INCORTX Academic Program

Computer Vision Foundations

From raw pixels to semantic understanding — master image formation, classical features, deep CNNs, and modern vision transformers to build systems that see.

Image Formation & Filtering Feature Detection & Matching CNNs & Transfer Learning Object Detection & Segmentation Vision Transformers (ViT) 15-Week Semester

Weekly Curriculum Navigator

Explore the Semester Modules

A comprehensive overview of the semester curriculum, mapping out visual theory, mathematical foundations, and deep learning engineering applications across all 14 instructional modules.

Week 1

Course Overview & Image Formation

The CV-AI pipeline from camera sensor to prediction, images as 2D discrete signals, and color models from RGB to HSV and LAB.

The CV-AI Pipeline — from camera sensor to prediction
Images as 2D discrete signals: pixels, color models & spaces
App: Autonomous driving & medical imaging — how cameras feed AI

Week 2

Spatial Filtering & Kernels

Convolution as a spatial operation, the sliding kernel intuition, and building an industrial edge-detection pipeline.

Convolution as a spatial operation — the sliding kernel intuition
Gaussian blur & edge detection: Sobel, LoG & Canny
App: Industrial surface inspection & document binarization

Week 3

Feature Detection & Matching

What makes a good feature, Harris corners, SIFT & ORB descriptors, and building a keypoint matching pipeline.

What is a feature? Harris corners & scale-space theory
SIFT & ORB: descriptors, invariance & ratio-test matching
App: AR marker tracking, document scanning & product recognition

Week 4

Image Alignment & Panoramas

Homography estimation with DLT, RANSAC for robust fitting under outliers, and automatic panorama stitching.

Homography matrix H & Direct Linear Transform (DLT)
RANSAC: robust estimation under outlier correspondences
App: Panorama stitching & medical image registration

Week 5

Neural Networks for Vision

From perceptron to MLP, backpropagation and the chain rule, and training a digit classifier from scratch in PyTorch.

From perceptron to MLP: activation, loss & the chain rule
Backpropagation & optimization — SGD, Momentum & Adam
App: Digit & handwriting classification as a vision foundation

Week 6

Convolutional Neural Networks (CNNs)

Conv layers, pooling, receptive fields, and the CNN feature hierarchy — from LeNet to AlexNet across six decades.

Convolutional layer: filters, stride, padding & feature maps
Pooling, receptive fields & the CNN feature hierarchy
App: Image classification from LeNet to AlexNet — a 60-year journey

Week 7

Modern CNNs & Transfer Learning

Skip connections in ResNet, efficient convolutions in MobileNet, and fine-tuning strategies for custom datasets.

Skip connections (ResNet) & efficient convolutions (MobileNet)
Transfer learning: pre-trained weights & fine-tuning strategies
App: Fine-tuning MobileNet for custom product quality control

Week 8

Midterm Examination (40%)

Comprehensive written exam covering Weeks 1–7. Open-note: one A4 cheat sheet, handwritten, double-sided.

Scope: Weeks 1–7 — classical CV, filters & CNN basics
Open-note: 1 handwritten A4 cheat sheet
Focus: problem-solving & architectural troubleshooting

Week 9

Object Detection Frameworks

Bounding boxes, IoU, anchor design, and a side-by-side comparison of two-stage vs. single-stage detector architectures.

Bounding boxes, IoU & the leap from classification to localization
Two-stage (Faster R-CNN) vs. single-stage (YOLO) architectures
App: Real-time YOLO for traffic monitoring & retail analytics

Week 10

Semantic & Instance Segmentation

Pixel-wise classification with encoder-decoder U-Net, Mask R-CNN for instance masks, and medical image segmentation.

Pixel-wise classification: the encoder-decoder architecture (U-Net)
Instance segmentation (Mask R-CNN) & panoptic unification
App: Tumor segmentation & autonomous driving scene parsing

Week 11

3D Vision & Multi-View Geometry

Camera calibration, stereo disparity-to-depth, epipolar geometry, and Structure from Motion for 3D reconstruction.

Camera calibration, stereo vision & disparity-to-depth
Epipolar geometry, fundamental matrix & Structure from Motion
App: Autonomous vehicle depth perception & drone 3D reconstruction

Week 12

Vision Transformers (ViT)

Scaled dot-product attention, multi-head attention, patch embeddings, and Swin's hierarchical shifted-window approach.

Self-attention mechanism: scaled dot-product & multi-head attention
ViT: patch embeddings; Swin: hierarchical shifted windows
App: Swin Transformer for satellite & aerial image analysis

Week 13

Generative Models in Vision

VAE latent spaces, adversarial training in GANs, and diffusion model denoising — with medical image synthesis as a case study.

VAE & GAN: latent space learning & adversarial training dynamics
Diffusion models: forward noise process & learned reverse denoising
App: Medical image synthesis for data augmentation

Week 14

Video Tracking & Edge AI

Optical flow with Lucas-Kanade, DeepSORT multi-object tracking, model compression, and ONNX/TensorRT edge deployment.

Optical flow (Lucas-Kanade) & multi-object tracking (DeepSORT)
Model optimization: pruning, quantization & knowledge distillation
App: Edge deployment with TensorRT & ONNX for embedded vision

Week 15

Final Project Presentation (35%)

Pair-based mock technical system design interviews and end-to-end vision pipeline demonstrations.

Mock System Design Interview (pair teams of 2)
End-to-end vision pipeline demonstration
Peer evaluation & Q&A panel

Grading & Infrastructure

Course Assessment Breakdown

A transparent view of course performance criteria, rewarding consistency, active participation, and solid visual AI engineering skills through a gamified flipped classroom model.

25%

In-Class Assignments

Weekly interactive lecture polling plus randomized team review presentations. Teams of 4 alternate presenter and responder roles, building a short visualization per session.

40%

Midterm Exam

Held in Week 8, covering Weeks 1–7. Classical CV, filters, feature extraction, and CNN basics. Allowed: one A4 sheet, handwritten, double-sided. Emphasis on problem-solving.

35%

Final Vision Project

A pair-based project (teams of 2) building a real-world image or video analysis system on a custom dataset. Evaluated via a mock technical system design interview in Week 15.

Ecosystem Stack

Course Learning Infrastructure

The software and platforms we will use to build, experiment, and submit assignments throughout the semester.

🏫 Google Classroom (Materials & Submissions)

☁️ Google Colab (Cloud Coding Runtime)

📸 OpenCV (Image I/O & Classical CV)

🔥 PyTorch (Deep Learning)

🌟 torchvision (Pre-trained Models & Transforms)

🔬 scikit-image (Image Algorithms)

🧮 NumPy / Matplotlib (Math & Visualization)

🖼️ Pillow (Image Loading)

📋 ClickUp (Project Management)