INCORTX Academic Program

Computer Vision Foundations

From raw pixels to semantic understanding — master image formation, classical features, deep CNNs, and modern vision transformers to build systems that see.

Image Formation & Filtering Feature Detection & Matching CNNs & Transfer Learning Object Detection & Segmentation Vision Transformers (ViT) 15-Week Semester

Loading Instructor Profile…

Week 1

Course Overview & Image Formation

The CV-AI pipeline from camera sensor to prediction, images as 2D discrete signals, and color models from RGB to HSV and LAB.

  • The CV-AI Pipeline — from camera sensor to prediction
  • Images as 2D discrete signals: pixels, color models & spaces
  • App: Autonomous driving & medical imaging — how cameras feed AI
Week 2

Spatial Filtering & Kernels

Convolution as a spatial operation, the sliding kernel intuition, and building an industrial edge-detection pipeline.

  • Convolution as a spatial operation — the sliding kernel intuition
  • Gaussian blur & edge detection: Sobel, LoG & Canny
  • App: Industrial surface inspection & document binarization
Week 3

Feature Detection & Matching

What makes a good feature, Harris corners, SIFT & ORB descriptors, and building a keypoint matching pipeline.

  • What is a feature? Harris corners & scale-space theory
  • SIFT & ORB: descriptors, invariance & ratio-test matching
  • App: AR marker tracking, document scanning & product recognition
Week 4

Image Alignment & Panoramas

Homography estimation with DLT, RANSAC for robust fitting under outliers, and automatic panorama stitching.

  • Homography matrix H & Direct Linear Transform (DLT)
  • RANSAC: robust estimation under outlier correspondences
  • App: Panorama stitching & medical image registration
Week 5

Neural Networks for Vision

From perceptron to MLP, backpropagation and the chain rule, and training a digit classifier from scratch in PyTorch.

  • From perceptron to MLP: activation, loss & the chain rule
  • Backpropagation & optimization — SGD, Momentum & Adam
  • App: Digit & handwriting classification as a vision foundation
Week 6

Convolutional Neural Networks (CNNs)

Conv layers, pooling, receptive fields, and the CNN feature hierarchy — from LeNet to AlexNet across six decades.

  • Convolutional layer: filters, stride, padding & feature maps
  • Pooling, receptive fields & the CNN feature hierarchy
  • App: Image classification from LeNet to AlexNet — a 60-year journey
Week 7

Modern CNNs & Transfer Learning

Skip connections in ResNet, efficient convolutions in MobileNet, and fine-tuning strategies for custom datasets.

  • Skip connections (ResNet) & efficient convolutions (MobileNet)
  • Transfer learning: pre-trained weights & fine-tuning strategies
  • App: Fine-tuning MobileNet for custom product quality control
Week 8

Midterm Examination (40%)

Comprehensive written exam covering Weeks 1–7. Open-note: one A4 cheat sheet, handwritten, double-sided.

  • Scope: Weeks 1–7 — classical CV, filters & CNN basics
  • Open-note: 1 handwritten A4 cheat sheet
  • Focus: problem-solving & architectural troubleshooting
Week 9

Object Detection Frameworks

Bounding boxes, IoU, anchor design, and a side-by-side comparison of two-stage vs. single-stage detector architectures.

  • Bounding boxes, IoU & the leap from classification to localization
  • Two-stage (Faster R-CNN) vs. single-stage (YOLO) architectures
  • App: Real-time YOLO for traffic monitoring & retail analytics
Week 10

Semantic & Instance Segmentation

Pixel-wise classification with encoder-decoder U-Net, Mask R-CNN for instance masks, and medical image segmentation.

  • Pixel-wise classification: the encoder-decoder architecture (U-Net)
  • Instance segmentation (Mask R-CNN) & panoptic unification
  • App: Tumor segmentation & autonomous driving scene parsing
Week 11

3D Vision & Multi-View Geometry

Camera calibration, stereo disparity-to-depth, epipolar geometry, and Structure from Motion for 3D reconstruction.

  • Camera calibration, stereo vision & disparity-to-depth
  • Epipolar geometry, fundamental matrix & Structure from Motion
  • App: Autonomous vehicle depth perception & drone 3D reconstruction
Week 12

Vision Transformers (ViT)

Scaled dot-product attention, multi-head attention, patch embeddings, and Swin's hierarchical shifted-window approach.

  • Self-attention mechanism: scaled dot-product & multi-head attention
  • ViT: patch embeddings; Swin: hierarchical shifted windows
  • App: Swin Transformer for satellite & aerial image analysis
Week 13

Generative Models in Vision

VAE latent spaces, adversarial training in GANs, and diffusion model denoising — with medical image synthesis as a case study.

  • VAE & GAN: latent space learning & adversarial training dynamics
  • Diffusion models: forward noise process & learned reverse denoising
  • App: Medical image synthesis for data augmentation
Week 14

Video Tracking & Edge AI

Optical flow with Lucas-Kanade, DeepSORT multi-object tracking, model compression, and ONNX/TensorRT edge deployment.

  • Optical flow (Lucas-Kanade) & multi-object tracking (DeepSORT)
  • Model optimization: pruning, quantization & knowledge distillation
  • App: Edge deployment with TensorRT & ONNX for embedded vision
Week 15

Final Project Presentation (35%)

Pair-based mock technical system design interviews and end-to-end vision pipeline demonstrations.

  • Mock System Design Interview (pair teams of 2)
  • End-to-end vision pipeline demonstration
  • Peer evaluation & Q&A panel
Course Assessment Breakdown
A transparent view of course performance criteria, rewarding consistency, active participation, and solid visual AI engineering skills through a gamified flipped classroom model.
25%

In-Class Assignments

Weekly interactive lecture polling plus randomized team review presentations. Teams of 4 alternate presenter and responder roles, building a short visualization per session.

40%

Midterm Exam

Held in Week 8, covering Weeks 1–7. Classical CV, filters, feature extraction, and CNN basics. Allowed: one A4 sheet, handwritten, double-sided. Emphasis on problem-solving.

35%

Final Vision Project

A pair-based project (teams of 2) building a real-world image or video analysis system on a custom dataset. Evaluated via a mock technical system design interview in Week 15.

Course Learning Infrastructure
The software and platforms we will use to build, experiment, and submit assignments throughout the semester.
🏫 Google Classroom (Materials & Submissions)
☁️ Google Colab (Cloud Coding Runtime)
📸 OpenCV (Image I/O & Classical CV)
🔥 PyTorch (Deep Learning)
🌟 torchvision (Pre-trained Models & Transforms)
🔬 scikit-image (Image Algorithms)
🧮 NumPy / Matplotlib (Math & Visualization)
🖼️ Pillow (Image Loading)
📋 ClickUp (Project Management)