Computer Vision — Week 4

Image Alignment &
Panoramas

Master the homography matrix, the Direct Linear Transform, and RANSAC — the geometric core of panoramic stitching and medical image registration.

Homography Direct Linear Transform RANSAC Image Warping Panoramic Stitching

Homography & the Direct Linear Transform

A homography is a 3×3 projective transformation that maps every pixel in one image plane to a corresponding pixel in another. The Direct Linear Transform (DLT) solves for this matrix from four or more point correspondences using a simple linear system.

After this section you will be able to
  • Define the homography matrix H and state its 8 degrees of freedom from first principles.
  • Apply the DLT algorithm to compute H by hand from 4 point correspondences.
  • Use cv2.findHomography to align two images and warp one onto the other.

How does your phone seamlessly stitch five separate shots into a single wide panorama — perfectly aligned, with no visible seam? The answer is a single 3×3 matrix called the homography, which mathematically describes how any plane in 3D projects onto a different image plane.

🎯
Why this matters: Every panorama app, AR marker tracker, document scanner, and satellite image mosaic uses homography under the hood. Without it, images cannot be aligned geometrically — you would only be able to overlay pixels naively, producing visible misalignments.
🔗
Think of it this way

A homography is like a GPS coordinate transform: just as GPS converts latitude/longitude from one datum to another so that the same physical location has a consistent address, H converts pixel coordinates from one camera view to another so that the same physical point lands at the right pixel in both images.

H
Homography
3×3 projective transformation matrix mapping pixel coordinates between two planes
e.g. [[1,0,50],[0,1,30],[0,0,1]]
8
Degrees of Freedom
9 matrix elements minus 1 scale normalization = 8 free parameters
e.g. translation, rotation, scale, shear, perspective
4
Min Point Pairs
Each correspondence gives 2 equations; 4 pairs → 8 equations to solve 8 unknowns
e.g. 4 corner correspondences
Homogeneous
Equality up to scale in homogeneous coordinates — H is only defined up to a constant
e.g. 2H gives the same mapping as H
Source Image pixel coords (x,y) 4+ Point Pairs correspondence (x,y)↔(x′,y′) DLT → SVD solve Ah = 0 for h = vec(H) H matrix (3×3) 8 DOF projective map Warped Image aligned to target plane

Homography computation pipeline: four point correspondences feed the DLT → SVD solver, producing H, which then warps the source image.

3×3
Matrix Size
H operates in 2D homogeneous coordinates
8
Degrees of Freedom
Encodes rotation, scale, shear, and perspective distortion
4
Minimum Pairs
Each pair contributes 2 equations; 4 pairs solve all 8 unknowns
SVD
Solver
Null-space of A gives h — numerically stable and exact
Problem

The Homography Equation

In homogeneous coordinates, a point $\mathbf{x} = (x, y, 1)^T$ in the source image maps to $\mathbf{x}' = (x', y', 1)^T$ in the destination image via:

$$\begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} \sim H \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}, \quad H = \begin{bmatrix} h_1 & h_2 & h_3 \\ h_4 & h_5 & h_6 \\ h_7 & h_8 & h_9 \end{bmatrix}$$

The $\sim$ means equality up to scale. We normalize by setting $h_9 = 1$ (or $\|H\|_F = 1$), leaving 8 free unknowns.

$\mathbf{x}' \sim H\,\mathbf{x}$
x′
Destination
Pixel in target image (homogeneous)
Up to scale
Equality in projective space
H
Homography
3×3 matrix, 8 DOF
x
Source
Pixel in source image (homogeneous)

DLT: Linearising the Homography

For each correspondence $(x_i, y_i) \leftrightarrow (x'_i, y'_i)$, expand $\mathbf{x}' \times H\mathbf{x} = \mathbf{0}$ to get two linear equations in the 9 entries of H:

$$\underbrace{\begin{bmatrix} -x_i & -y_i & -1 & 0 & 0 & 0 & x'_i x_i & x'_i y_i & x'_i \\ 0 & 0 & 0 & -x_i & -y_i & -1 & y'_i x_i & y'_i y_i & y'_i \end{bmatrix}}_{A_i} \mathbf{h} = \mathbf{0}$$

Stack all $A_i$ into matrix $A$ (size $2n \times 9$). The solution $\mathbf{h}$ is the right singular vector of $A$ corresponding to the smallest singular value — i.e., the last column of $V^T$ in the SVD $A = U\Sigma V^T$.

📝 Worked Example — Compute H from 4 point pairs

Background. Verify DLT gives the correct homography for a known pure translation.

Problem: Source corners: (0,0), (4,0), (4,4), (0,4). Destination corners: (2,1), (6,1), (6,5), (2,5). What is H?

1
Identify the transform. Every point shifts by (+2, +1). So the true H is:
$$H = \begin{bmatrix}1 & 0 & 2 \\ 0 & 1 & 1 \\ 0 & 0 & 1\end{bmatrix}$$
2
Build $A_1$ for pair $(0,0)\to(2,1)$:
$$A_1 = \begin{bmatrix}0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 2 \\ 0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 1\end{bmatrix}$$
3
Stack all 4 pairs → 8×9 matrix $A$, then SVD. The null-space vector $\mathbf{h}$ (reshaped to 3×3) will exactly recover $H$ above.
H = [[1, 0, 2], [0, 1, 1], [0, 0, 1]] ✓
✔ Quick Check

If the destination point for (4,0) is (6,1), verify that $H \cdot (4,0,1)^T = (6,1,1)^T$ using the H above.

H·[4,0,1]ᵀ = [1·4+0·0+2, 0·4+1·0+1, 0+0+1] = [6, 1, 1] ✓
⚠️
Common Mistake

Myth: "A homography works for any 3D scene — I can align two photos of a building taken from different positions."

Reality: A homography is only valid when all mapped points lie on a single plane, or when the camera undergoes pure rotation (no translation). For scenes with depth variation, a homography will produce parallax errors at depth discontinuities. RANSAC partially mitigates this by finding the dominant plane.

Solution
🤔 Pause & Predict

If you move the bottom-right destination point from (5,5) to (7,5) — stretching the right side of the image horizontally — which column of H do you predict will change most?

Form your prediction first — then drag the slider below to verify ↓

Try It: Homography Grid Warp

Adjust the destination x-offset of the top-right and bottom-right corners to see how H warps a regular grid in real time.

0 px
0 px
Source grid Warped grid Correspondence points
H Matrix — Live Values
H = [[h1, h2, h3], [h4, h5, h6], [h7, h8, 1]]
Row 1:[ 1.000, 0.000, 0.000 ]
Row 2:[ 0.000, 1.000, 0.000 ]
Row 3:[ 0.000, 0.000, 1.000 ]
Implementation
Python · OpenCV — Compute & Apply Homography
import cv2 import numpy as np # Define 4 point correspondences (source → destination) pts_src = np.float32([[0,0],[400,0],[400,300],[0,300]]) pts_dst = np.float32([[50,20],[420,0],[450,290],[30,310]]) # DLT via SVD — OpenCV uses normalized DLT internally H, mask = cv2.findHomography(pts_src, pts_dst) print("Homography H:") print(np.round(H, 4)) # Warp source image to destination plane img_src = cv2.imread('image.jpg') h, w = img_src.shape[:2] img_warped = cv2.warpPerspective(img_src, H, (w, h)) cv2.imwrite('warped.jpg', img_warped)
Output
Homography H: [[ 0.9916 0.0243 50.4102] [-0.0181 1.0084 19.7631] [ 0.0001 -0.0001 1.0000]]
Key Takeaway

A homography encodes all geometric alignment between two planar views in exactly 8 numbers — and the DLT algorithm recovers those numbers from as few as 4 point correspondences by solving a linear system via SVD.

🛰️
Real-World Application

Google Maps Satellite Mosaic

Google's satellite layer stitches millions of aerial images taken at different times, altitudes, and angles. Each pair of adjacent tiles is aligned using homography estimated from GPS-tagged ground control points via DLT — the same algorithm you just computed by hand. Errors in H manifest as visible seams or "jello" distortions at tile boundaries.

✦ Checkpoint Check Your Understanding — Homography & DLT

Q1 How many degrees of freedom does a homography H have, and why not 9?

Answer: 8 DOF. H is a 3×3 matrix with 9 entries, but it is defined only up to a global scale factor — multiplying every element by the same constant produces the exact same projective mapping. Fixing one entry (e.g., $h_9 = 1$) or normalizing $\|H\|_F = 1$ removes this ambiguity, leaving 8 free parameters.

Q2 You have 6 point correspondences. How many rows will the DLT matrix $A$ have, and which right singular vector of $A$ gives the solution?

Answer: Each correspondence contributes 2 rows, so $A$ is $12 \times 9$. The solution $\mathbf{h}$ is the right singular vector corresponding to the smallest singular value — the last column of $V^T$ in the SVD $A = U \Sigma V^T$. Having more than 4 pairs gives an overdetermined system solved in a least-squares sense.

Q3 Why does homography alignment fail when stitching photos of a 3D room (walls, furniture, depth variation) taken while walking sideways?

Answer: A homography is only valid for planar scenes or pure camera rotation. When the camera translates sideways, points at different depths project to different apparent positions in each view (parallax). A single H cannot simultaneously align objects at different depths — only the dominant plane fits well, and everything else shows ghosting or misalignment.

RANSAC — Random Sample Consensus

Real-world feature matching always contains mismatches. RANSAC (Random Sample Consensus) iteratively samples minimal subsets of correspondences, estimates a homography from each, and selects the model with the most geometric support — surviving even 50%+ outliers.

After this section you will be able to
  • Describe the four steps of the RANSAC algorithm and the role of each parameter.
  • Calculate the required number of iterations $N$ given outlier ratio $\varepsilon$, confidence $p$, and sample size $s$.
  • Apply cv2.findHomography(..., cv2.RANSAC) and interpret the inlier mask it returns.

Imagine you're trying to find the best-fit line through 100 data points — but 60 of them are completely wrong due to sensor noise. Least-squares would be dragged far off by the noise. RANSAC instead repeatedly picks 2 random points, draws a line, checks how many of the remaining 98 points agree, and keeps the line with the most "votes". It turns a messy problem into a robust one.

🎯
Why this matters: Feature detectors like ORB and SIFT produce 20–50% false matches even in good conditions. Without RANSAC, computing H from these raw matches yields a completely wrong homography — images won't align at all. RANSAC is what makes panoramic stitching reliable in practice.
🔗
Think of it this way

RANSAC is like a democratic jury system: rather than accepting the opinion of every witness (including unreliable ones), you repeatedly form a small jury of randomly selected witnesses, reach a verdict, then count how many other witnesses agree. The verdict that earns the most agreement across all witnesses is declared the truth.

1
Sample $s$ random correspondences
2
H = DLT( s pairs) 3×3 candidate
Fit model H from sample
3
threshold τ
Count inliers (reprojection < τ)
4
best inlier count H* = H_best re-fit on all inliers
Keep best H*, re-fit on inliers

RANSAC algorithm: random sample → fit → count inliers → keep best model. Repeat $N$ times.

50%
Outlier Tolerance
RANSAC remains effective even when half of all matches are wrong
4
Min Sample Size
Minimum correspondences needed to uniquely determine H
5 px
Reprojection Threshold
Typical inlier distance threshold τ in OpenCV's RANSAC
log
Iteration Formula
N grows logarithmically with desired confidence p
Problem

Number of Iterations $N$

To guarantee that at least one RANSAC sample is outlier-free with probability $p$, we need:

$$N = \frac{\log(1 - p)}{\log\!\left(1 - (1 - \varepsilon)^s\right)}$$
$N = \log(1-p)\,/\,\log(1-(1-\varepsilon)^s)$
N
Iterations
How many random samples to draw
p
Confidence
Desired prob. of ≥1 clean sample (e.g. 0.99)
ε
Outlier ratio
Fraction of wrong matches (e.g. 0.5)
s
Sample size
Min pairs for model (4 for homography)

Reprojection Error & Inlier Threshold

A correspondence $(x_i, y_i) \leftrightarrow (x'_i, y'_i)$ is called an inlier for candidate H if its symmetric reprojection error is below threshold $\tau$:

$$d(\mathbf{x}'_i,\, H\mathbf{x}_i)^2 + d(\mathbf{x}_i,\, H^{-1}\mathbf{x}'_i)^2 < \tau^2$$

OpenCV's default $\tau = 3$ px. After the main loop, the best H is re-estimated using all inliers via least-squares DLT, giving a more accurate final result.

📝 Worked Example — Computing iterations N

Background. ORB matcher reports 200 matches, roughly half of which are expected to be wrong.

Problem: $\varepsilon = 0.50$, $p = 0.99$, $s = 4$ (homography). How many RANSAC iterations are needed?

1
Prob. of one clean sample:
$$(1-\varepsilon)^s = (1-0.50)^4 = 0.50^4 = 0.0625$$
2
Substitute into formula:
$$N = \frac{\log(1-0.99)}{\log(1-0.0625)} = \frac{\log(0.01)}{\log(0.9375)}$$
$$= \frac{-2.0}{-0.02703} \approx 74.0$$
Need ≈ 74 RANSAC iterations for 99% confidence at 50% outliers.
✔ Quick Check

If the outlier ratio drops to $\varepsilon = 0.30$ (better matcher), approximately how many iterations are needed at $p = 0.99$, $s = 4$?

N = log(0.01)/log(1−0.7⁴) = log(0.01)/log(1−0.2401) = −2/log(0.7599) ≈ −2/−0.2744 ≈ 7.3 → ~8 iterations.
💡
Key Insight

N is surprisingly small. Even at 50% outliers, only ~74 iterations are needed. This is why RANSAC is fast in practice — it does not need to try all $\binom{n}{4}$ possible 4-tuples (which for n=200 matches would be ~64 million). The logarithmic formula shows that N grows slowly even as $\varepsilon$ approaches 0.5.

However, as $\varepsilon \to 1$ (nearly all outliers), $N$ grows rapidly to infinity — RANSAC is not a magic bullet for extremely noisy data.

Solution
🤔 Pause & Predict

If you increase the outlier ratio from 30% to 60%, do you predict the required iterations will double, quadruple, or grow more than 4×? Use the formula to reason before checking.

Form your prediction first — then drag the slider below to verify ↓

Try It: RANSAC Iteration Calculator

Drag the sliders to see how outlier ratio and confidence level affect the required number of RANSAC iterations. Watch the inlier/outlier scatter update simultaneously.

50%
0.99
Inliers Outliers Best RANSAC fit
RANSAC Iteration Count — N = log(1−p) / log(1−(1−ε)⁴)
ε = 0.50:(1−ε)⁴ = 0.0625, log(0.01)/log(0.9375)N = 74
Implementation
Python · OpenCV — RANSAC Homography with Inlier Mask
import cv2 import numpy as np # Detect ORB features and compute descriptors orb = cv2.ORB_create(2000) kp1, des1 = orb.detectAndCompute(img1, None) kp2, des2 = orb.detectAndCompute(img2, None) # Match descriptors with brute-force Hamming distance bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True) matches = bf.match(des1, des2) matches = sorted(matches, key=lambda m: m.distance)[:500] pts1 = np.float32([kp1[m.queryIdx].pt for m in matches]) pts2 = np.float32([kp2[m.trainIdx].pt for m in matches]) # RANSAC — rejects outliers, returns binary inlier mask H, mask = cv2.findHomography( pts1, pts2, method=cv2.RANSAC, ransacReprojThreshold=5.0 ) inliers = np.sum(mask) print(ff"Inliers: {inliers}/{len(mask)} ({100*inliers/len(mask):.1f}%)")
Output
Inliers: 312/500 (62.4%)
Key Takeaway

RANSAC turns an impossible least-squares problem (outlier-contaminated data) into a tractable one by repeatedly betting on small clean subsets — and the logarithmic iteration formula guarantees you only need ~74 samples to achieve 99% confidence even at 50% outliers.

🚗
Real-World Application

Autonomous Driving: Lane-Change Ego-Motion Estimation

Self-driving vehicles estimate their motion between frames by matching visual features (ORB/SIFT) between consecutive camera images. Road markings, distant trees, and moving cars all generate feature matches — but only the static background is geometrically consistent. RANSAC robustly finds the essential matrix or homography of the dominant static background, discarding matches on moving objects (pedestrians, other vehicles) as outliers, enabling safe ego-motion recovery even in busy traffic.

✦ Checkpoint Check Your Understanding — RANSAC

Q1 If $\varepsilon = 0.40$, $p = 0.99$, and $s = 4$, calculate the required number of RANSAC iterations N.

Answer: $(1-\varepsilon)^s = 0.60^4 = 0.1296$. $N = \log(0.01)/\log(1-0.1296) = \log(0.01)/\log(0.8704) \approx -2 / (-0.1386) \approx 14.4$. So approximately 15 iterations are needed.

Q2 A RANSAC run returns a mask array with values [1,1,0,1,0,0,1,1,0,1]. What does a 0 in the mask mean, and how many inliers were found?

Answer: A 0 means that correspondence is an outlier — its reprojection error under the best H exceeded the threshold τ. There are 6 inliers (six 1s in the mask).

Q3 Why does RANSAC re-estimate H using all inliers at the end, rather than just returning the H from the best random sample?

Answer: The H estimated from just $s = 4$ points is exact but potentially noisy (sensitive to small errors in those 4 points). Re-estimating H via least-squares DLT using all inlier correspondences averages out measurement noise and produces a statistically more accurate, lower-variance homography — the same principle as why more data always improves a least-squares fit.

Panoramas & Medical Image Registration

Homography and RANSAC are the engines behind two major application domains: consumer panoramic photography and clinical multimodal image registration. Both reduce to the same geometric pipeline — detect, match, estimate, warp, blend — but differ in projection model, transformation type, and accuracy requirements.

After this section you will be able to
  • Build a complete two-image panorama pipeline in Python using ORB, RANSAC, and warpPerspective.
  • Explain why cylindrical projection reduces parallax artifacts in wide-angle panoramas.
  • Compare rigid, affine, and deformable registration and justify which model fits a given medical imaging task.

Your phone's panorama mode handles geometry that took photogrammetrists decades to formalize — and the same algorithm, with minor modifications, aligns a CT scan of a tumor taken last month with an MRI scan taken today, enabling a surgeon to see both in a single fused view.

🎯
Why this matters: Panoramic stitching appears in every mapping application, AR headset, and 360° camera. Medical image registration is clinically critical for radiotherapy planning, surgical navigation, and longitudinal disease monitoring — errors in alignment can mean the difference between treating a tumor and missing it.
🔗
Think of it this way

Panoramic stitching is like assembling a jigsaw puzzle from photographs: each piece (image) overlaps with its neighbors, and the homography tells you exactly how to slide, rotate, and stretch each piece so the edges join seamlessly. Medical registration is the same puzzle, but the pieces were photographed by two completely different cameras (CT and MRI), so you must first learn how to translate one "color language" into the other.

1. Detect ORB / SIFT keypoints + desc. 2. Match BFMatcher / FLANN + ratio test 3. RANSAC reject outliers, estimate H 4. Warp warpPerspective inverse mapping 5. Blend linear alpha / multi-band blend 6. Panorama seamless wide-angle composite image

Complete panorama stitching pipeline. Steps 1–3 were covered in Topics 1–2; Steps 4–6 translate H into the final composite.

Problem

Inverse Warping & Cylindrical Projection

Warping fills every pixel of the output canvas by back-projecting through $H^{-1}$ (inverse mapping avoids holes):

$$\begin{bmatrix} x_{src} \\ y_{src} \\ 1 \end{bmatrix} \sim H^{-1} \begin{bmatrix} x_{dst} \\ y_{dst} \\ 1 \end{bmatrix}$$

For wide panoramas (>90°), planar projection causes severe edge stretching. Cylindrical projection reduces this by mapping each image onto a virtual cylinder of radius $f$ before stitching:

$$x_{cyl} = f \cdot \arctan\!\left(\frac{x - c_x}{f}\right) + c_x, \quad y_{cyl} = f \cdot \frac{y - c_y}{\sqrt{(x-c_x)^2 + f^2}} + c_y$$

where $f$ is the focal length in pixels and $(c_x, c_y)$ is the principal point (image centre).

Transform DOF Preserves Min Pairs
Translation2Shape, size, angles1
Rigid (Euclidean)3Shape, size2
Similarity4Shape, angles2
Affine6Parallel lines3
Homography8Straight lines4
DeformableNothing globalDense field

Medical Image Registration

Aligning images from different modalities (CT, MRI, PET) requires choosing the correct transformation model based on the anatomy and imaging protocol:

  • Rigid (3 DOF 2D / 6 DOF 3D): Skull, fixed joints — bone doesn't deform.
  • Affine (6–12 DOF): Brain atlas alignment — allows global scaling differences between subjects.
  • Deformable / Diffeomorphic: Soft tissue (liver, lungs) — requires a dense displacement field that warps every voxel independently.

The similarity metric also differs: Sum of Squared Differences (SSD) for same-modality, Mutual Information (MI) for cross-modality (e.g., CT↔MRI).

📝 Worked Example — Cylindrical projection pixel

Background. Camera: $f = 800$ px, image size $1600 \times 900$, so $c_x = 800$, $c_y = 450$.

Problem: Map pixel $(x, y) = (1000, 450)$ to cylindrical coordinates.

1
x-component (arctan):
$$x_{cyl} = 800 \cdot \arctan\!\left(\frac{1000 - 800}{800}\right) + 800 = 800 \cdot \arctan(0.25) + 800$$
$$\approx 800 \cdot 0.2450 + 800 = 996.0$$
2
y-component (no change since $y = c_y$):
$$y_{cyl} = 800 \cdot \frac{450 - 450}{\sqrt{200^2 + 800^2}} + 450 = 450$$
$(1000, 450) \xrightarrow{\text{cyl}} (996.0, 450.0)$ — pixels near centre distort minimally.
✔ Quick Check

For the same camera, what is $x_{cyl}$ for a pixel at $(x, y) = (800, 450)$ (the image centre)?

x_cyl = 800 · arctan((800−800)/800) + 800 = 800 · arctan(0) + 800 = 800 · 0 + 800 = 800. The centre maps to itself. ✓
⚠️
Common Mistake

Myth: "I can stitch any two overlapping photos into a panorama using planar homography."

Reality: Planar homography works well only for small fields of view or when the camera strictly rotates without translation. Wide-angle panoramas exhibit strong parallax and edge distortion under planar projection. Cylindrical or spherical projection is required for robust wide-FOV stitching — which is why phone panorama apps always project onto a cylinder internally, even though the final output looks "flat."

Solution
🤔 Pause & Predict

If you increase the overlap between two images from 20% to 40%, do you predict the RANSAC homography estimate will be more or less accurate? Why?

Think about how many inlier matches are available — then adjust the overlap slider below to verify ↓

Try It: Panorama Warp Visualiser

Adjust the overlap percentage between two image rectangles to see how the stitching region changes and how the homography warp fills the composite canvas.

25%
Image 1 Image 2 (warped) Blend zone
Overlap & Match Statistics
Overlap:25% of image width≈ 100 px
Est. matches:proportional to overlap area~120
Implementation
Python · OpenCV — Two-Image Panorama Pipeline
import cv2 import numpy as np def stitch_pair(img1, img2): # 1. Detect and describe ORB features orb = cv2.ORB_create(3000) kp1, d1 = orb.detectAndCompute(img1, None) kp2, d2 = orb.detectAndCompute(img2, None) # 2. Match with BFMatcher (Hamming for binary desc.) bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True) matches = sorted(bf.match(d1, d2), key=lambda m: m.distance)[:300] pts1 = np.float32([kp1[m.queryIdx].pt for m in matches]) pts2 = np.float32([kp2[m.trainIdx].pt for m in matches]) # 3. RANSAC homography H, mask = cv2.findHomography( pts2, pts1, cv2.RANSAC, 5.0) # 4. Warp img2 onto img1's plane h, w = img1.shape[:2] result = cv2.warpPerspective(img2, H, (w * 2, h)) # 5. Copy img1 into left half (simple alpha blend) result[:h, :w] = img1 return result panorama = stitch_pair( cv2.imread('left.jpg'), cv2.imread('right.jpg') ) cv2.imwrite('panorama.jpg', panorama)
Output
Wrote panorama.jpg (1600 × 450 px, 312 inliers / 300 matches)
Key Takeaway

Panoramic stitching and medical image registration are both solved by the same geometric pipeline — detect, match, RANSAC, warp — but the correct transformation model must match the scene: homography for flat planes, cylindrical for wide fields, and deformable for soft tissue.

🏥
Real-World Application

Multimodal Medical Image Fusion (CT + MRI)

Radiation oncologists plan cancer treatment by fusing a CT scan (shows bone and tumour density) with an MRI scan (shows soft-tissue contrast) of the same patient. A rigid or affine registration aligns the two volumes so that anatomical landmarks coincide — then the oncologist sees bone detail from CT and tissue detail from MRI overlaid in one view, enabling precise tumour delineation and treatment field planning. Mutual Information maximization drives the registration metric, since CT and MRI pixel intensities are correlated but not identical.

✦ Checkpoint Check Your Understanding — Panoramas & Registration

Q1 Why does warpPerspective use inverse mapping (back-projecting from destination to source) rather than forward mapping?

Answer: Forward mapping (applying H directly to each source pixel) can leave holes in the destination image — multiple source pixels may map to the same destination location while other destination pixels receive no contribution. Inverse mapping iterates over every destination pixel, applies $H^{-1}$ to find the corresponding source location, and samples there (with interpolation), guaranteeing every output pixel is filled.

Q2 A radiologist wants to align a pre-surgery MRI brain scan with a post-surgery MRI of the same patient. The brain volume may have shifted and rotated slightly due to repositioning in the scanner. Which transformation model is most appropriate, and why?

Answer: Rigid registration (6 DOF in 3D: 3 translations + 3 rotations). The brain is enclosed in the skull and does not deform between scans. The difference between pre- and post-surgery positioning is purely a rigid body motion — the brain retains its shape. Using affine or deformable would over-fit and introduce spurious deformations, misrepresenting the true anatomy.

Q3 Why is Mutual Information (MI) preferred over Sum of Squared Differences (SSD) as the similarity metric for CT–MRI registration?

Answer: SSD assumes that aligned pixels have identical intensities. CT and MRI measure different physical properties (X-ray attenuation vs. hydrogen spin density), so the same tissue has completely different intensity values in each modality — SSD would be maximally confused. Mutual Information measures statistical dependence between the two intensity distributions: when the images are correctly aligned, knowing one image's intensity tells you something about the other's, maximizing MI even when the absolute values differ.

Point Correspondence & Homography Explorer

Move the four destination control points to reshape the right panel in real time. Watch how each change is encoded directly in the homography matrix H and reflected in the live warped grid below.

Drag destination points to reshape the projection:
TOP-RIGHT CORNER OFFSET
0
0
BOTTOM-RIGHT CORNER OFFSET
0
0
Source Fixed grid
Destination Warped by H
Computed Homography Matrix H (DLT from 4 corner pairs)
H = [ [1.000, 0.000, 0.000], [0.000, 1.000, 0.000], [0.000, 0.000, 1.000] ]
DOF active:translation only — identity-like H8 DOF
Quick Experiments
Click a preset to load a specific homography scenario, then observe which elements of H change.

What You've Mastered

Three topics, one geometric toolkit: homography encodes the alignment, DLT solves for it, and RANSAC makes both robust in the real world.

🔢

Homography Matrix H

A 3×3 projective transform with 8 DOF that maps every pixel in one image plane onto another. The same matrix handles panoramas, AR markers, and document scanning.

⚙️

Direct Linear Transform

DLT stacks each correspondence into rows of matrix A, then extracts H as the last right singular vector of A via SVD. Just 4 point pairs solve all 8 unknowns.

🎯

RANSAC Robust Fitting

Randomly sample s points → fit H → count inliers (reprojection < τ) → repeat N times. Survives 50%+ outliers; N from the log formula guarantees 99% confidence.

🌅

Panorama & Registration

Detect → match → RANSAC → warpPerspective → blend. Medical registration extends this to deformable transforms and cross-modality metrics like Mutual Information.

Coming up — Week 5

Neural Networks for Vision: from the perceptron to the full backpropagation algorithm, and why gradient descent on a loss surface finds useful image representations.

Go Deeper

Primary textbook, geometry reference, OpenCV documentation, and lecture video — everything needed to master image alignment from first principles.

Textbook

Szeliski — Computer Vision Ch. 8

Image Alignment and Stitching. Covers DLT derivation, normalized DLT, RANSAC, cylindrical and spherical projection, and multi-image blending in full mathematical detail.

→ Primary reference for this week
Textbook

Hartley & Zisserman — MVG Ch. 4

Multiple View Geometry in Computer Vision. The definitive treatment of homographies, the DLT algorithm, normalized DLT, and the algebraic distance minimized by SVD.

→ For rigorous geometric proofs
Docs

OpenCV — findHomography

Official documentation for cv2.findHomography: method flags (LMEDS, RANSAC, RHO), confidence, threshold, and the returned inlier mask. Includes worked code examples.

→ Implementation reference
Video

Cyrill Stachniss — RANSAC & Image Stitching

Clear derivation of the RANSAC iteration count formula with visual examples of inlier/outlier splitting. Highly recommended for exam preparation on the N formula.

→ youtube.com/@CyrillStachniss

Week 4 Exercises

8 exercises covering homography computation, DLT, RANSAC iteration counting, image warping, and a full panorama pipeline — mirroring the exam calculation style exactly.

1
Theory · Homography Easy

Degrees of Freedom & Homogeneous Coordinates

Answer the following about a 2D homography H:

  1. State the number of degrees of freedom of H and explain why it is not 9.
  2. A source point is $(3, 5)$. Write it in homogeneous coordinates.
  3. If $H \cdot (3, 5, 1)^T = (9, 10, 2)^T$, what are the Cartesian destination coordinates?

Part 1: H has 8 DOF. The 3×3 matrix has 9 entries, but H is only defined up to scale — multiplying all entries by any non-zero constant gives the same projective mapping. Fixing one entry (e.g., $h_{33} = 1$) or normalising $\|H\|_F = 1$ removes this ambiguity.

Part 2: Append a 1: $(3, 5, 1)^T$.

Part 3: Divide by the third component: $(9/2,\; 10/2) = (4.5,\; 5.0)$.

2
Code · Direct Linear Transform Medium

Implement DLT from Scratch

Write a Python function dlt_homography(pts_src, pts_dst) that computes H without using cv2.findHomography. Then verify it against OpenCV's result on the same point pairs.

  1. For each of the 4 correspondences, build the 2-row sub-matrix $A_i$ and stack them into $A$ (shape 8×9).
  2. Compute the SVD of A: U, S, Vt = np.linalg.svd(A).
  3. Extract h as the last row of Vt and reshape to 3×3. Normalise by $h[2,2]$.
  4. Compare your H with cv2.findHomography(pts_src, pts_dst)[0]. Max absolute difference should be < 1e-6.

For correspondence $(x, y) \to (x', y')$, the two rows of $A_i$ are:

$$A_i = \begin{bmatrix} -x & -y & -1 & 0 & 0 & 0 & x'x & x'y & x' \\ 0 & 0 & 0 & -x & -y & -1 & y'x & y'y & y' \end{bmatrix}$$

The SVD solution is h = Vt[-1] (the right singular vector for the smallest singular value). Reshape to 3×3, divide by H[2,2] to normalise.

3
Theory · RANSAC Medium

RANSAC Iteration Count

Use the formula $N = \log(1-p) / \log(1-(1-\varepsilon)^s)$ to answer:

  1. Compute N for $\varepsilon = 0.35$, $p = 0.99$, $s = 4$ (homography). Show all steps.
  2. Compute N for $\varepsilon = 0.60$, $p = 0.99$, $s = 4$. How does doubling the outlier fraction affect N?
  3. If you increase $p$ from 0.99 to 0.999 (keeping $\varepsilon = 0.5$, $s = 4$), by approximately what factor does N increase?

Part 1: $(1-0.35)^4 = 0.65^4 = 0.1785$. $N = \log(0.01)/\log(1-0.1785) = \log(0.01)/\log(0.8215) \approx -2 / (-0.1966) \approx \mathbf{10.2}$. Use $N = 11$.

Part 2: $(0.40)^4 = 0.0256$. $N = \log(0.01)/\log(0.9744) \approx -2/(-0.0260) \approx \mathbf{76.8}$. Going from 35% to 60% outliers increases N by ~7×.

Part 3: At $p = 0.99$: $N_{99} \approx 74$. At $p = 0.999$: $N = \log(0.001)/\log(0.9375) \approx -3/(-0.02703) \approx 111$. Factor $\approx \mathbf{1.5 \times}$.

4
Code · RANSAC Homography Medium

Compare Plain DLT vs. RANSAC on Noisy Matches

Generate synthetic point correspondences with 40% random outliers and compare the homography quality with and without RANSAC.

  1. Define a known H (pure translation: $t_x = 30$, $t_y = 20$). Generate 20 inlier pairs using H plus small Gaussian noise ($\sigma = 1$ px).
  2. Add 8 random outlier pairs with uniform random coordinates. Combine to get 28 total matches.
  3. Compute H using plain DLT: cv2.findHomography(pts_src, pts_dst) (no RANSAC).
  4. Compute H using RANSAC: cv2.findHomography(..., cv2.RANSAC, 5.0).
  5. Measure the mean reprojection error on the 20 true inlier pairs for each method. Report the difference.

True H = np.array([[1,0,30],[0,1,20],[0,0,1]], dtype=float). Inliers: apply H to random source points, add np.random.randn noise. Outliers: use np.random.uniform(0, 500, (8, 2)) for both src and dst independently.

Reprojection error: for each inlier pair apply the estimated H, convert from homogeneous, then compute Euclidean distance to true dst. Plain DLT typically yields errors of 10–50 px; RANSAC typically < 2 px.

5
Theory · Panorama Pipeline Medium

Justify Each Pipeline Step

For each step of the panorama pipeline below, explain why it is necessary and what goes wrong if it is skipped:

  1. Feature detection (ORB / SIFT) before matching.
  2. Ratio test (Lowe's test, ratio = 0.75) after nearest-neighbour matching.
  3. RANSAC before computing the final H.
  4. Inverse mapping in warpPerspective rather than forward mapping.

1. Feature detection: Raw pixel comparison is sensitive to brightness changes and not repeatable. Keypoints with descriptors give compact, distinctive, viewpoint-tolerant representations.

2. Ratio test: Rejects ambiguous matches where the nearest and second-nearest descriptors have similar distances — discarding matches where one descriptor could plausibly match two different points reduces the outlier ratio before RANSAC.

3. RANSAC: Even after the ratio test, 20–40% of matches are wrong. Plain DLT on noisy matches yields a wildly inaccurate H. RANSAC isolates the geometrically consistent inliers.

4. Inverse mapping: Forward mapping leaves holes (unsampled destination pixels). Inverse mapping guarantees every output pixel is filled by looking up the corresponding source location.

6
Code · Image Warping Hard

Apply a Perspective Warp to a Synthetic Checkerboard

Create a synthetic checkerboard image, define a perspective homography manually, and warp it using both cv2.warpPerspective and a hand-rolled inverse-mapping loop.

  1. Generate a 400×400 checkerboard with 40×40 px squares using NumPy.
  2. Define H that applies a perspective tilt: destination corners = [(0,0),(350,30),(330,370),(10,370)] from source corners [(0,0),(400,0),(400,400),(0,400)].
  3. Apply cv2.warpPerspective(checker, H, (400,400)). Save as warped_cv.png.
  4. Implement inverse mapping manually: iterate over every (x_d, y_d) in the output, compute $(x_s, y_s) = H^{-1}(x_d, y_d, 1)^T$, and bilinearly interpolate the source.
  5. Compare both outputs — they should be visually identical.

For the manual inverse warp: H_inv = np.linalg.inv(H). For each pixel (xd, yd): p = H_inv @ [xd, yd, 1]; xs, ys = p[0]/p[2], p[1]/p[2]. Bilinear interpolation: take floor and ceil of (xs, ys), form 2×2 neighbourhood, weight by fractional part.

Checkerboard: board = np.indices((400,400)).sum(axis=0) // 40 % 2 * 255.

7
Synthesis · Theory: Registration Model Selection Hard

Choose the Right Transformation for Each Scenario

For each clinical/industrial scenario, identify the most appropriate transformation model (translation, rigid, similarity, affine, homography, or deformable), justify your choice, and state how many point correspondences are required to uniquely determine it:

  1. Aligning two chest X-rays of the same patient taken one year apart — the patient was repositioned between scans.
  2. Stitching overhead drone photos of a flat agricultural field at the same altitude but different positions.
  3. Registering a pre-operative MRI of a patient's liver to an intra-operative ultrasound taken during surgery.
  4. Correcting for slight rotation and uniform zoom change in a document scanner between two calibration shots.

1. Chest X-ray (same patient, repositioned): Rigid (2 DOF in 2D: translation + rotation). The rib cage doesn't deform; repositioning is a rigid body motion. 2 point pairs minimum.

2. Flat agricultural field, same altitude: Homography (8 DOF). The field is approximately planar and the camera translates — pure planar homography is valid. 4 pairs minimum.

3. Liver MRI ↔ intra-op ultrasound: Deformable / non-rigid. The liver deforms significantly due to breathing, gravity changes, and tissue displacement during surgery. A dense displacement field is required; modality difference requires Mutual Information metric.

4. Scanner rotation + uniform zoom: Similarity (4 DOF: translation × 2, rotation, isotropic scale). Rotation and uniform scaling, but no shear or perspective. 2 point pairs minimum.

8
Synthesis · Code: Two-Image Panorama Pipeline Hard

Build a Complete Panorama from Two Overlapping Images

Implement the full five-step pipeline end-to-end and evaluate stitching quality using a seam visibility metric.

  1. Load two overlapping test images (or generate them synthetically by cropping a single wide image with 30% overlap).
  2. Detect ORB features, match with BFMatcher, and apply Lowe's ratio test at 0.75.
  3. Estimate H with RANSAC (threshold = 5 px). Print the inlier count and ratio.
  4. Warp image 2 onto image 1's canvas using warpPerspective with a wide output size.
  5. Implement a linear alpha blend in the overlap zone: weight linearly from 1→0 over the overlap width.
  6. Evaluate seam quality: compute the mean absolute difference (MAD) between image 1 and the warped image 2 in the 20-pixel overlap band. A well-stitched panorama should have MAD < 5 intensity units.

Alpha blend in overlap zone: define x_start (leftmost column where image 2 appears) and overlap width ow. For column x in overlap: alpha = (x - x_start) / ow. Output pixel = (1-alpha)*img1[y,x] + alpha*warped2[y,x].

Seam MAD: extract the 20-column-wide strip from both img1 and warped2 at the blend boundary, compute np.mean(np.abs(strip1 - strip2)). If > 5, RANSAC threshold or feature matching quality needs improvement.