Part A: Image Warping and Mosaicing

An exploration of image transformation and homography recovery

Taking Images

Below are some images used in this project for the image rectification task:

Gon Image RSF Picture RSF Locker

Recovering Homographies

Before warping images into alignment, we need to recover the parameters of the transformation between each pair of images. In our case, this transformation is a homography, represented by the equation:

p' = H * p

Here, H is a 3x3 matrix with 8 degrees of freedom (the lower-right corner is a scaling factor and can be set to 1). We can recover the homography by establishing a set of (p’, p) pairs of corresponding points from two images.

The function structure used for homography recovery:

H = computeH(im1_pts, im2_pts)

In this function, im1_pts and im2_pts are n-by-2 matrices holding the (x, y) coordinates of n point correspondences from the two images, and H is the 3x3 homography matrix. Setting up a matrix equation, Ah = b, allows us to solve for the entries in H. This process requires more than 4 points to create an overdetermined system, which can be solved using least-squares to improve stability and accuracy.

System of Equations for Homography Recovery

We can set up the system of equations as follows (cited from CS180 Project 4B):

[x1, y1, 1, 0, 0, 0, -x'1*x1, -x'1*y1] * [h1 h2 h3 h4 h5 h6 h7 h8]^T = x'1 [0, 0, 0, x1, y1, 1, -y'1*x1, -y'1*y1] * [h1 h2 h3 h4 h5 h6 h7 h8]^T = y'1 ...

Once the coefficients h_i are computed, the homography matrix can be constructed as:

H = [h1 h2 h3] [h4 h5 h6] [h7 h8 1]

Computing Coordinate Warps

With the homography matrix H, we can compute coordinate warps in homogeneous coordinates by multiplying H with the original coordinates and then scaling:

[x'] = H * [x] [y'] [y] [w] [1]

This results in transformed coordinates:

x' = x' / w y' = y' / w

Warping the Images

Now that we know the parameters of the homography, we can use this matrix to warp each image towards a reference image. The function structure for this operation is:

imwarped = warpImage(im, H)

In this process, im is the input image to be warped, and H is the homography matrix. The warping can be done using either forward or inverse mapping. I used inverse mapping to trace the coordinates of the output image back to the original image, applying linear interpolation to fill in the new image plane.

One important aspect is determining the size of the resulting image. I predicted the bounding box coordinates by transforming the four corners of the input image using H.

Image Rectification

With the homography and warping code in place, I then tested them all together by performing “rectification” on an image. This essentially just process re-projects an image to make known rectangular objects (like paintings or tiles) appear straightened and aligned. This results below help verify our techniques to ensure we are getting accurate/decent enough estimations.

Warping Images for Rectification

Once we have estimated the homography matrix, we can use it to re-project one image onto the plane of another for stitching. For instance, to project Image 1 onto Image 2, we perform an inverse warp to achieve a smooth reprojection.

This involves transforming the four corners of Image 1 using the homography matrix to map them onto Image 2, defining the bounding box where the re-projected image will be placed. The next step is to interpolate the pixel values of Image 1 to align accurately within this bounding box. Using scipy.interpolate.RegularGridInterpolator, we interpolate the pixels of Image 1 and place the interpolated values at corresponding locations in the final image.

For cases involving multiple images without direct homographies between each pair, transformations can be calculated by chaining homographies. For example, if there is a homography from Image A to Image B and another from Image B to Image C, the homography from Image A to Image C can be obtained by multiplying the two matrices.

For my examples, I basically just selected the four corners of a rectangular object as points in im1_pts and define im2_pts as a perfect square, such as some scaled version of these coordinates [0, 0; 0, 1; 1, 0; 1, 1].

Results:

Frontal RSF Entrance Gon Rectified RSF Rectified

Directory containing rectified images:

media/project4a/image_rectification

Blend the Images into a Mosaic

To create a seamless mosaic, the images must be registered and blended together. Rather than overlaying one image directly onto another, which can cause strong edge artifacts, a weighted averaging technique is used. I selected one image as the reference and warped the other to align with the reference plane.

Each pixel in the region of an image is assigned a weight based on its Euclidean distance to the nearest edge using a distance transform. This distance is normalized from 0 to 1, serving as a blending weight. For improved results, a 4-level Laplacian pyramid was applied to handle different frequency bands. High-frequency details are blended based on the maximum distance transform values, while low-frequency details are combined using a weighted linear blend.

Pre-Mosaic Images

The following images were used as inputs for the mosaicing process:

RSF Locker Mosaic

RSF Locker 2 RSF Locker 3

Tower Mosaic

Tower 3 Tower 4

Campus Mosaic

Campus 2 Campus 3

Mosaic Results

Below are the final results of the mosaicing process using the blending techniques described:

RSF Locker Blended Tower Blended Final Campus Blended

Directory containing final mosaic images:

media/project4a/finalmosiac

It was exciting to see how basic linear algebra could lead to powerful methods like homography estimation, where a simple change of basis allows us to align images to a common plane. This not only enables seamless warping but also sets the stage for blending techniques that leverage low- and high-frequency components to create a cohesive, stunning final mosaic.

Detecting Corner Features in an Image

Here are the results of running the Harris corner detector on a sample image from just using the code provided in harris.py! Thank you! . The process transforms the black-and-white version of the image, assigning a "corner strength" value to each pixel. Pixels with a high corner strength compared to their immediate neighbors are identified as potential corner points. This produces many a lot and I mean a lot of pretty redundant (not much information obtained) candidates.

Harris Interest Points Overlay

The image below shows the original image overlaid with Harris interest points detected in areas with high corner strength:

Original Image Harris Corner Overlay

Adaptive Non-Maximal Suppression

In the previous Harris corner detection, there were way too many candidate points to be meaningful. To address this, I followed instructions and simply applied the Adaptive Non-Maximal Suppression (ANMS) that was outlined in the following paper staff shared:https://inst.eecs.berkeley.edu/~cs180/fa24/hw/proj4/Papers/MOPS.pdf. This was done to filter out redundant keypoints, retaining only the strongest corners that are evenly spaced across the image. This made a big difference as it helped to reduce overlap and also ensured that selected points represent significant, well-distributed features at least that's the idea!

Before and After Applying ANMS

The images below show a comparison of detected corners before and after applying ANMS:

Harris Corners Before ANMS Harris Corners After ANMS

Feature Descriptor Extraction

For feature descriptor extraction, I implemented axis-aligned 8x8 patches sampled from a larger 40x40 window around each keypoint. This approach was done to try and help ensure that we were able to get a sufficiently blurred descriptor for each feature point. I applied bias and gain normalization to standardize the descriptors for comparison. Note that rotation invariance and wavelet transforms were not applied in this implementation, focusing on the axis-aligned descriptor extraction.

Example Feature Descriptors

Below are examples of extracted feature descriptors:

Feature Descriptor Examples

Feature Matching

Once features are extracted from both images, the next step was to establish correspondences by matching features between them. For efficient matching, we use k-d trees to perform quick nearest-neighbor searches, where each feature in the first image is compared to potential matches in the second image. To ensure robust matches, a cross-check is applied, where a feature from the first image must have its best match in the second image, and vice versa. This bidirectional validation helps eliminate mismatches.

To further refine matches and minimize outliers, we apply a ratio test, inspired by Lowe’s method. By examining the ratio between the closest and second-closest matches for each feature, we can filter out ambiguous matches. Ideally, the closest match should be significantly stronger than the second-closest if it represents a true correspondence. This additional thresholding step enhances the reliability of our matching process.

Feature Matching Results

The image below illustrates the results of feature matching on ANMS-selected points. Harris interest points are marked in blue, while ANMS-selected points are in red.

Before RANSAC Feature Matching Illustration

RANSAC for Robust Homography Estimation

Yet even with Lowe's ratio test reducing outliers in feature matching, some incorrect correspondences still nonetheless remain, which can heavily influence the homography estimation calculation as a result of this "noise."" This is because the least-squares optimization is actually (without regularization) pretty sensitive to outliers, as it basically returns a safe "expectation" of the training data it can see. In fact there is some theory such as the James-Stein estimator that shows how shrinkage can help lead to a better general estimator when we have some conditions being met such as the data point being 3 or more parameters. a single mismatched pair can distort the result significantly. To address this, we use RANSAC (Random Sample Consensus) to improve robustness.

In RANSAC, we randomly sample four pairs of matched points and compute the homography based on these samples. We then evaluate how well this homography aligns other correspondences, counting the number of "inliers"—points that fit the homography within a defined error threshold. This process is repeated multiple times, and the homography with the highest number of inliers is selected as the best estimate. By prioritizing the consensus among inliers, RANSAC minimizes the impact of outliers and yields a more reliable homography.

RANSAC Results

The image below shows the effect of RANSAC on the matched features, with cross-checking disabled to highlight RANSAC’s impact on filtering outliers.

RANSAC Results Illustration RANSAC Results Illustration RANSAC Results Illustration

Final Results

The following images demonstrate the outcome of the full feature detection, matching, and homography estimation pipeline. The final mosaic is built by blending transformed images using robust homographies from RANSAC, resulting in cohesive alignment.

Automatic vs Manual Stitching Comparison

Auto Crop and Stitching
Auto Crop and Stitching
Manual Stitching
Manual Stitching

Automatic vs Manual Stitching Comparison for Tower Panorama

Auto Stitching Panorama
Auto Stitching Panorama
Manual Stitching Panorama
Manual Stitching Panorama

Automatic vs Manual Stitching Comparison

Auto Crop and Stitching
Auto Crop and Stitching
Manual Stitching
Manual Stitching

Automatic Blending Comparison for Outdoors Scene

Original Photo 1
Original Photo 1
Original Photo 2
Original Photo 2
Auto Blended Result
Auto Blended Result

I was pleasantly surprised by how easy it is to construct panorama pictures. By the end of the project, I had fully automated the homography estimation, correspondence point detection, and implemented a seamless blending process that works with any images, all without requiring further input from me.