CS194 Project 7B: Feature Matching for Autostitching

Rohan Chitnis

(Skip to bells and whistles section.)

Background

For this project, we continued upon part A (webpage) to do automatic panorama creation! Basically, this boils down to selecting matching control points rather than relying on a user to do so; everything after this remains the same. For this part, I used Gaussian and Laplacian stacks with size 3 for all my blending, since this provided me the best results from the previous part.

I performed the following steps. First, I implemented my very own Harris corner detector (no built-in library!) and applied it to each image. This was done by directly porting the provided sample code to Python. Because too many points were being generated, I removed all points whose value for the corner strength function is lower than the 95th percentile. I then used a technique called adaptive non-maximal suppression to only keep a nearly uniformly distributed subset of the chosen points for each image. For this, I followed the paper: for each Harris point, I obtained the minimum suppression radius as the distance to the closest other Harris point such that the corner strength of this point is less than 0.9 times the corner strength of the other. This means that the neighbor must have sufficiently larger corner strength. I then selected the 150 points with highest minimum suppression radius value. Afterward, I assigned each of these 150 points a feature: 64 pixels sampled evenly from the 40x40 grid around that point and normalized. Next, I matched points in each image using these features. My cutoff here was that the ratio of 1-NN / 2-NN distances from a point's features to the features of other points must be less than 0.6, a ratio which in practice worked well for me. Finally, I used 4-point RANSAC straightforwardly as described here to compute a good, robust homography matrix not affected by outliers/noise. My threshold for a set of points being in the inlier set was that the norm of the expected point minus the actual point must be less than 1 pixel. The results are quite amazing!

Results

Here, I trace through the algorithm presented in the Background section with a specific pictoral example.

The original 2 images of a lecture hall in Berkeley.

Harris points of interest for each image.

Points of interest after filtering using adaptive non-maximal suppression. As expected, they are roughly uniformly distributed throughout the images.

The automatically determined control points in each image, after feature matching and rejection.

Left: Automatically stitched mosaic with a homography matrix computed using RANSAC, just as beautiful and well-aligned as with user-defined control points!!!

Right: For reference, the manually stitched result.

Here are some other examples.

Original images of a theatre on Shattuck.

Left: Automatically stitched mosaic. It looks as nice as the manually stitched one to its right!

Right: For reference, the manually stitched result.

Original images outside HP Auditorium!

Left: Automatically stitched mosaic. Note that there are minor alignment issues at one area along the ground. This is due to the periodicity of the floor pattern causing some Harris points on the floor to incorrectly match (since all these points have very similar feature descriptors).

Right: For reference, the manually stitched result.

What I Learned

I learned that approximating homography matrices is actually not all that complex. RANSAC is a naive but powerful algorithm -- I was very surprised that this kind of guess-and-check method could actually be so robust and useful in practice. My favorite part was definitely extracting and matching feature descriptors for points, because it's just amazing how we can identify matching landmarks in different images.

Bells and Whistles

NOTE: There are more bells and whistles on my webpage for part A, linked at the top of this page.

1. Rotational Invariance

I successfully implemented rotational invariance as follows. First, for each Harris point, we compute the associated orientation vector as the vector of [smoothed x derivative, smoothed y derivative] at that coordinate. The smoothed derivative is obtained by simply sending each derivative through a Gaussian filter with parameter 4.5, as suggested in the paper. Call this orientation vector u. Second, to rotate the feature descriptor grid, we begin by observing that each point in the grid could originally be computed as grid_pt = img_center + 5 * i * [0, 1] + 5 * j * [1, 0], where i ranges from -3 to 4 and j ranges from -3 to 4. So, for the rotated case, we perform a change of basis and obtain grid_pt = img_center + 5 * i * u + 5 * j * u_perp, where u_perp equals u rotated by 90 degrees in either direction. Here are my results showcasing correctness.

The original 2 images I used. The second one is a rotated version of the one illustrated before.

With rotational invariance turned OFF (so using the original code), here are the control points found by matching feature descriptors (yep, no matches are found!).

Now using rotational invariance, here are the control points found by matching feature descriptors! Yay, it works!

Finally, the resulting mosaic, which looks nice just like before, save some minor alignment issues! Note that the black space occupying the bottom half of the image is simply a result of the increased output image dimensions due to the second image being rotated.

2. Panorama Recognition

I implemented panorama recognition from an unordered set. In order to do this, I followed and implemented the algorithm described in this paper. The core idea is that after my RANSAC homography matrix calculation, I determine the total number of features (n_f), as well as the number of features in the inlier set for the finally chosen matrix (n_i). I then use the formula derived in the paper, n_i > 5.9 + 0.22 * n_f, as my threshold for keeping a particular image within the mosaic. Here are some results. Note that if no images in the entire set are determined to create a panorama, I default to returning the first image.

Case 1

The original 3 images. Note that only the first and the third form a panorama.

Result. The second image has been discarded.

Case 2

The original 3 images. All three form a panorama.

Result. No images have been discarded.

Case 3

The original 3 images. None of them form a panorama.

Result. As expected, the code defaults to simply outputting the first image provided.