This project implements methods to algorithmically colorize photos from Sergei Mikhailovich Prokudin-Gorskii's (1863-1944) three-channel collection.
The colorization process involves two steps: Image Alignment and Color Restoration. In the image alignment phase, we attempt to align the three channels as closely as possible by treating this as an optimization problem. The optimization is done using two key techniques: Sum of Squared Differences (SSD) and Normalized Cross-Correlation (NCC).
SSD works by minimizing the error magnitude between two images, while NCC aims to find vectors that have the smallest angles between them.
argmax
and min
In the alignment task, I optimized for the best displacement that aligns the color channels by either minimizing the SSD using min
or maximizing the NCC using argmax
. These functions help find the displacement that yields the best alignment score. For small images, I can perform an exhaustive search over a large window, but for larger images, the shifts will most likely not lie in this range and there are way more pixels to compute due to higher number of pixels in high resolution photos, so I employed a pyramid search to recursively downscale the image and optimize over smaller regions.
The SSD is calculated by minimizing the squared pixel intensity differences between two images. This can be represented mathematically as:
Here, d
is the displacement vector, and I1
and I2
are the intensities of the two images at pixel positions (x, y)
.
The NCC is used to find the displacement that results in the best correlation between two images. It is represented by:
NCC(d) = ∑x,y (I1(x + dx, y + dy) - μ1) (I2(x, y) - μ2)
/
√(∑x,y (I1(x + dx, y + dy) - μ1)² ∑x,y (I2(x, y) - μ2)²)
Here, μ1
and μ2
are the mean intensity values of the two images.
In my implementation, I used the blue channel as the base image and aligned the red and green channels to it. I experimented with both SSD and NCC and found that NCC provided better alignment results. For smaller images, I used a brute-force search over a displacement window, while for larger images, I implemented a pyramid search to reduce computation time. I also found that using the sobel edge detector filter really improved the result for emir image specifically so I added that into my implementation presumambly this is because distributional brightness and pixel intensity across pixel channels being too different. As such I simply used this across all photos to get consistently good results.
To clean up the images, I implemented an automatic cropping method that removes unnecessary white or black borders by detecting the edge between the border and the image content. The cropping is based on intensity thresholding and calculating the bounding box of the actual image content.
Additionally, I implemented an automatic contrast adjustment method to improve the visual quality of the image. This rescaled the image intensity such that the darkest pixel is zero and the brightest pixel is one. This simple adjustment helps make the images more visually striking without any manual intervention.
In addition to simple intensity rescaling, I incorporated various advanced techniques to further enhance the visual quality of the images:
Below are the results from pyramid displacement images, filtered by those that contain green and red displacement vectors in their filenames.
Below are the results from single-scale images where I search a space of [-15, 15] window for both x and y image coordinates. Let's see the results below. Nothing fancy just the initial "naive" algorithm that works for smaller low-resolution images. If the ones below don't render for some reason just please look at github media single-naive for the photos. The displacement vectors are corectly referenced.
The project successfully automates the process of aligning and colorizing the digitized images from Prokudin-Gorskii's collection. The results demonstrate that NCC generally produces better alignment compared to SSD, and the pyramid search significantly reduces computation time for larger images.