Part 1 and 2: NERF!

As a reference, the images below show the process of optimizing the network to fit on the given images. The following results are for a neural field trained on a single input image (the fox), as well as a second example (the bear).

Model Architecture and Hyperparameters

PSNR Over Training (Fox)

Below is the PSNR curve over epochs for the "fox" image training run:

Fox PSNR over epochs
Figure: Fox PSNR Curve

Below is the training loss over epochs for the "fox" image:

Fox Loss over epochs
Figure: Fox Loss Curve

Predicted Images Across Iterations (Fox)

These images show the reconstruction of the fox image at various training epochs, demonstrating the network’s progress.

Fox at early epoch
Fox Reconstruction: Early Epoch
Fox intermediate epoch
Fox Reconstruction: Intermediate Epoch
Fox later epoch
Fox Reconstruction: Later Epoch
Fox final reconstructed
Fox Reconstruction: Final

Second Example: Bear Image

I then repeated the process on another image (the bear) with a chosen set of hyperparameters. Below is the PSNR curve and a selection of images during training.

PSNR curve for the bear image with a specific learning rate (1e-4):

Bear PSNR with LR=1e-4
Bear PSNR Curve with LR=1e-4

Bear Reconstruction Across Iterations

Bear epoch 1
Bear Reconstruction: Epoch 1
Bear epoch 200
Bear Reconstruction: Epoch 200
Bear epoch 400
Bear Reconstruction: Epoch 400
Bear epoch 600
Bear Reconstruction: Epoch 600
Bear epoch 800
Bear Reconstruction: Epoch 800
Bear final reconstructed
Bear Reconstruction: Final

Hyperparameter Tuning

I also ran hyperparameter tuning on the bear image. The following plots show loss curves and final results with different learning rates or network settings.

Bear LR=1e4 Loss Curve
Bear Loss Curve with LR=1e-4
Bear Loss with tuned parameters
Bear Loss Curve with Tuned Hyperparameters

Part 2: Fit a Neural Radiance Field from Multi-view Images

In this part, I used a neural radiance field (NeRF) to represent a 3D scene. Starting start from multi-view calibrated images of a Lego object. I have camera intrinsic and extrinsic parameters that allow us to cast rays into the scene. I then sample points along these rays and feed them into a Neural Radiance Field MLP to predict density and color. Finally, I then volume render the scene to compare against ground truth images and optimize the NeRF parameters.

Implementation Details

Ray, Camera, and Sample Visualization

Below is an example visualization of rays and samples I also draw at a single training step, along with the camera poses. I also plot up to 100 rays to keep the visualization less crowded as advised by the deliverables.

Sampled Rays Visualization 1
Sampled Rays Visualization 1
Sampled Rays Visualization 2
Sampled Rays Visualization 2
Sampled Rays Visualization 3
Sampled Rays Visualization 3
Sampled Rays Visualization 4
Sampled Rays Visualization 4

Partial Training Iterations Preview

While I am are still working on the full visualization of the training process (predicted images across more iterations and PSNR curves), here is a quick preview of the model’s output at a few selected iterations:

Legos Iteration 1
Legos Iteration 1
Legos Iteration 2
Legos Iteration 2
Legos Iteration 3
Legos Iteration 3
Legos Iteration 4
Legos Iteration 4

Additional Intermediate Results

Below are some additional iterations showing the model's progress at different completion percentages. As training goes on, the reconstruction quality improves, and more details become visible:

Lego iteration ~77%
Lego Reconstruction around 77% through training
Lego iteration ~88%
Lego Reconstruction around 88% through training
Lego iteration ~92%
Lego Reconstruction around 92% through training

Novel View Rendering

After training the network, I used it to render novel views of the Lego scene from arbitrary camera extrinsics. Below are examples of spherical rendering videos showing the Lego from multiple angles. The left video shows the result after 10 minutes of training and the right one after 2.5 minutes of training, demonstrating the improvement in rendering quality over time.

Spherical Rendering after 10 minutes of training
Spherical Rendering after 2.5 hours of training

Novel View Rendering with Background Bells and Whistles

Spherical Rendering after 10 minutes of training
Spherical Rendering after 2.5 hours of training