Codebook-NeRF

Improving NeRF resolution based on codebook

1KyungHee University   
+corresponding authors
KSC 2024

Abstract

In this paper, we propose a new NeRF[1] method that can restore high-resolution details of low-resolution images without reference images. To this end, while maintaining the Super Resolution process of NeRF-SR[2], the codebook structure of VQ-VAE[3] is introduced to learn the patterns of high-resolution images and improve the definition technique. The number of embedding vectors in the codebook was increased to learn more high-resolution information, and it is trained to imitate high-resolution latent characteristics without reference images through Imaging Inference. As a result of the experiment, the proposed model maintained the PSNR performance of NeRF-SR[2], and succeeded in generating clear and detail-rich images.

Codebook-NeRF

Codebook-NeRF Pipeline

Train Pipeline of Codebook-NeRF: Train SR (low-resolution) patches to mimic the characteristics of HR (high-resolution) patches. (a) Input both HR and SR patches into the codebook. (b) Learn high-resolution latent features from the HR patches through the codebook, training SR patches to imitate these features. (c) Reconstruct HR patches' latent features using a decoder, combining the output from the codebook at each deconvolution layer to produce a high-resolution image. (d) Employ a UNet structure to enhance reconstruction by incorporating high-resolution details obtained at each stage as additional input.


Codebook-NeRF Pipeline

Test Pipeline of Codebook-NeRF: Use only SR patches to restore high-resolution images. (a) Input SR patches into the codebook to generate latent representations with high-resolution features. (b) Pass these representations to the decoder to produce the final high-resolution image. (c) Apply high-resolution details learned by the codebook to SR patches, enabling high-resolution restoration without needing reference images.

Visualization: Here are examples of HuGS on different scenes (datasets). More results can be found in the paper and the data.

(1) Input
(2) Seg. w/ SfM
(3) HSfM
(4) Color Residual
(5) HCR
(6) Static Map
yoda gt
yoda sfm
yoda sfm mask
yoda residual
yoda res mask
yoda final mask
crab gt
crab sfm
crab sfm mask
crab residual
crab res mask
crab final mask
statue gt
statue sfm
statue sfm mask
statue residual
statue res mask
statue final mask
andbot gt
andbot sfm
andbot sfm mask
andbot residual
andbot res mask
andbot final mask
pillow gt
pillow sfm
pillow sfm mask
pillow residual
pillow res mask
pillow final mask
cars gt
cars sfm
cars sfm mask
cars residual
cars res mask
cars final mask
brandenburg gt
brandenburg sfm
brandenburg sfm mask
brandenburg residual
brandenburg res mask
brandenburg final mask
sacre gt
sacre sfm
sacre sfm mask
sacre residual
sacre res mask
sacre final mask
taj gt
taj sfm
taj sfm mask
taj residual
taj res mask
taj final mask
trevi gt
trevi sfm
trevi sfm mask
trevi residual
trevi res mask
trevi final mask

Rendering Results

Comparisons on the Distractor Dataset: Our method can better preserve static details while ignoring transient distractors.


BabyYoda
Mip-NeRF 360
w/ RobustNeRF
w/ HuGS (ours)
Crab
Statue
Android

Comparisons on the Kubric Dataset:


Pillow
Mip-NeRF 360
w/ RobustNeRF
w/ HuGS (ours)
Chairs
Cars

Comparisons on the Phototourism Dataset:


Brandenburg Gate
Mip-NeRF 360
w/ RobustNeRF
w/ HuGS (ours)
Sacre Coeur
Taj Mahal
Trevi Fountain

BibTeX

@article{chen2024nerfhugs,
  author    = {Chen, Jiahao and Qin, Yipeng and Liu, Lingjie and Lu, Jiangbo and Li, Guanbin},
  title     = {Codebook-NeRF: Improving NeRF resolution based on codebook},
  journal   = {CVPR},
  year      = {2024},
}