Junfeng Ni1,2,
Yu Liu1,2,
Ruijie Lu2,3,
Zirui Zhou1,
Song-Chun Zhu1,2,3,
Yixin Chen✉2,
Siyuan Huang✉2
✉ indicates corresponding author
1Tsinghua University
2State Key Laboratory of General Artificial Intelligence, BIGAI
3Peking University
DP-Recon incorporates diffusion priors for decompositional neural scene reconstruction to enhance reconstruction quality in sparsely captured and heavily occluded regions.
Installation
- Tested System: Ubuntu 20.04, CUDA 11.8
- Tested GPUs: A100, A800
conda create -n dprecon python=3.9 conda activate dprecon pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118 pip install tensorflow-hub pip install -r requirements.txt pip install accelerate pip install lightning==2.2.3 pip install xformers==0.0.20 # need exactly match torch version pip install setuptools==69.5.1 pip install git+https://github.com/KAIR-BAIR/nerfacc.git@v0.5.2 pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch pip install git+https://github.com/ashawkey/envlight.git pip install git+https://github.com/openai/CLIP.git pip install git+https://github.com/NVlabs/nvdiffrast.git # pip uninstall datasets # if have conflict with local datasets
If you are unable to access huggingface to download SD 1.5, SD 2.1, CLIPs or inpainting models, you can download them from here and place the $pretrained_models files to code/pretrained_models. Then, configure huggingface to load models from local files as follows:
export TRANSFORMERS_OFFLINE=1 export DIFFUSERS_OFFLINE=1 export HF_HUB_OFFLINE=1 export HF_HOME=./code/pretrained_models/huggingface
Data
Please download the preprocessed data and unzip in the data folder. The resulting folder structure should be:
└── DP-Recon
└── data
├── replica
├── scan ...
├── scannetpp
├── scan ...
├── youtube
├── scan ...
├── scene_style_prompt.json # scene style prompt we have triedYouTube data are obtained from YouTube videos (link to scan1, link to scan2 and link to scan3), with camera poses calibrated using COLMAP and object masks extracted by SAM2. This showcases the robust generalizability of our method to in-the-wild indoor scenes.
Since the full dataset is quite large, we use the replica/scan1 as an example scene. Please download this scene here and ensure it is placed under data/replica following the folder structure.
Training
cd code # 1.decompositional reconstruction + add geometry prior torchrun training/exp_runner.py --conf CONFIG --scan_id SCAN_ID --prior_yaml geometry.yaml ### CONFIG is the config file in code/confs, and SCAN_ID is the id of the scene to reconstruct. ### By default, training logs are saved using wandb. If you prefer to use TensorBoard, you can set the '--none_wandb'. # 2.(Optional but recommended) Manually select the best background inpainting result to reduce the randomness of the inpainting process. # 3.add texture prior torchrun training/exp_runner.py --conf CONFIG --scan_id SCAN_ID --prior_yaml texture.yaml --is_continue --ft_folder PATH_TO_EXP --checkpoint CKPT_EPOCH ### Example training command (use the Replica scan1): torchrun training/exp_runner.py --conf confs/replica_grid.conf --scan_id 1 --prior_yaml geometry.yaml # (Optional) Manually select the best background inpainting result from '../exps/dprecon_grid_replica_110/2025_03_16_23_12_57/plots/sds_views/obj_0/bg_pano/inpaint/' to replace randomly selected result '../exps/dprecon_grid_replica_110/2025_03_16_23_12_57/plots/sds_views/obj_0/bg_pano/bg_inpaint.png' torchrun training/exp_runner.py --conf confs/replica_grid.conf --scan_id 1 --prior_yaml texture.yaml --is_continue --ft_folder ../exps/dprecon_grid_replica_110/2025_03_16_23_12_57 --checkpoint 7500
Some of our reconstructed results are available here.
Evaluation
1. Reconstruction
# evaluate total scene python eval/eval_scene_recon.py --dataset_type DATASET # DATASET could be 'replica' or 'scannetpp' # evaluate each object python eval/eval_object_recon.py --dataset_type DATASET # evaluate bg python eval/eval_bg_recon.py --dataset_type DATASET
2. Rendering
# evaluate Full-Reference metrics (i.e., PSNR, SSIM, LPIPS) python eval/eval_rendering.py --dataset_type DATASET # evaluate No-Reference metrics (i.e., MUSIQ) python eval/eval_musiq.py --dataset_type DATASET # evaluate object mask python eval/eval_mask.py --dataset_type DATASET
Acknowledgements
Some codes are borrowed from MonoSDF, RICO, ObjectSDF++, ThreeStudio, Fantasia3D and RichDreamer. We thank all the authors for their great work.
Citation
@inproceedings{ni2025dprecon, title={Decompositional Neural Scene Reconstruction with Generative Diffusion Prior}, author={Ni, Junfeng and Liu, Yu and Lu, Ruijie and Zhou, Zirui and Zhu, Song-Chun and Chen, Yixin and Huang, Siyuan}, booktitle=CVPR, year={2025} }
