Finn Lukas Busch, Timon Homberger, Jesús Ortega-Peimbert, Quantao Yang, Olov Andersson
Project Website , Paper (arXiv)
This repository contains the code for the paper "One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation". We provide a dockerized environment to run the code or you can run it locally.
In summary we open-source:
- The OneMap mapping and navigation code
- The evaluation code for single- and multi-object navigation
- The multi-object navigation dataset and benchmark
- The multi-object navigation dataset generation code, such that you can generate your own datasets
Abstract
The capability to efficiently search for objects in complex environments is fundamental for many real-world robot
applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation
methods that allow a robot to search for an arbitrary object without prior training. However, these
zero-shot methods have so far treated the environment as unknown for each consecutive query.
In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage
information gathered from previous searches to more efficiently find new objects. To address this problem we build a
reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic
map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty
for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation
as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing
state-of-the-art approaches both on single and multi-object navigation tasks.
Setup (Docker)
0. Docker
You will need to have Docker installed on your system. Follow the official instructions to install. You will also need to have the nvidia-container-toolkit installed and configured as docker runtime on your system.
1. Clone the repository
# https
git clone https://github.com/KTH-RPL/OneMap.git
# or ssh
git clone git@github.com:KTH-RPL/OneMap.git
cd OneMap/
2. Build the Docker Image
The docker image build process will build habitat-sim and download model weights. You can choose to let the container
download the habitat scenes during build, or if you have them already downloaded, you can set HM3D=LOCAL and provide
the absolute HM3D_PATH to the versioned_data directory on your machine in the .env file in the root of the repository.
If you want the container to download the scenes for you, set HM3D=FULL in the .env file and provide your
Matterport credentials. You can get access for Matterport for free here.
You will not need to provide a HM3D_PATH then.
Having configured the .env file, you can build the docker image in the root of the repository with:
The build will take a while as habitat-sim is built from source. You can launch the docker container with:
and open a new terminal in the container with:
docker exec -it onemap-onemap-1 bash
Setup (Local, without Docker)
1. Clone the repository
# https
git clone https://github.com/KTH-RPL/OneMap.git
# or ssh
git clone git@github.com:KTH-RPL/OneMap.git
cd OneMap/
2. Install dependencies
python3 -m pip install gdown torch torchvision torchaudio meson
python3 -m pip install -r requirements.txt
Manually install newer timm version:
python3 -m pip install --upgrade timm>=1.0.7
YOLOV7:
git clone https://github.com/WongKinYiu/yolov7
Build planning utilities:
python3 -m pip install ./planning_cpp/
3. Download the model weights
SED extracted weights:
gdown 1D_RE4lvA-CiwrP75wsL8Iu1a6NrtrP9T -O weights/clip.pth
YOLOV7 weights and MobileSAM weights:
wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt -O weights/yolov7-e6e.pt
wget https://github.com/ChaoningZhang/MobileSAM/raw/refs/heads/master/weights/mobile_sam.pt -O weights/mobile_sam.pt
4. Download the habitat data
Running the code
1. Run the example
You can run the code on an example, visualized in rerun.io with:
Docker
You will need to have rerun.io installed on the host for visualization. Ensure the docker is running and you are in the container as described in the Docker setup. Then launch the rerun viewer on the host (not inside the docker) with:
and launch the example in the container with:
python3 habitat_test.py --config/mon/base_conf_sim.yaml
Local
Open the rerun viewer and example from the root of the repository with:
rerun
python3 habitat_test.py --config/mon/base_conf_sim.yaml
2. Run the evaluation
You can reproduce the evaluation results from the paper for single- and multi-object navigation.
Single-object navigation
python3 eval_habitat.py --config config/mon/eval_conf.yaml
This will run the evaluation and save the results in the results/ directory. You can read the results with:
python3 read_results.py --config config/mon/eval_conf.yaml
Multi-object navigation
python3 eval_habitat_multi.py --config config/mon/eval_multi_conf.yaml
This will run the evaluation and save the results in the results_multi/ directory. You can read the results with:
python3 read_results_multi.py --config config/mon/eval_multi_conf.yaml
Dataset generation
While we provide the generated dataset for the evaluation of multi-object navigation, we also release the code to generate the datasets with varying parameters. You can generate the dataset with
python3 eval/dataset_utils/gen_multiobject_dataset.py
and change the parameters such as number of objects per episode in the corresponding file.
Citation
If you use this code in your research, please cite our paper:
@misc{busch2024mapallrealtimeopenvocabulary,
title={One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation},
author={Finn Lukas Busch and Timon Homberger and Jesús Ortega-Peimbert and Quantao Yang and Olov Andersson},
year={2024},
eprint={2409.11764},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2409.11764},
}
