RussRobin/SpatialQA · Datasets at Hugging Face

SpatialQA enhances the model's spatial understanding capabilities by helping it comprehend and utilize depth maps.

In this HF dataset, SpatialQA.json and high-level images are provided. Please also download images in Bunny_695k for low and middle-level images.

How to use this dataset

  1. Download images and json from this repo
  2. Download Bunny_695k
  3. Prepare depth map for coco_2017 and visual_genome. Please refer to our instructions.
  4. File structure:
/images
/images/coco_2017
/images/coco_2017_d
/images/visual_genome
/images/visual_genome_d
/images/visual_genome
/images/open_images
/images/ocrvqa

/images/2d3ds
/images/2d3ds_d
/images/kitti
/images/kitti_d
/images/nyudepthv2
/images/nyudepthv2_d
/images/sa1b # sa-1b is under sa1b-1.tar ~ sa1b4.tar in this repo
/images/sa1b_d

SpatialBot-3B

Our finetuned 3B model can be found at: https://huggingface.co/RussRobin/SpatialBot-3B

Paper:

https://arxiv.org/abs/2406.13642

GitHub repo:

https://github.com/BAAI-DCAI/SpatialBot

SpatialBot, a VLM with precise depth understanding:

https://huggingface.co/RussRobin/SpatialBot

SpatialBench, the spatial understanding benchmark:

https://huggingface.co/datasets/RussRobin/SpatialBench