RussRobin/SpatialBot-3B-LoRA · Hugging Face

SpatialBot is a VLM with spatial understanding and reasoning abilties, by precisely understanding depth maps and using them to do high-level tasks.

In this HF repo, we provide ckpts of SpatialBot-3B with LoRA, which is based on Phi-2 and SigLIP. It can perform well on general VLM tasks and spatial understanding benchmarks like SpatialBench.

You will also need to download pretrained CKPT.

Paper:

https://arxiv.org/abs/2406.13642

GitHub repo:

https://github.com/BAAI-DCAI/SpatialBot

SpatialBench, the benchmark:

https://huggingface.co/datasets/RussRobin/SpatialBench

Merged SpatialBot-3B:

https://huggingface.co/RussRobin/SpatialBot-3B