Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning
πProject Page | βοΈPaper(Arxiv) | π₯Demo
Hao Chen, Jiaming Liu, Chenyang Gu, Zhuoyang Liu, Renrui Zhang, Xiaoqi Li, Xiao He, Yandong Guo, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng
π€ Unlike previous dual-system VLA methods that attach a separate policy head as System 1, FiS-VLA repurposes the final transformer blocks of an intact VLM as System 1, while retaining the full model for System 2 reasoning. Our System 2 processes low-frequency 2D images and language prompts into latent features that guide System 1, while System 1 additionally handles high-frequency inputs, including the robot state, 2D images, and point clouds, for robust and responsive action generation. Under this paradigm, FiS-VLA achieves superior performance and high-frequency control.
β¨ News β¨
- [2025/06/03] Fast-in-Slow is now live on arXiv with code release! π
- [2025/07/08] The pretrained checkpoint on a large-scale robotic dataset has been released! π
π¦ Installation
The code is built using Python 3.10, we also recommand to use Python above Python 3.10. We require PyTorch >= 2.2.0 and CUDA >= 12.0 (It may run with lower versions, but we have not tested it). We recommend using Miniconda and create an environment as follows:
conda create --name fisvla python=3.10 conda activate fisvla cd Fast-in-Slow pip install -r requirement.txt pip install git+https://github.com/moojink/dlimp_openvla@040105d256bd28866cc6620621a3d5f7b6b91b46 pip install git+https://github.com/arnoldland/openvla@5603207085d55148682e2a35b868ad77d7b42ece pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable" # Training additionally requires Flash-Attention 2 (https://github.com/Dao-AILab/flash-attention) pip install packaging ninja # Verify Ninja --> should return exit code "0" ninja --version; echo $? # Install Flash Attention 2 # =>> If you run into difficulty, try `pip cache remove flash_attn` first pip install "flash-attn==2.5.5" --no-build-isolation export COPPELIASIM_ROOT=${HOME}/CoppeliaSim export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT wget https://downloads.coppeliarobotics.com/V4_1_0/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz mkdir -p $COPPELIASIM_ROOT && tar -xf CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz -C $COPPELIASIM_ROOT --strip-components 1 rm -rf CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz cd LIFT3D/third_party/RLBench pip install -e . cd ../../.. cd LIFT3D pip install -e . cd .. pip install --force-reinstall opencv-python-headless pip install numpy==1.26.4
π§© Framework
Our code is built based on HybridVLA and Lift3D and is organized in the following framework:
conf: config files for hydridvla trainingscripts: scripts for training and testingtraining: contains strategies for trainingmodels: contains Fast-in-Slow models, including backbones & diffusion & vlms & vlasutil: contains different kinds of tools funtionvla: from openvla's vla structure, including action tokenizer, etc.transformer: customized transformer packagetimm: customized timm package
π‘Getting Started
You can find the Pretrained Checkpoint from HFπ€.
See the train.sh and scripts/train.py for more training details.
See the test_rlbench.sh and scripts/sim.py for more testing details.
ποΈ License
This project is licensed under the MIT License - see the LICENSE file for details.
π BibTeX
@misc{chen2025fastinslowdualsystemfoundationmodel,
title={Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning},
author={Hao Chen and Jiaming Liu and Chenyang Gu and Zhuoyang Liu and Renrui Zhang and Xiaoqi Li and Xiao He and Yandong Guo and Chi-Wing Fu and Shanghang Zhang and Pheng-Ann Heng},
year={2025},
eprint={2506.01953},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2506.01953},
}