Easy-to-use VLA deployment, fast to react, smooth in motion.
About
VLASH is an efficient and easy-to-use framework for VLAs fine-tuning and inference.
VLASH is efficient through:
- Asynchronous inference for fast reaction and smooth motion in real-time (>30Hz inference frequency for $\pi_{0.5}$ on RTX 5090)
- Future-state-awareness to enable stable asynchronous VLA inference without overhead
- Action quantization for faster robot execution speed
- LoRA with shared observation encoding for efficient fine-tuning on consumer GPUs
VLASH is easy to use with:
- Seamless integration with LeRobot datasets (v2.1, v3.0), models and robots
- Simple YAML-based configuration system
- Support for various policy architectures (e.g., $\pi_{0.5}$, $\pi_0$, ...)
- Easy deployment on real robot hardware
Demo
Getting Started
conda create -n "vlash" python=3.10 conda activate vlash conda install ffmpeg=7.1.1 -c conda-forge pip install -e .
Quick Examples
Fine-tune a VLA policy for your task, enabling smooth async inference without overhead:
vlash train examples/train/pi05/async.yaml
Run async inference on a robot:
vlash run examples/inference/async.yaml
Run async inference with 2x speedup:
vlash run examples/inference/async.yaml --action_quant_ratio=2
TODO
- LoRA fine-tuning for $\pi_{0.5}$, $\pi_0$ under 12G GPU memory
- QLoRA fine-tuning for $\pi_{0.5}$, $\pi_0$ under 8G GPU memory
- Efficient fine-tuning with shared observation
Acknowledgment
This project is built upon the following excellent open-source projects: LeRobot, PEFT.
License
Apache 2.0
