Installation
After git cloning, run the following commands:
git submodule update --init --recursive conda create -n point_tracking python=3.12 -y conda activate point_tracking pip install -r requirements.txt # detectron gripper tracking pip install git+https://github.com/facebookresearch/detectron2@9604f5995cc628619f0e4fd913453b4d7d61db3f gdown --fuzzy https://drive.google.com/file/d/1qQV-yPZHqW9Z_eKR_U0aTnkcKpQ1em9V/view # download the detectron gripper weights from LLARVA mv model_final.pth detectron_gripper.pth
For co-tracker
pip install git+https://github.com/facebookresearch/co-tracker
To download openx, make sure the rlds_dataset_mod submodule is installed, check the paths are correct in rlds_dataset_mod/prepare_open_x.sh and run the script after:
cd rlds_dataset_mod conda env create -f environment_ubuntu.yml conda activate rlds_env bash prepare_open_x.sh cd ..
However, this downloads an old version of BRIDGE. To download the latest version, run the following script and put the path to the saved directory the same as the path to the saved directory for the openx datasets.
bash download_bridge_v2.sh [/path/to/save/tensorflow_datasets]
Labeling Data with Points and Masks
To label OXE datasets, run the run_openx_processing.sh script.
See README_processing_scripts.md for instructions.
The script now processes datasets sequentially instead of running multiple iterations of the same dataset:
- SAVE_DIR required:
./run_openx_processing.sh /path/to/save/directoryprocesses all datasets - With dataset index:
./run_openx_processing.sh /path/to/save/directory 3processes dataset at index 3 (seerun_openx_processing.shfor the list of datasets) - With dataset name:
./run_openx_processing.sh /path/to/save/directory bridge_v2processes that specific dataset (seerun_openx_processing.shfor the list of datasets)
To label LIBERO data, first download the processed OpenVLA-style-processed LIBERO data here (LIBERO). Then run the following script:
python label_videos_libero.py --save_dir /path/to/save/directory --dataset_location [LOCATION_OF_FOLDERS_CONTAINING_LIBERO_HDF5_FILES] --which_dataset [path/to/libero_90_openvla_processed]
To label OXE datasets, run the following script:
bash run_openx_processing.sh /path/to/save/directory /path/to/openx_datasets_saved_location # by default this is ~/tensorflow_datasets/openx_datasets/Or for individual datasets, run the following script:
bash run_openx_processing.sh /path/to/save/directory /path/to/openx_datasets_saved_location [dataset_name]
Q/A: How does the script keep track of which videos have been labeled?
The scripts keep track of processed videos in separate JSON files:
finished_videos.json: Tracks videos that have been fully processed with trackinglabeled_videos.json: Tracks videos that have been labeled with trajectory segments
HDF5 File Structure
The processed data is saved in HDF5 files with the following structure:
dataset_movement_and_masks.h5
├── {dataset_name}/
│ ├── {episode_key}/
│ │ ├── {img_key}/
│ │ │ ├── gripper_positions: uint16 array
│ │ │ ├── significant_points: uint16 array
│ │ │ ├── stopped_points: uint16 array
│ │ │ ├── movement_across_subtrajectory: float array
│ │ │ ├── masked_frames: uint8 array (compressed with gzip)
│ │ │ └── traj_splits_indices: uint16 array
Where:
dataset_name: Name of the dataset (e.g., "oxe")episode_key: Unique identifier for each episode in the datasetimg_key: Camera view identifier (e.g., "primary", "secondary")gripper_positions: Binary mask indicating gripper positionssignificant_points: Binary mask indicating points with significant movementstopped_points: Binary mask indicating points that have stopped movingmovement_across_subtrajectory: Array tracking movement across video framesmasked_frames: Video frames with object masks applied (compressed)traj_splits_indices: Indices indicating trajectory splits
The data is organized hierarchically to maintain the relationship between datasets, episodes, and different camera views. All binary masks are stored as uint8 arrays for efficiency, and the masked frames are compressed using gzip compression to reduce file size. The trajectory labels are stored as variable-length strings to accommodate captions of different lengths.
Note: LIBERO is not like this structure.
It is missing dataset_name but is otherwise the same.