A Simple Recipe for Language-guided Domain Generalized Segmentation
Mohammad Fahes1,
Tuan-Hung Vu1,2,
Andrei Bursuc1,2,
Patrick Pérez3,
Raoul de Charette1
1 Inria, 2 valeo.ai, 3 Kyutai
Project page: https://astra-vision.github.io/FAMix/
Paper: https://arxiv.org/abs/2311.17922
TL; DR: FAMix (for Freeze, Augment, and Mix) is a simple method for domain generalized semantic segmentation, based on minimal fine-tuning, language-driven patch-wise style augmentation, and patch-wise style mixing of original and augmented styles.
Citation
@InProceedings{fahes2024simple,
title={A Simple Recipe for Language-guided Domain Generalized Segmentation},
author={Fahes, Mohammad and Vu, Tuan-Hung and Bursuc, Andrei and P{\'e}rez, Patrick and de Charette, Raoul},
booktitle={CVPR},
year={2024}
}
Demo
Test on unseen youtube videos in different cities
Training dataset: GTA5
Backbone: ResNet-50
Segmenter: DeepLabv3+
Watch the full video on YouTube
⚠️⚠️Note1: For testing datasets with higher resolution than the one used for training, scaling down the images by a factor of 2 (i.e., scale=0.5) and then upsampling the predictions back to the original resolution speeds up inference and can improve results. Thanks to tpy001 for raising this point in the issues. The scale parameter can be customized when running Evaluation by adding --scale <value>.
⚠️⚠️Note2: One more trick to improve the performance at inference: 1- Predict with a scale=1 (i.e., original size of the input image), 2- Predict with downsampled image (scale=0.5), 3- ensemble the predictions. The code for this is added, it can be activated by adding in --scale <value> and --ensemble in Evaluation.
Results with RN50 backbone and DLv3+ decoder trained on GTA5:
| Backbone | Decoder | Scale | Cityscapes | Mapillary | ACDC night | ACDC snow | ACDC rain | ACDC fog |
|---|---|---|---|---|---|---|---|---|
| RN50 | DLv3+ | 1 | 48.51 | 52.39 | 15.02 | 37.38 | 39.56 | 40.99 |
| RN50 | DLv3+ | 0.5 | 48.02 | 54.00 | 21.58 | 38.27 | 39.53 | 44.94 |
| RN50 | DLv3+ | ensemble (1 & 0.5) | 50.80 | 56.04 | 20.05 | 40.40 | 42.10 | 44.93 |
Results with RN101 backbone and DLv3+ decoder trained on GTA5:
| Backbone | Decoder | Scale | Cityscapes | Mapillary | ACDC night | ACDC snow | ACDC rain | ACDC fog |
|---|---|---|---|---|---|---|---|---|
| RN101 | DLv3+ | 1 | 49.13 | 53.41 | 21.28 | 41.49 | 42.19 | 44.30 |
| RN101 | DLv3+ | 0.5 | 50.06 | 55.31 | 23.97 | 40.34 | 42.41 | 44.98 |
| RN101 | DLv3+ | ensemble (1 & 0.5) | 51.46 | 56.95 | 24.53 | 43.33 | 44.77 | 47.39 |
Table of Content
Installation
Dependencies
First create a new conda environment with the required packages:
conda env create --file environment.yml
Then activate environment using:
Datasets
-
ACDC: Download ACDC images and labels from ACDC. Please follow the dataset directory structure:
<ACDC_DIR>/ % ACDC dataset root ├── rbg_anon/ % input image (rgb_anon_trainvaltest.zip) └── gt/ % semantic segmentation labels (gt_trainval.zip)
-
BDD100K: Download BDD100K images and labels from BDD100K. Please follow the dataset directory structure:
<BDD100K_DIR>/ % BDD100K dataset root ├── images/ % input image └── labels/ % semantic segmentation labels
-
Cityscapes: Follow the instructions in Cityscapes to download the images and semantic segmentation labels. Please follow the dataset directory structure:
<CITYSCAPES_DIR>/ % Cityscapes dataset root ├── leftImg8bit/ % input image (leftImg8bit_trainvaltest.zip) └── gtFine/ % semantic segmentation labels (gtFine_trainvaltest.zip)
-
GTA5: Download GTA5 images and labels from GTA5. Please follow the dataset directory structure:
<GTA5_DIR>/ % GTA5 dataset root ├── images/ % input image └── labels/ % semantic segmentation labels
-
Mapillary: Download Mapillary images and labels from Mapillary. Please follow the dataset directory structure:
<MAPILLARY_DIR>/ % Mapillary dataset root ├── training % Training subset └── images % input image └── labels % semantic segmentation labels ├── validation % Validation subset └── images % input image └── labels % semantic segmentation labels
-
Synthia: Download Synthia images and labels from SYNTHIA-RAND-CITYSCAPES and split it following SPLIT-DATA. Please follow the dataset directory structure:
<SYNTHIA>/ % Synthia dataset root ├── RGB/ % input image └── GT/ % semantic segmentation labels
Trained models
The trained models are available here.
Running FAMix
Style mining
python3 patch_PIN.py \
--dataset <dataset_name> \
--data_root <dataset_root> \
--resize_feat \
--save_dir <path_for_learnt_parameters_saving>
Training
python3 main.py \
--dataset <dataset_name> \
--data_root <dataset_root> \
--total_itrs 40000 \
--batch_size 8 \
--val_interval 750 \
--transfer \
--data_aug \
--ckpts_path <path_to_save_checkpoints> \
--path_for_stats <path_for_mined_styles>
Evaluation
python3 main.py \
--dataset <dataset_name> \
--data_root <dataset_root> \
--ckpt <path_to_tested_model> \
--test_only \
--ACDC_sub <ACDC_subset_if_tested_on_ACDC>
Inference & Visualization
To test any model on any image and visualize the output, please add the images to predict_test directory and run:
python3 predict.py \
--ckpt <ckpt_path> \
--save_val_results_to <directory_for_saved_output_images>
License
FAMix is released under the Apache 2.0 license.
Acknowledgement
The code is based on this implementation of DeepLabv3+, and uses code from CLIP, PODA and RobustNet.
