GitHub - astra-vision/FAMix: [CVPR 2024] Domain generalization by interpolating original feature styles with styles obtained using random descriptions in natural language

A Simple Recipe for Language-guided Domain Generalized Segmentation

Mohammad Fahes¹, Tuan-Hung Vu^1,2, Andrei Bursuc^1,2, Patrick Pérez³, Raoul de Charette¹
¹ Inria, ² valeo.ai, ³ Kyutai

Project page: https://astra-vision.github.io/FAMix/
Paper: https://arxiv.org/abs/2311.17922

TL; DR: FAMix (for Freeze, Augment, and Mix) is a simple method for domain generalized semantic segmentation, based on minimal fine-tuning, language-driven patch-wise style augmentation, and patch-wise style mixing of original and augmented styles.

Citation

@InProceedings{fahes2024simple,
  title={A Simple Recipe for Language-guided Domain Generalized Segmentation},
  author={Fahes, Mohammad and Vu, Tuan-Hung and Bursuc, Andrei and P{\'e}rez, Patrick and de Charette, Raoul},
  booktitle={CVPR},
  year={2024}
}

Demo

Test on unseen youtube videos in different cities
Training dataset: GTA5
Backbone: ResNet-50
Segmenter: DeepLabv3+

Watch the full video on YouTube

⚠️⚠️Note1: For testing datasets with higher resolution than the one used for training, scaling down the images by a factor of 2 (i.e., scale=0.5) and then upsampling the predictions back to the original resolution speeds up inference and can improve results. Thanks to tpy001 for raising this point in the issues. The scale parameter can be customized when running Evaluation by adding --scale <value>.

⚠️⚠️Note2: One more trick to improve the performance at inference: 1- Predict with a scale=1 (i.e., original size of the input image), 2- Predict with downsampled image (scale=0.5), 3- ensemble the predictions. The code for this is added, it can be activated by adding in --scale <value> and --ensemble in Evaluation.

Results with RN50 backbone and DLv3+ decoder trained on GTA5:

Backbone	Decoder	Scale	Cityscapes	Mapillary	ACDC night	ACDC snow	ACDC rain	ACDC fog
RN50	DLv3+	1	48.51	52.39	15.02	37.38	39.56	40.99
RN50	DLv3+	0.5	48.02	54.00	21.58	38.27	39.53	44.94
RN50	DLv3+	ensemble (1 & 0.5)	50.80	56.04	20.05	40.40	42.10	44.93

Results with RN101 backbone and DLv3+ decoder trained on GTA5:

Backbone	Decoder	Scale	Cityscapes	Mapillary	ACDC night	ACDC snow	ACDC rain	ACDC fog
RN101	DLv3+	1	49.13	53.41	21.28	41.49	42.19	44.30
RN101	DLv3+	0.5	50.06	55.31	23.97	40.34	42.41	44.98
RN101	DLv3+	ensemble (1 & 0.5)	51.46	56.95	24.53	43.33	44.77	47.39

Table of Content

Installation

Dependencies

First create a new conda environment with the required packages:

conda env create --file environment.yml

Then activate environment using:

Datasets

ACDC: Download ACDC images and labels from ACDC. Please follow the dataset directory structure:

<ACDC_DIR>/                   % ACDC dataset root
├── rbg_anon/                 % input image (rgb_anon_trainvaltest.zip)
└── gt/                       % semantic segmentation labels (gt_trainval.zip)

BDD100K: Download BDD100K images and labels from BDD100K. Please follow the dataset directory structure:

<BDD100K_DIR>/              % BDD100K dataset root
├── images/                 % input image
└── labels/                 % semantic segmentation labels

Cityscapes: Follow the instructions in Cityscapes to download the images and semantic segmentation labels. Please follow the dataset directory structure:

<CITYSCAPES_DIR>/             % Cityscapes dataset root
├── leftImg8bit/              % input image (leftImg8bit_trainvaltest.zip)
└── gtFine/                   % semantic segmentation labels (gtFine_trainvaltest.zip)

GTA5: Download GTA5 images and labels from GTA5. Please follow the dataset directory structure:

<GTA5_DIR>/                   % GTA5 dataset root
├── images/                   % input image 
└── labels/                   % semantic segmentation labels

Mapillary: Download Mapillary images and labels from Mapillary. Please follow the dataset directory structure:

<MAPILLARY_DIR>/              % Mapillary dataset root
├── training                  % Training subset 
 └── images                     % input image
 └── labels                     % semantic segmentation labels
├── validation                % Validation subset
 └── images                     % input image
 └── labels                     % semantic segmentation labels

Synthia: Download Synthia images and labels from SYNTHIA-RAND-CITYSCAPES and split it following SPLIT-DATA. Please follow the dataset directory structure:

<SYNTHIA>/                 % Synthia dataset root
├── RGB/                   % input image 
└── GT/                    % semantic segmentation labels

Trained models

The trained models are available here.

Running FAMix

Style mining

python3 patch_PIN.py \
  --dataset <dataset_name> \
  --data_root <dataset_root> \
  --resize_feat \
  --save_dir <path_for_learnt_parameters_saving>

Training

python3 main.py \
--dataset <dataset_name> \
--data_root <dataset_root> \
--total_itrs  40000 \
--batch_size 8 \
--val_interval 750 \
--transfer \
--data_aug \
--ckpts_path <path_to_save_checkpoints> \
--path_for_stats <path_for_mined_styles>

Evaluation

python3 main.py \
--dataset <dataset_name> \
--data_root <dataset_root> \
--ckpt <path_to_tested_model> \
--test_only \
--ACDC_sub <ACDC_subset_if_tested_on_ACDC>

Inference & Visualization

To test any model on any image and visualize the output, please add the images to predict_test directory and run:

python3 predict.py \
--ckpt <ckpt_path> \
--save_val_results_to <directory_for_saved_output_images>

License

FAMix is released under the Apache 2.0 license.

Acknowledgement

The code is based on this implementation of DeepLabv3+, and uses code from CLIP, PODA and RobustNet.

↑ back to top