GitHub - yasserben/FLOSS: [ICCV 2025] FLOSS: Plug-in Training-free and label-free text template selection that boosts OVSS methods

FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation

This is the official implementation of our paper:
FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation
Yasser Benigmim, Mohammad Fahes, Tuan-Hung Vu, Andrei Bursuc, Raoul de Charette

Overview

Key observation: For each class, there exist individual templates (colored dots) that perform better than the conventional averaged classifier using all 80 templates.

FLOSS challenges the conventional wisdom of using averaged templates in CLIP-based OVSS models by:

Identifying expert templates for each class that outperform averaged classifiers
Using entropy as the unsupervised metric to select these experts without labels
Introducing a novel fusion scheme to combine expert predictions
Providing a plug-and-play solution requiring no training or labels

News

🚀 New (02/2026): FLOSS notebook is released

Notebook

Reproduce FLOSS results with a single end-to-end pipeline, run evaluation and ranking experiments directly in Colab, and replace entropy with your own unsupervised metric to compare outcomes immediately.

Colab: Open FLOSS_Reproducibility.ipynb

Local Jupyter: notebooks/FLOSS_Reproducibility.ipynb

To run it locally:

jupyter notebook notebooks/FLOSS_Reproducibility.ipynb

Installation

Create and activate a conda environment:

conda create -n floss python=3.9
conda activate floss

Install numpy and PyTorch with CUDA support:

# For CUDA 12.4 (tested version)
pip install numpy==1.26.4 torch==2.6.0 torchvision==0.21.0

Install the remaining dependencies:

pip install -r requirements.txt

Note

This code has been tested with CUDA 12.4. Make sure you have the appropriate CUDA version installed on your system.

Data Preparation

Download and process the datasets following the MMSegmentation guide. Place all datasets in ./data with the following structure:

data/
  ├── cityscapes/
  ├── VOCdevkit/VOC2012/  # PASCAL VOC20
  ├── VOCdevkit/VOC2010/  # PASCAL CO59
  ├── ade20k/
  ├── coco_stuff164k/
  ├── mapillary/
  ├── bdd10k/
  └── acdc/              # Night, Fog, Rain, Snow conditions

After downloading, run these conversion scripts:

# COCO-Stuff: Convert annotations to semantic segmentation format
python tools/convert_datasets/coco_stuff16k.py data/coco_stuff164k/ -o data/coco_stuff164k/

# Cityscapes: Generate *labelTrainIds.png
python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8

# PASCAL Context: Convert annotations (requires Detail package)
python tools/convert_datasets/pascal_context.py data/VOCdevkit data/VOCdevkit/VOC2010/trainval_merged.json

# Mapillary: Convert to Cityscapes format and resize validation set
python tools/convert_datasets/mapillary2cityscape.py data/mapillary data/mapillary/cityscapes_trainIdLabel --train_id
python tools/convert_datasets/mapillary_resize.py data/mapillary/validation/images \
    data/mapillary/cityscapes_trainIdLabel/val/label \
    data/mapillary/half/val_img \
    data/mapillary/half/val_label

Evaluating Models

Download the CLIP-DINOiser checkpoint and place it in checkpoints/clip_dinoiser/.

To evaluate the baseline models (without FLOSS):

# Evaluate baseline models (without FLOSS)

# For CLIP-DINOiser
python ./tools/test.py configs/clipdinoiser.py --dataset DATASET

# For MaskCLIP
python ./tools/test.py configs/maskclip.py --dataset DATASET

# For NACLIP
python ./tools/eval_naclip.py --dataset DATASET

To evaluate with FLOSS (requires pre-computed rankings available in rankings/ directory, see Computing Template Rankings to reproduce them):

# For CLIP-DINOiser + FLOSS
python ./tools/test.py configs/clipdinoiser.py --dataset DATASET --mode fusion

# For MaskCLIP + FLOSS
python ./tools/test.py configs/maskclip.py --dataset DATASET --mode fusion

# For NACLIP + FLOSS
python ./tools/eval_naclip.py --dataset DATASET --mode fusion

Replace DATASET with one of: cityscapes, pascalvoc20, pascalco59, ade20k, cocostuff, mapillary, bdd10k, or acdc_night, acdc_snow, acdc_rain acdc_fog.

Computing Template Rankings

To reproduce the rankings:

# For CLIP-DINOiser on a regular size dataset
python ./tools/test.py configs/clipdinoiser.py --dataset DATASET --mode compute_metric --split train

# For CLIP-DINOiser on ADE20K (dividing the 80 templates into groups of 10)
python ./tools/test.py configs/clipdinoiser.py --dataset ade20k --mode compute_metric --split train --id-start 0 --id-end 9
python ./tools/test.py configs/clipdinoiser.py --dataset ade20k --mode compute_metric --split train --id-start 10 --id-end 19
python ./tools/test.py configs/clipdinoiser.py --dataset ade20k --mode compute_metric --split train --id-start 20 --id-end 29
# ... continue with remaining template ranges (30-39, 40-49, etc.)

Tip

When computing metrics, you can use the training set by specifying --split train. However, be aware that processing large datasets like ADE20K and COCO-Stuff requires significant computational resources - ideally a GPU with 40GB of memory 💾. To make this process more manageable, you can process the templates in smaller batches using --id-start and --id-end. For example, instead of processing all 80 templates at once, you can process them in groups (e.g., templates 0-9, then 10-19, etc.). This allows you to parallelize the work across multiple processes, making it more efficient for large-scale datasets.

Results

Method	Cityscapes	VOC20	CO59	ADE20K	Stuff	Avg
MaskCLIP	25.0	61.8	25.5	14.2	17.5	28.7
+ FLOSS	25.8	61.8	26.2	14.9	17.8	29.3
NACLIP	35.5	79.7	35.2	17.4	23.3	38.2
+ FLOSS	37.0	80.2	35.9	18.4	23.6	39.0
CLIP-DINOiser	31.1	80.9	35.9	20.0	24.6	38.5
+ FLOSS	34.6	82.3	36.2	20.7	24.7	39.7

Citation

If you find this work useful, please cite our paper:

@misc{benigmim2025flossfreelunchopenvocabulary,
      title={FLOSS: Free Lunch in Open-vocabulary Semantic Segmentation},
      author={Yasser Benigmim and Mohammad Fahes and Tuan-Hung Vu and Andrei Bursuc and Raoul de Charette},
      year={2025},
      eprint={2504.10487},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.10487},
}

Acknowledgments

We would like to thank the authors of MMSegmentation, CLIP-DINOiser, and NACLIP for making their code publicly available. Our implementation builds upon their excellent work.