GitHub - unilight/sheet: Speech Human Evaluation Estimation Toolkit (SHEET)

🗣️ SHEET / MOS-Bench 🎧

Manipulate MOS-Bench with SHEET

MOS-Bench is a benchmark designed to benchmark the generalization abilities of subjective speech quality assessment (SSQA) models.
SHEET stands for the Speech Human Evaluation Estimation Toolkit. SHEET was designed to conduct research experiments with MOS-Bench.

📚 Full Documentation (NEW!) | 📝 arXiv paper(2024) | 🤗 HuggingFace Space demo

MOS-Bench Overview

See this Google Spreadsheet for an overview of the datasets in MOS-Bench.

Sep 2025: MOS-Bench now has 8 training sets and 17 test sets.
Nov 2024: The initial MOS-Bench has 7 training sets and 12 test sets.

Usage guide

There are three usages of SHEET:

I am new to MOS prediction research. I want to train models! → Training guide
I already have my MOS predictor. I just want to do benchmarking! → Benchmarking guide
I just want to use your trained MOS predictor! → Quick start

Quick start

We utilize torch.hub to provide a convenient way to load pre-trained SSQA models and predict scores of wav files or torch tensors.

You can use the _id argument to specify which pre-trained model to use. If not specified, the default model is used. See the list of pre-trained models page for the complete table.

Note

Since SHEET is a on-going project, if you use our pre-trained model in you paper, it is suggested to specify the version. For instance: SHEET SSL-MOS v0.1.0, SHEET SSL-MOS v0.2.5, etc.

Tip

You don't need to install sheet following the installation instructions. However, you might need to install the following:

sheet-sqa
huggingface_hub

# load default pre-trained model
>>> predictor = torch.hub.load("unilight/sheet:v0.2.5", "sheet_ssqa", trust_repo=True, force_reload=True)
# use `_id` to specify which pre-trained model to use
>>> predictor = torch.hub.load("unilight/sheet:v0.2.5", "sheet_ssqa", trust_repo=True, force_reload=True, _id="bvcc/sslmos-wavlm_large/1337")
# if you want to use cuda, use either of the following
>>> predictor = torch.hub.load("unilight/sheet:v0.2.5", "sheet_ssqa", trust_repo=True, force_reload=True, cpu=False)
>>> predictor.model.cuda()

# you can either provide a path to your wav file
>>> predictor.predict(wav_path="/path/to/wav/file.wav")
3.6066928

# or provide a torch tensor with shape [num_samples]
>>> predictor.predict(wav=torch.rand(16000))
1.5806346
# if you put the model on cuda...
>>> predictor.predict(wav=torch.rand(16000).cuda())
1.5806346

Instsallation

Full installation is needed if your goal is to do training.

Editable installation with virtualenv

You don't need to prepare an environment (using conda, etc.) first. The following commands will automatically construct a virtual environment in tools/. When you run the recipes, the scripts will automatically activate the virtual environment.

git clone https://github.com/unilight/sheet.git
cd sheet/tools
make

Information

Citation

If you use the training scripts, benchmarking scripts or pre-trained models from this project, please consider citing the following paper.

@inproceedings{sheet,
  title     = {{SHEET: A Multi-purpose Open-source Speech Human Evaluation Estimation Toolkit}},
  author    = {Wen-Chin Huang and Erica Cooper and Tomoki Toda},
  year      = {2025},
  booktitle = {{Proc. Interspeech}},
  pages     = {2355--2359},
}


@article{huang2024,
      title={MOS-Bench: Benchmarking Generalization Abilities of Subjective Speech Quality Assessment Models}, 
      author={Wen-Chin Huang and Erica Cooper and Tomoki Toda},
      year={2024},
      eprint={2411.03715},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2411.03715}, 
}

Acknowledgements

This repo is greatly inspired by the following repos. Or I should say, many code snippets are directly taken from part of the following repos.

Author

Wen-Chin Huang
Toda Labotorary, Nagoya University
E-mail: wen.chinhuang@g.sp.m.is.nagoya-u.ac.jp