Proposal to integrate into 🤗 Hub by patrickvonplaten · Pull Request #555 · TensorSpeech/TensorFlowTTS

Hi TensorSpeech team! I hereby propose an integration with the HuggingFace model hub 🤗

This integration would allow you to freely download/upload models from/to the Hugging Face Hub: https://huggingface.co/.

Your users could then directly download model weights, etc within Python without having to manually downloads weights.
Taking your fastspeech_2_inference.ipynb example the following diff would show the code could change to be able to directly download weights from the model hub.

import tensorflow as tf

-from tensorflow_tts.inference import AutoConfig
from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(
-    pretrained_path="../tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
+   pretrained_path="tensorspeech/fastspeech2_tts"
)

input_text = "i love you so much."
input_ids = processor.text_to_sequence(input_text)

-config = AutoConfig.from_pretrained("../examples/fastspeech2/conf/fastspeech2.v1.yaml")
fastspeech2 = TFAutoModel.from_pretrained(
-    config=config, 
-    pretrained_path="../examples/fastspeech2/checkpoints/model-150000.h5",
+   pretrained_path="tensorspeech/fastspeech2_tts"
    is_build=True,
    name="fastspeech2"
)

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
)

As an example, I uploaded a fastspeech model to this repo of the HF hub:
I uploaded some weights exemplary to the hub here: https://huggingface.co/patrickvonplaten/tf_tts_fast_speech_2.
If you'd like to add this feature to your library we would obviously change the organization name from patrickvonplaten to tensorspeech.

You can try it out by running the following code:

import tensorflow as tf

from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(pretrained_path="patrickvonplaten/tf_tts_fast_speech_2")

input_text = "i love you so much."
input_ids = processor.text_to_sequence(input_text)

fastspeech2 = TFAutoModel.from_pretrained(
    pretrained_path="patrickvonplaten/tf_tts_fast_speech_2",
    is_build=True,
    name="fastspeech2"
)

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
)

Besides freely storing your model weights, we also provide git version control and download statistics for your models :-) We can also provide you with a hosted inference API where users could try out your models directly on the website.

We've already integrated with a couple of other libraries - you can check them out here:

Sorry for the missing tests in the PR - I just did the minimal changes to showcase you how the integration with the HF hub could look like :-) I'd also be more than happy to add you guys to a Slack channel where we could discuss further.

Cheers,
Patrick & Hugging Face team

Also cc @julien-c