Installation
You also need to install the llama-cpp-python package separately. This is because llama-cpp-python does not ship pre-built wheels on PyPi.
Don't worry, you can just run one of the following commands:
Linux/Windows
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpuMacOS with Apple Silicon
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metalUsage
Async FastRTC Demo
After installing orpheus-cpp, install fastrtc and run the following command:
Then go to http://localhost:7860 and you should see the demo.
2025-03-26_10-37-56.mp4
Sync TTS
from orpheus_cpp import OrpheusCpp from scipy.io.wavfile import write orpheus = OrpheusCpp() text = "I really hope the project deadline doesn't get moved up again." # output is a tuple of (sample_rate (24_000), samples (numpy int16 array)) sample_rate, samples = orpheus.tts(text, options={"voice_id": "tara"}) write("output.wav", sample_rate, samples.squeeze())
Streaming Sync
for sample_rate, samples in orpheus.stream_tts_sync(text, options={"voice_id": "tara"}): write("output.wav", sample_rate, samples.squeeze())
Streaming Async
async for sample_rate, samples in orpheus.stream_tts(text, options={"voice_id": "tara"}): write("output.wav", sample_rate, samples.squeeze())
Tips
By default, we wait until 1.5 seconds of audio is generated before yielding the first chunk.
This is to ensure smooth audio streaming at the cost of a longer time to first audio.
Depending on your hardware, you can try to reduce the pre_buffer_size to get a faster time to first chunk.
async for sample_rate, samples in orpheus.stream_tts(text, options={"voice_id": "tara", "pre_buffer_size": 0.5}): write("output.wav", sample_rate, samples.squeeze())
License
orpheus-cpp is distributed under the terms of the MIT license, with an adapted section of code from https://github.com/Lex-au/Orpheus-FastAPI/ that is covered under Apache-2.0 license, in LICENSES/Apache-2.0.txt and noted in place.