Windows Setup (assumes user installed Python)
Windows Powershell Setup
pip install uv # From this local dir #uv python install 3.11 uv venv -p 3.11.7 #uv init .venv\Scripts\activate uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu # Use a TTS fork with Windows binary wheels #uv pip install coqui-tts uv pip install -r .\pyproject.toml --all-extras
Optional WSL Setup
Claude Code currently only supports Linux so we can set it up in WSL to access the same Windows Folder. This is pretty ugly and leads to redundant packages and ML models being installed. Better current option (until Claude Code officially supports Windows) is probably to run the whole repo and commands in a Linux container and use Vscode Containers to access.
Launch WSL wsl -d Ubuntu-24.04 then run Linux commands
# Fix Windows PATH showing up in Linux echo $PATH | tr ':' '\n' | grep -v "/mnt/c" | tr '\n' ':' | sed 's/:$//' # Create python symlink if needed sudo ln -s /usr/bin/python3 /usr/bin/python sudo apt install -y python3-pip # Tried pathing python/pip to install uv but easier (much uglier) was to just install uv directly curl -LsSf https://astral.sh/uv/install.sh | sh # Reload your shell or add to PATH export PATH="$HOME/.local/bin:$PATH" source ~/.bashrc # Create our WSL .venv with a different directory name uv venv .venv-linux -p 3.11.7 source .venv-linux/bin/activate # May need to run this on subsequent WSL runs before running 'claude' # This might get big because we have similar libraries as Windows (TODO: See # if the ML libraries can be linked to keep size down) uv pip install -r ./pyproject.toml --all-extras
Run
#python -m voice_clone_tts Syn.webm --method local --backend coqui --use-cpu python -m voice_clone_tts "Syn.wav" --method local --backend coqui --use-cpu python -m voice_clone_tts "Syn.wav" --method local --backend coqui --use-cpu --show-speaker-info --clean python -m voice_clone_tts "Single.wav" --num-speakers 1 --method local --backend coqui --use-cpu --show-speaker-info --clean # If using AWS Transcribe output python -m voice_clone_tts "Syn.wav" --aws-transcribe "transcribe.json" --backend coqui --use-cpu --clean python -m voice_clone_tts "Syn.wav" --aws-transcribe "transcribe.json" --backend coqui --use-cpu --use-transcript --clean python -m voice_clone_tts "Syn.wav" --aws-transcribe "transcribe.json" --backend coqui --use-cpu --use-transcript --output-format mp3 --clean #--save-models
Docker attempt
From https://docs.coqui.ai/en/latest/docker_images.html
# Use maintained fork "github.com/idiap/coqui-ai-TTS" that has several fixes #docker pull ghcr.io/coqui-ai/tts-cpu --platform linux/amd64 #docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu docker pull ghcr.io/idiap/coqui-tts-cpu --platform linux/amd64 docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/idiap/coqui-tts-cpu # From within docker container tts --list_models #To get the list of available models # Start a server python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # 148MB Model downloads, but why is ths container so big (12GB) if the model doesn't already exist locally?