GPUStack
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
Python 4.7k 488
Collection of Dockerfiles to build images for various inference services across different accelerated backends.
Dockerfile 11 9
Provides a unified interface to detect GPU resources and manages GPU workloads.
Python 12 14
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
Go 253 24
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
Python 201 33