GPUStack

Pinned Loading

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

Python 4.7k 488
Collection of Dockerfiles to build images for various inference services across different accelerated backends.

Dockerfile 11 9
Provides a unified interface to detect GPU resources and manages GPU workloads.

Python 12 14
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.

Go 253 24
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

Python 201 33