GPUStack

Pinned Loading

  1. A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

    Python 4.7k 488

  2. Collection of Dockerfiles to build images for various inference services across different accelerated backends.

    Dockerfile 11 9

  3. Provides a unified interface to detect GPU resources and manages GPU workloads.

    Python 12 14

  4. Review/Check GGUF files and estimate the memory usage and maximum tokens per second.

    Go 253 24

  5. A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.

    Python 201 33