WangErXiao - Overview

Skip to content

Navigation Menu

Sign in

Appearance settings

Pinned Loading

  1. A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 71.9k 13.9k

  2. Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python 2.8k 418