WangErXiao - Overview

WangErXiao - Overview

Skip to content

Pinned Loading

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 71.9k 13.9k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2.8k 418