mgoin - Overview

Pinned Loading

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 75.9k 15.4k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3k 474
Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 3k 397
Sparsity-aware deep learning inference runtime for CPUs

Python 3.2k 190
RISC-V OS in Rust with hardware support for SiFive's HiFive1 board

Rust
Implementations of bitmask compression for weight sparsity in PyTorch

Python 4 1