mgoin - Overview
Skip to content
Sign in
AI CODE CREATION
GitHub CopilotWrite better code with AI
GitHub SparkBuild and deploy intelligent apps
GitHub ModelsManage and compare prompts
MCP RegistryNewIntegrate external tools
View all features
Sign up
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 75.9k 15.4k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 3k 474
Achieve state of the art inference performance with modern accelerators on Kubernetes
Shell 3k 397
Sparsity-aware deep learning inference runtime for CPUs
Python 3.2k 190
RISC-V OS in Rust with hardware support for SiFive's HiFive1 board
Rust
Implementations of bitmask compression for weight sparsity in PyTorch
Python 4 1