mgoin - Overview

Skip to content

Navigation Menu

Sign in

Appearance settings

Pinned Loading

  1. A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 75.9k 15.4k

  2. Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python 3k 474

  3. Achieve state of the art inference performance with modern accelerators on Kubernetes

    Shell 3k 397

  4. Sparsity-aware deep learning inference runtime for CPUs

    Python 3.2k 190

  5. RISC-V OS in Rust with hardware support for SiFive's HiFive1 board

    Rust

  6. Implementations of bitmask compression for weight sparsity in PyTorch

    Python 4 1