IST Austria Distributed Algorithms and Systems Lab

Popular repositories Loading

  1. Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

    Python 2.3k 194

  2. FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

    Python 1k 86

  3. Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

    Python 877 118

  4. Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

    Python 280 24

  5. Quantized LLM training in pure CUDA/C++.

    C++ 242 14