Tabrizian - Overview

Pinned Loading

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 13.4k 2.3k
The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 10.6k 1.8k
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.

C++ 673 193
Code for "Adaptive Gradient Quantization for Data-Parallel SGD", published in NeurIPS 2020.

Jupyter Notebook 30 5
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

Python 510 85