Supported Models — TensorRT LLM

Architecture

Model

HuggingFace Example

BertForSequenceClassification

BERT-based

textattack/bert-base-uncased-yelp-polarity

DeciLMForCausalLM

Nemotron

nvidia/Llama-3_1-Nemotron-51B-Instruct

DeepseekV3ForCausalLM

DeepSeek-V3

deepseek-ai/DeepSeek-V3

DeepseekV32ForCausalLM

DeepSeek-V3.2

deepseek-ai/DeepSeek-V3.2

Exaone4ForCausalLM

EXAONE 4.0

LGAI-EXAONE/EXAONE-4.0-32B

ExaoneMoEForCausalLM

K-EXAONE

LGAI-EXAONE/K-EXAONE-236B-A23B

Gemma3ForCausalLM

Gemma 3

google/gemma-3-1b-it

Glm4MoeForCausalLM

GLM-4.5, GLM-4.6, GLM-4.7

THUDM/GLM-4-100B-A10B

Glm4MoeLiteForCausalLM [6]

GLM-4.7-Flash

zai-org/GLM-4.7-Flash

GptOssForCausalLM

GPT-OSS

openai/gpt-oss-120b

LlamaForCausalLM

Llama 3.1, Llama 3, Llama 2, LLaMA

meta-llama/Meta-Llama-3.1-70B

Llama4ForConditionalGeneration

Llama 4

meta-llama/Llama-4-Scout-17B-16E-Instruct

MiniMaxM2ForCausalLM

MiniMax M2/M2.1

MiniMaxAI/MiniMax-M2

MistralForCausalLM

Mistral

mistralai/Mistral-7B-v0.1

MixtralForCausalLM

Mixtral

mistralai/Mixtral-8x7B-v0.1

MllamaForConditionalGeneration

Llama 3.2

meta-llama/Llama-3.2-11B-Vision

NemotronForCausalLM

Nemotron-3, Nemotron-4, Minitron

nvidia/Minitron-8B-Base

NemotronHForCausalLM

Nemotron-3-Nano

nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8

NemotronNASForCausalLM

NemotronNAS

nvidia/Llama-3_3-Nemotron-Super-49B-v1

Phi3ForCausalLM

Phi-4

microsoft/Phi-4

Qwen2ForCausalLM

QwQ, Qwen2

Qwen/Qwen2-7B-Instruct

Qwen2ForProcessRewardModel

Qwen2-based

Qwen/Qwen2.5-Math-PRM-7B

Qwen2ForRewardModel

Qwen2-based

Qwen/Qwen2.5-Math-RM-72B

Qwen3ForCausalLM

Qwen3

Qwen/Qwen3-8B

Qwen3MoeForCausalLM

Qwen3MoE

Qwen/Qwen3-30B-A3B

Qwen3NextForCausalLM

Qwen3Next

Qwen/Qwen3-Next-80B-A3B-Thinking

Qwen3_5MoeForCausalLM [5]

Qwen3.5-MoE

Qwen/Qwen3.5-397B-A17B