Supported Models — TensorRT LLM

Architecture	Model	HuggingFace Example
`BertForSequenceClassification`	BERT-based	`textattack/bert-base-uncased-yelp-polarity`
`DeciLMForCausalLM`	Nemotron	`nvidia/Llama-3_1-Nemotron-51B-Instruct`
`DeepseekV3ForCausalLM`	DeepSeek-V3	`deepseek-ai/DeepSeek-V3`
`DeepseekV32ForCausalLM`	DeepSeek-V3.2	`deepseek-ai/DeepSeek-V3.2`
`Exaone4ForCausalLM`	EXAONE 4.0	`LGAI-EXAONE/EXAONE-4.0-32B`
`ExaoneMoEForCausalLM`	K-EXAONE	`LGAI-EXAONE/K-EXAONE-236B-A23B`
`Gemma3ForCausalLM`	Gemma 3	`google/gemma-3-1b-it`
`Glm4MoeForCausalLM`	GLM-4.5, GLM-4.6, GLM-4.7	`THUDM/GLM-4-100B-A10B`
`Glm4MoeLiteForCausalLM` [6]	GLM-4.7-Flash	`zai-org/GLM-4.7-Flash`
`GptOssForCausalLM`	GPT-OSS	`openai/gpt-oss-120b`
`LlamaForCausalLM`	Llama 3.1, Llama 3, Llama 2, LLaMA	`meta-llama/Meta-Llama-3.1-70B`
`Llama4ForConditionalGeneration`	Llama 4	`meta-llama/Llama-4-Scout-17B-16E-Instruct`
`MiniMaxM2ForCausalLM`	MiniMax M2/M2.1	`MiniMaxAI/MiniMax-M2`
`MistralForCausalLM`	Mistral	`mistralai/Mistral-7B-v0.1`
`MixtralForCausalLM`	Mixtral	`mistralai/Mixtral-8x7B-v0.1`
`MllamaForConditionalGeneration`	Llama 3.2	`meta-llama/Llama-3.2-11B-Vision`
`NemotronForCausalLM`	Nemotron-3, Nemotron-4, Minitron	`nvidia/Minitron-8B-Base`
`NemotronHForCausalLM`	Nemotron-3-Nano	`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8`
`NemotronNASForCausalLM`	NemotronNAS	`nvidia/Llama-3_3-Nemotron-Super-49B-v1`
`Phi3ForCausalLM`	Phi-4	`microsoft/Phi-4`
`Qwen2ForCausalLM`	QwQ, Qwen2	`Qwen/Qwen2-7B-Instruct`
`Qwen2ForProcessRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-PRM-7B`
`Qwen2ForRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-RM-72B`
`Qwen3ForCausalLM`	Qwen3	`Qwen/Qwen3-8B`
`Qwen3MoeForCausalLM`	Qwen3MoE	`Qwen/Qwen3-30B-A3B`
`Qwen3NextForCausalLM`	Qwen3Next	`Qwen/Qwen3-Next-80B-A3B-Thinking`
`Qwen3_5MoeForCausalLM` [5]	Qwen3.5-MoE	`Qwen/Qwen3.5-397B-A17B`