BertForSequenceClassification
|
BERT-based |
textattack/bert-base-uncased-yelp-polarity
|
DeciLMForCausalLM
|
Nemotron |
nvidia/Llama-3_1-Nemotron-51B-Instruct
|
DeepseekV3ForCausalLM
|
DeepSeek-V3 |
deepseek-ai/DeepSeek-V3
|
DeepseekV32ForCausalLM
|
DeepSeek-V3.2 |
deepseek-ai/DeepSeek-V3.2
|
Exaone4ForCausalLM
|
EXAONE 4.0 |
LGAI-EXAONE/EXAONE-4.0-32B
|
ExaoneMoEForCausalLM
|
K-EXAONE |
LGAI-EXAONE/K-EXAONE-236B-A23B
|
Gemma3ForCausalLM
|
Gemma 3 |
google/gemma-3-1b-it
|
Glm4MoeForCausalLM
|
GLM-4.5, GLM-4.6, GLM-4.7 |
THUDM/GLM-4-100B-A10B
|
Glm4MoeLiteForCausalLM [6]
|
GLM-4.7-Flash |
zai-org/GLM-4.7-Flash
|
GptOssForCausalLM
|
GPT-OSS |
openai/gpt-oss-120b
|
LlamaForCausalLM
|
Llama 3.1, Llama 3, Llama 2, LLaMA |
meta-llama/Meta-Llama-3.1-70B
|
Llama4ForConditionalGeneration
|
Llama 4 |
meta-llama/Llama-4-Scout-17B-16E-Instruct
|
MiniMaxM2ForCausalLM
|
MiniMax M2/M2.1 |
MiniMaxAI/MiniMax-M2
|
MistralForCausalLM
|
Mistral |
mistralai/Mistral-7B-v0.1
|
MixtralForCausalLM
|
Mixtral |
mistralai/Mixtral-8x7B-v0.1
|
MllamaForConditionalGeneration
|
Llama 3.2 |
meta-llama/Llama-3.2-11B-Vision
|
NemotronForCausalLM
|
Nemotron-3, Nemotron-4, Minitron |
nvidia/Minitron-8B-Base
|
NemotronHForCausalLM
|
Nemotron-3-Nano |
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
|
NemotronNASForCausalLM
|
NemotronNAS |
nvidia/Llama-3_3-Nemotron-Super-49B-v1
|
Phi3ForCausalLM
|
Phi-4 |
microsoft/Phi-4
|
Qwen2ForCausalLM
|
QwQ, Qwen2 |
Qwen/Qwen2-7B-Instruct
|
Qwen2ForProcessRewardModel
|
Qwen2-based |
Qwen/Qwen2.5-Math-PRM-7B
|
Qwen2ForRewardModel
|
Qwen2-based |
Qwen/Qwen2.5-Math-RM-72B
|
Qwen3ForCausalLM
|
Qwen3 |
Qwen/Qwen3-8B
|
Qwen3MoeForCausalLM
|
Qwen3MoE |
Qwen/Qwen3-30B-A3B
|
Qwen3NextForCausalLM
|
Qwen3Next |
Qwen/Qwen3-Next-80B-A3B-Thinking
|
Qwen3_5MoeForCausalLM [5]
|
Qwen3.5-MoE |
Qwen/Qwen3.5-397B-A17B
|