Unsloth Docs | Unsloth Documentation

🦥Unsloth Docs

Unsloth is an open-source framework for running and training models.

Unsloth lets you run and train AI models on your own local hardware.

Our docs will guide you through running & training your own model locally.

Get started Our GitHub

Run and train Google's new Gemma 4 models!

Introducing Unsloth Studio

A new open, no-code web UI to train and run LLMs.

New Qwen3.5 Small & Medium LLMs are here!

Run the new 4B and 120B models by NVIDIA.

Train MoE LLMs 12x faster with less VRAM.

Learn to run local LLMs via Claude & OpenAI.

Run & fine-tune the new 80B coding model.

Run & fine-tune 30B model for agentic coding.

Unsloth streamlines local training, inference, data, and deployment

Unsloth lets you run and train models for text, audio, embedding, vision and more. Unsloth provides many key features for both inference and training:

Search + download + run any model like GGUFs, LoRA adapters, safetensors.

Train and RL 500+ models ~2x faster with ~70% less VRAM (no accuracy loss)
Supports full fine-tuning, pre-training, 4-bit, 16-bit and FP8 training.
Observability: Monitor training live, track loss, GPU usage, customize graphs
Multi-GPU works but a much better version is coming!

Unsloth supports MacOS, Linux, Windows, NVIDIA, Intel and CPU setups. See: Unsloth Requirements. Use the same command to update:

Use our official Docker image: unsloth/unsloth which currently works for Windows, WSL and Linux. MacOS support coming soon.

What is Fine-tuning and RL? Why?

Fine-tuning an LLM customizes its behavior, enhances domain knowledge, and optimizes performance for specific tasks. By fine-tuning a pre-trained model (e.g. Llama-3.1-8B) on a dataset, you can:

Update Knowledge: Introduce new domain-specific information.
Customize Behavior: Adjust the model’s tone, personality, or response style.
Optimize for Tasks: Improve accuracy and relevance for specific use cases.

Reinforcement Learning (RL) is where an "agent" learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties.

Action: What the model generates (e.g. a sentence).
Reward: A signal indicating how good or bad the model's action was (e.g. did the response follow instructions? was it helpful?).
Environment: The scenario or task the model is working on (e.g. answering a user’s question).

Example fine-tuning or RL use-cases:

Enables LLMs to predict if a headline impacts a company positively or negatively.
Can use historical customer interactions for more accurate and custom responses.
Fine-tune LLM on legal texts for contract analysis, case law research, and compliance.

You can think of a fine-tuned model as a specialized agent designed to do specific tasks more effectively and efficiently. Fine-tuning can replicate all of RAG's capabilities, but not vice versa.

curl -fsSL https://unsloth.ai/install.sh | sh

irm https://unsloth.ai/install.ps1 | iex

unsloth studio -H 0.0.0.0 -p 8888