Modular Documentation

The Modular Platform accelerates AI inference and abstracts hardware complexity. Using our Docker container, you can deploy a GenAI model from Hugging Face with an OpenAI-compatible endpoint on a wide range of hardware.

And if you need to customize the model or tune a GPU kernel, Modular provides a depth of model extensibility and GPU programmability that you won't find anywhere else.

Get started

python

from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="EMPTY")

completion = client.chat.completions.create(
  model="google/gemma-3-27b-it",
  messages=[
    {"role": "user", "content": "Who won the world series in 2020?"}
  ],
)

print(completion.choices[0].message.content)

Learning tools

500+ models supported

We're on a mission to make open source AI models as fast and easy to use as possible. Every model in our repo has been optimized using MAX Graph to ensure performance and portability across any architecture.

View Model Repo