GitHub - SharpAI/mlx-server: A native Swift server that serves MLX models with an OpenAI-compatible API. No Python runtime required — compiles to a single binary that runs on Apple Silicon.

A native Swift server that serves MLX models with an OpenAI-compatible API. No Python runtime required — compiles to a single binary that runs on Apple Silicon.

Features

  • 🚀 Native Swift — compiled binary, no Python dependency
  • 🍎 Apple Silicon optimized — uses Metal GPU via MLX
  • 🔌 OpenAI-compatible API — drop-in replacement for local inference
  • 📡 Streaming support — SSE streaming for real-time token generation
  • 🤗 HuggingFace models — loads any MLX-format model directly

Quick Start

# Build
swift build -c release

# Run (downloads model on first launch)
.build/release/mlx-server \
  --model mlx-community/Qwen2.5-3B-Instruct-4bit \
  --port 5413

API Endpoints

Endpoint Method Description
/health GET Server health + loaded model
/v1/models GET List available models
/v1/chat/completions POST Chat completions (streaming & non-streaming)

Usage Examples

# Health check
curl http://localhost:5413/health

# Chat completion
curl http://localhost:5413/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2.5-3B-Instruct-4bit",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Streaming
curl http://localhost:5413/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen2.5-3B-Instruct-4bit",
    "stream": true,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

CLI Options

Option Default Description
--model (required) HuggingFace model ID or local path
--port 5413 Port to listen on
--host 127.0.0.1 Host to bind
--max-tokens 2048 Max tokens per request

Metal Shader Library

MLX requires mlx.metallib to be co-located with the binary for GPU compute. If you encounter a "Failed to load the default metallib" error:

# Extract from official MLX Python package
python3 -m venv /tmp/mlx_venv
/tmp/mlx_venv/bin/pip install mlx
cp /tmp/mlx_venv/lib/python3.*/site-packages/mlx/lib/mlx.metallib .build/release/

Requirements

  • macOS 14.0+
  • Apple Silicon (M1/M2/M3/M4/M5)
  • Xcode Command Line Tools
  • Metal Toolchain (xcodebuild -downloadComponent MetalToolchain)

Dependencies

License

MIT