A native Swift server that serves MLX models with an OpenAI-compatible API. No Python runtime required — compiles to a single binary that runs on Apple Silicon.
Features
- 🚀 Native Swift — compiled binary, no Python dependency
- 🍎 Apple Silicon optimized — uses Metal GPU via MLX
- 🔌 OpenAI-compatible API — drop-in replacement for local inference
- 📡 Streaming support — SSE streaming for real-time token generation
- 🤗 HuggingFace models — loads any MLX-format model directly
Quick Start
# Build swift build -c release # Run (downloads model on first launch) .build/release/mlx-server \ --model mlx-community/Qwen2.5-3B-Instruct-4bit \ --port 5413
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Server health + loaded model |
/v1/models |
GET | List available models |
/v1/chat/completions |
POST | Chat completions (streaming & non-streaming) |
Usage Examples
# Health check curl http://localhost:5413/health # Chat completion curl http://localhost:5413/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "mlx-community/Qwen2.5-3B-Instruct-4bit", "messages": [{"role": "user", "content": "Hello!"}] }' # Streaming curl http://localhost:5413/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "mlx-community/Qwen2.5-3B-Instruct-4bit", "stream": true, "messages": [{"role": "user", "content": "Hello!"}] }'
CLI Options
| Option | Default | Description |
|---|---|---|
--model |
(required) | HuggingFace model ID or local path |
--port |
5413 |
Port to listen on |
--host |
127.0.0.1 |
Host to bind |
--max-tokens |
2048 |
Max tokens per request |
Metal Shader Library
MLX requires mlx.metallib to be co-located with the binary for GPU compute. If you encounter a "Failed to load the default metallib" error:
# Extract from official MLX Python package python3 -m venv /tmp/mlx_venv /tmp/mlx_venv/bin/pip install mlx cp /tmp/mlx_venv/lib/python3.*/site-packages/mlx/lib/mlx.metallib .build/release/
Requirements
- macOS 14.0+
- Apple Silicon (M1/M2/M3/M4/M5)
- Xcode Command Line Tools
- Metal Toolchain (
xcodebuild -downloadComponent MetalToolchain)
Dependencies
- mlx-swift — Apple MLX framework for Swift
- mlx-swift-lm — Language model support
- Hummingbird — Swift HTTP server
- swift-argument-parser — CLI argument parsing
License
MIT