GitHub - peva3/SmarterRouter: SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.

SmarterRouter

Intelligent, multi-backend AI router that sits between your application and various LLM providers. It profiles your models, aggregates benchmark data, and intelligently routes each query to the best available model for the task—all locally, all free. Key Benefits:

  • Zero manual model selection - AI automatically picks the right model for each prompt
  • All local, zero cost - No cloud API fees, works with your existing models
  • Production-ready - Monitoring, metrics, and error handling built-in
  • Drop-in replacement - Works with any OpenAI-compatible client

Why SmarterRouter? (vs other LLM proxies)

Feature SmarterRouter OptiLLM ClewdR LLM-API-Proxy Reader
Intelligent Routing ✅ Auto-selects best model ❌ Manual config ❌ Claude-only ❌ Manual routing ❌ URL-only
Multi-Backend Support ✅ Ollama + llama.cpp + OpenAI ❌ OpenAI-only ❌ Claude-only ✅ 100+ providers ❌ URL proxy
Local-first ✅ All local models ⚠️ Cloud proxy ⚠️ Cloud proxy ⚠️ Cloud proxy ⚠️ Cloud proxy
Zero Code Changes ✅ OpenAI-compatible ✅ OpenAI-compatible ✅ OpenAI-compatible ✅ OpenAI-compatible ✅ URL proxy
Production Features ✅ Monitoring + Metrics ✅ Metrics ✅ Dashboard ✅ Resilience ✅ Simple
Learning Capability ✅ Profiles models over time ❌ Static config ❌ Static config ❌ Static config ❌ Static config

Quick Start (5 minutes)

Get up and running with Docker in three commands:

# 1. Clone the repository
git clone https://github.com/peva3/SmarterRouter.git
cd SmarterRouter

# 2. Start with Docker Compose
docker-compose up -d

# 3. Verify it's running
curl http://localhost:11436/health

That's it! SmarterRouter will:

  • ✅ Discover all your Ollama models automatically
  • ✅ Profile each model for performance on your hardware (first run takes 30-60 min)
  • ✅ Start routing queries to the best model

Access the router at: http://localhost:11436

Connect to OpenWebUI

  1. Open OpenWebUI → SettingsConnectionsAdd Connection
  2. Configure:
    • Name: SmarterRouter
    • Base URL: http://localhost:11436/v1
    • API Key: (leave empty)
    • Model: smarterrouter/main
  3. Save and start chatting

SmarterRouter will automatically select the best model for each prompt!


What Gets Automated?

  • Model discovery - Automatically finds all available models from your backend
  • Performance profiling - Tests each model with standardized prompts on your hardware
  • Smart routing - Analyzes prompts and picks the optimal model based on category and complexity
  • VRAM management - Auto-detects all GPUs (NVIDIA, AMD, Intel, Apple Silicon), monitors usage, and unloads models when needed
  • Fallback handling - Automatically retries with backup models if primary fails
  • Response caching - Caches identical prompts for instant responses
  • Continuous learning - Collects user feedback to improve routing decisions

Configuration Basics

All configuration is via the .env file. Copy the template and customize:

cp ENV_DEFAULT .env
nano .env  # edit as needed

Essential settings:

Variable Purpose Default
ROUTER_OLLAMA_URL Your backend URL http://localhost:11434
ROUTER_PROVIDER Backend type: ollama, llama.cpp, openai ollama
ROUTER_QUALITY_PREFERENCE 0.0 (speed) to 1.0 (quality) 0.5
ROUTER_PINNED_MODEL Keep a small model always loaded (optional) (none)
ROUTER_ADMIN_API_KEY Required for production to secure admin endpoints (none)

VRAM monitoring: Enabled by default with auto-detection across NVIDIA, AMD, Intel, and Apple Silicon GPUs. Multi-GPU systems are fully supported. See Configuration Reference for details.

⚠️ Production security: Always set ROUTER_ADMIN_API_KEY in production to protect admin endpoints.

For complete configuration reference, see docs/configuration.md.


Documentation

Getting Started:

In-Depth Guides:

Examples:

Want to see how the sausage is made?

  • DEEPDIVE.md - Architecture, design decisions, and implementation details for the technically curious

Other Files:


Need Help?


License

MIT License - see LICENSE for details.