High-performance YOLO inference library written in Rust. This library provides a fast, safe, and efficient interface for running YOLO models using ONNX Runtime, with an API designed to match the Ultralytics Python package.
โจ Features
- ๐ High Performance - Pure Rust implementation with zero-cost abstractions
- ๐ฏ Ultralytics API Compatible -
Results,Boxes,Masks,Keypoints,Probsclasses matching Python - ๐ง Multiple Backends - CPU, CUDA, TensorRT, CoreML, OpenVINO, and more via ONNX Runtime
- ๐ฆ Dual Use - Library for Rust projects + standalone CLI application
- ๐ท๏ธ Auto Metadata - Automatically reads class names, task type, and input size from ONNX models
- ๐ผ๏ธ Multiple Sources - Images, directories, glob patterns, video files, webcams, and streams
- ๐ชถ Minimal Dependencies - No PyTorch, no heavy ML frameworks - just 5 core crates
๐ Quick Start
Prerequisites
- Rust 1.85+ (install via rustup, edition 2024 required)
- A YOLO ONNX model (export from Ultralytics:
yolo export model=yolo11n.pt format=onnx)
Installation
# Clone the repository git clone https://github.com/ultralytics/inference.git cd inference # Build release binary (not installed globally) cargo build --release # Install CLI globally from this git checkout (Cargo default location) cargo install --path . --locked # Install CLI globally with custom features # Minimal build (no default features) cargo install --path . --locked --no-default-features # Enable video support cargo install --path . --locked --features video # Enable multiple accelerators cargo install --path . --locked --features "cuda,tensorrt"
cargo install places binaries in Cargo's default bin directory:
- macOS/Linux:
~/.cargo/bin - Windows:
%USERPROFILE%\\.cargo\\bin
Ensure this directory is in your PATH, then run from anywhere:
ultralytics-inference --help
Export a YOLO Model to ONNX
# Using Ultralytics CLI yolo export model=yolo11n.pt format=onnx # Or with Python from ultralytics import YOLO model = YOLO("yolo11n.pt") model.export(format="onnx")
Run Inference
# With defaults (auto-downloads model and sample images) cargo run --release -- predict # With explicit arguments cargo run --release -- predict --model yolo11n.onnx --source image.jpg # On a directory of images cargo run --release -- predict --model yolo11n.onnx --source assets/ # With custom thresholds cargo run --release -- predict -m yolo11n.onnx -s image.jpg --conf 0.5 --iou 0.45 # With visualization and custom image size cargo run --release -- predict --model yolo11n.onnx --source video.mp4 --show --imgsz 1280 # Save individual frames for video input cargo run --release -- predict --model yolo11n.onnx --source video.mp4 --save-frames # Rectangular inference cargo run --release -- predict --model yolo11n.onnx --source image.jpg --rect
Example Output
# ultralytics-inference predict
WARNING โ ๏ธ 'model' argument is missing. Using default 'model=yolo11n.onnx'.
WARNING โ ๏ธ 'source' argument is missing. Using default images: https://ultralytics.com/images/bus.jpg, https://ultralytics.com/images/zidane.jpg
Ultralytics 0.0.8 ๐ Rust ONNX FP32 CPU
Using ONNX Runtime CPUExecutionProvider
YOLO11n summary: 80 classes, imgsz=(640, 640)
image 1/2 /home/ultralytics/inference/bus.jpg: 640x480 640x480 4 persons, 1 bus, 36.4ms
image 2/2 /home/ultralytics/inference/zidane.jpg: 384x640 2 persons, 1 tie, 28.6ms
Speed: 1.5ms preprocess, 32.5ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)
Results saved to runs/detect/predict1
๐ก Learn more at https://docs.ultralytics.com/modes/predict
๐ Usage
As a CLI Tool
# Show help cargo run --release -- help # Show version cargo run --release -- version # Run inference cargo run --release -- predict --model <model.onnx> --source <source>
CLI Options:
| Option | Short | Description | Default |
|---|---|---|---|
--model |
-m |
Path to ONNX model file | yolo11n.onnx |
--source |
-s |
Input source (image, video, webcam index, or URL) | Task dependent Ultralytics URL assets |
--device |
Device to use (cpu, cuda:0, mps, coreml, etc.) | cpu |
|
--conf |
Confidence threshold | 0.25 |
|
--iou |
IoU threshold for NMS | 0.7 |
|
--max-det |
Maximum number of detections | 300 |
|
--imgsz |
Inference image size | Model metadata |
|
--rect |
Enable rectangular inference (minimal padding) | true |
|
--batch |
Batch size for inference | 1 |
|
--half |
Use FP16 half-precision inference | false |
|
--save |
Save annotated results to runs//predict | true |
|
--save-frames |
Save individual frames for video | false |
|
--show |
Display results in a window | false |
|
--verbose |
Show verbose output | true |
Source Options:
| Source Type | Example Input | Description |
|---|---|---|
| Image | image.jpg |
Single image file |
| Directory | images/ |
Directory of images |
| Glob | images/*.jpg |
Glob pattern for images |
| Video | video.mp4 |
Video file |
| Webcam | 0,1 |
Webcam index (0 = default webcam) |
| URL | https://example.com/image.jpg |
Remote image URL |
As a Rust Library
Add to your Cargo.toml:
[dependencies] ultralytics-inference = { git = "https://github.com/ultralytics/inference.git" }
Basic Usage:
use ultralytics_inference::{YOLOModel, InferenceConfig}; fn main() -> Result<(), Box<dyn std::error::Error>> { // Load model - metadata (classes, task, imgsz) is read automatically let mut model = YOLOModel::load("yolo11n.onnx")?; // Run inference let results = model.predict("image.jpg")?; // Process results for result in &results { if let Some(ref boxes) = result.boxes { println!("Found {} detections", boxes.len()); for i in 0..boxes.len() { let cls = boxes.cls()[i] as usize; let conf = boxes.conf()[i]; let name = result.names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown"); println!(" {} {:.2}", name, conf); } } } Ok(()) }
With Custom Configuration:
use ultralytics_inference::{YOLOModel, InferenceConfig}; fn main() -> Result<(), Box<dyn std::error::Error>> { let config = InferenceConfig::new() .with_confidence(0.5) .with_iou(0.45) .with_max_det(300); let mut model = YOLOModel::load_with_config("yolo11n.onnx", config)?; let results = model.predict("image.jpg")?; Ok(()) }
Accessing Detection Data:
if let Some(ref boxes) = result.boxes { // Bounding boxes in different formats let xyxy = boxes.xyxy(); // [x1, y1, x2, y2] let xywh = boxes.xywh(); // [x_center, y_center, width, height] let xyxyn = boxes.xyxyn(); // Normalized [0-1] let xywhn = boxes.xywhn(); // Normalized [0-1] // Confidence scores and class IDs let conf = boxes.conf(); // Confidence scores let cls = boxes.cls(); // Class IDs }
Selecting a Device:
use ultralytics_inference::{Device, InferenceConfig, YOLOModel}; fn main() -> Result<(), Box<dyn std::error::Error>> { // Select a device (e.g., CUDA, MPS, CPU) let device = Device::Cuda(0); // Configure the model to use this device let config = InferenceConfig::new().with_device(device); let mut model = YOLOModel::load_with_config("yolo11n.onnx", config)?; let results = model.predict("image.jpg")?; Ok(()) }
๐๏ธ Project Structure
inference/
โโโ src/
โ โโโ lib.rs # Library entry point and public exports
โ โโโ main.rs # CLI application
โ โโโ model.rs # YOLOModel - ONNX session and inference
โ โโโ results.rs # Results, Boxes, Masks, Keypoints, Probs, Obb
โ โโโ preprocessing.rs # Image preprocessing (letterbox, normalize, SIMD)
โ โโโ postprocessing.rs # Detection post-processing (NMS, decode, SIMD)
โ โโโ metadata.rs # ONNX model metadata parsing
โ โโโ source.rs # Input source handling (images, video, webcam)
โ โโโ task.rs # Task enum (Detect, Segment, Pose, Classify, Obb)
โ โโโ inference.rs # InferenceConfig
โ โโโ batch.rs # Batch processing pipeline
โ โโโ device.rs # Device enum (CPU, CUDA, MPS, CoreML, etc.)
โ โโโ download.rs # Model and asset downloading
โ โโโ annotate.rs # Image annotation (bounding boxes, masks, keypoints)
โ โโโ io.rs # Result saving (images, videos)
โ โโโ logging.rs # Logging macros
โ โโโ error.rs # Error types
โ โโโ utils.rs # Utility functions (NMS, IoU)
โ โโโ cli/ # CLI module
โ โ โโโ mod.rs # CLI module exports
โ โ โโโ args.rs # CLI argument parsing
โ โ โโโ predict.rs # Predict command implementation
โ โโโ visualizer/ # Real-time visualization (minifb)
โโโ tests/
โ โโโ integration_test.rs # Integration tests
โโโ assets/ # Test images
โ โโโ bus.jpg
โ โโโ zidane.jpg
โโโ Cargo.toml # Rust dependencies and features
โโโ LICENSE # AGPL-3.0 License
โโโ README.md # This file
โก Hardware Acceleration
Enable hardware acceleration by adding features to your build:
# NVIDIA GPU (CUDA) cargo build --release --features cuda # NVIDIA TensorRT cargo build --release --features tensorrt # Apple CoreML (macOS/iOS) cargo build --release --features coreml # Intel OpenVINO cargo build --release --features openvino # Multiple features cargo build --release --features "cuda,tensorrt"
Available Features:
| Feature | Description |
|---|---|
cuda |
NVIDIA CUDA support |
tensorrt |
NVIDIA TensorRT optimization |
coreml |
Apple CoreML (macOS/iOS) |
openvino |
Intel OpenVINO |
onednn |
Intel oneDNN |
rocm |
AMD ROCm |
directml |
DirectML (Windows) |
nnapi |
Android Neural Networks API |
xnnpack |
XNNPACK (cross-platform) |
nvidia |
Convenience: CUDA + TensorRT |
intel |
Convenience: OpenVINO + oneDNN |
mobile |
Convenience: NNAPI + CoreML + QNN |
๐ฆ Dependencies
One of the key benefits of this library is minimal dependencies - no PyTorch, TensorFlow, or heavy ML frameworks required.
Core Dependencies (always included)
| Crate | Purpose |
|---|---|
ort |
ONNX Runtime bindings |
ndarray |
N-dimensional arrays |
image |
Image loading/decoding |
jpeg-decoder |
JPEG decoding |
fast_image_resize |
SIMD-optimized resizing |
half |
FP16 support |
lru |
LRU cache for preprocessing LUT |
wide |
SIMD for fast preprocessing |
Optional Dependencies (for --save feature)
| Crate | Purpose |
|---|---|
imageproc |
Drawing boxes and shapes |
ab_glyph |
Text rendering (embedded font) |
Optional Dependencies (for Video & Visualization)
| Crate | Purpose |
|---|---|
minifb |
Window creation and buffer display |
video-rs |
Video decoding/encoding (ffmpeg) |
Video Support (FFmpeg)
Video features require FFmpeg (7 or 8) installed on your system:
# macOS brew install ffmpeg # Ubuntu/Debian apt-get install -y ffmpeg libavutil-dev libavformat-dev libavfilter-dev libavdevice-dev libclang-dev # Build with video support cargo build --release --features video
To build without annotation support (smaller binary):
cargo build --release --no-default-features
๐งช Testing
# Run all tests cargo test # Run with output cargo test -- --nocapture # Run specific test cargo test test_boxes_creation
๐ Performance
Benchmarks on Apple M4 MacBook Pro (CPU, ONNX Runtime):
YOLO11n Detection Model (640x640)
| Precision | Model Size | Preprocess | Inference | Postprocess | Total |
|---|---|---|---|---|---|
| FP32 | 10.2 MB | ~9ms | ~21ms | <1ms | ~31ms |
| FP16 | 5.2 MB | ~9ms | ~24ms | <1ms | ~34ms |
Key findings:
- FP16 models are ~50% smaller (5.2 MB vs 10.2 MB)
- FP32 is slightly faster on CPU (~21ms vs ~24ms) due to CPU's native FP32 support
- FP16 requires upcasting to FP32 for computation on most CPUs, adding overhead
- Use FP32 for CPU inference, FP16 for GPU (where it provides speedup)
Threading Optimization
ONNX Runtime threading is set to auto (num_threads: 0) which lets ORT choose optimal thread count:
- Manual threading (4 threads): ~40ms inference
- Auto threading (0 = ORT decides): ~21ms inference
๐ฎ Roadmap
Completed
- Detection, Segmentation, Pose, Classification, OBB inference
- ONNX model metadata parsing (auto-detect classes, task, imgsz)
- Hardware acceleration support (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK)
- Ultralytics-compatible Results API (
Boxes,Masks,Keypoints,Probs,Obb) - Multiple input sources (images, directories, globs, URLs)
- Video file support and webcam/RTSP streaming
- Image annotation and visualization
- FP16 half-precision inference
- Batch inference support
- Rectangular inference support and optimization
- Class filtering support
In Progress
- Python bindings (PyO3)
- WebAssembly (WASM) support for browser inference
๐ก Contributing
Ultralytics thrives on community collaboration! We deeply value your contributions.
- Report Issues: Found a bug? Open an issue
- Feature Requests: Have an idea? Share it
- Pull Requests: Read our Contributing Guide first
- Feedback: Take our Survey
๐ License
Ultralytics offers two licensing options:
- AGPL-3.0 License: Open-source license for students, researchers, and enthusiasts. See LICENSE.
- Enterprise License: For commercial applications. Contact Ultralytics Licensing.
๐ฎ Contact
- GitHub Issues: Bug reports and feature requests
- Discord: Join our community
- Documentation: docs.ultralytics.com







