GitHub - rlaope/airML: Rust-based ML runtime for Acceleration

A lightweight ML runtime that runs ONNX models without Python. Fast, portable, and efficient.

Features

Single Binary: Deploy ML models with a single ~50MB binary
Fast Cold Start: 0.01-0.05s startup time (100x faster than Python)
Apple Silicon Acceleration: Native CoreML/Metal/Neural Engine support
ONNX Support: Run models exported from PyTorch, TensorFlow, and more
Zero Dependencies: No Python, no virtual environments, no package managers
NLP Support: Text tokenization and embedding generation

Installation

macOS (Apple Silicon) - Recommended

# 1. Download airml
curl -L https://github.com/rlaope/airML/releases/latest/download/airml-macos-aarch64.tar.gz | tar xz
sudo mv airml /usr/local/bin/

# 2. Download ONNX Runtime (required)
curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.23.1/onnxruntime-osx-arm64-1.23.1.tgz | tar xz -C /usr/local/lib

# 3. Set environment variable (add to ~/.zshrc for persistence)
export ORT_DYLIB_PATH=/usr/local/lib/onnxruntime-osx-arm64-1.23.1/lib/libonnxruntime.dylib

macOS (Intel)

curl -L https://github.com/rlaope/airML/releases/latest/download/airml-macos-x86_64.tar.gz | tar xz
sudo mv airml /usr/local/bin/

curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.23.1/onnxruntime-osx-x86_64-1.23.1.tgz | tar xz -C /usr/local/lib
export ORT_DYLIB_PATH=/usr/local/lib/onnxruntime-osx-x86_64-1.23.1/lib/libonnxruntime.dylib

Linux (x86_64)

curl -L https://github.com/rlaope/airML/releases/latest/download/airml-linux-x86_64.tar.gz | tar xz
sudo mv airml /usr/local/bin/

curl -L https://github.com/microsoft/onnxruntime/releases/download/v1.23.1/onnxruntime-linux-x64-1.23.1.tgz | tar xz -C /usr/local/lib
export ORT_DYLIB_PATH=/usr/local/lib/onnxruntime-linux-x64-1.23.1/lib/libonnxruntime.so

From Source

git clone https://github.com/rlaope/airML.git
cd airML
cargo build --release --features coreml,nlp

Verify Installation

Quick Start

Image Classification

# Run classification on an image
airml run -m resnet50.onnx -i cat.jpg -l imagenet_labels.txt

# Output:
# Top 5 predictions:
# --------------------------------------------------
#  281  95.23% ======================================== tabby
#  282   3.12% === tiger cat
#  285   0.89% = Egyptian cat

Text Embeddings

# Generate text embeddings
airml embed -m sentence-transformer.onnx -t tokenizer.json --text "Hello world"

# Output:
# {
#   "text": "Hello world",
#   "dimension": 384,
#   "embedding": [0.123456, 0.234567, ...]
# }

Benchmarking

# Benchmark inference performance
airml bench -m model.onnx -n 100 -p neural-engine

# Output:
# Mean latency:     12.34 ms
# Throughput:       81.00 inferences/sec

System Info

# Check available providers
airml system

# Output:
# OS: macos
# Architecture: aarch64
# Apple Silicon: true
# Available providers: cpu, coreml

CLI Reference

`airml run`

Run inference on an input image.

airml run --model <MODEL> --input <INPUT> [OPTIONS]

Options:
  -m, --model <MODEL>       Path to ONNX model file
  -i, --input <INPUT>       Path to input file (image)
  -l, --labels <LABELS>     Path to labels file
  -k, --top-k <N>           Top predictions to show [default: 5]
  -p, --provider <PROVIDER> Execution provider (auto, cpu, coreml, neural-engine)
      --preprocess <PRESET> Preprocessing (imagenet, clip, yolo, none)
      --raw                 Output raw tensor values

`airml embed`

Generate text embeddings (requires nlp feature).

airml embed --model <MODEL> --tokenizer <TOKENIZER> --text <TEXT> [OPTIONS]

Options:
  -m, --model <MODEL>          ONNX embedding model
  -t, --tokenizer <TOKENIZER>  tokenizer.json file
      --text <TEXT>            Text to embed
      --max-length <N>         Max sequence length [default: 512]
  -p, --provider <PROVIDER>    Execution provider
      --output <FORMAT>        Output format (json, raw)
      --normalize              L2 normalize embeddings

`airml info`

Display model information.

airml info --model <MODEL> [-v]

`airml bench`

Benchmark inference performance.

airml bench --model <MODEL> [OPTIONS]

Options:
  -n, --iterations <N>     Iterations [default: 100]
  -w, --warmup <N>         Warmup iterations [default: 10]
  -p, --provider <PROVIDER> Execution provider
      --shape <SHAPE>      Input shape (e.g., "1,3,224,224")

`airml system`

Display system capabilities.

Execution Providers

Provider	Platform	Hardware	Flag
CPU	All	Any CPU	(default)
CoreML	macOS	Apple Silicon	`--features coreml`
Neural Engine	macOS	M1/M2/M3 ANE	`--features coreml`

# Build with specific providers
cargo build --release                      # CPU only
cargo build --release --features coreml    # + CoreML
cargo build --release --features nlp       # + NLP
cargo build --release --features coreml,nlp # All features

Performance

Benchmarked on Apple M2 with ResNet50:

Provider	Latency	Throughput
CPU	~50ms	~20 inf/s
CoreML (All)	~15ms	~65 inf/s
Neural Engine	~8ms	~125 inf/s

Metric	airML	Python (PyTorch)
Binary Size	~50MB	~2GB
Cold Start	0.01-0.05s	2-5s
Memory Usage	~100MB	~500MB+

Using as a Library

use airml_core::{InferenceEngine, SessionConfig};
use airml_preprocess::ImagePreprocessor;
use airml_providers::CoreMLProvider;

fn main() -> anyhow::Result<()> {
    // Configure with CoreML
    let providers = vec![CoreMLProvider::default().neural_engine_only().into_dispatch()];
    let config = SessionConfig::new().with_providers(providers);

    // Load model
    let mut engine = InferenceEngine::from_file_with_config("model.onnx", config)?;

    // Preprocess and run
    let input = ImagePreprocessor::imagenet().load_and_process("image.jpg")?;
    let outputs = engine.run(input.into_dyn())?;

    Ok(())
}

Embedding Models in Binary

use airml_embed::EmbeddedModel;

static MODEL: &[u8] = include_bytes!("model.onnx");

fn main() -> anyhow::Result<()> {
    let engine = EmbeddedModel::new(MODEL).into_engine()?;
    // Use engine...
    Ok(())
}

Project Structure

airML/
├── crates/
│   ├── airml-core/        # Inference engine (ONNX Runtime wrapper)
│   ├── airml-preprocess/  # Image/text preprocessing
│   ├── airml-providers/   # Execution providers (CPU, CoreML)
│   └── airml-embed/       # Model embedding utilities
├── src/                   # CLI binary
│   ├── main.rs
│   ├── cli.rs             # Argument parsing
│   └── commands/          # Command implementations
├── docs/                  # Documentation
│   ├── ARCHITECTURE.md    # Internal architecture
│   ├── TUTORIAL.md        # Step-by-step tutorials
│   └── API.md             # API reference
└── models/                # Test models (gitignored)

Documentation

Architecture - Internal design and data flow
Tutorial - Step-by-step guides
API Reference - Complete API documentation

License

MIT License - see LICENSE for details.

Maintainer

@rlaope - piyrw9754@gmail.com

Contributing

See CONTRIBUTING.md for guidelines.