druide67 - Overview

Same model. Same Mac. 30 vs 71 tok/s. That's why I built asiai.

🦞 I'm Jean-Marc (druide67) — I build tools for local LLM inference on Apple Silicon.

asiai : Benchmark, monitor & compare 6 inference engines (Ollama, LM Studio, mlx-lm, llama.cpp, vllm-mlx, Exo). One CLI. Real numbers.

Built because my AI agents needed to monitor their own inference. So I gave them asiai's API. They started monitoring themselves.

Bench your claw!

Recent discoveries

  • MLX is 2.3x faster than llama.cpp for MoE architectures on Apple Silicon
  • DeltaNet KV cache stays flat from 64k to 256k context (same VRAM!)
  • Same model, same Mac: 30 tok/s on one engine, 71 tok/s on another

claude-whisper : Your Claude Code instances can now talk to each other. 240 lines of bash, zero daemon. The filesystem is the message bus.

OpenClaw : contributor — multi-agent AI assistant.

Strasbourg, France | asiai.dev | @jmn67 on X | LinkedIn