A suite of 26 single-header C++17 libraries for integrating large language models into native applications. Each library is a self-contained .hpp file -- drop in what you need, define one implementation macro, and ship. No Python, no SDKs, no package manager required.
Start Here
Just want to call an LLM? → llm-stream
Building a chatbot? → llm-chat + llm-retry
Building RAG? → llm-rag + llm-embed + llm-rank
Need production observability? → llm-log + llm-trace + llm-cost
The Suite
Core -- foundational primitives every LLM app needs
| Library | Description | Deps |
|---|---|---|
| llm-stream | Stream OpenAI & Anthropic responses via SSE | libcurl |
| llm-retry | Retry with exponential backoff + circuit breaker | None |
| llm-cost | Token counting + cost estimation for 6 models | None |
| llm-cache | LRU response cache -- skip identical API calls | None |
| llm-format | JSON schema enforcement + structured output | None |
| llm-json | Recursive-descent JSON parser and builder | None |
Data -- move, retrieve, and reshape information
| Library | Description | Deps |
|---|---|---|
| llm-embed | Text embeddings + cosine similarity + vector store | libcurl |
| llm-rag | Retrieval-augmented generation pipeline | libcurl |
| llm-rank | BM25 + LLM passage reranking, hybrid mode | libcurl† |
| llm-compress | Context compression: truncate, sliding window, summarize | None* |
| llm-parse | Offline HTML/markdown parsing, chunking, TextStats | None |
| llm-batch | Batch processing with thread pool, rate limiting, checkpointing | libcurl |
Ops -- observe, test, and operate at scale
| Library | Description | Deps |
|---|---|---|
| llm-log | Structured JSONL logging for every LLM call | None |
| llm-trace | RAII span tracing with OTLP JSON export | None |
| llm-pool | Concurrent request pool with priority queue + rate limiting | None |
| llm-mock | Mock LLM provider for unit testing -- zero network | None |
| llm-eval | N-run evaluation + consistency scoring + model comparison | libcurl |
| llm-ab | A/B testing with Welch t-test and Cohen d | libcurl |
App -- build complete user-facing features
| Library | Description | Deps |
|---|---|---|
| llm-chat | Multi-turn conversation manager with token-budget truncation | libcurl |
| llm-agent | Tool-calling agent loop (OpenAI function calling) | libcurl |
| llm-vision | Multimodal image+text for OpenAI and Anthropic | libcurl |
| llm-template | Mustache-style prompt templating | None |
| llm-router | Route prompts to the right model by complexity | None |
| llm-guard | PII detection + prompt injection scoring -- fully offline | None |
| llm-audio | Whisper transcription, translation, and TTS | libcurl |
| llm-finetune | Fine-tuning job lifecycle: upload, create, poll, manage models | libcurl |
*llm-compress requires libcurl only for the optional Summarize strategy. †llm-rank requires libcurl only for LLM-based reranking; local BM25 mode has zero deps.
Quickstart
Libraries compose naturally. Here is a production-ready pattern using llm-log, llm-retry, and llm-stream together:
#define LLM_LOG_IMPLEMENTATION #include "llm_log.hpp" #define LLM_RETRY_IMPLEMENTATION #include "llm_retry.hpp" #define LLM_STREAM_IMPLEMENTATION #include "llm_stream.hpp" int main() { llm::Logger logger("calls.jsonl"); llm::Config cfg; cfg.api_key = std::getenv("OPENAI_API_KEY"); cfg.model = "gpt-4o-mini"; const std::string prompt = "Explain backpressure in one paragraph."; auto log_id = logger.log_request(prompt, cfg.model); auto result = llm::with_retry<std::string>([&]() -> std::string { std::string output; llm::stream_openai(prompt, cfg, [&](std::string_view tok) { std::cout << tok << std::flush; output += tok; }, [](const llm::StreamStats& s) { std::cout << "\n[" << s.token_count << " tokens, " << s.tokens_per_sec << " tok/s]\n"; } ); return output; }); logger.log_response(log_id, result); }
Another example -- guard, route, and chat together:
#define LLM_GUARD_IMPLEMENTATION #include "llm_guard.hpp" #define LLM_ROUTER_IMPLEMENTATION #include "llm_router.hpp" #define LLM_CHAT_IMPLEMENTATION #include "llm_chat.hpp" int main() { // 1. Check input for PII / injection auto guard = llm::scan(user_input); if (!guard.safe) user_input = guard.scrubbed; // 2. Route to the right model llm::RouterConfig rcfg; rcfg.strategy = llm::RoutingStrategy::Balanced; rcfg.models = {{"gpt-4o-mini", 0.15, 0.5, 0.7, 40}, {"gpt-4o", 5.0, 1.0, 0.9, 100}}; auto decision = llm::Router(rcfg).route(user_input); // 3. Send with conversation memory llm::ChatConfig ccfg; ccfg.api_key = std::getenv("OPENAI_API_KEY"); ccfg.model = decision.model_name; llm::Conversation conv(ccfg); std::cout << conv.chat(user_input) << "\n"; }
Installation
Each library is a single .hpp file. Copy what you need:
# Core [](https://star-history.com/#Mattbusel/llm-cpp) curl -O https://raw.githubusercontent.com/Mattbusel/llm-stream/main/include/llm_stream.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-retry/main/include/llm_retry.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-cost/main/include/llm_cost.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-cache/main/include/llm_cache.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-format/main/include/llm_format.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-json/main/include/llm_json.hpp # Data [](https://star-history.com/#Mattbusel/llm-cpp) curl -O https://raw.githubusercontent.com/Mattbusel/llm-embed/main/include/llm_embed.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-rag/main/include/llm_rag.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-rank/main/include/llm_rank.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-compress/main/include/llm_compress.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-parse/main/include/llm_parse.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-batch/main/include/llm_batch.hpp # Ops [](https://star-history.com/#Mattbusel/llm-cpp) curl -O https://raw.githubusercontent.com/Mattbusel/llm-log/main/include/llm_log.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-trace/main/include/llm_trace.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-pool/main/include/llm_pool.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-mock/main/include/llm_mock.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-eval/main/include/llm_eval.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-ab/main/include/llm_ab.hpp # App [](https://star-history.com/#Mattbusel/llm-cpp) curl -O https://raw.githubusercontent.com/Mattbusel/llm-chat/main/include/llm_chat.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-agent/main/include/llm_agent.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-vision/main/include/llm_vision.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-template/main/include/llm_template.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-router/main/include/llm_router.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-guard/main/include/llm_guard.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-audio/main/include/llm_audio.hpp curl -O https://raw.githubusercontent.com/Mattbusel/llm-finetune/main/include/llm_finetune.hpp
In exactly one .cpp file per library, define the implementation macro before including:
#define LLM_STREAM_IMPLEMENTATION #define LLM_RETRY_IMPLEMENTATION #define LLM_LOG_IMPLEMENTATION #include "llm_stream.hpp" #include "llm_retry.hpp" #include "llm_log.hpp"
All other translation units just #include without the macro.
Requirements
| Requirement | Detail |
|---|---|
| C++ standard | C++17 or later |
| Compiler | GCC, Clang, MSVC -- all supported |
| External deps | libcurl for network libraries (see table above). All others: zero deps. |
| Build system | Any. Works with CMake, Make, Bazel, MSVC, plain g++. |
License
All 26 libraries: MIT -- Copyright (c) 2026 Mattbusel.
Related Projects by @Mattbusel
- Special-Relativity-in-Financial-Modeling -- C++ special relativity / Lorentz transform applied to OHLCV financial modeling
- LLMTokenStreamQuantEngine -- C++ engine combining LLM token streams with quantitative trading signals
- llm-stream -- Single-header C++ streaming library for OpenAI and Anthropic APIs