Cacheable stable-diffusion.cpp (Fork for Streaming API)
Diffusion model (SD, Flux, Wan, ...) inference in pure C/C++
This repository is a fork of leejet/stable-diffusion.cpp, modified to introduce a Condition Caching (Streaming API). While the upstream repo excels at stateless generation, this fork is specifically enhanced for real-time video generation and high-throughput img2img streaming applications where heavy text encoder re-evaluations (e.g., Qwen/LLM for Flux.2) become devastating bottlenecks.
By leveraging this fork's C API extensions, you can cache prompt conditions and reference images, skipping the LLM layers entirely on subsequent frames.
🚀 What's New in this Fork?
We added the Streaming API Extensions to the C API.
These functions allow you to encode text conditions and reference images exactly once, preserving them in a persistent GGML context. The cached representations can then be looped through sd_img2img_with_cond to radically increase Video-to-Video throughput.
Note: sd_encode_condition requires width and height parameters to match the output resolution of the subsequent sd_img2img_with_cond() call. These values are used for positional embeddings in SDXL/SD3 architectures and are unused but still required for Flux.
- For details on the architecture and caching mechanism: Streaming API Design
📚 Documentation
Detailed documentation tailored for using this repository in your own projects:
- 💻 C API & Streaming API Reference: How to integrate the library into C/C++ projects, and full usage of the Condition Caching API.
- 🐚 Command-Line Interface (CLI) Guide: A complete reference guide for the
sd-clitool. - ⚙️ Build Guide: Instructions on how to compile the project (CMake, CUDA, Vulkan, Metal).
- ⚡ Performance Optimization: Tips for reducing VRAM and increasing generation speed.
(Note: Additional model-specific documentation from the upstream repository is available in the docs/ folder, such as flux.md, sd3.md, lora.md, etc.)
Upstream Features
This fork retains 100% compatibility with all the amazing features developed by the original stable-diffusion.cpp contributors:
- Plain C/C++ implementation based on ggml, working similarly to llama.cpp.
- Super lightweight and without external dependencies.
- Supported Models: SD1.x, SD2.x, SDXL, SD3, FLUX.1/FLUX.2, Qwen-Image, Z-Image, Wan2.1/2.2, PhotoMaker, and more.
- Supported Backends: CPU (AVX2/AVX512), CUDA, Vulkan, Metal, OpenCL, SYCL.
- Supported Formats: Pytorch checkpoints (
.ckpt/.pth), Safetensors (.safetensors), GGUF (.gguf). - Flash Attention for aggressive memory usage optimization.
- LoRA support, ControlNet, LCM, ESRGAN upscaling, and TAESD faster latent decoding.
Quick Start
1. Build from Source
Since you will likely integrate this as a backend for another project, we recommend building from source. For full instructions, see the upstream Build Guide.
# Example: Building with Vulkan acceleration and Shared Libraries (C API) mkdir build && cd build cmake .. -DSD_VULKAN=ON -DSD_BUILD_SHARED_LIBS=ON cmake --build . --config Release # After a successful build, the CLI binary is at: build/bin/sd-cli # The shared library is at: build/stable-diffusion.dll (Windows) or build/libstable-diffusion.so (Linux)
2. Standard CLI Usage
Download a core model file (e.g., v1-5-pruned-emaonly.safetensors from Hugging Face).
./bin/sd-cli -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat"For detailed arguments and use-cases (like img2img or LoRA), check out the CLI Guide.
3. Streaming API Quick Start (C/C++)
The key benefit of this fork is condition caching. Here is a minimal example:
#include "stable-diffusion.h" // 1. Initialize context (once) sd_ctx_params_t ctx_params; sd_ctx_params_init(&ctx_params); ctx_params.diffusion_model_path = "flux-2-klein-4b.gguf"; ctx_params.vae_path = "ae.safetensors"; ctx_params.llm_path = "qwen3-4b.gguf"; ctx_params.flash_attn = true; sd_ctx_t* ctx = new_sd_ctx(&ctx_params); // 2. Encode prompt ONCE (the expensive LLM step) sd_condition_t* cond = sd_encode_condition(ctx, "cinematic oil painting", "", 512, 512); // 3. Process each video frame cheaply (no re-encoding) while (streaming) { sd_image_t frame = get_next_frame(); sd_image_t result = sd_img2img_with_cond(ctx, frame, cond, NULL, 0, 0.75f, 4, 1.0f, -1, NULL); render(result); free(result.data); free(frame.data); } // 4. Cleanup sd_free_condition(cond); free_sd_ctx(ctx);
For a full working example including reference image caching, see examples/stream_img2img/main.cpp.
References
As this is a fork, all credits for the base architecture belong to the respective original project creators:
