Open-source content fingerprinting and provenance engine. Named after Russell Kirsch, creator of the first digital image.
What It Does
Kirsch registers digital images, proves who published them first, and detects reuse — even after transformation (compression, resize, cropping). It combines perceptual hashing, neural embeddings, and a tamper-evident provenance chain into a single system.
Core Engine (kirsch-core/)
Rust. No Python in production.
| Capability | How |
|---|---|
| Fingerprinting | pHash (DCT 64-bit) + CLIP ViT-B/32 (512-dim) + SHA-256 |
| Detection | Two-tier: pHash fast filter → CLIP cosine precision |
| Provenance | Append-only SHA-256 hash chain with payload integrity |
| Storage | JSONL registry + atomic JSON chain (crash-safe) |
| Search | HNSW approximate nearest neighbor (O(log N)) |
| Anchoring | External chain-head checkpoints (tail-deletion protection) |
| Inference | ONNX Runtime, session pool, optional CUDA/TensorRT |
| API | REST server (axum), file watcher for streaming ingestion |
Quick Start
cd kirsch-core # Fingerprint an image cargo run --release -- fingerprint --model path/to/clip.onnx --image photo.jpg # Detect reuse cargo run --release -- detect --model path/to/clip.onnx --image suspect.jpg # Run the API server cargo run --release --features server -- serve --model path/to/clip.onnx --db ./data
Feature Flags
| Flag | What it enables |
|---|---|
cuda |
NVIDIA GPU inference via ONNX Runtime CUDA EP |
tensorrt |
NVIDIA TensorRT inference |
server |
REST API (axum + tower-http) |
watcher |
Directory watcher for auto-registration |
Performance
Measured on CPU (release, target-cpu=native):
- Fingerprint: 92ms (CLIP inference dominates)
- pHash distance: 2ns
- Chain verify (1K records): 519µs
- Throughput: ~11 fingerprints/sec single-session, scales linearly with pool size
Project Structure
analysis/ Market research and landscape assessment
architecture/ System design documents
experiments/ Validation notebooks (Python) — pHash, CLIP, chain integrity
kirsch-core/ Production Rust engine (this is the product)
src/
fingerprint.rs pHash + CLIP + SHA-256 + session pool
detect.rs Two-tier matching engine
chain.rs Tamper-evident provenance chain
ann.rs HNSW nearest-neighbor index
storage.rs Persistent JSONL registry
anchor.rs External anchoring
server.rs REST API (feature-gated)
watcher.rs File watcher (feature-gated)
batch.rs Parallel batch ingestion
preprocess.rs CLIP image preprocessing
Design Principles
- Open-standard — C2PA Content Credentials compatible
- Developer-first — CLI, Rust library, REST API
- Cryptographically verifiable — hash chain, not just pattern matching
- Real-time — streaming detection, not batch reports
- No vendor lock-in — content-addressed, open formats
License
Apache-2.0