GitHub - lispking/kvdb: A lightweight, high-performance embedded key-value database written in Zig, featuring B-tree indexing, Write-Ahead Logging (WAL) for durability, and ACID transactions.

An experimental embedded key-value database written in Zig, built around page-based storage, sorted leaf-page indexing, WAL record logging, and a small transaction API.

Project Status

KVDB is currently best viewed as an early-stage embedded storage engine.

What works today:

persistent page-based storage with CRUD operations
multi-page B-tree insert/search/iteration support
WAL record writing with CRC32 checksums
explicit transaction APIs plus compaction and CLI tooling
basic automated tests and CI

Important current limitations:

multi-page deletes work only while each touched non-root leaf keeps at least one entry; borrow/merge/root-shrink are not implemented yet
automatic startup recovery replays committed WAL records during Database.open
rollback is implemented by reloading the database file, not by targeted undo
compression is not implemented

Features

Implemented Today

Page-Based Storage: 4KB fixed-size pages optimized for simple file I/O
Sorted Leaf-Page Indexing: Keys are kept sorted inside the current root leaf page and searched with binary search
Basic Transaction API: One active transaction per database with explicit commit() / abort() methods
WAL Record Logging: Insert/delete/commit/abort records include CRC32 checksums
In-Memory Page Cache: Reduces repeated disk reads for loaded pages
Database Compaction: Rebuilds the database into a fresh file, validates logical contents, then atomically replaces the original
Zero External Dependencies: Pure Zig implementation with no external libraries required

Data Limits

Arbitrary Binary Data: Store any byte sequence as keys and values
Maximum Key Size: 1KB per key
Maximum Value Size: 2KB per value (half page size)
64-bit Page Identifiers: On-disk structures use 64-bit page IDs

Current Constraints

Multi-Page Inserts: Root and internal-node split propagation now allow the tree to grow beyond one page
Multi-Page Deletes: Deletes work while each touched non-root leaf keeps at least one entry; deletes that would require borrow/merge/root-shrink currently return NodeEmpty
Leaf Update Repacking: Repeated overwrites now repack the target leaf so updates do not keep leaking dead payload bytes
Recovery Wiring: Database.open replays committed WAL records and clears the WAL after successful recovery
Simplified Rollback: abort() reloads the database instead of doing targeted undo

Architecture

Storage Architecture

+---------------------------------------------------+
|                  Database File                     |
+---------------------------------------------------+
|  Page 0 (Metadata)  |  Page 1+ (B-tree Pages)      |
|  - Magic number     |  - Node headers              |
|  - Version          |  - Key/value data            |
|  - Root page ID     |  - Child page references     |
|  - Reserved fields  |                              |
+---------------------------------------------------+

The metadata format now tracks both the freelist head and the highest page ID grown from the file, so freed pages can be reused before the database extends on disk.

B-Tree Structure

The on-disk page format is now wired for multi-page growth during insert, while delete remains intentionally staged until rebalancing lands.

Leaf Nodes: Store key-value pairs directly
Internal Nodes: Multi-page insert/search/iteration now route through separator keys and child pointers
Current Root Capacity: The original root still starts as a leaf page, then promotes in place to an internal root on overflow
Sorted Keys: Keys are maintained in sorted order for binary search inside each node
Space Management: Key/value data grows upward from the end of the page
Delete Limitation: Deletes that would empty a non-root leaf are rejected with NodeEmpty until borrow/merge/root-shrink support is implemented

Each B-tree node layout:

[Node Header (4 bytes)]
[KeyInfo array (64 * 8 bytes)]
[Key/Value data (grows upward from end)]

Write-Ahead Log (WAL)

The WAL currently records write operations and stores enough information for validation and future recovery work:

[WAL Record Header (11 bytes)]
  - checksum: u32      (CRC32 of record data)
  - record_type: u8    (insert/delete/commit/abort)
  - key_len: u16       (length of key)
  - value_len: u32     (length of value)
[Key data (variable)]
[Value data (variable, optional)]

Record Types:

insert: Log a key-value insertion
delete: Log a key deletion
commit: Mark transaction as committed
abort: Mark transaction as rolled back

Automatic startup recovery now replays committed WAL records during Database.open; uncommitted and aborted WAL entries are discarded. A truncated tail header is ignored as an incomplete final record, while checksum mismatches and partial record payloads fail recovery.

Page Cache

The pager maintains an in-memory cache of recently accessed pages:

Pages are loaded from disk on first access
Modified pages marked as "dirty" and flushed on commit
Cache reduces disk I/O for frequently accessed data

Download

Pre-compiled binaries are available on the Releases page.

Pre-built Binaries

Platform	Architecture	Download
Linux	x86_64	kvdb-latest-x86_64-linux.tar.gz
Linux	ARM64	kvdb-latest-aarch64-linux.tar.gz
macOS	x86_64	kvdb-latest-x86_64-macos.tar.gz
macOS	Apple Silicon	kvdb-latest-aarch64-macos.tar.gz
Windows	x86_64	kvdb-latest-x86_64-windows.zip

Quick Install

Linux/macOS:

# Download latest release
curl -LO https://github.com/lispking/kvdb/releases/latest/download/kvdb-latest-x86_64-linux.tar.gz

# Extract
tar xzvf kvdb-latest-x86_64-linux.tar.gz
cd kvdb-latest-x86_64-linux

# Run
./kvdb-cli --help

Windows (PowerShell):

# Download latest release
Invoke-WebRequest -Uri "https://github.com/lispking/kvdb/releases/latest/download/kvdb-latest-x86_64-windows.zip" -OutFile "kvdb.zip"

# Extract
Expand-Archive -Path "kvdb.zip" -DestinationPath "."
cd kvdb-latest-x86_64-windows

# Run
.\kvdb-cli.exe --help

Quick Start

Prerequisites

If building from source:

Zig 0.15.2 or later
Unix-like system (macOS, Linux) or Windows

Installation

# Clone the repository
git clone https://github.com/lispking/kvdb.git
cd kvdb

# Install git hooks (optional, for development)
zig build install-hooks

# Build the library and CLI tool
zig build

Basic Usage

const std = @import("std");
const kvdb = @import("kvdb");

pub fn main() !void {
    var gpa = std.heap.DebugAllocator(.{}).init;
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Open database (creates if not exists)
    var db = try kvdb.open(allocator, "mydb.db");
    defer db.close();

    // Store data
    try db.put("name", "Alice");
    try db.put("age", "30");

    // Retrieve data
    if (try db.get("name")) |value| {
        defer allocator.free(value);
        std.debug.print("name = {s}\n", .{value});
    }

    // Delete data
    try db.delete("age");
}

API Usage

Opening and Closing

// Open with default options
var db = try kvdb.open(allocator, "path/to/db");
defer db.close();

// Open with custom options
var db = try Database.open(allocator, "path/to/db", .{
    .enable_wal = true,
    .fsync_policy = .always,
});

Basic Operations

// Insert or update a key-value pair
try db.put("key", "value");

// Retrieve a value (returns null if not found)
if (try db.get("key")) |value| {
    defer allocator.free(value);  // Caller must free
    // use value...
}

// Check if key exists
const exists = try db.contains("key");

// Delete a key
try db.delete("key");

Current Write Behavior:

put() and delete() do not require beginTransaction()
writes update the current database handle immediately in memory
durability happens later when dirty pages are flushed, such as during commit() or close()
reads through the same handle observe the updated in-memory state immediately

Transactions

// Begin transaction
const txn = try db.beginTransaction();

try {
    // Perform operations
    try db.put("key1", "value1");
    try db.put("key2", "value2");
    
    // Commit changes
    try txn.commit();
} catch |_| {
    // Abort on error
    try txn.abort();
}

Current Transaction Semantics:

Only one active transaction at a time per database handle
beginTransaction() does not create snapshots or isolation; it only establishes an explicit commit/abort boundary for the current handle
put() and delete() still modify the current handle's in-memory pages immediately
commit() appends a commit marker, flushes the handle's dirty pages, and clears the WAL
abort() appends an abort marker, clears the WAL, and reloads the database file, discarding the handle's unflushed in-memory changes
startup recovery treats WAL records as logical batches: insert / delete records stay pending until commit, abort discards the pending batch, and trailing incomplete batches are ignored
automatic startup recovery replays committed WAL records during Database.open
a truncated final WAL header is treated as an incomplete tail, while checksum mismatches and partial record payloads fail recovery
multi-page deletes succeed only when no touched non-root leaf becomes empty; deletes that would require borrow/merge/root-shrink currently fail with NodeEmpty

Recommendation: For predictable behavior in the current engine, group related writes inside an explicit transaction and avoid mixing earlier unflushed writes with a later abort().

Iteration

var iter = try db.iterator();
var count: usize = 0;

while (iter.next()) |entry| {
    std.debug.print("{s} = {s}\n", .{ entry.key, entry.value });
    count += 1;
}

Note: Current implementation iterates in sorted order across all reachable leaf pages.

Statistics

const stats = db.stats();
std.debug.print("Pages: {d}\n", .{stats.page_count});
std.debug.print("Page size: {d} bytes\n", .{stats.page_size});
std.debug.print("Total size: {d} bytes\n", .{stats.db_size});

CLI Tool

The kvdb-cli tool provides command-line access to the database.

Usage

kvdb-cli <database-file> <command> [args...]

Commands

Command	Description	Example
`get <key>`	Get value by key	`kvdb-cli my.db get name`
`put <key> <value>`	Set key-value pair	`kvdb-cli my.db put name Bob`
`delete <key>`	Delete a key	`kvdb-cli my.db delete name`
`list`	List all entries	`kvdb-cli my.db list`
`stats`	Show database statistics	`kvdb-cli my.db stats`
`inspect`	Show metadata and structural tree summary	`kvdb-cli my.db inspect`
`export <file>`	Stream all logical entries into a portable binary dump	`kvdb-cli my.db export backup.kvdbx`
`import <file>`	Load entries from a portable binary dump in one transaction	`kvdb-cli my.db import backup.kvdbx`
`compact`	Rewrite live data into a compacted database file	`kvdb-cli my.db compact`
`verify`	Validate metadata, tree ordering, freelist, and WAL records	`kvdb-cli my.db verify`

Examples

# Create database and insert data
kvdb-cli my.db put language Zig
kvdb-cli my.db put version "0.15.0"

# Query data
kvdb-cli my.db get language
# Output: Zig

# List all entries
kvdb-cli my.db list
# Output:
# language = Zig
# version = 0.15.0
#
# Total: 2 entries

# Check statistics
kvdb-cli my.db stats
# Output:
# Database Statistics:
#   Pages: 2
#   Page Size: 4096 bytes
#   Database Size: 8192 bytes (0.01 MB)

# Verify on-disk structure
kvdb-cli my.db verify
# Output:
# Verification OK
#   Tree pages checked: 1
#   Entries checked: 2
#   WAL records checked: 0

# Inspect metadata and tree shape
kvdb-cli my.db inspect
# Output:
# Database
#   Pages: 2
#   Page Size: 4096 bytes
#   Database Size: 8192 bytes (0.01 MB)
# Metadata
#   Root Page ID: 1
#   Freelist Head: 18446744073709551615
#   Freelist Pages: 0
#   Last Page ID: 1
# B-Tree
#   Height: 1
#   Nodes: 1
#   Leaf Nodes: 1
#   Internal Nodes: 0
#   Entries: 2

# Export logical contents to a binary-safe dump file
kvdb-cli my.db export backup.kvdbx
# Output:
# Exported 2 entries

# Import the dump into a fresh database
kvdb-cli restored.db import backup.kvdbx
# Output:
# Imported 2 entries

Building and Testing

Build Commands

# Build library and CLI tool
zig build

# Build in release mode
zig build -Doptimize=ReleaseFast

# Build specific target
zig build -Dtarget=x86_64-linux-gnu

Testing

# Run all tests
zig build test

# Run with verbose output
zig test src/kvdb.zig

Benchmark Suite

# Run the benchmark suite
zig build bench

# Build the benchmark binary without running it
zig build
./zig-out/bin/benchmark

The current benchmark binary reports these deterministic workloads under both fsync_policy = .always and fsync_policy = .batch:

sequential inserts
random inserts
point lookups
scans
updates
deletes
compaction

Each run uses fixed operation counts and a fixed PRNG seed so results are comparable across changes. Comparing the paired policy rows shows the durability/latency tradeoff without changing workload shape.

FFI Example

A minimal C example lives at examples/c_api_example.c and uses the exported C API directly. A minimal Python ctypes example lives at examples/python_ctypes_example.py.

# Build the static library
zig build

# Compile the C example against the installed archive
zig cc examples/c_api_example.c zig-out/lib/libkvdb.a -o c_api_example

# Run it
./c_api_example

# Run the Python ctypes example
python3 examples/python_ctypes_example.py

The example calls kvdb_open, kvdb_put, kvdb_get, kvdb_free, kvdb_close, and kvdb_status_code. When kvdb_get returns a non-null pointer, the caller owns that buffer and must release it with kvdb_free.

Mutating C API calls return stable numeric status codes instead of a generic -1:

KVDB_STATUS_OK = 0
KVDB_STATUS_INVALID_ARGUMENT = 1
KVDB_STATUS_NOT_FOUND = 2
KVDB_STATUS_TRANSACTION_CONFLICT = 3
KVDB_STATUS_STORAGE_ERROR = 4
KVDB_STATUS_WAL_ERROR = 5
KVDB_STATUS_INTERNAL_ERROR = 255

Use kvdb_status_code(...) from C to compare against the exported enum values without hard-coding numbers in callers.

Storage Test Matrix

Area	Core invariants	Current coverage	Missing high-risk coverage
`constants.zig`	Metadata layout stays stable and fits within one page	`constants: metadata layout`	Format-compatibility checks beyond struct size
`pager.zig`	Page allocation, freelist reuse, and persisted page count survive reopen	`pager: basic operations`	Dirty-page edge cases, invalid page handling, deeper reclaim integration
`btree.zig`	Root leaf inserts, lookups, deletes, and sorted order remain correct	`btree: basic operations`	Duplicate update/delete flows, `NodeFull`, key/value size boundaries, multi-page search
`wal.zig`	WAL records append and decode correctly with record-type boundaries	`wal: basic operations`, `wal: checksum mismatch is corruption`, `wal: truncated value payload is corruption`	Invalid record type handling, broader replay-path coverage
`kvdb.zig`	End-to-end CRUD, reopen persistence, compaction validation, and explicit commit path stay correct	`kvdb: basic operations`, `kvdb: transaction commit`, `kvdb: replay committed wal on open`, `kvdb: ignore uncommitted wal on open`, `kvdb: ignore aborted wal on open`, `kvdb: replay delete on open`, `kvdb: replay mixed wal batches on open`, `kvdb: replay is idempotent across reopen`, `kvdb: ignore truncated wal tail on open`, `kvdb: checksum corruption fails recovery on open`, `kvdb: compact preserves live key-value pairs`	Abort reload behavior, broader compaction correctness, WAL-disabled mode

Storage Test Checklist

Use this checklist when changing storage code:

verify restart persistence for any path that changes on-disk state
cover duplicate insert/update/delete flows when touching write logic
cover key/value boundary sizes and invalid arguments when touching validation
cover NodeFull behavior when touching B-tree insertion or page layout
cover checksum failure and truncated-tail handling when touching WAL parsing or recovery
compare live data before/after compaction or reopen when touching file replacement, freelist reuse, or reload paths
name tests with a subsystem prefix such as test "kvdb: ..." or test "wal: ..."

Development Setup

# Install pre-commit hooks
zig build install-hooks

# Format code
zig fmt src/ examples/

# Check formatting (read-only)
zig fmt --check src/ examples/

Pre-commit Hook

The pre-commit hook automatically:

Checks code formatting
Builds the project
Runs all tests

If any check fails, the commit is blocked. To bypass (not recommended):

git commit --no-verify -m "message"

Creating Releases

To create a new release with pre-built binaries:

# Tag a new version
git tag -a v1.0.0 -m "Release version 1.0.0"

# Push the tag (triggers release workflow)
git push origin v1.0.0

The Release workflow will automatically:

Build binaries for all supported platforms (Linux x86_64/ARM64, macOS x86_64/Apple Silicon, Windows x86_64)
Create a GitHub Release with downloadable archives
Generate release notes with installation instructions

Release Artifacts:

File	Description
`kvdb-vX.Y.Z-x86_64-linux.tar.gz`	Linux x86_64 binary
`kvdb-vX.Y.Z-aarch64-linux.tar.gz`	Linux ARM64 binary
`kvdb-vX.Y.Z-x86_64-macos.tar.gz`	macOS Intel binary
`kvdb-vX.Y.Z-aarch64-macos.tar.gz`	macOS Apple Silicon binary
`kvdb-vX.Y.Z-x86_64-windows.zip`	Windows x86_64 binary

Project Structure

kvdb/
├── build.zig              # Build configuration
├── build.zig.zon          # Package manifest
├── README.md              # This file
├── .gitignore             # Git ignore patterns
├── scripts/
│   └── pre-commit         # Git pre-commit hook
├── src/
│   ├── kvdb.zig           # Main database API (public interface)
│   ├── constants.zig      # Error codes, constants, data structures
│   ├── pager.zig          # Page management and I/O
│   ├── btree.zig          # B-tree index implementation
│   ├── wal.zig            # Write-ahead logging
│   └── cli.zig            # Command-line interface
└── examples/
    └── basic.zig          # Usage example

Module Descriptions

Module	Description	Key Types
`kvdb.zig`	Public API and database management	`Database`, `Transaction`, `Options`
`constants.zig`	Constants and error definitions	`Error`, `PageId`, `MetaData`, `WalRecordType`
`pager.zig`	Page-level storage operations	`Pager`, `Page`, `CacheEntry`
`btree.zig`	B-tree index operations	`BTree`, `BTreeNode`, `Iterator`
`wal.zig`	Write-ahead logging for durability	`Wal`, `Record`, `Iterator`
`cli.zig`	Command-line tool	CLI argument parsing and commands

Configuration

Database Options

pub const Options = struct {
    /// Enable Write-Ahead Logging record writing and startup recovery.
    enable_wal: bool = true,
    /// Choose whether commit/checkpoint paths force data to stable storage.
    fsync_policy: FsyncPolicy = .always,
};

enable_wal controls whether KVDB writes WAL records and performs startup replay at all.

fsync_policy controls how aggressively KVDB asks the OS to persist durability boundaries:

.always: call file.sync() at WAL commit/checkpoint and page-flush boundaries for the strongest current durability behavior
.batch: keep the same write ordering but skip explicit sync calls, which can improve throughput while allowing recent writes to remain in OS buffers after a crash or power loss

Note: WAL changes replay semantics only when enabled; fsync_policy changes durability cost, not logical recovery rules.

Constants

Constant	Value	Description
`PAGE_SIZE`	4096	Size of each database page
`MAX_KEY_SIZE`	1024	Maximum key size in bytes
`MAX_VALUE_SIZE`	2048	Maximum value size in bytes
`MAX_KEYS`	64	Maximum entries per B-tree node
`DB_VERSION`	1	Database format version
`MAGIC`	0x4B5644425F5A4947	File format identifier ("KVDB_ZIG")

Performance Considerations

Optimization Tips

Batch Operations: Group multiple puts in a single transaction
Reuse Connections: Keep database open for multiple operations
Page Size: The current engine uses fixed 4KB pages internally
Disable WAL: Only disable WAL if you intentionally want to skip WAL record logging
Tune fsync policy: Use .always for the strongest current durability behavior and .batch when benchmark throughput matters more than surviving the most recent buffered writes

Known Limitations

Inserts and deletes currently operate on a single root leaf page
Iteration is limited to the current root page
Automatic WAL replay runs during Database.open
Rollback is implemented via full reload, not targeted undo
No compression support yet
No encryption support
Single-writer model (no concurrent transactions)

Error Handling

The database uses a comprehensive error set:

pub const Error = error{
    // Storage errors
    DiskFull, CorruptedData, InvalidPageId, PageNotFound, PageOverflow,
    // Transaction errors  
    TransactionAlreadyActive, NoActiveTransaction, TransactionConflict,
    // B-tree errors
    KeyNotFound, KeyAlreadyExists, NodeFull, NodeEmpty,
    // WAL errors
    WalCorrupted, WalReplayFailed,
    // I/O errors
    IoError, InvalidArgument, DatabaseClosed,
};

All operations return Error!T and should be handled with try or catch.

License

MIT License - See LICENSE file for details

Contributing

Contributions are welcome! Please ensure:

Code follows Zig style guidelines (zig fmt)
All tests pass (zig build test)
New features include tests
Documentation is updated

Acknowledgments

Inspired by SQLite's architecture and B-tree implementation
Built with Zig 0.15.2
Thanks to the Zig community for feedback and support

Table of Contents