An extensible ecosystem for compressed columnar data. Spans in-memory arrays, on-disk file formats, over-the-wire protocols, and integrations with query engines — all built around the latest research from the database community.
Where to start#
Read & write Vortex files
Get started with Vortex in Python, Rust, or Java. Convert from Parquet, compress your data, and query it.
Use with a query engine
Integrate Vortex with DataFusion, DuckDB, Spark, Trino, or Ray for accelerated queries over compressed data.
Understand the architecture
Learn how DTypes, Arrays, Encodings, Layouts, and the Scan API fit together as building blocks.
Extend Vortex
Write your own encodings, layouts, compute functions, or extension types from Rust or Python.
Create an engine integration
Build a query engine connector or data source using the Scan API, C FFI, or C++ wrapper.
Internals
Explore the crate architecture, async runtime, session system, and integration internals. Build and benchmark locally.
Highlights#
Compressed arrays: Operate directly on compressed data with encodings like FastLanes, FSST, and ALP — no decompression needed for many operations.
Extensible file format: Zero-allocation reads, FlatBuffer metadata for O(1) column access, and optional WASM decompression kernels for forward compatibility.
Query engine integration: Filter and projection pushdown through the Scan API, with native integrations for DataFusion, DuckDB, Spark, Trino, and Ray.
Language bindings: First-class support for Python (PyO3), Java (JNI + Spark/Trino connectors), and C/C++ (FFI).