The vortex-datafusion crate integrates Vortex as a native DataFusion FileFormat, supporting
both reads and writes with filter, projection, and limit pushdown.
Setup#
Add the dependency:
[dependencies] vortex-datafusion = "<version>"
Register the Vortex format with a SessionContext:
use std::sync::Arc; use datafusion::datasource::provider::DefaultTableFactory; use datafusion::execution::SessionStateBuilder; use datafusion::prelude::SessionContext; use datafusion_common::GetExt; use object_store::memory::InMemory; use crate::VortexFormatFactory; let factory = Arc::new(VortexFormatFactory::new()); let state = SessionStateBuilder::new() .with_default_features() .with_table_factory( factory.get_ext().to_uppercase(), Arc::new(DefaultTableFactory::new()), ) .with_file_formats(vec![factory]) .build(); let ctx = SessionContext::new_with_state(state).enable_url_table();
Reading Vortex Files#
SQL#
Create an external table and query it:
ctx.sql( "CREATE EXTERNAL TABLE my_table \ (name VARCHAR NOT NULL, age INT NOT NULL) \ STORED AS vortex \ LOCATION '/demo/'", ) .await?;
Rust API#
You can also register a ListingTable directly:
let ctx = SessionContext::new(); let format = Arc::new(VortexFormat::new(session)); let table_url = ListingTableUrl::parse( filepath .to_str() .ok_or_else(|| vortex_err!("Path is not valid UTF-8"))?, )?; let config = ListingTableConfig::new(table_url) .with_listing_options( ListingOptions::new(format).with_session_config_options(ctx.state().config()), ) .infer_schema(&ctx.state()) .await?; let listing_table = Arc::new(ListingTable::try_new(config)?); ctx.register_table("vortex_tbl", listing_table as _)?;
Writing Vortex Files#
Write query results to Vortex using INSERT INTO:
ctx.sql( "INSERT INTO my_table VALUES \ ('Alice', 30), ('Bob', 25), ('Charlie', 35), ('Diana', 28)", ) .await? .collect() .await?;
Partitioned writes are supported — DataFusion automatically creates subdirectories for each partition value.
Querying#
Filters and projections are pushed down into the Vortex scan:
let result = ctx .sql("SELECT name, age FROM my_table WHERE age > 28 ORDER BY age") .await? .collect() .await?;
Pushdown Support#
The integration pushes the following operations into the Vortex scan:
Projections — only referenced columns are read and decompressed.
Filters — comparison (
=,<,>), logical (AND,OR,NOT),IN,LIKE,IS NULL, and cast expressions are evaluated during the scan. Unsupported filters fall back to DataFusion post-scan evaluation.Limits — applied at the scan level when no filter is present.
File pruning — files are eliminated without being opened based on partition values and file-level column statistics (min/max).