datafusion/datafusion-examples at main · apache/datafusion

DataFusion Examples

This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.

Prerequisites

Run git submodule update --init to init test files.

Running Examples

To run an example, use the cargo run command, such as:

git clone https://github.com/apache/datafusion
cd datafusion
# Download test data
git submodule update --init

# Change to the examples directory
cd datafusion-examples/examples

# Run all examples in a group
cargo run --example <group> -- all

# Run a specific example within a group
cargo run --example <group> -- <subcommand>

# Run all examples in the `dataframe` group
cargo run --example dataframe -- all

# Run a single example from the `dataframe` group
# (apply the same pattern for any other group)
cargo run --example dataframe -- dataframe

Builtin Functions Examples

Group: builtin_functions

Category: Single Process

Subcommand File Path Description
date_time builtin_functions/date_time.rs Examples of date-time related functions and queries
function_factory builtin_functions/function_factory.rs Register CREATE FUNCTION handler to implement SQL macros
regexp builtin_functions/regexp.rs Examples of using regular expression functions

Custom Data Source Examples

Group: custom_data_source

Category: Single Process

Subcommand File Path Description
adapter_serialization custom_data_source/adapter_serialization.rs Preserve custom PhysicalExprAdapter information during plan serialization using PhysicalExtensionCodec interception
csv_json_opener custom_data_source/csv_json_opener.rs Use low-level FileOpener APIs for CSV/JSON
csv_sql_streaming custom_data_source/csv_sql_streaming.rs Run a streaming SQL query against CSV data
custom_datasource custom_data_source/custom_datasource.rs Query a custom TableProvider
custom_file_casts custom_data_source/custom_file_casts.rs Implement custom casting rules
custom_file_format custom_data_source/custom_file_format.rs Write to a custom file format
default_column_values custom_data_source/default_column_values.rs Custom default values using metadata
file_stream_provider custom_data_source/file_stream_provider.rs Read/write via FileStreamProvider for streams

Data IO Examples

Group: data_io

Category: Single Process

Subcommand File Path Description
catalog data_io/catalog.rs Register tables into a custom catalog
json_shredding data_io/json_shredding.rs Implement filter rewriting for JSON shredding
parquet_adv_idx data_io/parquet_advanced_index.rs Create a secondary index across multiple parquet files
parquet_emb_idx data_io/parquet_embedded_index.rs Store a custom index inside Parquet files
parquet_enc data_io/parquet_encrypted.rs Read & write encrypted Parquet files
parquet_enc_with_kms data_io/parquet_encrypted_with_kms.rs Encrypted Parquet I/O using a KMS-backed factory
parquet_exec_visitor data_io/parquet_exec_visitor.rs Extract statistics by visiting an ExecutionPlan
parquet_idx data_io/parquet_index.rs Create a secondary index
query_http_csv data_io/query_http_csv.rs Query CSV files via HTTP
remote_catalog data_io/remote_catalog.rs Interact with a remote catalog

DataFrame Examples

Group: dataframe

Category: Single Process

Subcommand File Path Description
cache_factory dataframe/cache_factory.rs Custom lazy caching for DataFrames using CacheFactory
dataframe dataframe/dataframe.rs Query DataFrames from various sources and write output
deserialize_to_struct dataframe/deserialize_to_struct.rs Convert Arrow arrays into Rust structs

Execution Monitoring Examples

Group: execution_monitoring

Category: Single Process

Subcommand File Path Description
mem_pool_exec_plan execution_monitoring/memory_pool_execution_plan.rs Memory-aware ExecutionPlan with spilling
mem_pool_tracking execution_monitoring/memory_pool_tracking.rs Demonstrates memory tracking
tracing execution_monitoring/tracing.rs Demonstrates tracing integration

External Dependency Examples

Group: external_dependency

Category: Single Process

Subcommand File Path Description
dataframe_to_s3 external_dependency/dataframe_to_s3.rs Query DataFrames and write results to S3
query_aws_s3 external_dependency/query_aws_s3.rs Query S3-backed data using object_store

Flight Examples

Group: flight

Category: Distributed

Subcommand File Path Description
client flight/client.rs Execute SQL queries via Arrow Flight protocol
server flight/server.rs Run DataFusion server accepting FlightSQL/JDBC queries
sql_server flight/sql_server.rs Standalone SQL server for JDBC clients

Proto Examples

Group: proto

Category: Single Process

Subcommand File Path Description
composed_extension_codec proto/composed_extension_codec.rs Use multiple extension codecs for serialization/deserialization
expression_deduplication proto/expression_deduplication.rs Example of expression caching/deduplication using the codec decorator pattern

Query Planning Examples

Group: query_planning

Category: Single Process

Subcommand File Path Description
analyzer_rule query_planning/analyzer_rule.rs Custom AnalyzerRule to change query semantics
expr_api query_planning/expr_api.rs Create, execute, analyze, and coerce Exprs
optimizer_rule query_planning/optimizer_rule.rs Replace predicates via a custom OptimizerRule
parse_sql_expr query_planning/parse_sql_expr.rs Parse SQL into DataFusion Expr
plan_to_sql query_planning/plan_to_sql.rs Generate SQL from expressions or plans
planner_api query_planning/planner_api.rs APIs for logical and physical plan manipulation
pruning query_planning/pruning.rs Use pruning to skip irrelevant files
thread_pools query_planning/thread_pools.rs Configure custom thread pools for DataFusion execution

Relation Planner Examples

Group: relation_planner

Category: Single Process

Subcommand File Path Description
match_recognize relation_planner/match_recognize.rs Implement MATCH_RECOGNIZE pattern matching
pivot_unpivot relation_planner/pivot_unpivot.rs Implement PIVOT / UNPIVOT
table_sample relation_planner/table_sample.rs Implement TABLESAMPLE

SQL Ops Examples

Group: sql_ops

Category: Single Process

Subcommand File Path Description
analysis sql_ops/analysis.rs Analyze SQL queries
custom_sql_parser sql_ops/custom_sql_parser.rs Implement a custom SQL parser to extend DataFusion
frontend sql_ops/frontend.rs Build LogicalPlans from SQL
query sql_ops/query.rs Query data using SQL

UDF Examples

Group: udf

Category: Single Process

Subcommand File Path Description
adv_udaf udf/advanced_udaf.rs Advanced User Defined Aggregate Function (UDAF)
adv_udf udf/advanced_udf.rs Advanced User Defined Scalar Function (UDF)
adv_udwf udf/advanced_udwf.rs Advanced User Defined Window Function (UDWF)
async_udf udf/async_udf.rs Asynchronous User Defined Scalar Function
udaf udf/simple_udaf.rs Simple UDAF example
udf udf/simple_udf.rs Simple UDF example
udtf udf/simple_udtf.rs Simple UDTF example
udwf udf/simple_udwf.rs Simple UDWF example