DataFusion Examples
This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Prerequisites
Run git submodule update --init to init test files.
Running Examples
To run an example, use the cargo run command, such as:
git clone https://github.com/apache/datafusion cd datafusion # Download test data git submodule update --init # Change to the examples directory cd datafusion-examples/examples # Run all examples in a group cargo run --example <group> -- all # Run a specific example within a group cargo run --example <group> -- <subcommand> # Run all examples in the `dataframe` group cargo run --example dataframe -- all # Run a single example from the `dataframe` group # (apply the same pattern for any other group) cargo run --example dataframe -- dataframe
Builtin Functions Examples
Group: builtin_functions
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| date_time | builtin_functions/date_time.rs |
Examples of date-time related functions and queries |
| function_factory | builtin_functions/function_factory.rs |
Register CREATE FUNCTION handler to implement SQL macros |
| regexp | builtin_functions/regexp.rs |
Examples of using regular expression functions |
Custom Data Source Examples
Group: custom_data_source
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| adapter_serialization | custom_data_source/adapter_serialization.rs |
Preserve custom PhysicalExprAdapter information during plan serialization using PhysicalExtensionCodec interception |
| csv_json_opener | custom_data_source/csv_json_opener.rs |
Use low-level FileOpener APIs for CSV/JSON |
| csv_sql_streaming | custom_data_source/csv_sql_streaming.rs |
Run a streaming SQL query against CSV data |
| custom_datasource | custom_data_source/custom_datasource.rs |
Query a custom TableProvider |
| custom_file_casts | custom_data_source/custom_file_casts.rs |
Implement custom casting rules |
| custom_file_format | custom_data_source/custom_file_format.rs |
Write to a custom file format |
| default_column_values | custom_data_source/default_column_values.rs |
Custom default values using metadata |
| file_stream_provider | custom_data_source/file_stream_provider.rs |
Read/write via FileStreamProvider for streams |
Data IO Examples
Group: data_io
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| catalog | data_io/catalog.rs |
Register tables into a custom catalog |
| json_shredding | data_io/json_shredding.rs |
Implement filter rewriting for JSON shredding |
| parquet_adv_idx | data_io/parquet_advanced_index.rs |
Create a secondary index across multiple parquet files |
| parquet_emb_idx | data_io/parquet_embedded_index.rs |
Store a custom index inside Parquet files |
| parquet_enc | data_io/parquet_encrypted.rs |
Read & write encrypted Parquet files |
| parquet_enc_with_kms | data_io/parquet_encrypted_with_kms.rs |
Encrypted Parquet I/O using a KMS-backed factory |
| parquet_exec_visitor | data_io/parquet_exec_visitor.rs |
Extract statistics by visiting an ExecutionPlan |
| parquet_idx | data_io/parquet_index.rs |
Create a secondary index |
| query_http_csv | data_io/query_http_csv.rs |
Query CSV files via HTTP |
| remote_catalog | data_io/remote_catalog.rs |
Interact with a remote catalog |
DataFrame Examples
Group: dataframe
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| cache_factory | dataframe/cache_factory.rs |
Custom lazy caching for DataFrames using CacheFactory |
| dataframe | dataframe/dataframe.rs |
Query DataFrames from various sources and write output |
| deserialize_to_struct | dataframe/deserialize_to_struct.rs |
Convert Arrow arrays into Rust structs |
Execution Monitoring Examples
Group: execution_monitoring
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| mem_pool_exec_plan | execution_monitoring/memory_pool_execution_plan.rs |
Memory-aware ExecutionPlan with spilling |
| mem_pool_tracking | execution_monitoring/memory_pool_tracking.rs |
Demonstrates memory tracking |
| tracing | execution_monitoring/tracing.rs |
Demonstrates tracing integration |
External Dependency Examples
Group: external_dependency
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| dataframe_to_s3 | external_dependency/dataframe_to_s3.rs |
Query DataFrames and write results to S3 |
| query_aws_s3 | external_dependency/query_aws_s3.rs |
Query S3-backed data using object_store |
Flight Examples
Group: flight
Category: Distributed
| Subcommand | File Path | Description |
|---|---|---|
| client | flight/client.rs |
Execute SQL queries via Arrow Flight protocol |
| server | flight/server.rs |
Run DataFusion server accepting FlightSQL/JDBC queries |
| sql_server | flight/sql_server.rs |
Standalone SQL server for JDBC clients |
Proto Examples
Group: proto
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| composed_extension_codec | proto/composed_extension_codec.rs |
Use multiple extension codecs for serialization/deserialization |
| expression_deduplication | proto/expression_deduplication.rs |
Example of expression caching/deduplication using the codec decorator pattern |
Query Planning Examples
Group: query_planning
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| analyzer_rule | query_planning/analyzer_rule.rs |
Custom AnalyzerRule to change query semantics |
| expr_api | query_planning/expr_api.rs |
Create, execute, analyze, and coerce Exprs |
| optimizer_rule | query_planning/optimizer_rule.rs |
Replace predicates via a custom OptimizerRule |
| parse_sql_expr | query_planning/parse_sql_expr.rs |
Parse SQL into DataFusion Expr |
| plan_to_sql | query_planning/plan_to_sql.rs |
Generate SQL from expressions or plans |
| planner_api | query_planning/planner_api.rs |
APIs for logical and physical plan manipulation |
| pruning | query_planning/pruning.rs |
Use pruning to skip irrelevant files |
| thread_pools | query_planning/thread_pools.rs |
Configure custom thread pools for DataFusion execution |
Relation Planner Examples
Group: relation_planner
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| match_recognize | relation_planner/match_recognize.rs |
Implement MATCH_RECOGNIZE pattern matching |
| pivot_unpivot | relation_planner/pivot_unpivot.rs |
Implement PIVOT / UNPIVOT |
| table_sample | relation_planner/table_sample.rs |
Implement TABLESAMPLE |
SQL Ops Examples
Group: sql_ops
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| analysis | sql_ops/analysis.rs |
Analyze SQL queries |
| custom_sql_parser | sql_ops/custom_sql_parser.rs |
Implement a custom SQL parser to extend DataFusion |
| frontend | sql_ops/frontend.rs |
Build LogicalPlans from SQL |
| query | sql_ops/query.rs |
Query data using SQL |
UDF Examples
Group: udf
Category: Single Process
| Subcommand | File Path | Description |
|---|---|---|
| adv_udaf | udf/advanced_udaf.rs |
Advanced User Defined Aggregate Function (UDAF) |
| adv_udf | udf/advanced_udf.rs |
Advanced User Defined Scalar Function (UDF) |
| adv_udwf | udf/advanced_udwf.rs |
Advanced User Defined Window Function (UDWF) |
| async_udf | udf/async_udf.rs |
Asynchronous User Defined Scalar Function |
| udaf | udf/simple_udaf.rs |
Simple UDAF example |
| udf | udf/simple_udf.rs |
Simple UDF example |
| udtf | udf/simple_udtf.rs |
Simple UDTF example |
| udwf | udf/simple_udwf.rs |
Simple UDWF example |