[metrics] Materialize all metrics tags into top level columns by mattmkim · Pull Request #6237 · quickwit-oss/quickwit

alexanderbianchi added a commit to alexanderbianchi/quickwit that referenced this pull request

@alexanderbianchi @claude

… crate)

Adds a DataFusion-based query execution layer on top of the wide-schema
parquet metrics pipeline from PR quickwit-oss#6237.

- New crate `quickwit-datafusion` with pluggable `QuickwitDataSource` trait,
  `DataFusionSessionBuilder`, `QuickwitSchemaProvider`, and distributed
  worker session setup via datafusion-distributed WorkerService
- `MetricsDataSource` implements `QuickwitDataSource` for OSS parquet splits:
  metastore-backed split discovery, object-store caching, filter pushdown
  with CAST unwrapping fix, Substrait `ReadRel` consumption
- `DataFusionService` gRPC (ExecuteSql + ExecuteSubstrait streaming) wired
  into quickwit-serve alongside the existing searcher and OTLP services
- Distributed execution: DistributedPhysicalOptimizerRule produces
  PartitionIsolatorExec tasks (not shuffles) across multiple searcher nodes
- Integration tests covering: pruning, aggregation, time range, GROUP BY,
  distributed tasks, NULL column fill for missing parquet fields, Substrait
  named-table queries, rollup from file, partial schema projection
- dev/ local cluster setup with start-cluster, ingest-metrics, query-metrics

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>