feat(bigquery,cubestore): Parquet pre-aggregation export with WIF-native URLs by KrishnaRMaddikara · Pull Request #10499

feat(bigquery,cubestore): Parquet pre-aggregation export with WIF-native URLs by KrishnaRMaddikara · Pull Request #10499 · cube-js/cube

…ive URLs

Problem 1 — CSV.gz export is slow and expensive:
  BigQuery driver exports pre-aggregation data as CSV.gz.
  For large tables this means gigabytes of intermediate files.
  Parquet is 3-5x smaller and is the native CubeStore internal format.

Problem 2 — getSignedUrl() requires SA key bytes (broken on GKE WIF):
  getSignedUrl() requires service account key bytes to sign URLs.
  WIF tokens from the metadata server cannot sign URLs.
  Pre-agg pipeline silently fails: BQ exports fine, CubeStore gets 403.

Problem 3 — CubeStore cannot import Parquet (issue cube-js#3051):
  CubeStore only accepted CSV in its external import path.
  CubeStore already uses parquet/arrow internally for .chunk.parquet
  but CREATE TABLE ... WITH (input_format) lacked Parquet support.

Fix:
  packages/cubejs-bigquery-driver/src/BigQueryDriver.ts:
    - Export format: CSV.gz -> PARQUET
    - URL generation: getSignedUrl() -> gs://bucket/object (IAM-authenticated)
    - Return key: csvFile -> parquetFile

  packages/cubejs-cubestore-driver/src/CubeStoreDriver.ts:
    - Add importParquetFile() method
    - Add parquetFile branch in uploadTableWithIndexes()
    - Sends: CREATE TABLE t (...) WITH (input_format = 'parquet') LOCATION 'gs://...'

  rust/cubestore/cubestore/src/metastore/mod.rs:
    - Add ImportFormat::Parquet variant to enum

  rust/cubestore/cubestore/src/sql/mod.rs:
    - Parse input_format = 'parquet' in WITH clause

  rust/cubestore/cubestore/src/import/mod.rs:
    - Dispatch ImportFormat::Parquet to do_import_parquet()
    - Add do_import_parquet() using DataFusion ParquetRecordBatchReaderBuilder
    - Add arrow_array_value_to_string() helper for Arrow to TableValue conversion
    - Fix resolve_location() to handle gs:// URLs via GCS API with WIF token
    - Fix estimate_location_row_count() to skip fs::metadata() for remote URLs

Works with Workload Identity when combined with the GCS WIF fix in gcs.rs.
Backward compatible: Postgres/Snowflake/Redshift pre-aggs unaffected.

Closes cube-js#3051
Closes cube-js#9837

…uet path

- Make csvFile optional in TableCSVData — BigQuery now returns parquetFile only
- Add parquetFile?: string[] to TableCSVData interface
- Update isDownloadTableCSVData() to recognise parquetFile
- Delete stale export files before BQ extract to prevent prefix collision
- Remove exportBucketCsvEscapeSymbol from Parquet return (CSV-specific field)