feat: Add table.maintenance.compact() for full-table data file compaction by qzyu999 · Pull Request #3124 · apache/iceberg-python

added 2 commits

March 5, 2026 21:32
This introduces a simplified, whole-table compaction strategy via the
MaintenanceTable API (`table.maintenance.compact()`).

Key implementation details:
- Reads the entire table state into memory via `.to_arrow()`.
- Uses `table.overwrite()` to rewrite data, leveraging PyIceberg's
  target file bin-packing (`write.target-file-size-bytes`) natively.
- Ensures atomicity by executing within a table transaction.
- Explicitly sets `snapshot-type: replace` and `replace-operation: compaction`
  to ensure correct metadata history for downstream engines.
- Includes a guard to safely ignore compaction requests on empty tables.

Includes full Pytest coverage in `tests/table/test_maintenance.py`.
Closes apache#1092

kevinjqliu

…ction in test_maintenance_compact()

@qzyu999

@qzyu999

Formats the [compact](iceberg-python/pyiceberg/table/maintenance.py) method docstring to ensure the summary line does not wrap and correctly ends with a period, satisfying pydocstyle D205 and D400 rules.
Replaces the use of .overwrite() in MaintenanceTable.compact() with a new .replace() API backed by a _RewriteFiles producer. This ensures compaction now generates an Operation.REPLACE snapshot instead of Operation.OVERWRITE, preserving logical table state for downstream consumers.

Fixes apache#1092

kevinjqliu

This was referenced

Mar 9, 2026