feat: Add table.maintenance.compact() for full-table data file compaction by qzyu999 · Pull Request #3124 · apache/iceberg-python
added 2 commits
March 5, 2026 21:32This introduces a simplified, whole-table compaction strategy via the MaintenanceTable API (`table.maintenance.compact()`). Key implementation details: - Reads the entire table state into memory via `.to_arrow()`. - Uses `table.overwrite()` to rewrite data, leveraging PyIceberg's target file bin-packing (`write.target-file-size-bytes`) natively. - Ensures atomicity by executing within a table transaction. - Explicitly sets `snapshot-type: replace` and `replace-operation: compaction` to ensure correct metadata history for downstream engines. - Includes a guard to safely ignore compaction requests on empty tables. Includes full Pytest coverage in `tests/table/test_maintenance.py`. Closes apache#1092
Formats the [compact](iceberg-python/pyiceberg/table/maintenance.py) method docstring to ensure the summary line does not wrap and correctly ends with a period, satisfying pydocstyle D205 and D400 rules.
Replaces the use of .overwrite() in MaintenanceTable.compact() with a new .replace() API backed by a _RewriteFiles producer. This ensures compaction now generates an Operation.REPLACE snapshot instead of Operation.OVERWRITE, preserving logical table state for downstream consumers. Fixes apache#1092
This was referenced
Mar 9, 2026This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters