fix(transaction): preserve delete-only manifests in fast_append#2149
Open
drbothen wants to merge 1 commit intoapache:mainfrom
Open
fix(transaction): preserve delete-only manifests in fast_append#2149drbothen wants to merge 1 commit intoapache:mainfrom
drbothen wants to merge 1 commit intoapache:mainfrom
Conversation
FastAppendAction::existing_manifest() filters which manifests from the current snapshot are carried forward to the new snapshot. The filter only checked has_added_files() and has_existing_files(), which drops manifests that contain only Deleted entries. After a rewrite_files operation, a delete-only manifest records which file paths were removed. If a subsequent fast_append drops this manifest, the deleted files reappear as alive in the new snapshot because the old manifests still carry their Added entries with no corresponding Delete entries to exclude them. This causes data duplication that compounds with each subsequent operation. Add has_deleted_files() to the filter so delete-only manifests survive across snapshot boundaries until expire_snapshots cleans them up.
big-mac-slice
pushed a commit
to perpetualsystems/iceberg-rust
that referenced
this pull request
Mar 6, 2026
Implements Operation::Replace so small files can be atomically swapped for fewer, larger ones without loading the entire table into memory. Three interconnected changes: 1. Fix snapshot_summary.rs: add Operation::Replace to the allowlist in update_snapshot_summaries(). Without this the commit fails at runtime before any manifest is written. 2. Fix snapshot.rs: wire the delete_entries() TODO. The return value was silently discarded; now deleted entries are written into a manifest via add_delete_entry() and included in the manifest list. Also fix a latent bug in summary() where previous_snapshot was looked up by the new (not-yet-existing) snapshot ID — it now uses current_snapshot() correctly. Add removed_data_files tracking to SnapshotProducer so deleted file counts propagate into the snapshot summary. 3. Add replace_data_files.rs: ReplaceDataFilesAction and ReplaceOperation modeled on FastAppendAction. Applies the PR apache#2149 fix (include delete-only manifests via has_deleted_files(). Four unit tests cover validation errors, full-table replacement with DELETED entry verification, and partial replacement where surviving files are preserved. All 1050 existing tests continue to pass. EOF )
big-mac-slice
pushed a commit
to perpetualsystems/iceberg-rust
that referenced
this pull request
Mar 6, 2026
Adds ReplaceDataFilesAction to Transaction for atomic replace operations (compaction/rewrite). Implements Operation::Replace end-to-end: snapshot summary accounting, manifest writing, and transaction action. Ports and extends the approach from apache/iceberg-rust PR apache#2106, incorporating the delete-manifest bug fix from PR apache#2149. Changes: - crates/iceberg/src/transaction/replace_data_files.rs (new) - ReplaceDataFilesAction with delete_files(), add_files(), set_commit_uuid(), set_key_metadata(), set_snapshot_properties() builders - Validates no duplicate paths in files_to_delete - Calls validate_duplicate_files() to reject files already alive in snapshot - Validates all files_to_delete paths exist in current snapshot (phantom delete protection) - ReplaceOperation implements SnapshotProduceOperation: scans existing manifests to build Deleted entries; per-manifest filtering in existing_manifest() preserves unaffected manifests - delete_files() doc comment describes manifest-granularity constraint - Comment on existing_manifest() documents O(2N) double scan limitation - crates/iceberg/src/transaction/snapshot.rs - Added removed_data_files field and with_removed_data_files() builder - Implemented write_delete_manifest() using add_delete_entry() - Fixed latent bug: summary() looked up new snapshot ID (not yet in metadata) to find previous snapshot, causing subtract-overflow panics - Fixed stale error message ("fast append" -> neutral wording) - Added comment explaining ManifestContentType::Data in write_delete_manifest - crates/iceberg/src/spec/snapshot_summary.rs - Added Operation::Replace to update_snapshot_summaries() allowlist - Fixed unwrap() -> unwrap_or(0) on property string parsing - Fixed u64 subtraction -> saturating_sub() to prevent underflow - Removed stale #[allow(dead_code)] from all four helpers - crates/iceberg/src/spec/manifest/writer.rs - Removed #[allow(dead_code)] from add_delete_entry() - crates/iceberg/src/transaction/mod.rs - Wired replace_data_files() method on Transaction - Added apply_updates_to_table() test helper - .gitignore: added .private/ Tests: 1052 pass (cargo test -p iceberg --lib)
big-mac-slice
pushed a commit
to perpetualsystems/iceberg-rust
that referenced
this pull request
Mar 6, 2026
- fix(append): carry forward delete-only manifests after Replace (port of apache/iceberg-rust PR apache#2149): add || entry.has_deleted_files() to FastAppendOperation::existing_manifest() so delete manifests created by Replace are not silently dropped by the next FastAppend - feat(replace): add validate_from_snapshot(snapshot_id) builder to ReplaceDataFilesAction for concurrent compaction workflows; validates files_to_delete against a historical snapshot instead of current, avoiding false rejections when another writer commits between planning and committing (port of apache/iceberg-rust PR apache#2106) - feat(replace): add data_sequence_number(seq_num) builder to ReplaceDataFilesAction and SnapshotProducer; threads an explicit sequence number into added manifest entries so compacted files can retain the original sequence number when equality deletes are present (port of apache/iceberg-rust PR apache#2106) - test: four new tests covering all three changes (1056 total, all pass)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FastAppendAction::existing_manifest()to carry forward delete-only manifestsfast_append, causing previously deleted files to reappear as live dataCloses #2148
Details
The
existing_manifest()filter checkedhas_added_files() || has_existing_files()but nothas_deleted_files(). A delete-only manifest (created by operations likerewrite_filesto record file removals) would be silently dropped on the nextfast_append, breaking snapshot isolation and causing compounding data duplication.The fix adds
|| entry.has_deleted_files()so delete-only manifests persist untilexpire_snapshotscleans them up, consistent with the Iceberg spec's requirement that delete tracking survives across snapshot boundaries.Test plan
fast_appendtests pass (cargo test -p iceberg --lib -- append)cargo test -p iceberg --lib— 1,211 tests)rewrite_files(planned for that PR)