Fix data loss when inserting duplicate values during a migration by ggilder · Pull Request #1633 · github/gh-ost
…igration indexes
gh-ost could silently lose data when adding a unique index to a column during
a migration, even with the PanicOnWarnings flag enabled. This occurred when:
1. A migration adds a unique index to a column (e.g., email)
2. Rows with duplicate values are inserted into the original table after the
bulk copy phase completes (during postponed cutover)
3. These duplicate rows are applied via binlog replay to the ghost table
**Expected behavior:** Migration fails with clear error
**Actual behavior:** Original rows with duplicate values silently
deleted, data lost
Original table: id PRIMARY KEY, email (no unique constraint)
Ghost table: id PRIMARY KEY, email UNIQUE (being added)
Initial state (after bulk copy):
- Ghost table: (id=1, email='bob@example.com')
(id=2, email='alice@example.com')
During postponed cutover:
- INSERT (id=3, email='bob@example.com') into original table
Binlog replay attempts:
- INSERT (id=3, email='bob@example.com') into ghost table
- Duplicate email='bob@example.com' (conflicts with id=1)
- Row with id=1 silently deleted → DATA LOSS
The DMLInsertQueryBuilder used `REPLACE` statements for binlog event replay:
```sql
REPLACE INTO ghost_table (id, email) VALUES (3, 'bob@example.com');
REPLACE behavior:
- If duplicate PRIMARY KEY: deletes old row, inserts new row (no warning/error)
- If duplicate on OTHER unique index: deletes conflicting row, inserts new row
- NO warnings or errors generated, so PanicOnWarnings cannot detect the issue
The original design assumed REPLACE was needed to handle timing edge cases
where binlog replay might process a row before bulk copy, but this caused
silent data corruption when other unique indexes had duplicates.
Changed DMLInsertQueryBuilder to use INSERT IGNORE instead of REPLACE:
INSERT IGNORE INTO ghost_table (id, email) VALUES (3, 'bob@example.com');
INSERT IGNORE behavior:
- If duplicate on ANY unique index: skip insert, generate WARNING
- Does not delete existing rows
Added warning detection to ApplyDMLEventQueries() when PanicOnWarnings is
enabled:
- Checks SHOW WARNINGS after batch execution
- Ignores duplicates on migration unique key (expected - row already copied)
- FAILS migration for duplicates on other unique indexes
- Transaction rollback ensures no partial state
Edge Case: DELETE+INSERT Conversion
When an UPDATE modifies the migration unique key (e.g., PRIMARY KEY), gh-ost
converts it to DELETE+INSERT within a single transaction:
BEGIN;
DELETE FROM ghost WHERE id=2;
INSERT IGNORE INTO ghost VALUES (3, 'bob@example.com');
COMMIT;
If the INSERT encounters a duplicate on a non-migration unique index:
- With PanicOnWarnings: Warning detected, transaction rolled back, both
DELETE and INSERT undone → no data loss ✓
- Without PanicOnWarnings: DELETE succeeds, INSERT silently skips →
data loss. This further reinforces that PanicOnWarnings should default
to on.