Resume from checkpoint by meiji163 · Pull Request #1595 · github/gh-ost
Description
This PR introduces a checkpoint mechanism that can be used to resume a migration. In combination with --gtid, this would allow the user to resume the migration using a different replica. If using file-based coordinates, it requires to resume using the same replica. This is a continuation of @shlomi-noach's POC in #343. Closes #205
Usage: run gh-ost normally with --checkpoint flag. If the migration is interrupted/killed, restart gh-ost with the same arguments with the additional --resume flag. By default the checkpoint is every 300 seconds, but can be configured with --checkpoint-seconds. Also see doc/resume.md.
In case this PR introduced Go code changes:
- contributed code is using same conventions as original code
-
script/cibuildreturns with no formatting errors, build errors or unit test errors.
Details
The two main operations of gh-ost are applying DML events from the binlog and copying rows to the ghost table.
A checkpoint saves the state of both:
- the binlog coordinates of the transaction last applied to the gh-ost table (
LastTrxCoords) - the range last copied to the gh-ost table (
IterationRangeMinandIterationRangeMax)
It is safe to resume the migration from this state because
- DML event application is idempotent at the row level. If binlog streamer resumes at coordinates smaller than or equal to the coordinates last processed by the applier, the final values should be the same even if some DML events are applied twice.
- Copying a row is also idempotent since the second
INSERTwill fail with duplicate key error. Then the DML applier will bring the row up to date like usual
To store the checkpoint we use a new _ghk table, which looks like
CREATE TABLE _${original_tablename}_ghk ( `gh_ost_chk_id` bigint auto_increment primary key, `gh_ost_chk_timestamp` bigint, `gh_ost_chk_coords` varchar(4096), `gh_ost_chk_iteration` bigint, `gh_ost_rows_copied` bigint, `gh_ost_dml_applied` bigint, `c1_min`,`c2_min`, ...,`cn_min`, `c1_max`,`c2_max`, ...,`cn_max` );
where (c1_min, c2_min..., cn_min) and (c1_max, ... cn_max) are the created with the same types as the shared unique key (c1, c2, ... cn) used by gh-ost.
Testing
Replica Tests
I tested resuming with --test-on-replica under synthetic sysbench OLTP write load of ~2k DML/sec. I created a sysbench table with 300M rows and ran a no-op migration with --gtid and --checkpoint set to timeout after 10min. 10 seconds after migration timed out, I started a new gh-ost process with --resume. When the migration finished the ghost and original tables were checksummed, revealing no data discrepancy. ✅
I repeated this test using an initial timeout of 20min and a waiting period of 1hr before resuming. The data integrity check also passed. In addition the test passed running on two testing replicas in production clusters.
Switching Replicas
I tested resuming gh-ost using a different replica than the original one it was attach to:
- Using the same 300M test table and sysbench write load, I started the migration
gh-ost --alter='add index k_2 (c)' --host='replica1' --gtid --checkpoint - After 10min, I killed the migration
- After waiting 5min, I resumed the migration using a second replica:
gh-ost --alter='add index k_2 (c)' --host='replica2' --gtid --checkpoint --resume - After a few minutes, I killed the sysbench write load (so no DML happens after cutover).
- Once migration completed, I checksummed the original and ghost table to verify data integrity. ✅
Failover Test
Using the same setup, I tested resuming migration after a master failover triggered by orchestrator. The failover kills the migration, and I resumed the migration using the same replica. ✅