fix(jobsdb): handle database connection errors in migration with retry logic (#4535) by deepshekhardas · Pull Request #6782 · rudderlabs/rudder-server

Skip to content

Navigation Menu

Sign in

Appearance settings

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

Conversation

@deepshekhardas

Copy link Copy Markdown

Fixes #4535

Problem

The backupDSLoop encounters 'driver: bad connection' errors when trying to mark the end of backup operations. This happens when the database connection is lost during long-running migration operations.

Solution

  • Add isRetryableError() function to detect retryable connection errors
  • Add retryWithBackoff() function with exponential backoff (1s, 2s, 4s, max 30s)
  • Wrap journalMarkDoneInTx with retry logic (up to 3 retries)

Changes

  • Added helper functions for retryable error detection
  • Applied retry logic to journal marking operations in migration

@github-actions

Copy link Copy Markdown

github-actions bot commented

Apr 9, 2026

This PR is considered to be stale. It has been open 20 days with no further activity thus it is going to be closed in 7 days. To avoid such a case please consider removing the stale label manually or add a comment to the PR.

@github-actions github-actions bot added the Stale label

Apr 9, 2026

@lokey lokey closed this

Apr 10, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

backupDSLoop exception: mark end of backup operation: driver: bad connection; driver: bad connection

2 participants

@deepshekhardas @lokey