fix: retry redis conn in the bg and fail open otherwise by Flo4604 · Pull Request #5377 · unkeyed/unkey
What does this PR do?
Adds resilient middleware engine handling to prevent service failures when Redis is unavailable. The middleware engine now fails closed with a 503 Service Unavailable response when Redis connectivity is lost, and automatically retries connection in the background with exponential backoff.
Introduces a new ResilientEvaluator wrapper that atomically swaps between unavailable and working engine states. When Redis is configured but fails to connect, the service returns 503 errors instead of crashing, and continues attempting to reconnect until successful.
Adds a new error code EngineUnavailable with appropriate HTTP status mapping and Prometheus metrics for monitoring engine unavailability events.
Fixes #5365
Type of change
- Enhancement (small improvements)
- Bug fix (non-breaking change which fixes an issue)
- Chore (refactoring code, technical debt, workflow improvements)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing functionality to not work as expected)
- This change requires a documentation update
How should this be tested?
- Start sentinel service with Redis URL configured but Redis server unavailable
- Verify requests return 503 with "middleware engine temporarily unavailable" message
- Start Redis server and verify engine automatically recovers
- Monitor
sentinel_engine_unavailable_totalPrometheus metric during unavailability - Test with empty Redis URL to ensure pass-through mode still works
Checklist
Required
- Filled out the "How to test" section in this PR
- Read Contributing Guide
- Self-reviewed my own code
- Commented on my code in hard-to-understand areas
- Ran
pnpm build - Ran
pnpm fmt - Ran
make fmton/godirectory - Checked for warnings, there are none
- Removed all
console.logs - Merged the latest changes from main onto my branch with
git pull origin main - My changes don't cause any responsiveness issues
Appreciated
- If a UI change was made: Added a screen recording or screenshots to this PR
- Updated the Unkey Docs if changes were necessary