fix: prevent replicas from restoring old-timeline WAL segments by IgorOhrimenko · Pull Request #10394 · cloudnative-pg/cloudnative-pg

@dosubot dosubot bot added the size:XL

This PR changes 500-999 lines, ignoring generated files.

label

Mar 31, 2026

@dosubot dosubot bot added size:L

This PR changes 100-499 lines, ignoring generated files.

and removed size:XL

This PR changes 500-999 lines, ignoring generated files.

labels

Mar 31, 2026
After a switchover or failover, the WAL archive still contains segments
from previous timelines. When a replica restarts with existing PVC data,
its restore_command can fetch these old-timeline WAL segments from the
archive, causing the replica's timeline history to diverge from the
current primary.

This results in either:
- CrashLoopBackOff: "requested timeline N is not a child of this
  server's history"
- Replica stuck in "Standby (file based)" mode, unable to stream

The existing validateTimelineHistoryFile only checks .history files.
This commit adds validateWALSegmentTimeline which also rejects regular
WAL segments whose timeline is older than the cluster's current
timeline, for replicas in established clusters.

The check is skipped when CurrentPrimary is not set (during bootstrap
or PITR recovery) to allow fetching WAL from any timeline.

Closes: cloudnative-pg#4990
Signed-off-by: Igor Ohrimenko <igor.ohrimenko@travelata.ru>