fix(cluster): strip stale primary label during failover (#10403) by armru · Pull Request #10409

fix(cluster): strip stale primary label during failover (#10403) by armru · Pull Request #10409 · cloudnative-pg/cloudnative-pg

dosubot bot added the size:L

This PR changes 100-499 lines, ignoring generated files.

label

Apr 2, 2026

armru changed the title ~~fix(controller): strip stale primary label during failover (#10403)~~ fix(cluster): strip stale primary label during failover (#10403)

Apr 2, 2026

When the operator initiates a failover, the old primary's pod keeps
its cnpg.io/instanceRole=primary label until ReconcileMetadata runs.
But ReconcileMetadata is skipped during the entire failover window
(the CurrentPrimary != TargetPrimary guard returns early), so the -rw
service keeps routing to the old primary. If the old primary comes
back (e.g. after a temporary network partition), replicas reconnect
through the -rw service, satisfy the sync quorum, and writes committed
on the stale primary are lost to pg_rewind.

Strip the primary label from the old primary pod immediately when
failover is initiated. As a second layer of defense, call
ReconcileMetadata during dual-primary detection so labels are
corrected even if the first patch didn't fire.

Fixes: #10403
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>