Setting the Flow Control MaxOutstandingElementCount cause stuck messages

The behaviour was observed in the 0.21.1-beta version of the Java client library
It seems resolved in the 0.23.1-beta version, however this newer version implements the bi-directional streaming pull and the previous doesn't.

Test Setup:

  • 100k messages published to a Topic
  • Single Subscription
  • n Subscribers (k8s running on GKE, each pod running in separate node) {n ∈ Z | 1 ≤ n ≤ 4}
  • Each node had 2 vCPU and each Subscriber provided 10 user threads to the client library
  • All Subscribers running the same code (Parse message, build various classes, callback processing takes ~20ms)
  • Flow control settings: MaxOutstandingElementCount: 1000
    Results:
  • n Messages remain stuck on Topic n ∈ [223, 6652]
    Observations:
  • When messages are stuck: mod_ack_deadline_request_count > 0, Pull request count > 0
  • When messages are stuck Subscriber callback threads are Waiting
  • No stuck messages when retesting with default Flow Control Settings (i.e. MaxOutstandingElementCount = null)
  • Experiments with MaxOutstandingElementCount ∈ [1000, 500, 250, 125] resulted in stuck message count ∈ [~6K, ~3K, ~1.7K, ~6K]

Interpretation:
Message ACK deadlines are being continuously extended and the callback threads are not being invoked