Fix infinite retry loops in flb_tls_net_read/write by spstack · Pull Request #11547

Fix infinite retry loops in flb_tls_net_read/write by spstack · Pull Request #11547 · fluent/fluent-bit

spstack marked this pull request as ready for review

March 17, 2026 22:16

This set of changes addresses an issue where `flb_tls_net_read|write`
functions can hang and consume 100% CPU.

The issue occurs when a TLS connection is lost, and the underlying
openssl implementation repeatedly returns `SSL_ERROR_WANT_READ|WRITE`.

If no `io_timeout` is configured, then the thread will enter a tight infinite
loop retrying the read/write indefinitely until the process is restarted.

This can be addressed by setting net.io_timeout config setting to something
other than the default, but this set of changes attempts to address the
case where no default is specified.

The solution here is to simply default to a high value for the timeout
if the setting is zero. This does not modify the net.io_timeout value,
and only applies to this set of functions. Reasoning is that there
should not be a case where the user would want to spin forever here.

This change also adds a small delay in between retries so that even
for the timeout case, it doesn't load the CPU unnecessarily while
waiting for the next bit of data.

Signed-off-by: Scott Stack <scottstack14@gmail.com>