chore(flushing): standardize code with refactoring on some flushers and retries by duncanista · Pull Request #1018 · DataDog/datadog-lambda-extension

duncanista

Base automatically changed from jordan.gonzalez/flushing/create-service to main

February 5, 2026 21:32
we were creating a client every time when flushing traces, now we just use one, also removes unnecessary traits as we are not creating more tracing agents for other use cases

@duncanista

@duncanista

@duncanista

@duncanista

@duncanista

@duncanista duncanista changed the title chore(flushing): standardize code with refactoring on trace flushers chore(flushing): standardize code with refactoring on some flushers and retries

Feb 5, 2026

@duncanista duncanista deleted the jordan.gonzalez/flushing/standardize-mechanisms branch

February 12, 2026 21:02

duncanista added a commit that referenced this pull request

Feb 18, 2026
## Overview

Continuation of #1018 removing unnecessary mut lock on callers for
dogstatsd

duncanpharvey pushed a commit that referenced this pull request

Mar 10, 2026
## Overview

Continuation of #1018 removing unnecessary mut lock on callers for
dogstatsd

jchrostek-dd added a commit that referenced this pull request

Mar 11, 2026
… Lambda

## Problem

After upgrading from extension v92 to v93, customers reported a sharp
increase in "Max retries exceeded, returning request error" errors
(SVLS-8672, GitHub issue #1092).

## Root Cause

PR #1018 introduced HTTP client caching for performance improvements.
However, the cached client maintains a connection pool that doesn't work
well with Lambda's freeze/resume execution model:

1. Lambda executes, HTTP client created with connection pool
2. Extension flushes traces, connections remain open in pool
3. Lambda freezes (paused between invocations - seconds to minutes)
4. Lambda resumes, cached client reuses stale connections
5. TCP errors → "Max retries exceeded"

In v92, a new HTTP client was created per-flush, so there were never
stale connections to reuse.

## Solution

Disable connection pooling by setting `pool_max_idle_per_host(0)`. This
ensures each request gets a fresh connection, avoiding stale connection
issues while still benefiting from client caching.

This matches the pattern used in libdatadog's `new_client_periodic()`
which explicitly disables pooling with the comment: "This client does
not keep connections because otherwise we would get a pipe closed every
second connection because of low keep alive in the agent."

Fixes: SVLS-8672
Fixes: #1092

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

jchrostek-dd added a commit that referenced this pull request

Mar 11, 2026
… Lambda (#1094)

## Summary

Fixes a regression introduced in v93 where customers see a sharp
increase in "Max retries exceeded, returning request error" errors after
upgrading from v92.

- Disables HTTP connection pooling for the trace/stats flusher by
setting `pool_max_idle_per_host(0)`
- Prevents stale connections from being reused after Lambda
freeze/resume cycles

## Problem

PR #1018 introduced HTTP client caching for performance improvements.
However, the cached client maintains a connection pool that doesn't work
well with Lambda's freeze/resume execution model:

1. Lambda executes, HTTP client created with connection pool
2. Extension flushes traces, connections remain open in pool
3. Lambda **freezes** (paused between invocations - can be seconds to
minutes)
4. Lambda **resumes**, cached client reuses stale connections
5. TCP errors → "Max retries exceeded"

In v92, a new HTTP client was created per-flush, so there were never
stale connections to reuse.

## Solution

Disable connection pooling by setting `pool_max_idle_per_host(0)`. This
ensures each request gets a fresh connection, avoiding stale connection
issues while still benefiting from client caching (TLS session reuse,
configuration reuse, etc.).

This matches the pattern used in libdatadog's `new_client_periodic()`
which explicitly disables pooling with the comment:
> "This client does not keep connections because otherwise we would get
a pipe closed every second connection because of low keep alive in the
agent."


## Related

- Fixes [SVLS-8672](https://datadoghq.atlassian.net/browse/SVLS-8672)
- Fixes #1092
- Regression introduced in #1018

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>