chore(flushing): standardize code with refactoring on some flushers and retries by duncanista · Pull Request #1018 · DataDog/datadog-lambda-extension
Base automatically changed from jordan.gonzalez/flushing/create-service to main
February 5, 2026 21:32we were creating a client every time when flushing traces, now we just use one, also removes unnecessary traits as we are not creating more tracing agents for other use cases
duncanista
changed the title
chore(flushing): standardize code with refactoring on trace flushers
chore(flushing): standardize code with refactoring on some flushers and retries
duncanista
deleted the
jordan.gonzalez/flushing/standardize-mechanisms
branch
duncanista added a commit that referenced this pull request
Feb 18, 2026## Overview Continuation of #1018 removing unnecessary mut lock on callers for dogstatsd
duncanpharvey pushed a commit that referenced this pull request
Mar 10, 2026## Overview Continuation of #1018 removing unnecessary mut lock on callers for dogstatsd
jchrostek-dd added a commit that referenced this pull request
Mar 11, 2026… Lambda ## Problem After upgrading from extension v92 to v93, customers reported a sharp increase in "Max retries exceeded, returning request error" errors (SVLS-8672, GitHub issue #1092). ## Root Cause PR #1018 introduced HTTP client caching for performance improvements. However, the cached client maintains a connection pool that doesn't work well with Lambda's freeze/resume execution model: 1. Lambda executes, HTTP client created with connection pool 2. Extension flushes traces, connections remain open in pool 3. Lambda freezes (paused between invocations - seconds to minutes) 4. Lambda resumes, cached client reuses stale connections 5. TCP errors → "Max retries exceeded" In v92, a new HTTP client was created per-flush, so there were never stale connections to reuse. ## Solution Disable connection pooling by setting `pool_max_idle_per_host(0)`. This ensures each request gets a fresh connection, avoiding stale connection issues while still benefiting from client caching. This matches the pattern used in libdatadog's `new_client_periodic()` which explicitly disables pooling with the comment: "This client does not keep connections because otherwise we would get a pipe closed every second connection because of low keep alive in the agent." Fixes: SVLS-8672 Fixes: #1092 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jchrostek-dd added a commit that referenced this pull request
Mar 11, 2026… Lambda (#1094) ## Summary Fixes a regression introduced in v93 where customers see a sharp increase in "Max retries exceeded, returning request error" errors after upgrading from v92. - Disables HTTP connection pooling for the trace/stats flusher by setting `pool_max_idle_per_host(0)` - Prevents stale connections from being reused after Lambda freeze/resume cycles ## Problem PR #1018 introduced HTTP client caching for performance improvements. However, the cached client maintains a connection pool that doesn't work well with Lambda's freeze/resume execution model: 1. Lambda executes, HTTP client created with connection pool 2. Extension flushes traces, connections remain open in pool 3. Lambda **freezes** (paused between invocations - can be seconds to minutes) 4. Lambda **resumes**, cached client reuses stale connections 5. TCP errors → "Max retries exceeded" In v92, a new HTTP client was created per-flush, so there were never stale connections to reuse. ## Solution Disable connection pooling by setting `pool_max_idle_per_host(0)`. This ensures each request gets a fresh connection, avoiding stale connection issues while still benefiting from client caching (TLS session reuse, configuration reuse, etc.). This matches the pattern used in libdatadog's `new_client_periodic()` which explicitly disables pooling with the comment: > "This client does not keep connections because otherwise we would get a pipe closed every second connection because of low keep alive in the agent." ## Related - Fixes [SVLS-8672](https://datadoghq.atlassian.net/browse/SVLS-8672) - Fixes #1092 - Regression introduced in #1018 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters