[SVLS-8211] feat: Add timeout for requests to span_dedup_service by lym953 · Pull Request #986 · DataDog/datadog-lambda-extension

@lym953

@lym953 marked this pull request as ready for review

January 9, 2026 19:37

litianningdatadog

@lym953 lym953 deleted the yiming.luo/span-dedup-timeout branch

January 13, 2026 18:11

duncanpharvey pushed a commit that referenced this pull request

Mar 10, 2026
## Problem
Span dedup service sometimes fails to return the result and thus logs
the error:
> DD_EXTENSION | ERROR | Failed to send check_and_add response: true

I see this error in our Self Monitoring and a customer's account.
Also I believe it causes extension to fail to receive traces from the
tracer, causing missing traces. This is because the caller of span dedup
is in `process_traces()`, which is the function that handles the
tracer's HTTP request to send traces. If this function fails to get span
dedup result and gets stuck, the HTTP request will time out.

## This PR
While I don't yet know what causes the error, this PR adds a patch to
mitigate the impact:
1. Change log level from `error` to `warn`
2. Add a timeout of 5 seconds to the span dedup check, so that if the
caller doesn't get an answer soon, it defaults to treating the trace as
not a duplicate, which is the most common case.

## Testing
To merge this PR then check log in self monitoring, as it's hard to run
high-volume tests in self monitoring from a non-main branch.