feat(apm): aggregate and flush data accordingly for DSM, Profiling, LLMObs, and Live Debugger by duncanista · Pull Request #737 · DataDog/datadog-lambda-extension
Conversation
What?
Creates an aggregator and flusher for APM proxy data.
Motivation
With the introduction of #684, the Extension is way faster to continue invocations, I think this could have affected how data is aggregated for this various APM products in which we were just sending the request to intake as soon as it arrives.
Now, we aggregate it and respect the flushing pattern and algorithm.
Testing
- Profiling:
- LLMObs:
-
DSM:
tbd -
Live Debugger:
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't like how simple this aggregator is, its just a wrapper for a queue, but I think so far we should respect the way we are following this pattern, maybe we can create a trait and consolidate how we create aggregators
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a buffering layer for APM proxy requests in the Trace Agent, introducing an in-memory aggregator and a flusher that batches and retries HTTP calls for DSM, Profiling, LLMObs, and Live Debugger.
- Introduce
proxy_aggregatorandproxy_flushermodules to buffer and periodically flush requests. - Update
TraceAgentendpoints to enqueue proxy requests instead of forwarding immediately. - Adjust
get_clientsignature to take a reference and propagate that change to related flushers.
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| bottlecap/src/traces/trace_agent.rs | Switched proxy endpoints to enqueue into proxy_aggregator |
| bottlecap/src/traces/proxy_aggregator.rs | Added ProxyRequest struct and Aggregator for request buffering |
| bottlecap/src/traces/proxy_flusher.rs | Added Flusher to batch, send, and retry buffered proxy requests |
| bottlecap/src/traces/mod.rs | Registered new modules and added DD_ADDITIONAL_TAGS_HEADER const |
| bottlecap/src/logs/flusher.rs | Updated get_client call to pass a reference |
| bottlecap/src/http.rs | Changed get_client and build_client to accept &Arc<Config> |
Comments suppressed due to low confidence (1)
bottlecap/src/traces/proxy_flusher.rs:26
- [nitpick] There are no unit tests for the new
Flusherbehavior. Please add tests covering header merging, batch flushing, retry logic, and failure scenarios.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Feel free to request additional review from others if needed.
|
|
||
| pub async fn flush( | ||
| &self, | ||
| retry_requests: Option<Vec<reqwest::RequestBuilder>>, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my learning: is there a limit on the number of retries? I assumed the retry count in send() is a different things.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry count in the send method is mainly for the amount of times we try to retry a payload during a flush.
I think there's no limit on the number of failed payloads we try to retry – this is mainly bc we could have a network error which we need to handle.
For 400/500 errors we don't retry, maybe we should at some point, but we want to ensure we don't timeout the lambda or fill the buffer with data that will never arrive
astuyve
deleted the
jordan.gonzalez/trace-agent/aggregation-for-proxy
branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters


