Comparing v90...v91 · DataDog/datadog-lambda-extension
Commits on Dec 3, 2025
-
[SVLS-8054] add integration testing (#946)
## Overview Setup integration tests for the lambda extension. These integration tests will run on every PR. ### Details This PR includes: * CDK stacks for deploying lambda integration tests. This is for the lambda, and any related resources, we want to test against. * Integration tests, setup with Jest. These invoke lambda functions, wait, then get Datadog telemetry data to test/verify against. * Gitlab Integration Test Step (info below). * `README.md` for how to run tests locally. Note: * For simplicity, this is setup to just test against the ARM variant (not AMD). This also doesn't include FIPS or AppSec builds. I think this should be a reasonable starting point for our integration tests and we can evaluate adding additional configuration support as needed. ### Gitlab Integration Test Step The integration tests step in Gitlab will: 1. Publish the lambda extension. 2. Deploy CDK stacks, using the newly published lambda extension. 3. Run a test suite. 4. Destroy the CDK stacks. 5. Delete the lambda extension. ### Executing the integration tests The integration tests will automatically run on every PR. Developers can also run the integration tests locally by running `npm run test`. Full information is included in `README.md`. ### Example Integration Tests I added a 2 basic tests, one for node and one for python. These lambda function logs 'Hello World' and are setup with the extension and tracer library. The integration test gets the logs and traces from Datadog. It confirms that we have a log with the message 'Hello World!'. It also confirms we have spans with names `aws.lambda.cold_start`, `aws.lambda.load` and `aws.lambda`. Note that this isn't actually working correct for python for `aws.lambda.load` and `aws.lambda.cold_start`. Those spans are created, but with a different traceId so they aren't getting linked to `aws.lambda`. I will follow up and investigate. I plan on having a follow up PR with other runtimes. ## Testing This PR triggered the integration tests, can see the [corresponding gitlab pipeline](https://gitlab.ddbuild.io/DataDog/datadog-lambda-extension/-/pipelines/84401218) with the newly added step 'integration-tests'. (Or see the 'dd-gitlab/integration-test' in the checks for this PR) The results from the integration test can be obtained by going to [integration step](https://gitlab.ddbuild.io/DataDog/datadog-lambda-extension/-/jobs/1262527100) and downloading the artifacts. Screenshot attached below. <img width="1786" height="1177" alt="Screenshot 2025-12-01 at 9 14 03 AM" src="https://github.com/user-attachments/assets/d1e1313c-b986-44d8-ad65-ab6341d0b909" />
Commits on Dec 4, 2025
-
fix(config): support colons in tag values (URLs, etc.) (#953)
https://datadoghq.atlassian.net/browse/SVLS-8095 ## Overview Tag parsing previously used split(':') which broke values containing colons like URLs (git.repository_url:https://...). Changed to usesplitn(2, ':') to split only on the first colon, preserving the rest as the value. Changes: - Add parse_key_value_tag() helper to centralize parsing logic - Refactor deserialize_key_value_pairs to use helper - Refactor deserialize_key_value_pair_array_to_hashmap to use helper - Add comprehensive test coverage for URL values and edge cases ## Testing unit test and expect e2e tests to pass Co-authored-by: tianning.li <tianning.li@datadoghq.com>
-
docs: Add Lambda Managed Instance mode documentation (#951)
docs: Add Lambda Managed Instance mode documentation https://datadoghq.atlassian.net/browse/SVLS-8083 ## Overview Add comprehensive documentation for Lambda Managed Instance support (v90+): - Overview of Managed Instance mode and how it differs from standard Lambda - Automatic detection and optimization behavior - Background continuous flushing architecture with zero per-invocation overhead - Key differences comparison table (invocation model, flushing, use cases) - Getting started guide for users Also clarifies that custom continuous flush intervals are respected in Managed Instance mode (not completely ignored as previously stated). ## Testing n/a
Commits on Dec 8, 2025
Commits on Dec 9, 2025
-
add integration tests for node and java (#958)
## Overview * Adding integration tests for Java and Dotnet. These are similar to the existing Node/Python integration in that they just test very basic functionality - we get logs/traces from the lambda function. These tests are meant to be our starting point and serve as example setup for other integration tests. * Fixed how we are filtering logs to use `@lambda.request_id:{requestId}`. This makes it use log attributes instead of checking the actual log message. * Updated stack cleanup step to just execute CLI command instead of CDK command. There was a slight issue when cleaning up the Java/Dotnet stacks due to their code assets, so the CLI command was easier. * Added tag the tag `extension_integration_test: true` to all of the stacks to make it easier to cleanup stacks if it gets missed (deployed locally and forgot to clean up, pipeline cancelled before cleanup step, etc.) A follow up item is to create a lambda function to periodically run and clean up all old stacks with this tag. ## Testing * Integ tests for this PR passed. * Checked AWS account and confirmed that there are no stacks with prefix `integ-61910f24` -
chore: Add a timer to avoid repeated debug logs (#954)
## Problem When the Lambda runtime spins down, the extension may enter a loop waiting for unfinished work, printing up to 100,000s identical lines of log: > LOGS_AGENT | No more events to process but still have senders, continuing to drain... For example, in one of my tests, this line was printed 31602 times within 1.28 seconds. <img width="1097" height="344" alt="image" src="https://github.com/user-attachments/assets/b3aff30c-596f-4837-a081-fbe165ba6254" /> This: 1. slightly complicates debugging for our engineers 2. adds costs and confusion for customers who turn on `DD_LOG_LEVEL=debug` to debug the extension ## This PR Add a timer and print this line at most once every 100ms, so it will be printed at most 20 times within the 2-second spindown time. ## Testing No testing for now. Should be straightforward. Will see if logs are reduced in future debugging.
Commits on Dec 10, 2025
-
APPSEC-60188: gracefully accept
nullin APIGW response (#960)I strongly suspect the .NET Lambda SDK (from Amazon) produces `null` values instead of omitting fields, which appears to be accepted by API Gateway but is presently rejected by our parsing logic. This addresses this problem and adds a new test case. JJ-Change-Id: vprmkv ZD: 2375557 Jira: APPSEC-60188
-
fix build layer script usages (#931)
## Overview more accurate commenting on using the build_bottlecap_layer script Co-authored-by: olivier.ndjikenzia <olivier.ndjikenzia@datadoghq.com>
Commits on Dec 15, 2025
-
[SVLS-7934] feat: Support TLS certificate for trace/stats flusher (#961)
## Problem A customer reported that their Lambda is behind a proxy, and the Rust-based extension can't send traces to Datadog via the proxy, while the previous go-based extension worked. ## This PR Supports the env var `DD_TLS_CERT_FILE`: The path to a file of concatenated CA certificates in PEM format. Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension flushes traces/stats to Datadog, the HTTP client created can load and use this cert, and connect the proxy properly. ## Testing ### Steps 1. Create a Lambda in a VPC with an NGINX proxy. 2. Add a layer to the Lambda, which includes the CA certificate `ca-cert.pem` 3. Set env vars: - `DD_TLS_CERT_FILE=/opt/ca-cert.pem` - `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the private IP of the proxy EC2 instance - `DD_LOG_LEVEL=debug` 4. Update routing rules of security groups so the Lambda can reach `http://10.0.0.30:3128` 5. Invoke the Lambda ### Result **Before** Trace flush failed with error logs: > DD_EXTENSION | ERROR | Max retries exceeded, returning request error error=Network error: client error (Connect) attempts=1 DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent **After** Trace flush is successful: > DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces DD_EXTENSION | DEBUG | TRACES | Added root certificate from /opt/ca-cert.pem DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy: Some("http://10.0.0.30:3128") DD_EXTENSION | DEBUG | Sending with retry url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120 max_retries=1 DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1 DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1 DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms ## Notes This fix only covers trace flusher and stats flusher, which use `ServerlessTraceFlusher::get_http_client()` to create the HTTP client. It doesn't cover logs flusher and proxy flusher, which use a different function (http.rs:get_client()) to create the HTTP client. However, logs flushing was successful in my tests, even if no certificate was added. We can come back to logs/proxy flusher if someone reports an error. -
chore: Upgrade libdatadog (#964)
## Overview The crate `datadog-trace-obfuscation` has been renamed as `libdd-trace-obfuscation`. This PR updates this dependency. ## Testing