Comparing v90...v91 · DataDog/datadog-lambda-extension

Commits on Dec 1, 2025

  1. Configuration menu

    Browse the repository at this point in the history

Commits on Dec 2, 2025

  1. Configuration menu

    Browse the repository at this point in the history

Commits on Dec 3, 2025

  1. [SVLS-8054] add integration testing (#946)

    ## Overview
    Setup integration tests for the lambda extension. These integration
    tests will run on every PR.
    
    ### Details
    This PR includes:
    * CDK stacks for deploying lambda integration tests. This is for the
    lambda, and any related resources, we want to test against.
    * Integration tests, setup with Jest. These invoke lambda functions,
    wait, then get Datadog telemetry data to test/verify against.
    * Gitlab Integration Test Step (info below).
    * `README.md` for how to run tests locally.
    
    Note:
    * For simplicity, this is setup to just test against the ARM variant
    (not AMD). This also doesn't include FIPS or AppSec builds. I think this
    should be a reasonable starting point for our integration tests and we
    can evaluate adding additional configuration support as needed.
    
    ### Gitlab Integration Test Step
    The integration tests step in Gitlab will:
    1. Publish the lambda extension.
    2. Deploy CDK stacks, using the newly published lambda extension.
    3. Run a test suite.
    4. Destroy the CDK stacks.
    5. Delete the lambda extension.
    
    ### Executing the integration tests
    The integration tests will automatically run on every PR. Developers can
    also run the integration tests locally by running `npm run test`. Full
    information is included in `README.md`.
    
    ### Example Integration Tests
    I added a 2 basic tests, one for node and one for python. These lambda
    function logs 'Hello World' and are setup with the extension and tracer
    library. The integration test gets the logs and traces from Datadog. It
    confirms that we have a log with the message 'Hello World!'. It also
    confirms we have spans with names `aws.lambda.cold_start`,
    `aws.lambda.load` and `aws.lambda`. Note that this isn't actually
    working correct for python for `aws.lambda.load` and
    `aws.lambda.cold_start`. Those spans are created, but with a different
    traceId so they aren't getting linked to `aws.lambda`. I will follow up
    and investigate.
    
    I plan on having a follow up PR with other runtimes.
    
    ## Testing 
    This PR triggered the integration tests, can see the [corresponding
    gitlab
    pipeline](https://gitlab.ddbuild.io/DataDog/datadog-lambda-extension/-/pipelines/84401218)
    with the newly added step 'integration-tests'. (Or see the
    'dd-gitlab/integration-test' in the checks for this PR)
    
    The results from the integration test can be obtained by going to
    [integration
    step](https://gitlab.ddbuild.io/DataDog/datadog-lambda-extension/-/jobs/1262527100)
    and downloading the artifacts. Screenshot attached below.
    
    <img width="1786" height="1177" alt="Screenshot 2025-12-01 at 9 14
    03 AM"
    src="https://github.com/user-attachments/assets/d1e1313c-b986-44d8-ad65-ab6341d0b909"
    />
    Configuration menu

    Browse the repository at this point in the history

Commits on Dec 4, 2025

  1. fix(config): support colons in tag values (URLs, etc.) (#953)

    https://datadoghq.atlassian.net/browse/SVLS-8095
    
    ## Overview
    Tag parsing previously used split(':') which broke values containing colons like URLs (git.repository_url:https://...). Changed to usesplitn(2, ':') to split only on the first colon, preserving the rest as the value.
    
    Changes:
     - Add parse_key_value_tag() helper to centralize parsing logic
     - Refactor deserialize_key_value_pairs to use helper
     - Refactor deserialize_key_value_pair_array_to_hashmap to use helper
     - Add comprehensive test coverage for URL values and edge cases
    
    ## Testing 
    unit test and expect e2e tests to pass
    
    Co-authored-by: tianning.li <tianning.li@datadoghq.com>
    Configuration menu

    Browse the repository at this point in the history

  2. docs: Add Lambda Managed Instance mode documentation (#951)

    docs: Add Lambda Managed Instance mode documentation
    
    https://datadoghq.atlassian.net/browse/SVLS-8083
    
    ## Overview
    Add comprehensive documentation for Lambda Managed Instance support
    (v90+):
    - Overview of Managed Instance mode and how it differs from standard
    Lambda
      - Automatic detection and optimization behavior
    - Background continuous flushing architecture with zero per-invocation
    overhead
    - Key differences comparison table (invocation model, flushing, use
    cases)
      - Getting started guide for users
    
      Also clarifies that custom continuous flush intervals are respected in
      Managed Instance mode (not completely ignored as previously stated).
    
    ## Testing 
    n/a
    Configuration menu

    Browse the repository at this point in the history

Commits on Dec 8, 2025

  1. Configuration menu

    Browse the repository at this point in the history

  2. Configuration menu

    Browse the repository at this point in the history

  3. Configuration menu

    Browse the repository at this point in the history

Commits on Dec 9, 2025

  1. add integration tests for node and java (#958)

    ## Overview
    * Adding integration tests for Java and Dotnet. These are similar to the
    existing Node/Python integration in that they just test very basic
    functionality - we get logs/traces from the lambda function. These tests
    are meant to be our starting point and serve as example setup for other
    integration tests.
    * Fixed how we are filtering logs to use
    `@lambda.request_id:{requestId}`. This makes it use log attributes
    instead of checking the actual log message.
    * Updated stack cleanup step to just execute CLI command instead of CDK
    command. There was a slight issue when cleaning up the Java/Dotnet
    stacks due to their code assets, so the CLI command was easier.
    * Added tag the tag `extension_integration_test: true` to all of the
    stacks to make it easier to cleanup stacks if it gets missed (deployed
    locally and forgot to clean up, pipeline cancelled before cleanup step,
    etc.) A follow up item is to create a lambda function to periodically
    run and clean up all old stacks with this tag.
    
    ## Testing 
    * Integ tests for this PR passed.
    * Checked AWS account and confirmed that there are no stacks with prefix
    `integ-61910f24`
    Configuration menu

    Browse the repository at this point in the history

  2. chore: Add a timer to avoid repeated debug logs (#954)

    ## Problem
    When the Lambda runtime spins down, the extension may enter a loop
    waiting for unfinished work, printing up to 100,000s identical lines of
    log:
    > LOGS_AGENT | No more events to process but still have senders,
    continuing to drain...
    
    For example, in one of my tests, this line was printed 31602 times
    within 1.28 seconds.
    
    <img width="1097" height="344" alt="image"
    src="https://github.com/user-attachments/assets/b3aff30c-596f-4837-a081-fbe165ba6254"
    />
    
    This:
    1. slightly complicates debugging for our engineers
    2. adds costs and confusion for customers who turn on
    `DD_LOG_LEVEL=debug` to debug the extension
    
    
    
    ## This PR
    Add a timer and print this line at most once every 100ms, so it will be
    printed at most 20 times within the 2-second spindown time.
    
    ## Testing
    No testing for now. Should be straightforward. Will see if logs are
    reduced in future debugging.
    Configuration menu

    Browse the repository at this point in the history

Commits on Dec 10, 2025

  1. APPSEC-60188: gracefully accept null in APIGW response (#960)

    I strongly suspect the .NET Lambda SDK (from Amazon) produces `null`
    values instead of omitting fields, which appears to be accepted by API
    Gateway but is presently rejected by our parsing logic. This addresses
    this problem and adds a new test case.
    
    JJ-Change-Id: vprmkv
    ZD: 2375557
    Jira: APPSEC-60188
    Configuration menu

    Browse the repository at this point in the history

  2. Configuration menu

    Browse the repository at this point in the history

  3. fix build layer script usages (#931)

    ## Overview
    
    more accurate commenting on using the build_bottlecap_layer script
    
    Co-authored-by: olivier.ndjikenzia <olivier.ndjikenzia@datadoghq.com>
    Configuration menu

    Browse the repository at this point in the history

Commits on Dec 15, 2025

  1. Configuration menu

    Browse the repository at this point in the history

  2. Configuration menu

    Browse the repository at this point in the history

  3. [SVLS-7934] feat: Support TLS certificate for trace/stats flusher (#961)

    ## Problem
    A customer reported that their Lambda is behind a proxy, and the
    Rust-based extension can't send traces to Datadog via the proxy, while
    the previous go-based extension worked.
    
    ## This PR
    Supports the env var `DD_TLS_CERT_FILE`: The path to a file of
    concatenated CA certificates in PEM format.
    Example: `DD_TLS_CERT_FILE=/opt/ca-cert.pem`, so the when the extension
    flushes traces/stats to Datadog, the HTTP client created can load and
    use this cert, and connect the proxy properly.
    
    ## Testing
    ### Steps
    1. Create a Lambda in a VPC with an NGINX proxy.
    2. Add a layer to the Lambda, which includes the CA certificate
    `ca-cert.pem`
    3. Set env vars:
        - `DD_TLS_CERT_FILE=/opt/ca-cert.pem`
    - `DD_PROXY_HTTPS=http://10.0.0.30:3128`, where `10.0.0.30` is the
    private IP of the proxy EC2 instance
        - `DD_LOG_LEVEL=debug`
    4. Update routing rules of security groups so the Lambda can reach
    `http://10.0.0.30:3128`
    5. Invoke the Lambda
    ### Result
    **Before**
    Trace flush failed with error logs:
    > DD_EXTENSION | ERROR | Max retries exceeded, returning request error
    error=Network error: client error (Connect) attempts=1
    DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent
    
    **After**
    Trace flush is successful:
    > DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces
    DD_EXTENSION | DEBUG | TRACES | Added root certificate from
    /opt/ca-cert.pem
    DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy:
    Some("http://10.0.0.30:3128")
    DD_EXTENSION | DEBUG | Sending with retry
    url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120
    max_retries=1
    DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1
    DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1
    DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms
    
    ## Notes
    This fix only covers trace flusher and stats flusher, which use
    `ServerlessTraceFlusher::get_http_client()` to create the HTTP client.
    It doesn't cover logs flusher and proxy flusher, which use a different
    function (http.rs:get_client()) to create the HTTP client. However, logs
    flushing was successful in my tests, even if no certificate was added.
    We can come back to logs/proxy flusher if someone reports an error.
    Configuration menu

    Browse the repository at this point in the history

  4. chore: Upgrade libdatadog (#964)

    ## Overview
    The crate `datadog-trace-obfuscation` has been renamed as
    `libdd-trace-obfuscation`. This PR updates this dependency.
    
    ## Testing
    Configuration menu

    Browse the repository at this point in the history

Commits on Dec 17, 2025

  1. Configuration menu

    Browse the repository at this point in the history