Switch to allocation-free enumeration of `Activity` objects by andrewlock Β· Pull Request #8041 Β· DataDog/dd-trace-dotnet
This was referenced
Jan 8, 2026This was referenced
Jan 8, 2026andrewlock added a commit that referenced this pull request
Jan 9, 2026## Summary of changes Update `Activity`-based benchmarks to reduce variability and make comparisons easier ## Reason for change The various `Activity`-related benchmarks call the global `Tracer` instance, so we should make sure to configure it with our default benchmarking settings (basically disabling background jobs like telemetry/discovery/remote config) to reduce variation. Also added a "baseline" job for comparison and a version that uses hiearachical IDs instead of W3C IDs. Was a prerequisite for a bunch of other work. ## Implementation details - Configure `Benchmarks.OpenTelemetry.InstrumentedApi` project and `Benchmarks.Trace/ActivityBenchmark` to setup the global tracer with Telemetry etc disabled - Add two extra benchmarks to `Benchmarks.Trace/ActivityBenchmark` - `StartStopWithChild_Baseline`, which is the same as `StartStopWithChild` but without the DD integration - `StartStopWithChild_Hierarchical`, which is the same as `StartStopWithChild` but uses hierarchical ID format - Don't run either of these in CI for now (to avoid extra load), just for local comparisons - Simplify the benchmark to just do explicit duck typing (which is closer to what we do normally anyway, and removes a bunch of code) ## Test coverage Running these benchmarks locally shows the improvements we need to make, and highlights that we clearly have a bug with hierarchical IDs π¬ | Method | Runtime | Mean | Error | StdDev | Gen0 | Gen1 | Allocated | | ------------------------------- | -------------------- | ----------: | ----------: | ----------: | -----: | -----: | --------: | | StartStopWithChild_Baseline | .NET 6.0 | 672.9 ns | 23.23 ns | 66.66 ns | 0.0038 | - | 1.09 KB | | StartStopWithChild_Hierarchical | .NET 6.0 | 35,478.1 ns | 1,035.96 ns | 3,021.94 ns | - | - | 10.47 KB | | StartStopWithChild | .NET 6.0 | 3,681.7 ns | 50.23 ns | 44.53 ns | 0.0153 | - | 4.87 KB | | StartStopWithChild_Baseline | .NET 8.0 | 570.8 ns | 16.96 ns | 49.22 ns | 0.0029 | - | 1.09 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 32,478.5 ns | 1,215.04 ns | 3,466.57 ns | - | - | 10.48 KB | | StartStopWithChild | .NET 8.0 | 3,021.4 ns | 59.24 ns | 163.16 ns | 0.0153 | - | 4.77 KB | | StartStopWithChild_Baseline | .NET Core 3.1 | 752.6 ns | 15.06 ns | 30.09 ns | 0.0038 | - | 1.19 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 34,170.1 ns | 572.80 ns | 478.31 ns | - | - | 10.75 KB | | StartStopWithChild | .NET Core 3.1 | 4,905.0 ns | 31.63 ns | 29.58 ns | 0.0153 | - | 5.05 KB | | StartStopWithChild_Baseline | .NET Framework 4.7.2 | 770.7 ns | 6.03 ns | 5.64 ns | 0.2012 | - | 1.24 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 37,386.2 ns | 542.73 ns | 453.21 ns | 1.8921 | 0.1221 | 11.77 KB | | StartStopWithChild | .NET Framework 4.7.2 | 5,884.5 ns | 64.54 ns | 60.37 ns | 0.8621 | - | 5.3 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 π - #8037 - #8038 - #8039 - #8040 - #8041 - #8042
andrewlock added a commit that referenced this pull request
Jan 9, 2026## Summary of changes Reduce allocation of `ActivityHandlerCommon` by removing `string` concatenation ## Reason for change The `ActivityHandlerCommon.ActivityStarted` and `ActivityStopped` methods need to store and retrieve `Activity` instances from a `ConcurrentDictionary<>`. Today they're doing that be concatenating the `Activity`'s `TraceId` and `SpanID`, or by using it's `ID`. All that concatenation causes a bunch of allocation, so instead introduce a simple `struct` to use as the key instead ## Implementation details Introduce `ActivityKey`, which is essentially `internal readonly record struct ActivityKey(string TraceId, string SpanId`, and use that for all of the dictionary lookups. Which avoids all the string concatenation allocations. ## Test coverage Added some unit tests for `ActivityKey`, by mostly covered by existing integration tests for correctness. Benchmarks show a significant improvement over [the previous results](#8036), particularly for Hierachical IDs which clearly were buggy | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8036 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 4.217 us | 0.3227 us | 0.0153 | 4.09 KB | -6.38 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 3.413 us | 0.2505 us | 0.0076 | 4.09 KB | -6.39 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.676 us | 0.4636 us | 0.0153 | 4.32 KB | -6.43 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 6.813 us | 0.4969 us | 0.7324 | 4.53 KB | -7.24 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 4.105 us | 0.2677 us | 0.0153 | 4.3 KB | -0.57 KB | | StartStopWithChild | .NET 8.0 | 3.475 us | 0.1570 us | 0.0114 | 4.2 KB | -0.57 KB | | StartStopWithChild | .NET Core 3.1 | 5.647 us | 0.3129 us | 0.0153 | 4.48 KB | -0.57 KB | | StartStopWithChild | .NET Framework 4.7.2 | 6.842 us | 0.2992 us | 0.7629 | 4.69 KB | -0.61 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 π - #8038 - #8039 - #8040 - #8041 - #8042
andrewlock added a commit that referenced this pull request
Jan 9, 2026## Summary of changes Fix incorrect nullable annotations on `Activity` duck types ## Reason for change While working on other performance things, noticed that the nullable annotations often declared non-nullability when they actually could be null. A particularly confusing part are the `TraceId` and `SpanId` values in `IW3CActivity`. These were marked non-nullable because when you call `Activity.Start()` then these will always be non-null, but _only_ if you're using W3C IDs. If you're using hierarchical IDs ([the default in <.NET 5](https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/5.0/default-activityidformat-changed)) then these values _will_ be null. As a side note, I suspect this explains the "we saw errors about these being null in error tracking but don't understand why" scenarios π Also, I think we should rename `IW3CActivity` to `IActivity3` instead. W3C _implies_ that it's a W3C activity, but that's not necessarily the case, and is essentially the source of the above confusion I think. `IActivity3` would then be consistently named with `IActivity5` and `IActivity6` we also currently have. ## Implementation details Add nullable annotations to values that _can_ be null, and fix the fallout (mostly in `ActivityKey`) ## Test coverage Covered by existing tests sufficiently I think ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 π - #8038 - #8039 - #8040 - #8041 - #8042
andrewlock added a commit that referenced this pull request
Jan 9, 2026## Summary of changes Avoids allocating a closure in .NET Core if we can avoid it ## Reason for change .NET Core's `ConcurrentDictionary.GetOrAdd()` method allows providing a "state" object which we can pass to the `GetOrAdd` method. Using this method avoids allocating a closure every time the method is hit, and we can pass the state using a value tuple to avoid additional allocation there ## Implementation details `#if`/`#else` to glory ## Test coverage Functionality is covered by existing tests, benchmarks show an incremental improvement over #8037 for .NET Core, as expected: | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8037 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 3.746 us | 0.1128 us | 0.0114 | 3.81 KB | -0.28 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 2.759 us | 0.0374 us | 0.0114 | 3.81 KB | -0.28 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.762 us | 0.0584 us | 0.0153 | 4.04 KB | -0.28 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.651 us | 0.0717 us | 0.7324 | 4.54 KB | 0.01 KB (noise) | | | | | | | | | | StartStopWithChild | .NET 6.0 | 3.607 us | 0.0508 us | 0.0153 | 4.02 KB | -0.28 KB | | StartStopWithChild | .NET 8.0 | 2.921 us | 0.0617 us | 0.0114 | 3.91 KB | -0.29 KB | | StartStopWithChild | .NET Core 3.1 | 4.922 us | 0.0407 us | 0.0153 | 4.2 KB | -0.28 KB | | StartStopWithChild | .NET Framework 4.7.2 | 6.008 us | 0.0979 us | 0.7629 | 4.69 KB | 0 | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 - #8038 - #8039 π - #8040 - #8041 - #8042
pablomartinezbernardo pushed a commit that referenced this pull request
Jan 10, 2026## Summary of changes Reduce allocation of `ActivityHandlerCommon` by removing `string` concatenation ## Reason for change The `ActivityHandlerCommon.ActivityStarted` and `ActivityStopped` methods need to store and retrieve `Activity` instances from a `ConcurrentDictionary<>`. Today they're doing that be concatenating the `Activity`'s `TraceId` and `SpanID`, or by using it's `ID`. All that concatenation causes a bunch of allocation, so instead introduce a simple `struct` to use as the key instead ## Implementation details Introduce `ActivityKey`, which is essentially `internal readonly record struct ActivityKey(string TraceId, string SpanId`, and use that for all of the dictionary lookups. Which avoids all the string concatenation allocations. ## Test coverage Added some unit tests for `ActivityKey`, by mostly covered by existing integration tests for correctness. Benchmarks show a significant improvement over [the previous results](#8036), particularly for Hierachical IDs which clearly were buggy | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8036 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 4.217 us | 0.3227 us | 0.0153 | 4.09 KB | -6.38 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 3.413 us | 0.2505 us | 0.0076 | 4.09 KB | -6.39 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.676 us | 0.4636 us | 0.0153 | 4.32 KB | -6.43 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 6.813 us | 0.4969 us | 0.7324 | 4.53 KB | -7.24 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 4.105 us | 0.2677 us | 0.0153 | 4.3 KB | -0.57 KB | | StartStopWithChild | .NET 8.0 | 3.475 us | 0.1570 us | 0.0114 | 4.2 KB | -0.57 KB | | StartStopWithChild | .NET Core 3.1 | 5.647 us | 0.3129 us | 0.0153 | 4.48 KB | -0.57 KB | | StartStopWithChild | .NET Framework 4.7.2 | 6.842 us | 0.2992 us | 0.7629 | 4.69 KB | -0.61 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 π - #8038 - #8039 - #8040 - #8041 - #8042
andrewlock added a commit that referenced this pull request
Jan 12, 2026## Summary of changes Various optimizations in the `Activity` handling code to reduce allocations and execution time ## Reason for change Some properties are triksy, as they do a bunch of allocation, so we should avoid them if we can. The changes in here _look_ bigger than they are diff-wise, I've added comments to aid review. ## Implementation details - Don't call `ParentId` until we definitely need it. - This property does a bunch of allocation to generate a "valid" value, so we should avoid it if we can. This makes the conditions a bit harder to read, but delays calling `ParentId` until we're sure we don't have something better already - Avoid calling `Tracer.Instance.ActiveScope?.Span` until we know we need it (very minor optimisations but why not π€·ββοΈ) - Extract `StopActivitySlow` to a separate method, as we _shouldn't_ hit this now, so should help the JIT out with things like code size etc of the calling method (conjecture, not tested, but I think it's better from code understanding PoV too) - Simplify `ShouldIgnoreByOperationName` which also improves execution time 10x from ~30us to ~3us. ## Test coverage Functionally covered by existing tests. Benchmarks compared to #8039 show improvments. Allocations listed below, but execution time is also improved: I re-ran the numbers after making the updates, and the benchmarks are actually better: <details><summary>Benchmarks for original PR</summary> <p> | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8039 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 3.410 us | 0.0658 us | 0.0153 | 3.79 KB | -0.02 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 2.582 us | 0.0443 us | 0.0114 | 3.79 KB | -0.02 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.272 us | 0.0731 us | 0.0153 | 4.02 KB | -0.02 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.165 us | 0.1245 us | 0.7324 | 4.51 KB | -0.03 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 3.312 us | 0.0266 us | 0.0153 | 3.78 KB | -0.04 KB | | StartStopWithChild | .NET 8.0 | 2.648 us | 0.0306 us | 0.0114 | 3.78 KB | -0.13 KB | | StartStopWithChild | .NET Core 3.1 | 4.344 us | 0.0555 us | 0.0076 | 3.97 KB | -0.23 KB | | StartStopWithChild | .NET Framework 4.7.2 | 5.234 us | 0.1568 us | 0.7095 | 4.39 KB | -0.30 KB | </p> </details> | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Alloc Ratio | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 3.770 us | 0.3101 us | 0.0076 | 3.58 KB | -0.23 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 3.039 us | 0.2377 us | 0.0076 | 3.58 KB | -0.23 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.490 us | 0.1135 us | 0.0076 | 3.8 KB | -0.22 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.303 us | 0.2469 us | 0.6943 | 4.3 KB | -0.24 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 3.386 us | 0.0971 us | 0.0114 | 3.59 KB | -0.43 KB | | StartStopWithChild | .NET 8.0 | 2.661 us | 0.0218 us | 0.0114 | 3.59 KB | -0.32 KB | | StartStopWithChild | .NET Core 3.1 | 4.540 us | 0.1625 us | 0.0076 | 3.78 KB | -0.42 KB | | StartStopWithChild | .NET Framework 4.7.2 | 5.563 us | 0.1946 us | 0.6790 | 4.2 KB | -0.49 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 - #8038 - #8039 - #8040 π - #8041 - #8042
Base automatically changed from andrew/otel/ParentId to master
January 12, 2026 18:58TODO: - Add unit tests - Add tests for handling of fallback path, particularly with "old" activity types - Benchmark across all different TFMs - Expose more widely? - Use for activity links and events
- Explicitly checks we have the correct type on every execution - Does an allocating-enumeration if we have the wrong type - Encapsulate the enumeration
andrewlock added a commit that referenced this pull request
Jan 13, 2026β¦8042) ## Summary of changes Improve performance around the "population" of tags from an `Activity` into a `Span` ## Reason for change Currently we do a _lot_ of allocation in the `OtlpHelpers.AgentConvertSpan` method. The `OpenTelemetryTags` object also has a lot of properties on it that we never set, and are only used for _reading_ tags in the `OperationMapper`, even though many may never even be read. This increases the size of the tags object. This PR aims to do various optimizations around the "close activity" paths: - Reduce the size of the `OpenTelemetryTags` object by removing properties only used by `OperationMapper` - Add properties to `OpenTelemetryTags` for values that we explictly set in `OtlpHelpers.AgentConvertSpan` ## Implementation details - Currently we're always creating the `OpenTelemetryTags`, even though we _may_ throw it away if the activity is ignored, so that's an easy win. - I think we can assume that the tags object passed to `AgentConvertSpan` is only ever `OpenTelemetryTags` based on the call sites, so we can interact with the tags object directly where posisble. - "Simplify" some of the code paths by avoiding `span.SetTag` and favouring `tags.SetTag` when we know it's not a "special" tag. - Inline a few method calls in places where they will often not be called, at the expense of some clarity - Refactor `OperationMapper` to use the `GetTag()` API, and cache the values at various points. The resulting code is uglier, but overall means we reduce ~100 bytes off every span, so I think it's worth it ## Test coverage Functionality is covered by existing, this is just a refactoring, and there's good tests for `OperationMapper` currently. Benchmarks vs #8041 show a nice allocation improvement, at the expense of slower execution (this was local though, so not 100% on that, will see what the CI results show). If there is a significant slow down, I'll try to isolate it, but I think the allocation improvments are probably worth it either way | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8041 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 4.438 us | 0.7347 us | 0.0076 | 3.2 KB | -0.40 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 3.216 us | 0.4260 us | 0.0076 | 3.2 KB | -0.40 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.293 us | 0.8316 us | 0.0076 | 3.42 KB | -0.41 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.735 us | 0.5592 us | 0.6332 | 3.92 KB | -0.41 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 3.848 us | 0.5325 us | 0.0076 | 3.19 KB | -0.40 KB | | StartStopWithChild | .NET 8.0 | 3.080 us | 0.2698 us | 0.0076 | 3.19 KB | -0.40 KB | | StartStopWithChild | .NET Core 3.1 | 5.094 us | 0.4982 us | 0.0076 | 3.38 KB | -0.40 KB | | StartStopWithChild | .NET Framework 4.7.2 | 6.172 us | 0.5948 us | 0.6104 | 3.79 KB | -0.41 KB | ## Other details Added a couple of extra tests that were missing from the previous PR in the stack too https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 - #8038 - #8039 - #8040 - #8041 - #8042 π
andrewlock added a commit that referenced this pull request
Jan 15, 2026β¦#8058) ## Summary of changes Don't use the `DynamicMethod` approach in .NET 10 ## Reason for change In #8041 we added allocation-free enumeration of tags objects by building a `DynamicMethod` that calls the `struct` method to avoid boxing. I benchmarked it on a bunch of TFMs, but missed .NET 10. However, .NET 10 enumeration is _already_ allocation free, so the `DynamicMethod` actually hurts performance (presumably primarily because it messes with other inlining and stack allocation improvements the JIT can do), so we _shouldn't_ use the `DynamicMethod` approach in .NET 10 π ## Implementation details - Just do a "normal" enumeration of the tags if it's .NET 10 - Add .NET 10 to the benchmark project TFMs ## Test coverage Did a quick benchmark comparing naive enumeration of the tag objects of a duck typed activity with and without the dynamic method approach. In general, the results are better for `DynamicMethod` in all these TFMs _except_ .NET 10 (I tested .NET 8/9 previously and confirmed `DynamicMEthod` is better there too) | Method | Runtime | Mean | Error | StdDev | Median | Allocated | | -------------------------- | ------------------ | -------: | -------: | -------: | -------: | --------: | | EnumerateTags | .NET 10.0 | 189.9 ns | 6.94 ns | 20.46 ns | 182.6 ns | 592 B | | EnumerateTagsDynamicMethod | .NET 10.0 | 205.6 ns | 4.14 ns | 8.83 ns | 205.1 ns | 592 B | | | | | | | | | | EnumerateTags | .NET 6.0 | 366.1 ns | 9.49 ns | 27.54 ns | 361.0 ns | 624 B | | EnumerateTagsDynamicMethod | .NET 6.0 | 307.9 ns | 6.37 ns | 18.29 ns | 299.4 ns | 592 B | | | | | | | | | | EnumerateTags | .NET Core 3.1 | 501.0 ns | 12.15 ns | 35.64 ns | 494.2 ns | 672 B | | EnumerateTagsDynamicMethod | .NET Core 3.1 | 441.9 ns | 8.83 ns | 25.19 ns | 436.7 ns | 640 B | | | | | | | | | | EnumerateTags | .NET Framework 4.8 | 536.9 ns | 10.52 ns | 14.05 ns | 534.7 ns | 746 B | | EnumerateTagsDynamicMethod | .NET Framework 4.8 | 542.4 ns | 13.83 ns | 40.33 ns | 534.9 ns | 714 B | <details><summary>Benchmark additions in `ActivityBenchmark</summary> <p> ```csharp private AllocationFreeEnumerator<IEnumerable<KeyValuePair<string, object?>>, KeyValuePair<string, object?>, long>.AllocationFreeForEachDelegate _enumerator; [GlobalSetup] public void GlobalSetup() { // ... using var activity = CreateActivity(); _enumerator = AllocationFreeEnumerator<IEnumerable<KeyValuePair<string, object?>>, KeyValuePair<string, object?>, long> .BuildAllocationFreeForEachDelegate(activity.DuckCast<IActivity6>().TagObjects.GetType()); } [Benchmark] public long EnumerateTags() { using var parent = CreateActivity(); var parentMock = parent.DuckCast<IActivity6>(); long count = 0; foreach (var pair in parentMock.TagObjects) { count++; } return count; } [Benchmark] public long EnumerateTagsDynamicMethod() { using var parent = CreateActivity(); var parentMock = parent.DuckCast<IActivity6>(); long count = 0; _enumerator( parentMock.TagObjects, ref count, static (ref state, i) => { state++; return true; }); return count; } ``` </p> </details>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters