Add benchmark for two scopes by andrewlock · Pull Request #7869 · DataDog/dd-trace-dotnet

@andrewlock

## Summary of changes

Replace `ArraySegment<Span>` with `SpanCollection`

## Reason for change

`ArraySegment<Span>` is a wrapper around an array, and by design, always
requires allocating an array.

However, if we look at the [distribution of
traces](https://app.datadoghq.com/notebook/13270656/partial-flush-worth-it-or-not?computational_onboarding=true&fullscreen_end_ts=1760111663000&fullscreen_paused=true&fullscreen_refresh_mode=paused&fullscreen_start_ts=1759829808000&fullscreen_widget=3c8idtpl)
received for processing, we see that 50% of traces only have a single
span.

Consequently, `SpanCollection` takes a similar approach [to
`StringValues`](https://andrewlock.net/a-brief-look-at-stringvalues/),
in which it is an abstraction around _either_ a single `Span` _or_ a
`Span[]`. This means we can avoid allocating the array entirely until we
need to.

For small traces (with a single span), this saves 56 Bytes per
scope/trace or ~8% of the basic trace size. For larger traces,
`SpanCollection` and `ArraySegment<Span>` are essentially identical, so
the allocation is the same.

Given `SpanCollection` (and `ArraySegment` actually) are `readonly
struct`, also added `in` to the signatures (given that both structs are
the same size and > pointer size).

> The only practical way I could see to actually make `SpanCollection`
pointer-sized is to remove the `Count` parameter. But that means we
_either_ need to allocate an "ArraySegment" wrapper around the Array, to
hold the count, _or_ we always allocate an array of the "correct"
length. I explored the latter in a separate PR, using an array pool
during the "builder" step, and then allocating an array of the correct
size subsequently, but the allocation gains were marginal, and it didn't

## Implementation details

The changes are essentially:
- `SpanCollection` holds _either_ a `Span` _or_ a `Span[]` (_or_ `null`)
- We store this in the same field (much like `StringValues` does), as it
reduces the size of the struct which brings small perf benefits
- Pass the span around via `in` to reduce chance of defensive copies
- Fix/replace uses of Moq which requires different usage for `in`/`ref`
fields, and can't provide the same functionality as a stub

## Test coverage

Added some unit tests for the implementation, but the important thing is
the benchmarks. We see an 8-8.5% reduction in the allocations for create
span/create scope:


| Benchmark | Base Allocated | Diff Allocated | Change | Change % |
|:----------|-----------:|-----------:|--------:|--------:|
| Benchmarks.Trace.SpanBenchmark.StartFinishScope&#8209;net6.0 | 696 B |
640 B | -56 B | -8.05%
| Benchmarks.Trace.SpanBenchmark.StartFinishScope&#8209;netcoreapp3.1 |
696 B | 640 B | -56 B | -8.05%
| Benchmarks.Trace.SpanBenchmark.StartFinishScope&#8209;net472 | 658 B |
602 B | -56 B | -8.51%
| Benchmarks.Trace.SpanBenchmark.StartFinishSpan&#8209;net472 | 578 B |
522 B | -56 B | -9.69%
| Benchmarks.Trace.SpanBenchmark.StartFinishSpan&#8209;net6.0 | 576 B |
520 B | -56 B | -9.72%
| Benchmarks.Trace.SpanBenchmark.StartFinishSpan&#8209;netcoreapp3.1 |
576 B | 520 B | -56 B | -9.72%

Other benchmarks which create a single span see similar improvements.

Note that the _two_ scope benchmark (added in #7869) shows essentially
no change (as expected).

## Other details

I tried a variety of variations on this approach:
- Keep separate fields for `Span` and `Span[]`
- Nicer as it's type safe again, but increases allocations (by a
pointer) so prob not worth it as this is hot path
- Don't pass via `in`
  - Slows things down slightly
- Remove the `Count` field (and make `SpanCollection` pointer-sized)
- Requires allocating exact-sized arrays, so not practical to handle
array growth
- Same as above, but use an array pool for the build stage, and then
only allocate the fixed size on close
- Does show improvements in allocation, but more complexity. May be
worth considering (I'll create a separate PR for it)
- Don't have a builder, just use an array pool + a Count field, and rely
on reliably cleaning up the pooling,

https://datadoghq.atlassian.net/browse/LANGPLAT-841