Rework and publish metric benchmarks by jack-berg · Pull Request #8000 · open-telemetry/opentelemetry-java
As mentioned #7986, I've been working through some ideas to improve the performance of the metric SDK under high contention.
To illustrate the impact on these changes, I've reworked MetricsBenchmark to include dimensions that impact record performance. The set of dimensions that play some role include:
- Instrument type / aggregation (5): counter + sum, up down counter + sum, gauge + last value, histogram + explicit histogram, histogram + base2 expo histogram
- instrument value type (2): double, long
- memory mode (2): immutable, reuseable
- temporality (2): cumulative, delta
- exemplars recorded (2): true, false
- threads (2): 1, 4
- cardinality (2): 1, 100
That forms 2 * 2 * 2 * 2 * 2 * 2 * 5 = 320 unique test cases, which is just impractical. And so I narrow it down to the most meaningful dimensions:
- eliminated instrument value type: while long vs. double matters somewhat, its not much
- eliminated memory mode: immutable vs reusable mostly matters for the collect path
- exemplars: can impact performance, but less important than other factors
With these eliminated, were down to 222*5 = 40 test cases, which is more reasonable.
I'm also using this as an opportunity to finish what @tylerbenson started and get into the routine of running benchmarks on each change on dedicated hardwhere, and publishing the results on https://open-telemetry.github.io/opentelemetry-java/benchmarks/
The unfinished problem was that the benchmarks in this repo are micro benchmarks. Their not very meaningful for end users and may even do more harm then good. What we need is a curated set of somewhat high level benchmarks, intentionally built to demonstrate / report on the types of performance characteristics that matter to end users.
This revamped MetricRecordBenchmark is the first of these. I will followup with dedicated benchmarks for other areas:
- Log SDK record and export
- Trace SDK record and export
- Metric SDK export
- Noop implementation
For reference, here are the results of the revamped MetricRecordBenchmark on my machine:
Benchmark (aggregationTemporality) (cardinality) (instrumentTypeAndAggregation) Mode Cnt Score Error Units
MetricRecordBenchmark.threads1 DELTA 1 COUNTER_SUM thrpt 5 13414.208 ± 243.504 ops/s
MetricRecordBenchmark.threads1 DELTA 1 UP_DOWN_COUNTER_SUM thrpt 5 12276.148 ± 105.900 ops/s
MetricRecordBenchmark.threads1 DELTA 1 GAUGE_LAST_VALUE thrpt 5 10896.580 ± 705.898 ops/s
MetricRecordBenchmark.threads1 DELTA 1 HISTOGRAM_EXPLICIT thrpt 5 6642.787 ± 674.574 ops/s
MetricRecordBenchmark.threads1 DELTA 1 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 3651.887 ± 304.134 ops/s
MetricRecordBenchmark.threads1 DELTA 100 COUNTER_SUM thrpt 5 8359.025 ± 777.598 ops/s
MetricRecordBenchmark.threads1 DELTA 100 UP_DOWN_COUNTER_SUM thrpt 5 9247.253 ± 423.551 ops/s
MetricRecordBenchmark.threads1 DELTA 100 GAUGE_LAST_VALUE thrpt 5 9165.700 ± 143.755 ops/s
MetricRecordBenchmark.threads1 DELTA 100 HISTOGRAM_EXPLICIT thrpt 5 7300.896 ± 684.395 ops/s
MetricRecordBenchmark.threads1 DELTA 100 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 3858.246 ± 34.989 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 1 COUNTER_SUM thrpt 5 12433.135 ± 148.315 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 1 UP_DOWN_COUNTER_SUM thrpt 5 13341.423 ± 242.611 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 1 GAUGE_LAST_VALUE thrpt 5 10628.592 ± 101.145 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 1 HISTOGRAM_EXPLICIT thrpt 5 6895.783 ± 740.681 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 1 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 4087.396 ± 435.895 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 100 COUNTER_SUM thrpt 5 10402.076 ± 240.933 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 100 UP_DOWN_COUNTER_SUM thrpt 5 9199.368 ± 107.627 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 100 GAUGE_LAST_VALUE thrpt 5 9056.580 ± 297.773 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 100 HISTOGRAM_EXPLICIT thrpt 5 7475.743 ± 979.090 ops/s
MetricRecordBenchmark.threads1 CUMULATIVE 100 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 3836.227 ± 131.765 ops/s
MetricRecordBenchmark.threads4 DELTA 1 COUNTER_SUM thrpt 5 1577.822 ± 219.796 ops/s
MetricRecordBenchmark.threads4 DELTA 1 UP_DOWN_COUNTER_SUM thrpt 5 1615.582 ± 335.284 ops/s
MetricRecordBenchmark.threads4 DELTA 1 GAUGE_LAST_VALUE thrpt 5 1208.008 ± 165.999 ops/s
MetricRecordBenchmark.threads4 DELTA 1 HISTOGRAM_EXPLICIT thrpt 5 904.243 ± 22.615 ops/s
MetricRecordBenchmark.threads4 DELTA 1 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 869.229 ± 31.214 ops/s
MetricRecordBenchmark.threads4 DELTA 100 COUNTER_SUM thrpt 5 1725.486 ± 240.360 ops/s
MetricRecordBenchmark.threads4 DELTA 100 UP_DOWN_COUNTER_SUM thrpt 5 1422.319 ± 594.337 ops/s
MetricRecordBenchmark.threads4 DELTA 100 GAUGE_LAST_VALUE thrpt 5 1560.890 ± 654.561 ops/s
MetricRecordBenchmark.threads4 DELTA 100 HISTOGRAM_EXPLICIT thrpt 5 1587.582 ± 458.715 ops/s
MetricRecordBenchmark.threads4 DELTA 100 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 1688.229 ± 181.653 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 1 COUNTER_SUM thrpt 5 1540.747 ± 137.303 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 1 UP_DOWN_COUNTER_SUM thrpt 5 1429.698 ± 220.415 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 1 GAUGE_LAST_VALUE thrpt 5 1215.367 ± 546.045 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 1 HISTOGRAM_EXPLICIT thrpt 5 1237.215 ± 18.528 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 1 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 837.980 ± 23.871 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 100 COUNTER_SUM thrpt 5 1602.628 ± 813.536 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 100 UP_DOWN_COUNTER_SUM thrpt 5 1717.663 ± 577.817 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 100 GAUGE_LAST_VALUE thrpt 5 1565.824 ± 298.550 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 100 HISTOGRAM_EXPLICIT thrpt 5 1352.174 ± 594.439 ops/s
MetricRecordBenchmark.threads4 CUMULATIVE 100 HISTOGRAM_BASE2_EXPONENTIAL thrpt 5 1465.394 ± 313.072 ops/s