Add OTel metrics dual export alongside Prometheus by shuheiktgw · Pull Request #6262 · quickwit-oss/quickwit

@shuheiktgw

…e ops)

OTel Gauge does not support relative operations (inc/dec/add/sub), which
forced the dual-write layer to read back from Prometheus after mutation.
This splits the type into two semantically correct variants:

- IntGauge: absolute state via set()/get(), backed by OTel Gauge<i64>
- IntUpDownCounter: relative deltas via inc()/dec()/add()/sub(), backed
  by OTel UpDownCounter<i64>

Both still use prometheus::IntGauge on the Prometheus side.

Also renames GaugeGuard → UpDownCounterGuard and
OwnedGaugeGuard → OwnedUpDownCounterGuard to match the new semantics.
The no-op set() was a temporary shim after the IntGauge/IntUpDownCounter
split. This removes it and fixes the three call sites that relied on it:

- WAL: remove wal field from InFlightDataGauges, add
  wal_memory_allocated_bytes IntGauge to INGEST_V2_METRICS
- Search: split pending (IntGauge) and ongoing (IntUpDownCounter) into
  standalone metrics instead of sharing a vec
- Storage: replace fd_cache set() calls with inc/dec based on push/pop
  return values
- Indexing: change ongoing_merge_operations to IntGauge
Replace Vec<KeyValue> with Arc<[KeyValue]> for metric attributes and
Vec<String> with Arc<[Key]> for label names to reduce heap allocations
on the hot path when recording metrics with labels.
…-dual-export

# Conflicts:
#	quickwit/Cargo.lock
#	quickwit/quickwit-actors/src/mailbox.rs
#	quickwit/quickwit-common/Cargo.toml
#	quickwit/quickwit-common/src/runtimes.rs
#	quickwit/quickwit-common/src/stream_utils.rs
#	quickwit/quickwit-common/src/thread_pool.rs
#	quickwit/quickwit-metastore/src/metastore/postgres/metrics.rs

@shuheiktgw

Add FNV-hash-keyed caches to IntCounterVec, IntGaugeVec,
IntUpDownCounterVec, and HistogramVec, following the same strategy
used by the prometheus crate internally. On cache hit, with_label_values
now returns a clone of the cached metric (Arc refcount bumps only)
instead of re-allocating OTel attributes on every call.

@shuheiktgw

Reduces lock contention in with_label_values by switching from a
single RwLock guarding the entire HashMap to DashMap's sharded
concurrent map.