feat: implement diagnostic channels for observability by logaretm · Pull Request #3195

feat: implement diagnostic channels for observability by logaretm · Pull Request #3195 · redis/node-redis

AI review requested due to automatic review settings

March 11, 2026 20:08

Qard approved these changes Mar 11, 2026

This was referenced

Mar 13, 2026

Use number | undefined everywhere instead of mixing with number?.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Eliminates repetitive ternary pattern. TracingChannelContextMap maps
channel names to their context types so getTracingChannel infers
the correct type without explicit generics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tead

metrics.ts no longer imports dc directly. All channel acquisition
goes through the exported helpers which handle caching and
undefined checks. dc remains private to tracing.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace traceCommand, traceBatch, traceConnect, traceConnectionWait
with a single trace(CHANNELS.TRACE_*, fn, contextFactory) function.
Channel name resolves the context type via TracingChannelContextMap.

Also adds cache for getTracingChannel to avoid re-acquiring on
every call, matching getChannel's caching pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace separate #subscriptions and #tracingChannels arrays with a
single #unsubscribers list of cleanup closures. Simpler destroy(),
no need to store channel/handler pairs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Inline the single remaining noop as an object literal in the
default #instance. No noop classes or files remain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove duplicate COMMAND_REPLY publish from sendCommand — typed
commands already publish via _executeCommand. Also merge the two
separate COMMAND_REPLY subscribers (pubsub out + streaming) into
a single #subscribeCommandReply method.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the acquire timeout fires, rejectWait() is called so the
TracingChannel error channel fires — APMs see the timeout as an
error instead of an orphaned span that never closes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When OTel is not initialized or disabled, skip instrument registration
and subscriber creation entirely. No noop instruments needed — if
there are no subscribers, channels are never fired.

Removes 139-line noop-meter.ts and all noop fallback branches from
instrument creation helpers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- COMMAND_REPLY in _executeCommand now passes sanitizeArgs(parser.redisArgs)
  instead of raw args, preventing sensitive values from leaking to subscribers
- Pool connection wait trace promise gets .catch(noop) to prevent
  unhandled rejection if a tracing subscriber throws

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When pool.destroy() is called with tasks still queued, call
rejectWait() on each pending task so the TracingChannel error
channel fires and APM subscribers close their spans cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Tracing test: check isOpen before destroying already-closed client
- OTel E2E tests: rewrite 5 tests that called removed methods
  (resiliencyMetrics.recordClientErrors, streamMetrics.recordStreamLag)
  to publish via channels instead
- Connection closed metric: split subscriber for basic (connection count)
  and advanced (close reason) so tests enabling only one group see
  correct metrics

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The original code always recorded wait time, even for 0ms waits
when a client was available. The TracingChannel refactor only traced
the "no client, must wait" path. Add trace for immediate availability
so the metric is always emitted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…or event

Remove redundant publish(CHANNELS.ERROR) from sendCommand — the
TracingChannel already emits error events for command failures.
Wire OTel resiliency subscriber to commandTC.error for command-level
error counts; CHANNELS.ERROR now only carries cluster/internal errors.

Extract subscribeTC() helper to reduce TracingChannel subscription
boilerplate and #recordError() to share error metric recording logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The previous approach unconditionally dropped all MOVED/ASK errors in
#recordError, including cluster-origin ones that indicate slot migration.
Move the filter to each call site so client-origin redirections are
skipped while cluster-origin ones are still recorded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion

internal and origin are separate fields — an error can be non-internal
but still originate from the cluster. Use ctx.origin === 'client' as
Pavel suggested to correctly identify client-origin redirections.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Short-circuit with noop stubs when enabled is false, so metrics are not
recorded even when meterProvider and enabledMetricGroups are provided.
Update the disabled-metrics test to supply those options, verifying the
guard works rather than passing vacuously.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Redis replies QUEUED for each command inside MULTI before EXEC runs,
so per-command traces resolve as successful even when EXEC fails. Only
the batch-level trace reflects the real outcome. Remove per-command
tracing from _executeMulti and keep only the TRACE_BATCH span.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>