feat: add e2e workspace build duration metric by sreya · Pull Request #21739 · coder/coder
Adds coderd_template_workspace_build_duration_seconds histogram that tracks the full duration from workspace build creation to agent ready. This captures the complete user-perceived build time including provisioning and agent startup. The metric is emitted when the agent reports ready/error/timeout via the lifecycle API, ensuring each build is counted exactly once per replica. Labels: template_name, organization_name, workspace_transition, status Fixes #21621
…tric - Add 'prebuild' label to distinguish prebuild creation from user builds - Emit metric only when all agents are ready (multi-agent workspaces) - Use worst status across all agents (error > timeout > success) - Use MAX(ready_at) for accurate duration calculation
- Add EnableOpenMetrics to promhttp.HandlerOpts for protobuf scraping - Remove debug logging from lifecycle.go
Removes the self-join by querying workspace_resources directly instead of starting from workspace_agents. The agent's ResourceID is already available at the call site, making this the more efficient approach.
Makes WorkspaceBuildDurationHistogram injectable on LifecycleAPI so tests can use a per-test prometheus.Registry. Each metric-related test now gathers metrics and asserts: - Correct label values (template_name, org, transition, status, is_prebuild) - Exactly one observation with positive duration - No observations when AllAgentsReady is false
Replace custom helper functions with the existing promhelp package (coderd/coderdtest/promhelp) which is the codebase standard for Prometheus metric testing. Each test now uses: - promhelp.HistogramValue() to validate labels, sample count, and sum - promhelp.MetricValue() to assert metric absence - Per-test prometheus.Registry for isolation
Use fixed buildCreatedAt and agentReadyAt times (90s apart) so tests can assert the exact GetSampleSum() rather than just > 0.
Not required for native histograms - protobuf scraping is controlled by Prometheus scrape_protocols config, not the server-side handler.
Replace the global WorkspaceBuildDurationSeconds var with NewBuildDurationHistogram(reg) so tests use the real histogram definition (same buckets, native histogram config, labels) instead of a separate test-only histogram. The histogram is created in coderd.go when Prometheus is enabled, stored on the API struct, and threaded through to the agent API via Options.
- Replace package-level var with LifecycleMetrics struct and NewLifecycleMetrics constructor - Make emitBuildDurationMetric a receiver on LifecycleAPI - Unexport emitMetricsOnce - Thread LifecycleMetrics through coderd -> agentapi Options - Tests use NewLifecycleMetrics with per-test registries
sreya
deleted the
feat/workspace-build-e2e-metric
branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters