Live URLs:
- Production app frontend(LIVE WEBSITE): https://stickermule-frontend-386055911814.us-central1.run.app/stores
- Production app backend: https://stickermule-app-386055911814.us-central1.run.app
- Prometheus: http://35.225.111.249:9090
- Grafana: http://35.225.111.249:3000
- Jaeger: http://35.225.111.249:16686
- Redis: http://35.225.111.249:6379
- All tests passing (13/13)
- CI/CD operational (GitHub Actions)
- Full observability: Metrics + Logs + Traces
CURRENT PROJECT STRUCTURE
StickerMulePractice/
├── main.go (Handler struct, CRUD resolvers, Prometheus middleware)
├── main_test.go (13 test cases, all passing)
├── go.mod
├── go.sum
├── docker-compose.yml (Prometheus + Grafana stack)
├── traffic-generator.sh (Load testing script)
├── observability/
│ └── prometheus.yml (Prometheus scrape config)
├── .env (DATABASE_URL - not in Git)
├── .gitignore
├── README.md
└── PROGRESS.md
Testing
Running Tests
#Run all tests go test -v #Run with coverage go test -cover #Generate coverage report go test -coverprofile=coverage.out go tool cover -html=coverage.out
Test Strategy
- Unit tests with mocked database dependencies using
sqlmock - HTTP handler tests using
httptest - Comprehensive error path coverage (not found, database errors, invalid input)
- 13 comprehensive test cases
- 95%+ coverage on business logic handlers
- Test driven development approach
- Mocked database dependencies for fast, isolated tests
Features
GraphQL API
- Queries:
store(id: Int!)- Fetch store by ID
- Mutations:
createStore(name: String!, revenue: Float!, active: Boolean)- Create new storeupdateStore(id: Int!, name: String, revenue: Float, total_orders: Int, active: Boolean)- Update existing storedeleteStore(id: Int!)- Delete store
Example Queries
Create a store:
mutation { createStore(name: "My Store", revenue: 50000, active: true) { id name revenue } }
Update a store:
mutation { updateStore(id: 1, revenue: 75000) { id name revenue } }
Delete a store:
mutation { deleteStore(id: 1) { success id } }
Prometheus Metrics
- Implemented HTTP instrumentation with Counter and Histogram metrics
- Automated middleware
- Metrics exposed at
/metricsendpoint for Prometheus scraping - RED Method Coverage:
- Rate:
http_requests_total(by method, path, status) - Errors: Status code tracking (200, 404, 500, etc.)
- Duration:
http_request_duration_secondshistogram with percentiles
- Rate:
- Created
responseWriterwrapper to capture response status codes - Performance insights: /health ~0.07ms, /store ~143ms (with database latency visible)
Key Learning: Middleware pattern enables instrumentation without touching business logic. Histograms reveal distribution patterns that averages hide.
Production Pattern: Single middleware automatically instruments all endpoints - scalable and maintainable.
Observability Stack
- Set up Prometheus and Grafana in Docker using docker-compose
- Configured Prometheus to scrape
/metricsendpoint every 15s - Built comprehensive Grafana dashboard with RED method:
- Rate:
rate(http_requests_total[1m])- requests per second by endpoint - Errors: 4xx/5xx percentage tracking with regex filtering
- Duration: p50/p95/p99 latency percentiles using
histogram_quantile()
- Rate:
- Created traffic generator script for realistic load testing
- Real time visualization of all metrics with historical data retention
- Architecture: Pull based monitoring (Prometheus scrapes app, zero app dependencies)
Key Learning: Histograms + histogram_quantile() enable percentile calculations. Pull model keeps app simple, monitoring infrastructure has zero impact on app reliability.
Production Insight: Latency percentiles reveal distribution, /health at ~0.1ms vs /store at ~150ms (database overhead visible). p99 tracking catches worst case user experiences that averages hide.
Observability Stack Extended + CI/CD
- Deployed app to GCP Cloud Run (serverless, auto-scaling)
- Set up CI/CD with GitHub Actions (auto-deploy on push to main)
- Deployed Prometheus + Grafana on GCP Compute Engine (e2-micro VM)
- Configured Prometheus to scrape Cloud Run metrics via HTTPS
- Rebuilt Grafana dashboard with RED method, made json congif file:
- Rate:
rate(http_requests_total[1m])- requests per second by endpoint - Errors: 4xx/5xx percentage tracking with regex filtering
- Duration: p50/p95/p99 latency percentiles using
histogram_quantile()
- Rate:
- Fixed traffic generator for live load testing
Key Learning: Cloud Run (serverless) vs Compute Engine (VMs) - understanding when to use each. CI/CD eliminates machine-specific deployment issues. Pull based monitoring keeps app independent of observability infrastructure.
Production Insight: Latency distribution visible, /health ~0.1ms vs /store ~150ms (database overhead). p99 tracking reveals worst-case user experience. Error rate fluctuates 0-30% with traffic patterns.
Distributed Tracing
- Integrated OpenTelemetry SDK for trace instrumentation
- HTTP handlers with otelhttp middleware
- Added custom spans for database queries with metadata
- Deployed Jaeger on observability VM
- Configured OTLP HTTP exporter to send traces from Cloud Run to Jaeger
- Opened firewall port 4318 for trace ingestion
Key Learning: Distributed tracing reveals WHERE time is spent, not just HOW MUCH. Nested spans show parent-child relationships. Tags provide context. Foundation for performance optimization.
Production Insight: Traces show individual request journeys through the system. Database network latency is the primary bottleneck, candidate for caching layer.
Redis Caching
- Deployed Redis 7 on observability VM (256MB, LRU eviction)
- Using cache aside pattern for read operations
- Cache invalidation on mutations (create/update/delete)
- Added cache hit/miss Prometheus metrics
- X-Cache headers show HIT/MISS status for debugging
- Distributed tracing includes cache.get and cache.set spans
- Performance Results:
- Cache hit rate: 55% initial
- Network latency to Neon reduced significantly
Key Learning: Cache-aside pattern with TTL and invalidation prevents stale data. Graceful degradation ensures app works even if Redis fails. Metrics show cache effectiveness.
Production Insight: Hit rate climbs as cache warms up. Cache metrics enable optimization.
Architecture: Redis co-located with observability stack. Cloud Run connects via public IP (will move to internal networking with Kubernetes later).
CI/CD Pipeline Refinement
Backend CI/CD Pipeline
- Two-stage pipeline:
testjob must pass beforedeployjob runs - Code quality gates:
go vetandstaticcheckcatch bugs before deployment - Test coverage threshold: Pipeline fails if coverage drops below 20%
- Race condition detection:
go test -raceflag identifies concurrency bugs - Dependency caching: Go modules cached between runs for faster builds
- Version tagging: Automatic
v1.0.TIMESTAMPtags on each successful deploy - PR support: Tests run on pull requests without deploying
Frontend CI/CD Pipeline
- TypeScript type checking:
tsc --noEmitcatches type errors before deploy - ESLint integration: Code quality enforcement on every push
- Build verification:
npm run buildmust succeed before deployment - npm caching: Dependencies cached for faster CI runs
- Version tagging: Matches backend tagging pattern
Key Files Modified
backend/.github/workflows/deploy.yml- Full CI/CD with test gatesfrontend/.github/workflows/deploy.yml- TypeScript/ESLint checks
Key Learning: needs: test creates job dependencies - deploy only runs after tests pass. permissions: contents: write required for git tagging. Go version in CI must match go.mod exactly.
Bug Fixes:
- Go 1.25 required for module compatibility (dependencies needed newer Go)
staticcheckenforces Go style conventions (lowercase error messages)- GitHub Actions needs explicit write permission for pushing tags





