GitHub - VejayPersaud/StickerMulePractice

Stickermule_SitePhoto1.png

Stickermule_SitePhoto2.png

Live URLs:

CURRENT PROJECT STRUCTURE

StickerMulePractice/
├── main.go                    (Handler struct, CRUD resolvers, Prometheus middleware)
├── main_test.go               (13 test cases, all passing)
├── go.mod
├── go.sum
├── docker-compose.yml         (Prometheus + Grafana stack)
├── traffic-generator.sh       (Load testing script)
├── observability/
│   └── prometheus.yml         (Prometheus scrape config)
├── .env                       (DATABASE_URL - not in Git)
├── .gitignore
├── README.md
└── PROGRESS.md                

Testing

Running Tests

#Run all tests
go test -v

#Run with coverage
go test -cover

#Generate coverage report
go test -coverprofile=coverage.out
go tool cover -html=coverage.out

Test Strategy

  • Unit tests with mocked database dependencies using sqlmock
  • HTTP handler tests using httptest
  • Comprehensive error path coverage (not found, database errors, invalid input)
  • 13 comprehensive test cases
  • 95%+ coverage on business logic handlers
  • Test driven development approach
  • Mocked database dependencies for fast, isolated tests

Features

GraphQL API

  • Queries:
    • store(id: Int!) - Fetch store by ID
  • Mutations:
    • createStore(name: String!, revenue: Float!, active: Boolean) - Create new store
    • updateStore(id: Int!, name: String, revenue: Float, total_orders: Int, active: Boolean) - Update existing store
    • deleteStore(id: Int!) - Delete store

Example Queries

Create a store:

mutation {
  createStore(name: "My Store", revenue: 50000, active: true) {
    id
    name
    revenue
  }
}

Update a store:

mutation {
  updateStore(id: 1, revenue: 75000) {
    id
    name
    revenue
  }
}

Delete a store:

mutation {
  deleteStore(id: 1) {
    success
    id
  }
}

Prometheus Metrics

  • Implemented HTTP instrumentation with Counter and Histogram metrics
  • Automated middleware
  • Metrics exposed at /metrics endpoint for Prometheus scraping
  • RED Method Coverage:
    • Rate: http_requests_total (by method, path, status)
    • Errors: Status code tracking (200, 404, 500, etc.)
    • Duration: http_request_duration_seconds histogram with percentiles
  • Created responseWriter wrapper to capture response status codes
  • Performance insights: /health ~0.07ms, /store ~143ms (with database latency visible)

Key Learning: Middleware pattern enables instrumentation without touching business logic. Histograms reveal distribution patterns that averages hide.

Production Pattern: Single middleware automatically instruments all endpoints - scalable and maintainable.

GrafanaMetricsDashboard.png

Observability Stack

  • Set up Prometheus and Grafana in Docker using docker-compose
  • Configured Prometheus to scrape /metrics endpoint every 15s
  • Built comprehensive Grafana dashboard with RED method:
    • Rate: rate(http_requests_total[1m]) - requests per second by endpoint
    • Errors: 4xx/5xx percentage tracking with regex filtering
    • Duration: p50/p95/p99 latency percentiles using histogram_quantile()
  • Created traffic generator script for realistic load testing
  • Real time visualization of all metrics with historical data retention
  • Architecture: Pull based monitoring (Prometheus scrapes app, zero app dependencies)

Key Learning: Histograms + histogram_quantile() enable percentile calculations. Pull model keeps app simple, monitoring infrastructure has zero impact on app reliability.

Production Insight: Latency percentiles reveal distribution, /health at ~0.1ms vs /store at ~150ms (database overhead visible). p99 tracking catches worst case user experiences that averages hide.

GoogleCloudRun.png

GoogleComputeEngine.png

Observability Stack Extended + CI/CD

  • Deployed app to GCP Cloud Run (serverless, auto-scaling)
  • Set up CI/CD with GitHub Actions (auto-deploy on push to main)
  • Deployed Prometheus + Grafana on GCP Compute Engine (e2-micro VM)
  • Configured Prometheus to scrape Cloud Run metrics via HTTPS
  • Rebuilt Grafana dashboard with RED method, made json congif file:
    • Rate: rate(http_requests_total[1m]) - requests per second by endpoint
    • Errors: 4xx/5xx percentage tracking with regex filtering
    • Duration: p50/p95/p99 latency percentiles using histogram_quantile()
  • Fixed traffic generator for live load testing

Key Learning: Cloud Run (serverless) vs Compute Engine (VMs) - understanding when to use each. CI/CD eliminates machine-specific deployment issues. Pull based monitoring keeps app independent of observability infrastructure.

Production Insight: Latency distribution visible, /health ~0.1ms vs /store ~150ms (database overhead). p99 tracking reveals worst-case user experience. Error rate fluctuates 0-30% with traffic patterns.

JaegerTraceHistory.png

Distributed Tracing

  • Integrated OpenTelemetry SDK for trace instrumentation
  • HTTP handlers with otelhttp middleware
  • Added custom spans for database queries with metadata
  • Deployed Jaeger on observability VM
  • Configured OTLP HTTP exporter to send traces from Cloud Run to Jaeger
  • Opened firewall port 4318 for trace ingestion

Key Learning: Distributed tracing reveals WHERE time is spent, not just HOW MUCH. Nested spans show parent-child relationships. Tags provide context. Foundation for performance optimization.

Production Insight: Traces show individual request journeys through the system. Database network latency is the primary bottleneck, candidate for caching layer.

Redis Caching

  • Deployed Redis 7 on observability VM (256MB, LRU eviction)
  • Using cache aside pattern for read operations
  • Cache invalidation on mutations (create/update/delete)
  • Added cache hit/miss Prometheus metrics
  • X-Cache headers show HIT/MISS status for debugging
  • Distributed tracing includes cache.get and cache.set spans
  • Performance Results:
    • Cache hit rate: 55% initial
    • Network latency to Neon reduced significantly

Key Learning: Cache-aside pattern with TTL and invalidation prevents stale data. Graceful degradation ensures app works even if Redis fails. Metrics show cache effectiveness.

Production Insight: Hit rate climbs as cache warms up. Cache metrics enable optimization.

Architecture: Redis co-located with observability stack. Cloud Run connects via public IP (will move to internal networking with Kubernetes later).

CI/CD Pipeline Refinement

Backend CI/CD Pipeline

  • Two-stage pipeline: test job must pass before deploy job runs
  • Code quality gates: go vet and staticcheck catch bugs before deployment
  • Test coverage threshold: Pipeline fails if coverage drops below 20%
  • Race condition detection: go test -race flag identifies concurrency bugs
  • Dependency caching: Go modules cached between runs for faster builds
  • Version tagging: Automatic v1.0.TIMESTAMP tags on each successful deploy
  • PR support: Tests run on pull requests without deploying

Frontend CI/CD Pipeline

  • TypeScript type checking: tsc --noEmit catches type errors before deploy
  • ESLint integration: Code quality enforcement on every push
  • Build verification: npm run build must succeed before deployment
  • npm caching: Dependencies cached for faster CI runs
  • Version tagging: Matches backend tagging pattern

Key Files Modified

  • backend/.github/workflows/deploy.yml - Full CI/CD with test gates
  • frontend/.github/workflows/deploy.yml - TypeScript/ESLint checks

Key Learning: needs: test creates job dependencies - deploy only runs after tests pass. permissions: contents: write required for git tagging. Go version in CI must match go.mod exactly.

Bug Fixes:

  • Go 1.25 required for module compatibility (dependencies needed newer Go)
  • staticcheck enforces Go style conventions (lowercase error messages)
  • GitHub Actions needs explicit write permission for pushing tags