OpenStack Canary Monitoring System
⚠️ PROOF OF CONCEPT - NOT PRODUCTION READY
This is a proof-of-concept system generated by Claude AI and is UNTESTED. Do not deploy this in production environments without thorough testing, security review, and validation. Use at your own risk.
A comprehensive early warning system for OpenStack cloud infrastructure that monitors dataplane health across multiple datacenters and availability zones.
Overview
The OpenStack Canary system provides proactive monitoring of your OpenStack infrastructure by:
- Multi-AZ Deployment: Distributes canary instances across availability zones
- Synthetic Traffic Generation: Creates realistic workload patterns between instances
- Comprehensive Metrics: Tracks latency, throughput, system health, and application performance
- Early Warning Detection: Identifies dataplane issues before they impact production workloads
- Datadog Integration: Real-time monitoring, alerting, and dashboards
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ OpenStack Cloud Infrastructure │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ Datacenter 1 │ Datacenter 2 │ Datacenter N │
├─────────────────┼─────────────────┼─────────────────────────────┤
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┬─────────────┐│
│ │ AZ-A │ │ │ AZ-A │ │ │ AZ-A │ AZ-B ││
│ │ ┌─────────┐ │ │ │ ┌─────────┐ │ │ │ ┌─────────┐ │ ┌─────────┐ ││
│ │ │ Canary │ │ │ │ │ Canary │ │ │ │ │ Canary │ │ │ Canary │ ││
│ │ │Instance │ │ │ │ │Instance │ │ │ │ │Instance │ │ │Instance │ ││
│ │ └─────────┘ │ │ │ └─────────┘ │ │ │ └─────────┘ │ └─────────┘ ││
│ └─────────────┘ │ └─────────────┘ │ └─────────────┴─────────────┘│
└─────────────────┴─────────────────┴─────────────────────────────┘
│
▼
┌─────────────────┐
│ Datadog │
│ Monitoring │
│ & Alerting │
└─────────────────┘
Components
- Canary Application (
app.py): Core web service with health endpoints - Traffic Generator (
traffic_generator.py): Synthetic workload creation - System Monitor (
system_monitor.py): OS-level monitoring and alerts - Datadog Integration (
datadog_config.py): Metrics, dashboards, and alerting - Deployment Automation: Heat templates and Docker deployment scripts
Quick Start
Docker Deployment (Recommended)
-
Clone and configure:
git clone <repository> cd openstack-canary
-
Set up environment:
cp .env.example .env # Edit .env with your configuration -
Deploy:
./docker-deploy.sh start --datacenter dc1 --az zone-a
-
Verify health:
curl http://localhost:8080/health
OpenStack Heat Deployment
-
Configure deployment:
cp deploy-config.env.example deploy-config.env # Edit with your OpenStack configuration -
Deploy to OpenStack:
./deploy.sh deploy --datacenter dc1 --azs "nova,zone-a,zone-b" -
Set up monitoring:
./deploy.sh setup-datadog --dd-api-key YOUR_KEY --dd-app-key YOUR_APP_KEY
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
CANARY_ID |
Unique identifier for canary instance | canary-{hostname} |
DATACENTER |
Datacenter name for tagging | unknown |
AVAILABILITY_ZONE |
AZ name for tagging | unknown |
DD_API_KEY |
Datadog API key | - |
DD_APP_KEY |
Datadog Application key | - |
PEER_ENDPOINTS |
Comma-separated peer endpoints | - |
TRAFFIC_INTERVAL |
Traffic generation interval (seconds) | 10 |
MONITORING_INTERVAL |
System monitoring interval (seconds) | 30 |
Alert Thresholds
| Metric | Warning | Critical |
|---|---|---|
| CPU Usage | 70% | 90% |
| Memory Usage | 80% | 95% |
| Disk Usage | 85% | 95% |
| Load Average | 5.0 | 10.0 |
| Error Rate | 5% | 10% |
| Peer Connectivity | 70% | 50% |
API Endpoints
Health Checks
GET /health- Basic health statusGET /health/detailed- Detailed system metrics
Connectivity Tests
GET /connectivity- Test connectivity to peer instancesGET /load-test- Generate synthetic load
Metrics
GET /metrics- Prometheus-style metrics
Example Response
{
"status": "healthy",
"canary_id": "canary-dc1-zone-a-001",
"datacenter": "dc1",
"availability_zone": "zone-a",
"timestamp": "2024-01-15T10:30:00Z",
"uptime_seconds": 3600,
"system_metrics": {
"cpu_percent": 15.2,
"memory": {
"percent": 45.8,
"available": 2147483648
},
"disk": {
"percent": 25.3,
"free": 8589934592
}
}
}Monitoring & Alerting
Datadog Integration
The system automatically creates:
- Dashboard: Comprehensive overview of all canary instances
- Alerts: Proactive notifications for infrastructure issues
- SLOs: Service level objectives for availability tracking
Key Metrics
canary.health_check- Health check frequencycanary.peer_latency- Inter-instance latencycanary.traffic_gen.success_rate- Traffic generation success ratecanary.system.cpu_percent- CPU utilizationcanary.system.memory_percent- Memory utilization
Alert Examples
- High Error Rate: Error rate > 10% for 5 minutes
- Instance Down: No health checks for 10 minutes
- High Latency: Inter-DC latency > 1000ms for 10 minutes
- Resource Exhaustion: CPU > 90% or Memory > 95% for 15 minutes
Deployment Options
1. Docker Deployment
Pros: Easy setup, consistent environment, quick development Cons: Limited OS-level monitoring, single-host deployment
# Start all services ./docker-deploy.sh start # View logs ./docker-deploy.sh logs canary-app # Scale services ./docker-deploy.sh scale canary-app=3 # Health check ./docker-deploy.sh health
2. OpenStack Heat Deployment
Pros: Multi-AZ deployment, native OpenStack integration, scalable Cons: Requires OpenStack environment, more complex setup
# Deploy across multiple AZs ./deploy.sh deploy -d dc1 -a "nova,zone-a,zone-b" -c 2 # Update deployment ./deploy.sh update -n canary-prod # Check status ./deploy.sh status -n canary-prod # View logs ./deploy.sh logs -n canary-prod
3. Manual Deployment
For custom environments:
# Install dependencies pip install -r requirements.txt # Start canary application gunicorn --bind 0.0.0.0:8080 app:app # Start traffic generator (separate terminal) python traffic_generator.py # Start system monitor (separate terminal) python system_monitor.py
Development
Running Tests
# Unit tests python -m pytest tests/ # Integration tests python -m pytest tests/integration/ # Load tests python -m pytest tests/load/
Building Custom Images
# Build Docker image docker build -t canary:latest . # Build with custom tag docker build -t canary:v1.2.3 .
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
Troubleshooting
Common Issues
Service Not Starting
# Check logs ./docker-deploy.sh logs canary-app # Check container status docker ps -a | grep canary # Restart services ./docker-deploy.sh restart
High Memory Usage
# Check system resources curl localhost:8080/health/detailed # View container stats docker stats canary-app
Connectivity Issues
# Test connectivity curl localhost:8080/connectivity # Check network configuration docker network ls docker network inspect openstack-canary_canary-network
Datadog Metrics Missing
# Verify API keys echo $DD_API_KEY | cut -c1-8 # Check agent status docker exec datadog-agent agent status # Restart agent docker restart datadog-agent
Log Locations
- Docker:
docker logs <container_name> - OpenStack:
/var/log/canary/ - System:
/var/log/syslog
Performance Tuning
High Load Scenarios
# Increase worker processes export GUNICORN_WORKERS=4 # Adjust traffic intervals export TRAFFIC_INTERVAL=30 # Scale horizontally ./deploy.sh scale 5
Resource Optimization
# Monitor resource usage curl localhost:8080/metrics | grep canary_ # Adjust monitoring intervals export MONITORING_INTERVAL=60
Security Considerations
- Network Security: Use security groups to restrict access
- API Keys: Store Datadog keys securely, use environment variables
- Container Security: Run containers as non-root user
- TLS: Enable HTTPS for production deployments
- Monitoring: Monitor for unusual traffic patterns
Maintenance
Regular Tasks
- Weekly: Review error rates and performance metrics
- Monthly: Update Docker images and dependencies
- Quarterly: Review and update alert thresholds
Backup & Recovery
# Backup configuration and data ./docker-deploy.sh backup # Restore from backup ./docker-deploy.sh restore /path/to/backup
Updates
# Update Docker deployment ./docker-deploy.sh update # Update OpenStack deployment ./deploy.sh update -n canary-prod
Support
For issues and questions:
- Check the Troubleshooting section
- Review logs for error messages
- Check Datadog dashboard for system health
- Contact the infrastructure team
Disclaimer
⚠️ This is a proof-of-concept generated by Claude AI and is UNTESTED. See DISCLAIMER.md for full details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
v1.0.0 (2024-01-15)
- Initial release
- Multi-AZ deployment support
- Datadog integration
- Docker containerization
- OpenStack Heat templates
- Comprehensive monitoring and alerting