This repository contains Ansible playbooks and roles for deploying Saga Pegasus Cluster. The cluster can operate as full node (optionally archive) or validator.
Prerequisites
- Dedicated Kubernetes cluster. Sharing the cluster with other workloads might be problematic.
- Kubeconfig file with access to the cluster available locally for the deployer. If using RBAC, make sure the Role is allowed to create workloads and Roles.
- Ansible 2.9+
- Python 3.6+
- AWS credentials (for S3 access). A IAM user with S3 permissions need to be created. Once created, share the ARN with us, since it will have to be whitelisted on the genesis bucket.
Kubernetes Addons
- CSI driver
- A
StorageClassnamedsaga-defaulthandled by the CSI driver and, ideally, persistent. Running ephemeral storages is possible but not advised. - If
expose_p2p = true, a LoadBalancer implementation like MetalLB installed in your cluster. Cloud native solutions are also compatible. This is highly recommended, although it is possible not to expose the p2p port publicly and rely on ClusterIP services. The implementation should attach an external ip or hostname to the newly createdLoadBalancerservices. We will typically require 2-3 IPs. - To create loadbalancer services in a cloud native environment you can use the following parameters to configure it:
controller_lb_annotations: supports a list of any annotations required by your cloud providerchainlet_external_traffic_policy: Defaults toCluster, you can set it toLocalto support a multi-port setup in a single loadbalancer, you will need to consult your cloud provider docuemtnation for compatiblity (for example, for Oracle Cloud, classic LB only supports multi-port configuration with local external traffic).chainlet_allocate_loadbalancer_node_ports: Similar to the above, defaults tofalseand can be set totureas needed
Deploy Saga Pegasus
Inventory
Create your inventory file copying one from the samples directory, based on the network (e.g.: mainnet) and the mode (e.g.: fullnode). Customize your ansible variables, specifically:
network: staging|testnet|mainnetmode:fullnode|service-provider|validatormoniker: <your_moniker>kubeconfig_file: local path for the kubeconfig (e.g.:~/.kube/your_cluster)expose_p2p: bool (optional). If true (default, recommended), you will have to be able to allocate external IPs (or hostname) to LoadBalancer services.
Plus, those are the required secrets:
aws_access_key_id: aws credentials to access the S3 genesis bucket. Ask to be whitelistedaws_secret_access_key: aws credentials to access the S3 genesis bucket.metrics_grafana_password: password to access the grafana web interface (only if deploying metrics).keychain_password: password to the local keychain used by the validator (validators only)validator_mnemonic: only used ifmode = validator.
RECOMMENDED: use ansible-vault to encrypt secrets, keep them in a separate inventory file offline.
Deploy
Just run
cd ansible
ansible-playbook -e @<inventory_file> -e <secrets_file> --vault-password-file <vault_password_file> playbooks/deploy.yml
This will install all the roles in the right order: metrics (if enabled), ingress-nginx (if expose_p2p), ssc and controller. The latter is responsible to spinup all the chainlets once SSC is in sync.
SSC (Saga Security Chain)
The SSC role is optional and controlled by the ssc.enabled variable:
- Enabled by default: Only for devnet environment
- Disabled by default: For mainnet and testnet environments
- Features:
- Downloads genesis file from S3 during initialization
- Generates validator keys automatically
- Provides RPC (26657), P2P (26656), gRPC (9090), and metrics (26660) endpoints
- Uses persistent storage for blockchain data
The playbooks are idempotent, so they can be run as much as possible with no negative consequences. It is possible to just redeploy a single component using --tags <role>. E.g.: --tags controller.
Migration from AWS
If you are already running a validator on AWS EKS, follow this part to migrate. It is really important to follow the exact sequence of operation to avoid double signing:
- Deploy the new cluster in fullnode mode. Verify all the chainlets are started and in sync. Check in grafana that the block production doesn't have hiccups.
- Scale down the old cluster.
- Redeploy the new cluster in validator mode. It's just a redeploy of the controller and have it redeploy all the chains.
Deploy the new cluster in fullnode mode
- Make sure
mode: fullnodein your inventory file. - Follow the Deploy Saga Pegasus instructions
- Make sure the chains are in sync:
scripts/cluster.sh chainlets status [--kubeconfig <kubeconfig_file>]. It will print a success or failure message at the end.
Scale down the old cluster
After making sure the new cluster is in sync
- Switch context to old EKS cluster (or pass
--kubeconfig <kubeconfig_file>to the kubectl commands) - Scale down ssc:
kubectl scale deployment ssc -n sagasrv-ssc --replicas=0 - Scale down controller:
scripts/cluster.sh controller down - Scale down chainlets:
kubectl get pods -A | grep chainlet | awk '{print $1}' | xargs -I{} kubectl -n {} scale deployment/chainlet --replicas=0 - Verify all the chainlets are terminated:
kubectl get pods -A | grep chainletshould be empty
If something goes wrong, scale SSC and Controller back up
- SSC:
kubectl scale deployment ssc -n sagasrv-ssc --replicas=1 - Controller:
scripts/cluster.sh controller upThe controller will scale chainlets back up and the validator will be restored.
Redeploy the new cluster in validator mode
After making sure the old cluster is scaled down completely
⚠️⚠️⚠️ WARNING: Make sure the old cluster is completely scaled down before proceeding. Running two validators simultaneously will result in double signing and slashing. ⚠️⚠️⚠️
- In the inventory set
mode: validator - Make sure you have the
validator_mnemoniccorrectly set - Wipe out SSC:
kubectl delete deployment ssc -n sagasrv-ssc && kubectl delete pvc ssc-pvc -n sagasrv-ssc - Redeploy (Deploy)
- Restart the controller:
scripts/cluster.sh controller restart - Redeploy every chainlet deleting the deployment and having the controller redeploy it as validator:
scripts/cluster.sh chainlets redeployand then execute the commands in the output.
Now you should be able to see all the chainlets restarting: kubectl get pods -A | grep chainlet. Check the status with scripts/cluster.sh chainlets status making sure all of them are restarting and getting in sync.
Run a devnet cluster
Devnet cluster is meant for development. It can run a single validator cluster. It is deployed like every other cluster (just network: devnet). By default it comes without metrics and does not expose p2p nor other services. For this reasons, transactions will require port-forwarding. E.g.:
- SSC:
kubectl port-forward -n sagasrv-ssc service/ssc 26657:26657. Thensscd --node http://localhost:26657 <your_command> - Chainlets:
kubectl port-forward -n saga-<chain_id> service/chainlet 26657:26657. Thensagaosd --node http://localhost:26657 <your_command>
Alternatively, ssc and chainlets can be exposed setting expose_p2p: true. Also metrics can be deployed setting metrics_enabled: true.
Launch your first chainlet
- Port forward ssc:
kubectl port-forward -n sagasrv-ssc service/ssc 26657:26657 - Create chainlet stack:
sscd --node http://localhost:26657 tx chainlet create-chainlet-stack SagaOS "SagaOS Chainlet Stack" sagaxyz/sagaos:0.13.1 0.13.1 sha256:ced72e81e44926157e56d1c9dd3c0de5a5af50c2a87380f4be80d9d5196d86d3 100upsaga day 100upsaga --fees 2000upsaga --from saga1nmu5laudnkpcn6jlejrv8dprumqtj00ujl0zk2 -y --chain-id <your_chain_id>. Just use the "foundation" addres set up and the desired sagaosd version and make sure you set the right chain_id for ssc based on your inventory. - Launch chainlet
sscd --node http://localhost:26657 tx chainlet launch-chainlet saga1nmu5laudnkpcn6jlejrv8dprumqtj00ujl0zk2 SagaOS 0.13.1 myfirstchainlet '{"denom":"gas","gasLimit":10000000,"genAcctBalances":"saga1nmu5laudnkpcn6jlejrv8dprumqtj00ujl0zk2=1000000000","feeAccount":"saga1nmu5laudnkpcn6jlejrv8dprumqtj00ujl0zk2"}' --fees 500000upsaga --gas 800000 --from saga1nmu5laudnkpcn6jlejrv8dprumqtj00ujl0zk2 --yes --chain-id <your_chain_id> - (optional) Stop port ssc forward process.
- Port forward your chainlet RPC:
kubectl port-forward -n saga-<chain_id> service/chainlet 26657:26657 - Execute any cosmos transaction using
sagaosd --node http://localhost:26657 <your_command>.
NOTE: evm transaction will require port forward of port 8545 instead of 26657.
Utils
cluster.sh
Collection of util commands to interact with the cluster. The script is organized into main commands with subcommands for better organization:
Controller Commands
controller downScale down the controller deploymentcontroller upScale up the controller deploymentcontroller restartRestart controller pod
Individual Chainlet Commands
chainlet restart <identifier>Restart chainlet pods by namespace or chain_idchainlet redeploy <identifier>Redeploy chainlet deployment by namespace or chain_idchainlet wipe <identifier>Wipe chainlet data (delete PVC) and redeploychainlet logs <identifier>Follow chainlet logs by namespace or chain_idchainlet status <identifier>Show sync status for a specific chainletchainlet height <identifier>Show current block height for a specific chainletchainlet expand-pvc <identifier> [%]Expand chainlet PVC by percentage (default: 20%)
Bulk Chainlets Commands
chainlets statusShow status of all chainletschainlets redeployRedeploy all chainlet deployments in saga-* namespaces
Validator Commands
validator unjail <identifier>Unjail validator by namespace or chain_idvalidator status [<identifier>]Check validator status on chain(s) - fetches moniker from SSC and checks if validator is in the active set (includes SSC when no identifier specified)
Other Commands
install-completionInstall bash completion for cluster.sh
Usage Examples:
# Controller operations scripts/cluster.sh controller down scripts/cluster.sh controller restart # Individual chainlet operations scripts/cluster.sh chainlet restart saga-my-chain scripts/cluster.sh chainlet redeploy saga-my-chain scripts/cluster.sh chainlet wipe saga-my-chain scripts/cluster.sh chainlet logs my_chain_id scripts/cluster.sh chainlet status saga-my-chain scripts/cluster.sh chainlet height saga-my-chain # Bulk operations on all chainlets scripts/cluster.sh chainlets status scripts/cluster.sh chainlets redeploy # Validator operations scripts/cluster.sh validator unjail saga-my-chain scripts/cluster.sh validator unjail my_chain_id scripts/cluster.sh validator status saga-my-chain # Check status on specific chain scripts/cluster.sh validator status my_chain_id # Check status using chain_id scripts/cluster.sh validator status # Check status on SSC and all chains
Optionally, pass --kubeconfig <your_kubeconfig> to use a different context than the current. Use scripts/cluster.sh --help or scripts/cluster.sh COMMAND --help for detailed usage information.
Make it faster
- Add alias
c=<your_path>/scripts/cluster.shto the file loaded on start of the terminal (e.g.~/.bashrc,~/.zshrc) - Run
c install-completion - Enjoy autocomplete of commands, options, namespaces and chainids.
AlertManager Configuration
AlertManager can be configured with custom notification channels based on alert severity. This is optional and disabled by default.
Enable AlertManager Configuration
To enable AlertManager configuration, set in your inventory:
metrics_alertmanager_config_enabled: true
Notification Channels by Severity
Configure different notification channels for each severity level:
metrics_alertmanager_channels: critical: - name: critical-slack type: slack api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' channel: '#alerts-critical' title: 'Critical Alert - {{ "{{ .GroupLabels.alertname }}" }}' text: '{{ "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}" }}' - name: critical-email type: email to: ['admin@yourcompany.com'] subject: 'CRITICAL: {{ "{{ .GroupLabels.alertname }}" }}' warning: - name: warning-slack type: slack api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' channel: '#alerts-warning' title: 'Warning Alert - {{ "{{ .GroupLabels.alertname }}" }}' text: '{{ "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}" }}' info: - name: info-slack type: slack api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK' channel: '#alerts-info' title: 'Info Alert - {{ "{{ .GroupLabels.alertname }}" }}' text: '{{ "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}" }}'
Supported Channel Types
- Slack: Requires
api_url,channel,title,text - Email: Requires
to(list),subject - Webhook: Requires
url - PagerDuty: Requires
routing_key, optionaldescription
Global SMTP Configuration
For email notifications, configure SMTP settings:
metrics_alertmanager_global: smtp_smarthost: 'smtp.yourcompany.com:587' smtp_from: 'alertmanager@yourcompany.com' smtp_auth_username: 'your-smtp-user' smtp_auth_password: 'your-smtp-password' smtp_require_tls: true
Custom Routes and Inhibition Rules
Add custom routing rules for specific alerts:
metrics_alertmanager_custom_routes: - match: alertname: ChainletDown receiver: critical-alerts group_wait: 10s repeat_interval: 1h metrics_alertmanager_inhibit_rules: - source_match: severity: critical target_match: severity: warning equal: ['alertname', 'cluster', 'service']
Example Configuration
Here's a complete example for your inventory file:
# Disable noisy Kubernetes control plane ServiceMonitors metrics_kube_proxy_enabled: false metrics_kube_scheduler_enabled: false metrics_kube_etcd_enabled: false metrics_kube_controller_manager_enabled: false # Enable AlertManager configuration metrics_alertmanager_config_enabled: true # SMTP settings for email notifications metrics_alertmanager_global: smtp_smarthost: 'smtp.gmail.com:587' smtp_from: 'alerts@yourcompany.com' smtp_auth_username: 'alerts@yourcompany.com' smtp_auth_password: 'your-app-password' smtp_require_tls: true # Notification channels metrics_alertmanager_channels: critical: - name: critical-slack type: slack api_url: 'https://hooks.slack.com/services/<some_secrets>' channel: '#alerts-critical' title: 'CRITICAL: {{ "{{ .GroupLabels.alertname }}" }}' text: '{{ "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}" }}' - name: critical-pagerduty type: pagerduty routing_key: 'your-pagerduty-integration-key' description: 'Critical Saga Alert' warning: - name: warning-slack type: slack api_url: 'https://hooks.slack.com/services/<some_secrets>' channel: '#alerts-warning' title: 'Warning: {{ "{{ .GroupLabels.alertname }}" }}' text: '{{ "{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}" }}'