feat: implement weighted RPC load balancing with traffic distribution by DaMandal0rian · Pull Request #6126 · graphprotocol/graph-node

@DaMandal0rian DaMandal0rian changed the title feat: implement weighted RPC load balancing with comprehensive improv… feat: implement weighted RPC load balancing

Aug 23, 2025

@DaMandal0rian DaMandal0rian changed the title feat: implement weighted RPC load balancing feat: implement weighted RPC load balancing with traffic distribution

Aug 23, 2025

DaMandal0rian added a commit that referenced this pull request

Aug 24, 2025
…lience

This commit introduces dynamic weight adjustment for RPC providers, improving failover and resilience by adapting to real-time provider health.

Key changes include:
- Introduced a `Health` module (`chain/ethereum/src/health.rs`) to monitor RPC provider latency, error rates, and consecutive failures.
- Integrated health metrics into the RPC provider selection logic in `chain/ethereum/src/network.rs`.
- Dynamically adjusts provider weights based on their health scores, ensuring traffic is steered away from underperforming endpoints.
- Updated `node/src/network_setup.rs` to initialize and manage health checkers for Ethereum RPC adapters.
- Added `tokio` dependency to `chain/ethereum/Cargo.toml` and `node/Cargo.toml` for asynchronous health checks.
- Refactored test cases in `chain/ethereum/src/network.rs` to accommodate dynamic weighting.

This enhancement builds upon the existing static weighted RPC steering, allowing for more adaptive and robust RPC management.

Fixes #6126

DaMandal0rian added a commit that referenced this pull request

Aug 24, 2025
…lience

This commit introduces dynamic weight adjustment for RPC providers, improving failover and resilience by adapting to real-time provider health.

Key changes include:
- Introduced a `Health` module (`chain/ethereum/src/health.rs`) to monitor RPC provider latency, error rates, and consecutive failures.
- Integrated health metrics into the RPC provider selection logic in `chain/ethereum/src/network.rs`.
- Dynamically adjusts provider weights based on their health scores, ensuring traffic is steered away from underperforming endpoints.
- Updated `node/src/network_setup.rs` to initialize and manage health checkers for Ethereum RPC adapters.
- Added `tokio` dependency to `chain/ethereum/Cargo.toml` and `node/Cargo.toml` for asynchronous health checks.
- Refactored test cases in `chain/ethereum/src/network.rs` to accommodate dynamic weighting.

This enhancement builds upon the existing static weighted RPC steering, allowing for more adaptive and robust RPC management.

Fixes #6126

DaMandal0rian added a commit that referenced this pull request

Jan 22, 2026
…lience (#6128)

* feat: Implement dynamic weighted RPC load balancing for enhanced resilience

This commit introduces dynamic weight adjustment for RPC providers, improving failover and resilience by adapting to real-time provider health.

Key changes include:
- Introduced a `Health` module (`chain/ethereum/src/health.rs`) to monitor RPC provider latency, error rates, and consecutive failures.
- Integrated health metrics into the RPC provider selection logic in `chain/ethereum/src/network.rs`.
- Dynamically adjusts provider weights based on their health scores, ensuring traffic is steered away from underperforming endpoints.
- Updated `node/src/network_setup.rs` to initialize and manage health checkers for Ethereum RPC adapters.
- Added `tokio` dependency to `chain/ethereum/Cargo.toml` and `node/Cargo.toml` for asynchronous health checks.
- Refactored test cases in `chain/ethereum/src/network.rs` to accommodate dynamic weighting.

This enhancement builds upon the existing static weighted RPC steering, allowing for more adaptive and robust RPC management.

Fixes #6126

* bump: tokio

Copilot AI review requested due to automatic review settings

January 22, 2026 22:31

DaMandal0rian added a commit that referenced this pull request

Jan 31, 2026
…lience (#6128)

* feat: Implement dynamic weighted RPC load balancing for enhanced resilience

This commit introduces dynamic weight adjustment for RPC providers, improving failover and resilience by adapting to real-time provider health.

Key changes include:
- Introduced a `Health` module (`chain/ethereum/src/health.rs`) to monitor RPC provider latency, error rates, and consecutive failures.
- Integrated health metrics into the RPC provider selection logic in `chain/ethereum/src/network.rs`.
- Dynamically adjusts provider weights based on their health scores, ensuring traffic is steered away from underperforming endpoints.
- Updated `node/src/network_setup.rs` to initialize and manage health checkers for Ethereum RPC adapters.
- Added `tokio` dependency to `chain/ethereum/Cargo.toml` and `node/Cargo.toml` for asynchronous health checks.
- Refactored test cases in `chain/ethereum/src/network.rs` to accommodate dynamic weighting.

This enhancement builds upon the existing static weighted RPC steering, allowing for more adaptive and robust RPC management.

Fixes #6126

* bump: tokio
…ements (#6090)

This commit introduces a complete weighted load balancing system for RPC endpoints
with traffic distribution based on configurable provider weights (0.0-1.0).

- Implements probabilistic selection using WeightedIndex from rand crate
- Supports decimal weights (0.0-1.0) for precise traffic distribution
- Weights are relative and don't need to sum to 1.0 (normalized internally)
- Graceful fallback to random selection if weights are invalid

- Improved error retesting logic that preserves weight distribution
- Error retesting now occurs AFTER weight-based selection to minimize skew
- Maintains existing failover capabilities while respecting configured weights
- Robust handling of edge cases (all zero weights, invalid configurations)

- Added `weighted_rpc_steering` flag to enable/disable weighted selection
- Provider weight validation ensures values are between 0.0 and 1.0
- Validation prevents all-zero weight configurations
- Comprehensive configuration documentation with usage examples

- Refactored adapter selection into modular, well-documented functions:
  - `select_best_adapter()`: Chooses between weighted/random strategies
  - `select_weighted_adapter()`: Implements WeightedIndex-based selection
  - `select_random_adapter()`: Enhanced random selection with error consideration
- Added comprehensive inline documentation explaining algorithms
- Maintains thread safety with proper Arc usage and thread-safe RNG
- Added test coverage for weighted selection with statistical validation

- Extended Provider struct with f64 weight field (default: 1.0)
- Added weight validation in Provider::validate() method
- Added Chain-level validation to prevent all-zero weight configurations
- Integrated with existing configuration validation pipeline

- Added --weighted-rpc-steering command line flag (node/src/opt.rs)
- Integrated weighted flag through network setup pipeline (node/src/network_setup.rs)
- Updated chain configuration to pass weight values to adapters (node/src/chain.rs)

- Added comprehensive configuration documentation in full_config.toml
- Includes weight range explanation, distribution examples, and usage guidelines
- Clear examples showing relative weight calculations and traffic distribution

- Updated rand dependency to use appropriate version with WeightedIndex support
- Proper import paths for rand 0.9 distribution modules
- Fixed compilation issues with correct trait imports (Distribution)

- Comprehensive inline documentation for all weight-related methods
- Clear separation of concerns with single-responsibility functions
- Maintained backward compatibility with existing random selection
- Added statistical test validation for weight distribution accuracy

- Comprehensive test suite validates weight distribution over 1000 iterations
- Statistical validation with 10% tolerance for weight accuracy
- All existing tests continue to pass, ensuring no regression
- Build verification across all affected packages

```toml
weighted_rpc_steering = true

[chains.mainnet]
provider = [
  { label = "primary", url = "http://rpc1.io", weight = 0.7 },   # 70% traffic
  { label = "backup", url = "http://rpc2.io", weight = 0.3 },    # 30% traffic
]
```

This implementation provides production-ready weighted load balancing with
robust error handling, comprehensive validation, and excellent maintainability.

🤖 Generated with Claude Code
- Remove unused one_f64() function that was causing CI warnings
- Remove unused serde default attribute from Provider.weight field
- Add missing weighted_rpc_steering field to test fixtures
- Apply cargo fmt formatting fixes

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude
…lience (#6128)

* feat: Implement dynamic weighted RPC load balancing for enhanced resilience

This commit introduces dynamic weight adjustment for RPC providers, improving failover and resilience by adapting to real-time provider health.

Key changes include:
- Introduced a `Health` module (`chain/ethereum/src/health.rs`) to monitor RPC provider latency, error rates, and consecutive failures.
- Integrated health metrics into the RPC provider selection logic in `chain/ethereum/src/network.rs`.
- Dynamically adjusts provider weights based on their health scores, ensuring traffic is steered away from underperforming endpoints.
- Updated `node/src/network_setup.rs` to initialize and manage health checkers for Ethereum RPC adapters.
- Added `tokio` dependency to `chain/ethereum/Cargo.toml` and `node/Cargo.toml` for asynchronous health checks.
- Refactored test cases in `chain/ethereum/src/network.rs` to accommodate dynamic weighting.

This enhancement builds upon the existing static weighted RPC steering, allowing for more adaptive and robust RPC management.

Fixes #6126

* bump: tokio
- Add `health_check()` method to EthereumAdapter using `eth_blockNumber`
  with a fixed 5s timeout independent of json_rpc_timeout
- Replace RwLock with atomics (AtomicU64/AtomicU32) in Health struct,
  following the EndpointMetrics pattern to avoid lock poisoning
- Add CancellationToken support to health_check_task for graceful shutdown
- Add tokio-util dependency for CancellationToken
- Make `adapter` field private on EthereumNetworkAdapter, add getter
- Replace Vec-based health checker lookup with HashMap<String, Arc<Health>>
  for O(1) lookups instead of O(n*m)
- Remove redundant empty check in select_weighted_adapter; WeightedIndex
  already returns Err for empty input, falling through to random selection
- Replace struct literal construction in tests with ::new() calls
- Add explicit assertions that health scores start at 1.0
Previously all health checkers were stored in a single Vec and passed
to every chain's EthereumNetworkAdapters. Now they are grouped by
ChainName so each chain only receives its own health checkers.
- Document that weight 0.0 is intentional (disables from weighted
  selection while keeping the provider for error-retesting)
- Fix contradictory example in full_config.toml that showed weights >1.0
  despite validation rejecting them
- Remove weight from firehose provider config since it is only used
  for RPC providers