12 KiB
RPC Manager - Round-Robin Load Balancing Guide
Overview
The RPC Manager is a production-grade RPC endpoint management system that provides:
- Round-Robin Load Balancing: Distributes RPC calls evenly across multiple endpoints
- Health Monitoring: Tracks endpoint health and automatically handles failures
- Multiple Rotation Policies: Supports different strategies for endpoint selection
- Statistics & Metrics: Provides detailed metrics about RPC usage and health
- Automatic Failover: Gracefully handles endpoint failures and recoveries
Architecture
Core Components
┌─────────────────────────────────────────────┐
│ RPC Manager │
│ - Manages endpoint pool │
│ - Rotates through endpoints │
│ - Tracks health metrics │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ RPC Endpoint Health Tracker │
│ - Success/failure counts │
│ - Response times │
│ - Consecutive failure tracking │
│ - Health status │
└─────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────┐
│ Connection Manager Integration │
│ - Transparent endpoint selection │
│ - Automatic client pooling │
│ - Fallback support │
└─────────────────────────────────────────────┘
Rotation Policies
1. Round-Robin (Default)
Simple cyclic rotation through all endpoints.
Endpoint 1 → Endpoint 2 → Endpoint 3 → Endpoint 1 → ...
Best For: Uniform load distribution across identical endpoints
2. Health-Aware
Prioritizes healthy endpoints, falls back to round-robin if all unhealthy.
Healthy endpoints preferred, unhealthy skipped
Best For: Mixed quality endpoints, avoiding bad ones
3. Least-Failures
Always selects endpoint with lowest failure count.
Track total failures per endpoint, select best
Best For: Handling varying endpoint reliability over time
Usage
Basic Setup
import (
"github.com/fraktal/mev-beta/pkg/arbitrum"
"github.com/fraktal/mev-beta/internal/config"
"github.com/fraktal/mev-beta/internal/logger"
)
// Create connection manager with round-robin enabled
cfg := &config.ArbitrumConfig{
RPCEndpoint: "https://primary.rpc.com",
// ... other config
}
connectionManager := arbitrum.NewConnectionManager(cfg, logger)
connectionManager.EnableRoundRobin(true)
Using Round-Robin Clients
// Create a round-robin client wrapper
rrClient := arbitrum.NewRoundRobinClient(
connectionManager.rpcManager,
ctx,
logger,
)
// For read operations - uses round-robin
client, err := rrClient.GetClientForRead()
if err != nil {
return err
}
// Perform read operation
result, err := client.ChainID(ctx)
// Record result
if err != nil {
rrClient.RecordReadFailure()
} else {
rrClient.RecordReadSuccess(responseTime)
}
Advanced: Initialize with Multiple Endpoints
endpoints := []string{
"https://rpc1.arbitrum.io",
"https://rpc2.arbitrum.io",
"https://rpc3.arbitrum.io",
}
// Initialize round-robin with multiple endpoints
err := arbitrum.InitializeRPCRoundRobin(connectionManager, endpoints)
if err != nil {
logger.Error(fmt.Sprintf("Failed to initialize round-robin: %v", err))
}
// Set rotation strategy
arbitrum.ConfigureRPCLoadBalancing(connectionManager, arbitrum.HealthAware)
Monitoring RPC Health
// Perform health check on all endpoints
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
err := connectionManager.PerformRPCHealthCheck(ctx)
if err != nil {
logger.Warn(fmt.Sprintf("Health check failed: %v", err))
}
// Get detailed statistics
stats := connectionManager.GetRPCManagerStats()
fmt.Println(stats)
// Output:
// {
// "total_endpoints": 3,
// "healthy_count": 3,
// "total_requests": 10234,
// "total_success": 10000,
// "total_failure": 234,
// "success_rate": "97.71%",
// "current_policy": "health-aware",
// "endpoint_details": [...]
// }
Integration Examples
With Batch Fetcher
The batch fetcher automatically benefits from round-robin when the connection manager is configured:
// Connection manager already uses round-robin
client, err := connectionManager.GetClient(ctx)
// Create batch fetcher - will use round-robin automatically
batchFetcher, err := datafetcher.NewBatchFetcher(
client,
contractAddr,
logger,
)
// All batch calls are load-balanced across endpoints
results, err := batchFetcher.FetchPoolsBatch(ctx, poolAddresses)
With Monitor
The Arbitrum monitor automatically uses the round-robin enabled connection manager:
// Create monitor with round-robin connection manager
monitor, err := monitor.NewArbitrumMonitor(
arbConfig,
botConfig,
logger,
rateLimiter,
marketMgr,
scanner,
)
// All RPC calls from monitor are load-balanced
Health Metrics
The RPC Manager tracks comprehensive health metrics for each endpoint:
health, _ := connectionManager.rpcManager.GetEndpointHealth(0)
// Access metrics
success, failure, consecutive, isHealthy := health.GetStats()
fmt.Printf("Endpoint: %s\n", health.URL)
fmt.Printf(" Success: %d, Failure: %d, Consecutive Fails: %d\n",
success, failure, consecutive)
fmt.Printf(" Healthy: %v, Response Time: %dms\n",
isHealthy, health.ResponseTime.Milliseconds())
Health Thresholds
- Marked Unhealthy: 3+ consecutive failures
- Recovered: Next successful call resets consecutive failures
- Tracked Metrics: Success count, failure count, response time
Performance Impact
Load Distribution
With 3 endpoints using round-robin:
- Without: All calls hit endpoint 1 → potential rate limiting
- With: Calls distributed evenly → 3x throughput potential
Response Times
Example with mixed endpoints:
Endpoint 1: 50ms avg
Endpoint 2: 200ms avg (poor)
Endpoint 3: 50ms avg
Health-Aware Strategy Results:
- Requests to 1: ~45%
- Requests to 2: ~5% (deprioritized)
- Requests to 3: ~50%
Success Rate: 99.8% (vs 95% without load balancing)
Configuration
Environment Variables
# Enable round-robin explicitly
export RPC_ROUNDROBIN_ENABLED=true
# Additional fallback endpoints (comma-separated)
export ARBITRUM_FALLBACK_ENDPOINTS="https://rpc1.io,https://rpc2.io,https://rpc3.io"
Configuration File
arbitrum:
rpc_endpoint: "https://primary.rpc.io"
rate_limit:
requests_per_second: 5.0
burst: 10
reading_endpoints:
- url: "https://read1.rpc.io"
- url: "https://read2.rpc.io"
execution_endpoints:
- url: "https://execute1.rpc.io"
Best Practices
1. Choose Right Rotation Policy
// For equal-quality endpoints
connectionManager.SetRPCRotationPolicy(arbitrum.RoundRobin)
// For mixed-quality endpoints
connectionManager.SetRPCRotationPolicy(arbitrum.HealthAware)
// For endpoints with varying failures
connectionManager.SetRPCRotationPolicy(arbitrum.LeastFailures)
2. Monitor Regularly
// Periodic health checks
ticker := time.NewTicker(5 * time.Minute)
for range ticker.C {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
if err := connectionManager.PerformRPCHealthCheck(ctx); err != nil {
logger.Error(fmt.Sprintf("Health check failed: %v", err))
}
cancel()
}
3. Handle Errors Gracefully
client, err := rrClient.GetClientForRead()
if err != nil {
logger.Error(fmt.Sprintf("Failed to get RPC client: %v", err))
// Implement fallback logic
return nil, err
}
// Always record results
start := time.Now()
result, err := performRPCCall(client)
elapsed := time.Since(start)
if err != nil {
rrClient.RecordReadFailure()
} else {
rrClient.RecordReadSuccess(elapsed)
}
4. Optimize Batch Sizes
// RPC Manager works best with batch operations
// Reduce individual calls, increase batch size
// ❌ Avoid: Many small individual calls
for _, pool := range pools {
data, _ := client.CallContract(ctx, call)
}
// ✅ Better: Batch operations
batchFetcher.FetchPoolsBatch(ctx, pools)
Troubleshooting
All Endpoints Unhealthy
Error: "no healthy endpoints available"
Solution: Check endpoint status and logs
- Perform manual health check
- Verify network connectivity
- Check RPC provider status
- Review error logs for specific failures
High Failure Rate
stats := connectionManager.GetRPCManagerStats()
if stats["success_rate"].(string) < "95%" {
logger.Warn("High RPC failure rate detected")
// Switch to more lenient policy
connectionManager.SetRPCRotationPolicy(arbitrum.LeastFailures)
}
Uneven Load Distribution
// Check distribution
stats := connectionManager.GetRPCManagerStats()
details := stats["endpoint_details"].([]map[string]interface{})
for _, endpoint := range details {
fmt.Printf("%s: %d requests\n",
endpoint["url"],
endpoint["success_count"])
}
Metrics Reference
Tracked Metrics
- Total Requests: Sum of all successful and failed calls
- Success Rate: Percentage of successful calls
- Response Times: Min, max, average per endpoint
- Consecutive Failures: Track for health status
- Endpoint Status: Healthy/Unhealthy state
Export Format
{
"total_endpoints": 3,
"healthy_count": 3,
"total_requests": 15234,
"total_success": 14892,
"total_failure": 342,
"success_rate": "97.76%",
"current_policy": "health-aware",
"endpoint_details": [
{
"index": 0,
"url": "https://rpc1.io",
"success_count": 5023,
"failure_count": 89,
"consecutive_fails": 0,
"is_healthy": true,
"last_checked": "2025-11-03T12:34:56Z",
"response_time_ms": 45
}
]
}
API Reference
RPCManager
type RPCManager struct
NewRPCManager(logger) *RPCManager
AddEndpoint(client, url) error
GetNextClient(ctx) (*RateLimitedClient, int, error)
RecordSuccess(idx, responseTime)
RecordFailure(idx)
GetEndpointHealth(idx) (*RPCEndpointHealth, error)
GetAllHealthStats() []map[string]interface{}
SetRotationPolicy(policy RotationPolicy)
HealthCheckAll(ctx) error
GetStats() map[string]interface{}
Close() error
ConnectionManager Extensions
func (cm *ConnectionManager) EnableRoundRobin(enabled bool)
func (cm *ConnectionManager) SetRPCRotationPolicy(policy RotationPolicy)
func (cm *ConnectionManager) GetRPCManagerStats() map[string]interface{}
func (cm *ConnectionManager) PerformRPCHealthCheck(ctx) error
Helper Functions
NewRoundRobinClient(manager, ctx, logger) *RoundRobinClient
InitializeRPCRoundRobin(cm, endpoints) error
ConfigureRPCLoadBalancing(cm, strategy) error
GetConnectionManagerWithRoundRobin(cfg, logger, endpoints) (*ConnectionManager, error)
Future Enhancements
Planned improvements to RPC Manager:
- Weighted Round-Robin: Assign weights based on historical performance
- Dynamic Endpoint Discovery: Auto-discover and add new endpoints
- Regional Failover: Prefer endpoints in same region for latency
- Cost Tracking: Monitor and report RPC call costs
- Analytics Dashboard: Real-time visualization of RPC metrics
- Adaptive Timeouts: Adjust timeouts based on endpoint performance
- Request Queueing: Smart queuing during RPC overload
Conclusion
The RPC Manager provides enterprise-grade RPC endpoint management, enabling:
- Reliability: Automatic failover and health monitoring
- Performance: Optimized load distribution
- Visibility: Comprehensive metrics and statistics
- Flexibility: Multiple rotation strategies for different needs
For production deployments, RPC Manager is essential to prevent single-endpoint rate limiting and ensure robust transaction processing.