# RPC Manager - Round-Robin Load Balancing Guide ## Overview The **RPC Manager** is a production-grade RPC endpoint management system that provides: - **Round-Robin Load Balancing**: Distributes RPC calls evenly across multiple endpoints - **Health Monitoring**: Tracks endpoint health and automatically handles failures - **Multiple Rotation Policies**: Supports different strategies for endpoint selection - **Statistics & Metrics**: Provides detailed metrics about RPC usage and health - **Automatic Failover**: Gracefully handles endpoint failures and recoveries ## Architecture ### Core Components ``` ┌─────────────────────────────────────────────┐ │ RPC Manager │ │ - Manages endpoint pool │ │ - Rotates through endpoints │ │ - Tracks health metrics │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ RPC Endpoint Health Tracker │ │ - Success/failure counts │ │ - Response times │ │ - Consecutive failure tracking │ │ - Health status │ └─────────────────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────┐ │ Connection Manager Integration │ │ - Transparent endpoint selection │ │ - Automatic client pooling │ │ - Fallback support │ └─────────────────────────────────────────────┘ ``` ## Rotation Policies ### 1. Round-Robin (Default) Simple cyclic rotation through all endpoints. ``` Endpoint 1 → Endpoint 2 → Endpoint 3 → Endpoint 1 → ... ``` **Best For**: Uniform load distribution across identical endpoints ### 2. Health-Aware Prioritizes healthy endpoints, falls back to round-robin if all unhealthy. ``` Healthy endpoints preferred, unhealthy skipped ``` **Best For**: Mixed quality endpoints, avoiding bad ones ### 3. Least-Failures Always selects endpoint with lowest failure count. ``` Track total failures per endpoint, select best ``` **Best For**: Handling varying endpoint reliability over time ## Usage ### Basic Setup ```go import ( "github.com/fraktal/mev-beta/pkg/arbitrum" "github.com/fraktal/mev-beta/internal/config" "github.com/fraktal/mev-beta/internal/logger" ) // Create connection manager with round-robin enabled cfg := &config.ArbitrumConfig{ RPCEndpoint: "https://primary.rpc.com", // ... other config } connectionManager := arbitrum.NewConnectionManager(cfg, logger) connectionManager.EnableRoundRobin(true) ``` ### Using Round-Robin Clients ```go // Create a round-robin client wrapper rrClient := arbitrum.NewRoundRobinClient( connectionManager.rpcManager, ctx, logger, ) // For read operations - uses round-robin client, err := rrClient.GetClientForRead() if err != nil { return err } // Perform read operation result, err := client.ChainID(ctx) // Record result if err != nil { rrClient.RecordReadFailure() } else { rrClient.RecordReadSuccess(responseTime) } ``` ### Advanced: Initialize with Multiple Endpoints ```go endpoints := []string{ "https://rpc1.arbitrum.io", "https://rpc2.arbitrum.io", "https://rpc3.arbitrum.io", } // Initialize round-robin with multiple endpoints err := arbitrum.InitializeRPCRoundRobin(connectionManager, endpoints) if err != nil { logger.Error(fmt.Sprintf("Failed to initialize round-robin: %v", err)) } // Set rotation strategy arbitrum.ConfigureRPCLoadBalancing(connectionManager, arbitrum.HealthAware) ``` ### Monitoring RPC Health ```go // Perform health check on all endpoints ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() err := connectionManager.PerformRPCHealthCheck(ctx) if err != nil { logger.Warn(fmt.Sprintf("Health check failed: %v", err)) } // Get detailed statistics stats := connectionManager.GetRPCManagerStats() fmt.Println(stats) // Output: // { // "total_endpoints": 3, // "healthy_count": 3, // "total_requests": 10234, // "total_success": 10000, // "total_failure": 234, // "success_rate": "97.71%", // "current_policy": "health-aware", // "endpoint_details": [...] // } ``` ## Integration Examples ### With Batch Fetcher The batch fetcher automatically benefits from round-robin when the connection manager is configured: ```go // Connection manager already uses round-robin client, err := connectionManager.GetClient(ctx) // Create batch fetcher - will use round-robin automatically batchFetcher, err := datafetcher.NewBatchFetcher( client, contractAddr, logger, ) // All batch calls are load-balanced across endpoints results, err := batchFetcher.FetchPoolsBatch(ctx, poolAddresses) ``` ### With Monitor The Arbitrum monitor automatically uses the round-robin enabled connection manager: ```go // Create monitor with round-robin connection manager monitor, err := monitor.NewArbitrumMonitor( arbConfig, botConfig, logger, rateLimiter, marketMgr, scanner, ) // All RPC calls from monitor are load-balanced ``` ## Health Metrics The RPC Manager tracks comprehensive health metrics for each endpoint: ```go health, _ := connectionManager.rpcManager.GetEndpointHealth(0) // Access metrics success, failure, consecutive, isHealthy := health.GetStats() fmt.Printf("Endpoint: %s\n", health.URL) fmt.Printf(" Success: %d, Failure: %d, Consecutive Fails: %d\n", success, failure, consecutive) fmt.Printf(" Healthy: %v, Response Time: %dms\n", isHealthy, health.ResponseTime.Milliseconds()) ``` ### Health Thresholds - **Marked Unhealthy**: 3+ consecutive failures - **Recovered**: Next successful call resets consecutive failures - **Tracked Metrics**: Success count, failure count, response time ## Performance Impact ### Load Distribution With 3 endpoints using round-robin: - **Without**: All calls hit endpoint 1 → potential rate limiting - **With**: Calls distributed evenly → 3x throughput potential ### Response Times Example with mixed endpoints: ``` Endpoint 1: 50ms avg Endpoint 2: 200ms avg (poor) Endpoint 3: 50ms avg Health-Aware Strategy Results: - Requests to 1: ~45% - Requests to 2: ~5% (deprioritized) - Requests to 3: ~50% Success Rate: 99.8% (vs 95% without load balancing) ``` ## Configuration ### Environment Variables ```bash # Enable round-robin explicitly export RPC_ROUNDROBIN_ENABLED=true # Additional fallback endpoints (comma-separated) export ARBITRUM_FALLBACK_ENDPOINTS="https://rpc1.io,https://rpc2.io,https://rpc3.io" ``` ### Configuration File ```yaml arbitrum: rpc_endpoint: "https://primary.rpc.io" rate_limit: requests_per_second: 5.0 burst: 10 reading_endpoints: - url: "https://read1.rpc.io" - url: "https://read2.rpc.io" execution_endpoints: - url: "https://execute1.rpc.io" ``` ## Best Practices ### 1. Choose Right Rotation Policy ```go // For equal-quality endpoints connectionManager.SetRPCRotationPolicy(arbitrum.RoundRobin) // For mixed-quality endpoints connectionManager.SetRPCRotationPolicy(arbitrum.HealthAware) // For endpoints with varying failures connectionManager.SetRPCRotationPolicy(arbitrum.LeastFailures) ``` ### 2. Monitor Regularly ```go // Periodic health checks ticker := time.NewTicker(5 * time.Minute) for range ticker.C { ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) if err := connectionManager.PerformRPCHealthCheck(ctx); err != nil { logger.Error(fmt.Sprintf("Health check failed: %v", err)) } cancel() } ``` ### 3. Handle Errors Gracefully ```go client, err := rrClient.GetClientForRead() if err != nil { logger.Error(fmt.Sprintf("Failed to get RPC client: %v", err)) // Implement fallback logic return nil, err } // Always record results start := time.Now() result, err := performRPCCall(client) elapsed := time.Since(start) if err != nil { rrClient.RecordReadFailure() } else { rrClient.RecordReadSuccess(elapsed) } ``` ### 4. Optimize Batch Sizes ```go // RPC Manager works best with batch operations // Reduce individual calls, increase batch size // ❌ Avoid: Many small individual calls for _, pool := range pools { data, _ := client.CallContract(ctx, call) } // ✅ Better: Batch operations batchFetcher.FetchPoolsBatch(ctx, pools) ``` ## Troubleshooting ### All Endpoints Unhealthy ``` Error: "no healthy endpoints available" Solution: Check endpoint status and logs - Perform manual health check - Verify network connectivity - Check RPC provider status - Review error logs for specific failures ``` ### High Failure Rate ```go stats := connectionManager.GetRPCManagerStats() if stats["success_rate"].(string) < "95%" { logger.Warn("High RPC failure rate detected") // Switch to more lenient policy connectionManager.SetRPCRotationPolicy(arbitrum.LeastFailures) } ``` ### Uneven Load Distribution ```go // Check distribution stats := connectionManager.GetRPCManagerStats() details := stats["endpoint_details"].([]map[string]interface{}) for _, endpoint := range details { fmt.Printf("%s: %d requests\n", endpoint["url"], endpoint["success_count"]) } ``` ## Metrics Reference ### Tracked Metrics - **Total Requests**: Sum of all successful and failed calls - **Success Rate**: Percentage of successful calls - **Response Times**: Min, max, average per endpoint - **Consecutive Failures**: Track for health status - **Endpoint Status**: Healthy/Unhealthy state ### Export Format ```json { "total_endpoints": 3, "healthy_count": 3, "total_requests": 15234, "total_success": 14892, "total_failure": 342, "success_rate": "97.76%", "current_policy": "health-aware", "endpoint_details": [ { "index": 0, "url": "https://rpc1.io", "success_count": 5023, "failure_count": 89, "consecutive_fails": 0, "is_healthy": true, "last_checked": "2025-11-03T12:34:56Z", "response_time_ms": 45 } ] } ``` ## API Reference ### RPCManager ```go type RPCManager struct NewRPCManager(logger) *RPCManager AddEndpoint(client, url) error GetNextClient(ctx) (*RateLimitedClient, int, error) RecordSuccess(idx, responseTime) RecordFailure(idx) GetEndpointHealth(idx) (*RPCEndpointHealth, error) GetAllHealthStats() []map[string]interface{} SetRotationPolicy(policy RotationPolicy) HealthCheckAll(ctx) error GetStats() map[string]interface{} Close() error ``` ### ConnectionManager Extensions ```go func (cm *ConnectionManager) EnableRoundRobin(enabled bool) func (cm *ConnectionManager) SetRPCRotationPolicy(policy RotationPolicy) func (cm *ConnectionManager) GetRPCManagerStats() map[string]interface{} func (cm *ConnectionManager) PerformRPCHealthCheck(ctx) error ``` ### Helper Functions ```go NewRoundRobinClient(manager, ctx, logger) *RoundRobinClient InitializeRPCRoundRobin(cm, endpoints) error ConfigureRPCLoadBalancing(cm, strategy) error GetConnectionManagerWithRoundRobin(cfg, logger, endpoints) (*ConnectionManager, error) ``` ## Future Enhancements Planned improvements to RPC Manager: 1. **Weighted Round-Robin**: Assign weights based on historical performance 2. **Dynamic Endpoint Discovery**: Auto-discover and add new endpoints 3. **Regional Failover**: Prefer endpoints in same region for latency 4. **Cost Tracking**: Monitor and report RPC call costs 5. **Analytics Dashboard**: Real-time visualization of RPC metrics 6. **Adaptive Timeouts**: Adjust timeouts based on endpoint performance 7. **Request Queueing**: Smart queuing during RPC overload ## Conclusion The RPC Manager provides enterprise-grade RPC endpoint management, enabling: - **Reliability**: Automatic failover and health monitoring - **Performance**: Optimized load distribution - **Visibility**: Comprehensive metrics and statistics - **Flexibility**: Multiple rotation strategies for different needs For production deployments, RPC Manager is essential to prevent single-endpoint rate limiting and ensure robust transaction processing.