455 lines
12 KiB
Markdown
455 lines
12 KiB
Markdown
# RPC Manager - Round-Robin Load Balancing Guide
|
|
|
|
## Overview
|
|
|
|
The **RPC Manager** is a production-grade RPC endpoint management system that provides:
|
|
|
|
- **Round-Robin Load Balancing**: Distributes RPC calls evenly across multiple endpoints
|
|
- **Health Monitoring**: Tracks endpoint health and automatically handles failures
|
|
- **Multiple Rotation Policies**: Supports different strategies for endpoint selection
|
|
- **Statistics & Metrics**: Provides detailed metrics about RPC usage and health
|
|
- **Automatic Failover**: Gracefully handles endpoint failures and recoveries
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────┐
|
|
│ RPC Manager │
|
|
│ - Manages endpoint pool │
|
|
│ - Rotates through endpoints │
|
|
│ - Tracks health metrics │
|
|
└─────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────┐
|
|
│ RPC Endpoint Health Tracker │
|
|
│ - Success/failure counts │
|
|
│ - Response times │
|
|
│ - Consecutive failure tracking │
|
|
│ - Health status │
|
|
└─────────────────────────────────────────────┘
|
|
↓
|
|
┌─────────────────────────────────────────────┐
|
|
│ Connection Manager Integration │
|
|
│ - Transparent endpoint selection │
|
|
│ - Automatic client pooling │
|
|
│ - Fallback support │
|
|
└─────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Rotation Policies
|
|
|
|
### 1. Round-Robin (Default)
|
|
Simple cyclic rotation through all endpoints.
|
|
```
|
|
Endpoint 1 → Endpoint 2 → Endpoint 3 → Endpoint 1 → ...
|
|
```
|
|
**Best For**: Uniform load distribution across identical endpoints
|
|
|
|
### 2. Health-Aware
|
|
Prioritizes healthy endpoints, falls back to round-robin if all unhealthy.
|
|
```
|
|
Healthy endpoints preferred, unhealthy skipped
|
|
```
|
|
**Best For**: Mixed quality endpoints, avoiding bad ones
|
|
|
|
### 3. Least-Failures
|
|
Always selects endpoint with lowest failure count.
|
|
```
|
|
Track total failures per endpoint, select best
|
|
```
|
|
**Best For**: Handling varying endpoint reliability over time
|
|
|
|
## Usage
|
|
|
|
### Basic Setup
|
|
|
|
```go
|
|
import (
|
|
"github.com/fraktal/mev-beta/pkg/arbitrum"
|
|
"github.com/fraktal/mev-beta/internal/config"
|
|
"github.com/fraktal/mev-beta/internal/logger"
|
|
)
|
|
|
|
// Create connection manager with round-robin enabled
|
|
cfg := &config.ArbitrumConfig{
|
|
RPCEndpoint: "https://primary.rpc.com",
|
|
// ... other config
|
|
}
|
|
|
|
connectionManager := arbitrum.NewConnectionManager(cfg, logger)
|
|
connectionManager.EnableRoundRobin(true)
|
|
```
|
|
|
|
### Using Round-Robin Clients
|
|
|
|
```go
|
|
// Create a round-robin client wrapper
|
|
rrClient := arbitrum.NewRoundRobinClient(
|
|
connectionManager.rpcManager,
|
|
ctx,
|
|
logger,
|
|
)
|
|
|
|
// For read operations - uses round-robin
|
|
client, err := rrClient.GetClientForRead()
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
// Perform read operation
|
|
result, err := client.ChainID(ctx)
|
|
|
|
// Record result
|
|
if err != nil {
|
|
rrClient.RecordReadFailure()
|
|
} else {
|
|
rrClient.RecordReadSuccess(responseTime)
|
|
}
|
|
```
|
|
|
|
### Advanced: Initialize with Multiple Endpoints
|
|
|
|
```go
|
|
endpoints := []string{
|
|
"https://rpc1.arbitrum.io",
|
|
"https://rpc2.arbitrum.io",
|
|
"https://rpc3.arbitrum.io",
|
|
}
|
|
|
|
// Initialize round-robin with multiple endpoints
|
|
err := arbitrum.InitializeRPCRoundRobin(connectionManager, endpoints)
|
|
if err != nil {
|
|
logger.Error(fmt.Sprintf("Failed to initialize round-robin: %v", err))
|
|
}
|
|
|
|
// Set rotation strategy
|
|
arbitrum.ConfigureRPCLoadBalancing(connectionManager, arbitrum.HealthAware)
|
|
```
|
|
|
|
### Monitoring RPC Health
|
|
|
|
```go
|
|
// Perform health check on all endpoints
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
|
defer cancel()
|
|
|
|
err := connectionManager.PerformRPCHealthCheck(ctx)
|
|
if err != nil {
|
|
logger.Warn(fmt.Sprintf("Health check failed: %v", err))
|
|
}
|
|
|
|
// Get detailed statistics
|
|
stats := connectionManager.GetRPCManagerStats()
|
|
fmt.Println(stats)
|
|
// Output:
|
|
// {
|
|
// "total_endpoints": 3,
|
|
// "healthy_count": 3,
|
|
// "total_requests": 10234,
|
|
// "total_success": 10000,
|
|
// "total_failure": 234,
|
|
// "success_rate": "97.71%",
|
|
// "current_policy": "health-aware",
|
|
// "endpoint_details": [...]
|
|
// }
|
|
```
|
|
|
|
## Integration Examples
|
|
|
|
### With Batch Fetcher
|
|
|
|
The batch fetcher automatically benefits from round-robin when the connection manager is configured:
|
|
|
|
```go
|
|
// Connection manager already uses round-robin
|
|
client, err := connectionManager.GetClient(ctx)
|
|
|
|
// Create batch fetcher - will use round-robin automatically
|
|
batchFetcher, err := datafetcher.NewBatchFetcher(
|
|
client,
|
|
contractAddr,
|
|
logger,
|
|
)
|
|
|
|
// All batch calls are load-balanced across endpoints
|
|
results, err := batchFetcher.FetchPoolsBatch(ctx, poolAddresses)
|
|
```
|
|
|
|
### With Monitor
|
|
|
|
The Arbitrum monitor automatically uses the round-robin enabled connection manager:
|
|
|
|
```go
|
|
// Create monitor with round-robin connection manager
|
|
monitor, err := monitor.NewArbitrumMonitor(
|
|
arbConfig,
|
|
botConfig,
|
|
logger,
|
|
rateLimiter,
|
|
marketMgr,
|
|
scanner,
|
|
)
|
|
|
|
// All RPC calls from monitor are load-balanced
|
|
```
|
|
|
|
## Health Metrics
|
|
|
|
The RPC Manager tracks comprehensive health metrics for each endpoint:
|
|
|
|
```go
|
|
health, _ := connectionManager.rpcManager.GetEndpointHealth(0)
|
|
|
|
// Access metrics
|
|
success, failure, consecutive, isHealthy := health.GetStats()
|
|
|
|
fmt.Printf("Endpoint: %s\n", health.URL)
|
|
fmt.Printf(" Success: %d, Failure: %d, Consecutive Fails: %d\n",
|
|
success, failure, consecutive)
|
|
fmt.Printf(" Healthy: %v, Response Time: %dms\n",
|
|
isHealthy, health.ResponseTime.Milliseconds())
|
|
```
|
|
|
|
### Health Thresholds
|
|
|
|
- **Marked Unhealthy**: 3+ consecutive failures
|
|
- **Recovered**: Next successful call resets consecutive failures
|
|
- **Tracked Metrics**: Success count, failure count, response time
|
|
|
|
## Performance Impact
|
|
|
|
### Load Distribution
|
|
With 3 endpoints using round-robin:
|
|
- **Without**: All calls hit endpoint 1 → potential rate limiting
|
|
- **With**: Calls distributed evenly → 3x throughput potential
|
|
|
|
### Response Times
|
|
Example with mixed endpoints:
|
|
```
|
|
Endpoint 1: 50ms avg
|
|
Endpoint 2: 200ms avg (poor)
|
|
Endpoint 3: 50ms avg
|
|
|
|
Health-Aware Strategy Results:
|
|
- Requests to 1: ~45%
|
|
- Requests to 2: ~5% (deprioritized)
|
|
- Requests to 3: ~50%
|
|
|
|
Success Rate: 99.8% (vs 95% without load balancing)
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
```bash
|
|
# Enable round-robin explicitly
|
|
export RPC_ROUNDROBIN_ENABLED=true
|
|
|
|
# Additional fallback endpoints (comma-separated)
|
|
export ARBITRUM_FALLBACK_ENDPOINTS="https://rpc1.io,https://rpc2.io,https://rpc3.io"
|
|
```
|
|
|
|
### Configuration File
|
|
```yaml
|
|
arbitrum:
|
|
rpc_endpoint: "https://primary.rpc.io"
|
|
rate_limit:
|
|
requests_per_second: 5.0
|
|
burst: 10
|
|
reading_endpoints:
|
|
- url: "https://read1.rpc.io"
|
|
- url: "https://read2.rpc.io"
|
|
execution_endpoints:
|
|
- url: "https://execute1.rpc.io"
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Choose Right Rotation Policy
|
|
```go
|
|
// For equal-quality endpoints
|
|
connectionManager.SetRPCRotationPolicy(arbitrum.RoundRobin)
|
|
|
|
// For mixed-quality endpoints
|
|
connectionManager.SetRPCRotationPolicy(arbitrum.HealthAware)
|
|
|
|
// For endpoints with varying failures
|
|
connectionManager.SetRPCRotationPolicy(arbitrum.LeastFailures)
|
|
```
|
|
|
|
### 2. Monitor Regularly
|
|
```go
|
|
// Periodic health checks
|
|
ticker := time.NewTicker(5 * time.Minute)
|
|
for range ticker.C {
|
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
|
if err := connectionManager.PerformRPCHealthCheck(ctx); err != nil {
|
|
logger.Error(fmt.Sprintf("Health check failed: %v", err))
|
|
}
|
|
cancel()
|
|
}
|
|
```
|
|
|
|
### 3. Handle Errors Gracefully
|
|
```go
|
|
client, err := rrClient.GetClientForRead()
|
|
if err != nil {
|
|
logger.Error(fmt.Sprintf("Failed to get RPC client: %v", err))
|
|
// Implement fallback logic
|
|
return nil, err
|
|
}
|
|
|
|
// Always record results
|
|
start := time.Now()
|
|
result, err := performRPCCall(client)
|
|
elapsed := time.Since(start)
|
|
|
|
if err != nil {
|
|
rrClient.RecordReadFailure()
|
|
} else {
|
|
rrClient.RecordReadSuccess(elapsed)
|
|
}
|
|
```
|
|
|
|
### 4. Optimize Batch Sizes
|
|
```go
|
|
// RPC Manager works best with batch operations
|
|
// Reduce individual calls, increase batch size
|
|
|
|
// ❌ Avoid: Many small individual calls
|
|
for _, pool := range pools {
|
|
data, _ := client.CallContract(ctx, call)
|
|
}
|
|
|
|
// ✅ Better: Batch operations
|
|
batchFetcher.FetchPoolsBatch(ctx, pools)
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### All Endpoints Unhealthy
|
|
```
|
|
Error: "no healthy endpoints available"
|
|
|
|
Solution: Check endpoint status and logs
|
|
- Perform manual health check
|
|
- Verify network connectivity
|
|
- Check RPC provider status
|
|
- Review error logs for specific failures
|
|
```
|
|
|
|
### High Failure Rate
|
|
```go
|
|
stats := connectionManager.GetRPCManagerStats()
|
|
if stats["success_rate"].(string) < "95%" {
|
|
logger.Warn("High RPC failure rate detected")
|
|
// Switch to more lenient policy
|
|
connectionManager.SetRPCRotationPolicy(arbitrum.LeastFailures)
|
|
}
|
|
```
|
|
|
|
### Uneven Load Distribution
|
|
```go
|
|
// Check distribution
|
|
stats := connectionManager.GetRPCManagerStats()
|
|
details := stats["endpoint_details"].([]map[string]interface{})
|
|
|
|
for _, endpoint := range details {
|
|
fmt.Printf("%s: %d requests\n",
|
|
endpoint["url"],
|
|
endpoint["success_count"])
|
|
}
|
|
```
|
|
|
|
## Metrics Reference
|
|
|
|
### Tracked Metrics
|
|
- **Total Requests**: Sum of all successful and failed calls
|
|
- **Success Rate**: Percentage of successful calls
|
|
- **Response Times**: Min, max, average per endpoint
|
|
- **Consecutive Failures**: Track for health status
|
|
- **Endpoint Status**: Healthy/Unhealthy state
|
|
|
|
### Export Format
|
|
```json
|
|
{
|
|
"total_endpoints": 3,
|
|
"healthy_count": 3,
|
|
"total_requests": 15234,
|
|
"total_success": 14892,
|
|
"total_failure": 342,
|
|
"success_rate": "97.76%",
|
|
"current_policy": "health-aware",
|
|
"endpoint_details": [
|
|
{
|
|
"index": 0,
|
|
"url": "https://rpc1.io",
|
|
"success_count": 5023,
|
|
"failure_count": 89,
|
|
"consecutive_fails": 0,
|
|
"is_healthy": true,
|
|
"last_checked": "2025-11-03T12:34:56Z",
|
|
"response_time_ms": 45
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### RPCManager
|
|
```go
|
|
type RPCManager struct
|
|
NewRPCManager(logger) *RPCManager
|
|
AddEndpoint(client, url) error
|
|
GetNextClient(ctx) (*RateLimitedClient, int, error)
|
|
RecordSuccess(idx, responseTime)
|
|
RecordFailure(idx)
|
|
GetEndpointHealth(idx) (*RPCEndpointHealth, error)
|
|
GetAllHealthStats() []map[string]interface{}
|
|
SetRotationPolicy(policy RotationPolicy)
|
|
HealthCheckAll(ctx) error
|
|
GetStats() map[string]interface{}
|
|
Close() error
|
|
```
|
|
|
|
### ConnectionManager Extensions
|
|
```go
|
|
func (cm *ConnectionManager) EnableRoundRobin(enabled bool)
|
|
func (cm *ConnectionManager) SetRPCRotationPolicy(policy RotationPolicy)
|
|
func (cm *ConnectionManager) GetRPCManagerStats() map[string]interface{}
|
|
func (cm *ConnectionManager) PerformRPCHealthCheck(ctx) error
|
|
```
|
|
|
|
### Helper Functions
|
|
```go
|
|
NewRoundRobinClient(manager, ctx, logger) *RoundRobinClient
|
|
InitializeRPCRoundRobin(cm, endpoints) error
|
|
ConfigureRPCLoadBalancing(cm, strategy) error
|
|
GetConnectionManagerWithRoundRobin(cfg, logger, endpoints) (*ConnectionManager, error)
|
|
```
|
|
|
|
## Future Enhancements
|
|
|
|
Planned improvements to RPC Manager:
|
|
|
|
1. **Weighted Round-Robin**: Assign weights based on historical performance
|
|
2. **Dynamic Endpoint Discovery**: Auto-discover and add new endpoints
|
|
3. **Regional Failover**: Prefer endpoints in same region for latency
|
|
4. **Cost Tracking**: Monitor and report RPC call costs
|
|
5. **Analytics Dashboard**: Real-time visualization of RPC metrics
|
|
6. **Adaptive Timeouts**: Adjust timeouts based on endpoint performance
|
|
7. **Request Queueing**: Smart queuing during RPC overload
|
|
|
|
## Conclusion
|
|
|
|
The RPC Manager provides enterprise-grade RPC endpoint management, enabling:
|
|
- **Reliability**: Automatic failover and health monitoring
|
|
- **Performance**: Optimized load distribution
|
|
- **Visibility**: Comprehensive metrics and statistics
|
|
- **Flexibility**: Multiple rotation strategies for different needs
|
|
|
|
For production deployments, RPC Manager is essential to prevent single-endpoint rate limiting and ensure robust transaction processing.
|