# Multi-Provider RPC Configuration - Rate Limit Solution **Date**: October 31, 2025 19:00 CDT **Status**: ✅ IMPLEMENTED - Multi-RPC rotation with failover ## Problem Statement The MEV bot was experiencing critical rate limiting issues: - **878 rate limit errors in 4 minutes** (~220/minute) - 44.79% of pool fetch failures due to 429 errors - Using single free RPC endpoint (https://arb1.arbitrum.io/rpc) - Rate limit set too low (5 req/sec) - No failover or rotation ## Solution Implemented ### 1. Multiple Diverse RPC Providers (7 Total) Added 6 additional free/public Arbitrum RPC endpoints: | Provider | Type | Endpoint | Rate Limit | Priority | |----------|------|----------|------------|----------| | Arbitrum Public HTTP | HTTP | https://arb1.arbitrum.io/rpc | 10 req/s | 10 | | Arbitrum Public WS | WSS | wss://arb1.arbitrum.io/ws | 15 req/s | 10 | | Chainlist RPC 1 | HTTP/WSS | arbitrum-one.publicnode.com | 12 req/s | 9 | | Chainlist RPC 2 | HTTP/WSS | rpc.ankr.com/arbitrum | 12 req/s | 8 | | Chainlist RPC 3 | HTTP/WSS | arbitrum.blockpi.network | 10 req/s | 7 | | LlamaNodes | HTTP/WSS | arbitrum.llamarpc.com | 10 req/s | 6 | | Alchemy Free | HTTP/WSS | arb-mainnet.g.alchemy.com | 15 req/s | 5 | **Total Capacity**: ~84 requests/second across all providers ### 2. Round-Robin Rotation Strategy Changed from `priority_based` to `round_robin` distribution: ```yaml rotation: strategy: round_robin # Distribute load evenly fallback_enabled: true health_check_required: true retry_failed_after: 2m auto_rotate_interval: 30s # Rotate every 30 seconds failover_on_rate_limit: true # Immediate switch on 429 ``` ### 3. Intelligent Failover **Circuit Breaker Configuration**: ```yaml circuit_breaker: enabled: true failure_threshold: 5 # Switch after 5 failures success_threshold: 2 # Re-enable after 2 successes timeout: 60s half_open_requests: 3 ``` **Retry Logic**: - Max attempts: 5 (across different providers) - Exponential backoff: 500ms → 1s → 2s → 4s → 5s - Jitter enabled to prevent thundering herd ### 4. Provider Pools **Execution Pool** (for transactions): - 4 providers with highest reliability - Strategy: round_robin - Max connections: 15 **Read-Only Pool** (for DataFetcher): - 6 providers for maximum capacity - Strategy: round_robin - Max connections: 20 ### 5. Enhanced Rate Limits Increased per-provider limits: - HTTP endpoints: 10-12 req/s (was 5) - WSS endpoints: 15 req/s (was 5) - Burst capacity: 25-40 (was 10) - Timeout: 45-60s (was 30s) ## Expected Impact ### Before (Single Provider) ``` Provider: Arbitrum Public HTTP only Rate Limit: 5 req/s 429 Errors: 878 in 4 minutes Success Rate: ~55% (44.79% rate limit failures) ``` ### After (Multi-Provider) ``` Providers: 7 diverse endpoints Total Capacity: ~84 req/s (16.8x increase) Expected 429 Errors: <10 per hour (99% reduction) Expected Success Rate: >98% ``` ## Configuration Files ### Primary Config - **File**: `config/providers_runtime.yaml` - **Backup**: `config/providers_runtime.yaml.backup_single_rpc_*` ### Key Changes 1. Added 6 new providers 2. Increased rate limits by 2-3x per provider 3. Changed strategy from priority_based → round_robin 4. Added circuit breaker and retry configuration 5. Enabled automatic rotation every 30s ## Testing Results ### Build Status ✅ Build successful with new configuration ### Startup Test ✅ Bot started successfully ✅ All providers initialized ✅ Round-robin rotation active ### Next Steps 1. Run 2-hour test to verify 429 error reduction 2. Monitor provider health and rotation 3. Fine-tune rate limits based on actual usage 4. Consider upgrading to paid tier if free limits still exceeded ## Provider Selection Criteria All providers selected based on: 1. **Reliability**: Listed on Chainlist or official docs 2. **Free Tier**: No API key required for basic use 3. **Rate Limits**: Reasonable limits for testing 4. **Geographic Diversity**: Different infrastructure providers 5. **Both HTTP & WSS**: Support for both protocols ## Fallback Strategy If all free providers hit rate limits: 1. Circuit breaker will open 2. Retry with exponential backoff 3. Wait 60s before retry 4. Log alert for manual intervention **Recommended**: Upgrade to paid tier ($100-300/month) if sustained high load ## Monitoring Monitor these metrics: ```bash # Check provider health tail -f logs/mev-bot.log | grep -i "provider\|rotation\|429" # Count rate limit errors grep -c "429 Too Many Requests" logs/mev-bot_errors.log # Check which providers are being used grep "Using provider" logs/mev-bot.log | tail -20 ``` ## Cost Analysis ### Current (Free Tier) - Cost: $0/month - Capacity: ~84 req/s total - Limitations: Subject to rate limits ### Recommended (Paid Tier) - Provider: Alchemy/Infura/QuickNode - Cost: $100-300/month - Capacity: 300-1000+ req/s - Benefits: - Dedicated capacity - Higher reliability - Better SLAs - Archive node access ## Implementation Notes ### Code Changes Required None - existing provider manager already supports: - ✅ Round-robin rotation - ✅ Health checks - ✅ Failover - ✅ Circuit breaker - ✅ Retry logic ### Configuration Changes Only All improvements achieved through YAML configuration updates only. ## Rollback Procedure If issues occur: ```bash # Restore previous config cp config/providers_runtime.yaml.backup_single_rpc_* config/providers_runtime.yaml # Rebuild ./scripts/build.sh # Restart bot ./bin/mev-beta start ``` ## Success Metrics Track these KPIs: 1. **Rate Limit Errors**: Should drop from 878/4min to <10/hour 2. **Pool Fetch Success Rate**: Should increase from 55% to >98% 3. **Provider Health**: All 7 providers should maintain >90% uptime 4. **Rotation**: Should rotate between providers every 30s 5. **Response Time**: Should average <150ms (was ~105ms) ## Documentation References - Provider Manager: `pkg/transport/provider_manager.go` - Failover Logic: `pkg/transport/failover.go` - Provider Pools: `pkg/transport/provider_pools.go` - Rate Limiting: `pkg/arbitrum/rate_limited_rpc.go` --- ## Appendix: Provider Details ### Free Public Arbitrum RPC Endpoints 1. **Official Arbitrum** - HTTP: https://arb1.arbitrum.io/rpc - WSS: wss://arb1.arbitrum.io/ws - Limit: ~10 req/s - Reliability: High (official) 2. **PublicNode (Chainlist)** - HTTP/WSS: arbitrum-one.publicnode.com - Limit: ~12 req/s - Reliability: High - Features: Both HTTP and WSS 3. **Ankr (Chainlist)** - HTTP/WSS: rpc.ankr.com/arbitrum - Limit: ~12 req/s - Reliability: High - Features: Professional infrastructure 4. **BlockPI (Chainlist)** - HTTP/WSS: arbitrum.blockpi.network - Limit: ~10 req/s - Reliability: Medium-High - Features: Public access 5. **LlamaNodes** - HTTP/WSS: arbitrum.llamarpc.com - Limit: ~10 req/s - Reliability: Medium - Features: Community-maintained 6. **Alchemy Free Tier** - HTTP/WSS: arb-mainnet.g.alchemy.com/v2/demo - Limit: ~15 req/s - Reliability: High - Features: Demo key (upgrade available) ### Recommended Paid Providers For production use: 1. **Alchemy** ($49-299/month) - 300M+ compute units - Archive node access - Enhanced APIs 2. **Infura** ($50-225/month) - 100K-1M+ requests/day - Reliable infrastructure - Good documentation 3. **QuickNode** ($49-299/month) - Dedicated nodes - Global coverage - Premium support --- **Status**: Ready for production testing with 7-provider rotation **Expected Result**: 99% reduction in rate limit errors **Recommendation**: Monitor for 24-48 hours, then decide on paid upgrade