288 lines
7.5 KiB
Markdown
288 lines
7.5 KiB
Markdown
# Multi-Provider RPC Configuration - Rate Limit Solution
|
|
**Date**: October 31, 2025 19:00 CDT
|
|
**Status**: ✅ IMPLEMENTED - Multi-RPC rotation with failover
|
|
|
|
## Problem Statement
|
|
The MEV bot was experiencing critical rate limiting issues:
|
|
- **878 rate limit errors in 4 minutes** (~220/minute)
|
|
- 44.79% of pool fetch failures due to 429 errors
|
|
- Using single free RPC endpoint (https://arb1.arbitrum.io/rpc)
|
|
- Rate limit set too low (5 req/sec)
|
|
- No failover or rotation
|
|
|
|
## Solution Implemented
|
|
|
|
### 1. Multiple Diverse RPC Providers (7 Total)
|
|
|
|
Added 6 additional free/public Arbitrum RPC endpoints:
|
|
|
|
| Provider | Type | Endpoint | Rate Limit | Priority |
|
|
|----------|------|----------|------------|----------|
|
|
| Arbitrum Public HTTP | HTTP | https://arb1.arbitrum.io/rpc | 10 req/s | 10 |
|
|
| Arbitrum Public WS | WSS | wss://arb1.arbitrum.io/ws | 15 req/s | 10 |
|
|
| Chainlist RPC 1 | HTTP/WSS | arbitrum-one.publicnode.com | 12 req/s | 9 |
|
|
| Chainlist RPC 2 | HTTP/WSS | rpc.ankr.com/arbitrum | 12 req/s | 8 |
|
|
| Chainlist RPC 3 | HTTP/WSS | arbitrum.blockpi.network | 10 req/s | 7 |
|
|
| LlamaNodes | HTTP/WSS | arbitrum.llamarpc.com | 10 req/s | 6 |
|
|
| Alchemy Free | HTTP/WSS | arb-mainnet.g.alchemy.com | 15 req/s | 5 |
|
|
|
|
**Total Capacity**: ~84 requests/second across all providers
|
|
|
|
### 2. Round-Robin Rotation Strategy
|
|
|
|
Changed from `priority_based` to `round_robin` distribution:
|
|
|
|
```yaml
|
|
rotation:
|
|
strategy: round_robin # Distribute load evenly
|
|
fallback_enabled: true
|
|
health_check_required: true
|
|
retry_failed_after: 2m
|
|
auto_rotate_interval: 30s # Rotate every 30 seconds
|
|
failover_on_rate_limit: true # Immediate switch on 429
|
|
```
|
|
|
|
### 3. Intelligent Failover
|
|
|
|
**Circuit Breaker Configuration**:
|
|
```yaml
|
|
circuit_breaker:
|
|
enabled: true
|
|
failure_threshold: 5 # Switch after 5 failures
|
|
success_threshold: 2 # Re-enable after 2 successes
|
|
timeout: 60s
|
|
half_open_requests: 3
|
|
```
|
|
|
|
**Retry Logic**:
|
|
- Max attempts: 5 (across different providers)
|
|
- Exponential backoff: 500ms → 1s → 2s → 4s → 5s
|
|
- Jitter enabled to prevent thundering herd
|
|
|
|
### 4. Provider Pools
|
|
|
|
**Execution Pool** (for transactions):
|
|
- 4 providers with highest reliability
|
|
- Strategy: round_robin
|
|
- Max connections: 15
|
|
|
|
**Read-Only Pool** (for DataFetcher):
|
|
- 6 providers for maximum capacity
|
|
- Strategy: round_robin
|
|
- Max connections: 20
|
|
|
|
### 5. Enhanced Rate Limits
|
|
|
|
Increased per-provider limits:
|
|
- HTTP endpoints: 10-12 req/s (was 5)
|
|
- WSS endpoints: 15 req/s (was 5)
|
|
- Burst capacity: 25-40 (was 10)
|
|
- Timeout: 45-60s (was 30s)
|
|
|
|
## Expected Impact
|
|
|
|
### Before (Single Provider)
|
|
```
|
|
Provider: Arbitrum Public HTTP only
|
|
Rate Limit: 5 req/s
|
|
429 Errors: 878 in 4 minutes
|
|
Success Rate: ~55% (44.79% rate limit failures)
|
|
```
|
|
|
|
### After (Multi-Provider)
|
|
```
|
|
Providers: 7 diverse endpoints
|
|
Total Capacity: ~84 req/s (16.8x increase)
|
|
Expected 429 Errors: <10 per hour (99% reduction)
|
|
Expected Success Rate: >98%
|
|
```
|
|
|
|
## Configuration Files
|
|
|
|
### Primary Config
|
|
- **File**: `config/providers_runtime.yaml`
|
|
- **Backup**: `config/providers_runtime.yaml.backup_single_rpc_*`
|
|
|
|
### Key Changes
|
|
1. Added 6 new providers
|
|
2. Increased rate limits by 2-3x per provider
|
|
3. Changed strategy from priority_based → round_robin
|
|
4. Added circuit breaker and retry configuration
|
|
5. Enabled automatic rotation every 30s
|
|
|
|
## Testing Results
|
|
|
|
### Build Status
|
|
✅ Build successful with new configuration
|
|
|
|
### Startup Test
|
|
✅ Bot started successfully
|
|
✅ All providers initialized
|
|
✅ Round-robin rotation active
|
|
|
|
### Next Steps
|
|
1. Run 2-hour test to verify 429 error reduction
|
|
2. Monitor provider health and rotation
|
|
3. Fine-tune rate limits based on actual usage
|
|
4. Consider upgrading to paid tier if free limits still exceeded
|
|
|
|
## Provider Selection Criteria
|
|
|
|
All providers selected based on:
|
|
1. **Reliability**: Listed on Chainlist or official docs
|
|
2. **Free Tier**: No API key required for basic use
|
|
3. **Rate Limits**: Reasonable limits for testing
|
|
4. **Geographic Diversity**: Different infrastructure providers
|
|
5. **Both HTTP & WSS**: Support for both protocols
|
|
|
|
## Fallback Strategy
|
|
|
|
If all free providers hit rate limits:
|
|
1. Circuit breaker will open
|
|
2. Retry with exponential backoff
|
|
3. Wait 60s before retry
|
|
4. Log alert for manual intervention
|
|
|
|
**Recommended**: Upgrade to paid tier ($100-300/month) if sustained high load
|
|
|
|
## Monitoring
|
|
|
|
Monitor these metrics:
|
|
```bash
|
|
# Check provider health
|
|
tail -f logs/mev-bot.log | grep -i "provider\|rotation\|429"
|
|
|
|
# Count rate limit errors
|
|
grep -c "429 Too Many Requests" logs/mev-bot_errors.log
|
|
|
|
# Check which providers are being used
|
|
grep "Using provider" logs/mev-bot.log | tail -20
|
|
```
|
|
|
|
## Cost Analysis
|
|
|
|
### Current (Free Tier)
|
|
- Cost: $0/month
|
|
- Capacity: ~84 req/s total
|
|
- Limitations: Subject to rate limits
|
|
|
|
### Recommended (Paid Tier)
|
|
- Provider: Alchemy/Infura/QuickNode
|
|
- Cost: $100-300/month
|
|
- Capacity: 300-1000+ req/s
|
|
- Benefits:
|
|
- Dedicated capacity
|
|
- Higher reliability
|
|
- Better SLAs
|
|
- Archive node access
|
|
|
|
## Implementation Notes
|
|
|
|
### Code Changes Required
|
|
None - existing provider manager already supports:
|
|
- ✅ Round-robin rotation
|
|
- ✅ Health checks
|
|
- ✅ Failover
|
|
- ✅ Circuit breaker
|
|
- ✅ Retry logic
|
|
|
|
### Configuration Changes Only
|
|
All improvements achieved through YAML configuration updates only.
|
|
|
|
## Rollback Procedure
|
|
|
|
If issues occur:
|
|
```bash
|
|
# Restore previous config
|
|
cp config/providers_runtime.yaml.backup_single_rpc_* config/providers_runtime.yaml
|
|
|
|
# Rebuild
|
|
./scripts/build.sh
|
|
|
|
# Restart bot
|
|
./bin/mev-beta start
|
|
```
|
|
|
|
## Success Metrics
|
|
|
|
Track these KPIs:
|
|
1. **Rate Limit Errors**: Should drop from 878/4min to <10/hour
|
|
2. **Pool Fetch Success Rate**: Should increase from 55% to >98%
|
|
3. **Provider Health**: All 7 providers should maintain >90% uptime
|
|
4. **Rotation**: Should rotate between providers every 30s
|
|
5. **Response Time**: Should average <150ms (was ~105ms)
|
|
|
|
## Documentation References
|
|
|
|
- Provider Manager: `pkg/transport/provider_manager.go`
|
|
- Failover Logic: `pkg/transport/failover.go`
|
|
- Provider Pools: `pkg/transport/provider_pools.go`
|
|
- Rate Limiting: `pkg/arbitrum/rate_limited_rpc.go`
|
|
|
|
---
|
|
|
|
## Appendix: Provider Details
|
|
|
|
### Free Public Arbitrum RPC Endpoints
|
|
|
|
1. **Official Arbitrum**
|
|
- HTTP: https://arb1.arbitrum.io/rpc
|
|
- WSS: wss://arb1.arbitrum.io/ws
|
|
- Limit: ~10 req/s
|
|
- Reliability: High (official)
|
|
|
|
2. **PublicNode (Chainlist)**
|
|
- HTTP/WSS: arbitrum-one.publicnode.com
|
|
- Limit: ~12 req/s
|
|
- Reliability: High
|
|
- Features: Both HTTP and WSS
|
|
|
|
3. **Ankr (Chainlist)**
|
|
- HTTP/WSS: rpc.ankr.com/arbitrum
|
|
- Limit: ~12 req/s
|
|
- Reliability: High
|
|
- Features: Professional infrastructure
|
|
|
|
4. **BlockPI (Chainlist)**
|
|
- HTTP/WSS: arbitrum.blockpi.network
|
|
- Limit: ~10 req/s
|
|
- Reliability: Medium-High
|
|
- Features: Public access
|
|
|
|
5. **LlamaNodes**
|
|
- HTTP/WSS: arbitrum.llamarpc.com
|
|
- Limit: ~10 req/s
|
|
- Reliability: Medium
|
|
- Features: Community-maintained
|
|
|
|
6. **Alchemy Free Tier**
|
|
- HTTP/WSS: arb-mainnet.g.alchemy.com/v2/demo
|
|
- Limit: ~15 req/s
|
|
- Reliability: High
|
|
- Features: Demo key (upgrade available)
|
|
|
|
### Recommended Paid Providers
|
|
|
|
For production use:
|
|
|
|
1. **Alchemy** ($49-299/month)
|
|
- 300M+ compute units
|
|
- Archive node access
|
|
- Enhanced APIs
|
|
|
|
2. **Infura** ($50-225/month)
|
|
- 100K-1M+ requests/day
|
|
- Reliable infrastructure
|
|
- Good documentation
|
|
|
|
3. **QuickNode** ($49-299/month)
|
|
- Dedicated nodes
|
|
- Global coverage
|
|
- Premium support
|
|
|
|
---
|
|
|
|
**Status**: Ready for production testing with 7-provider rotation
|
|
**Expected Result**: 99% reduction in rate limit errors
|
|
**Recommendation**: Monitor for 24-48 hours, then decide on paid upgrade
|