copper-tone-tech/mev-beta

Fork 0

Files

Krypto Kajun 52d555ccdf fix(critical): complete execution pipeline - all blockers fixed and operational

2025-11-04 10:24:34 -06:00

7.5 KiB

Raw Blame History

Multi-Provider RPC Configuration - Rate Limit Solution

Date: October 31, 2025 19:00 CDT Status: ✅ IMPLEMENTED - Multi-RPC rotation with failover

Problem Statement

The MEV bot was experiencing critical rate limiting issues:

878 rate limit errors in 4 minutes (~220/minute)
44.79% of pool fetch failures due to 429 errors
Using single free RPC endpoint (https://arb1.arbitrum.io/rpc)
Rate limit set too low (5 req/sec)
No failover or rotation

Solution Implemented

1. Multiple Diverse RPC Providers (7 Total)

Added 6 additional free/public Arbitrum RPC endpoints:

Provider	Type	Endpoint	Rate Limit	Priority
Arbitrum Public HTTP	HTTP	https://arb1.arbitrum.io/rpc	10 req/s	10
Arbitrum Public WS	WSS	wss://arb1.arbitrum.io/ws	15 req/s	10
Chainlist RPC 1	HTTP/WSS	arbitrum-one.publicnode.com	12 req/s	9
Chainlist RPC 2	HTTP/WSS	rpc.ankr.com/arbitrum	12 req/s	8
Chainlist RPC 3	HTTP/WSS	arbitrum.blockpi.network	10 req/s	7
LlamaNodes	HTTP/WSS	arbitrum.llamarpc.com	10 req/s	6
Alchemy Free	HTTP/WSS	arb-mainnet.g.alchemy.com	15 req/s	5

Total Capacity: ~84 requests/second across all providers

2. Round-Robin Rotation Strategy

Changed from priority_based to round_robin distribution:

rotation:
  strategy: round_robin  # Distribute load evenly
  fallback_enabled: true
  health_check_required: true
  retry_failed_after: 2m
  auto_rotate_interval: 30s  # Rotate every 30 seconds
  failover_on_rate_limit: true  # Immediate switch on 429

3. Intelligent Failover

Circuit Breaker Configuration:

circuit_breaker:
  enabled: true
  failure_threshold: 5  # Switch after 5 failures
  success_threshold: 2  # Re-enable after 2 successes
  timeout: 60s
  half_open_requests: 3

Retry Logic:

Max attempts: 5 (across different providers)
Exponential backoff: 500ms → 1s → 2s → 4s → 5s
Jitter enabled to prevent thundering herd

4. Provider Pools

Execution Pool (for transactions):

4 providers with highest reliability
Strategy: round_robin
Max connections: 15

Read-Only Pool (for DataFetcher):

6 providers for maximum capacity
Strategy: round_robin
Max connections: 20

5. Enhanced Rate Limits

Increased per-provider limits:

HTTP endpoints: 10-12 req/s (was 5)
WSS endpoints: 15 req/s (was 5)
Burst capacity: 25-40 (was 10)
Timeout: 45-60s (was 30s)

Expected Impact

Before (Single Provider)

Provider: Arbitrum Public HTTP only
Rate Limit: 5 req/s
429 Errors: 878 in 4 minutes
Success Rate: ~55% (44.79% rate limit failures)

After (Multi-Provider)

Providers: 7 diverse endpoints
Total Capacity: ~84 req/s (16.8x increase)
Expected 429 Errors: <10 per hour (99% reduction)
Expected Success Rate: >98%

Configuration Files

Primary Config

File: config/providers_runtime.yaml
Backup: config/providers_runtime.yaml.backup_single_rpc_*

Key Changes

Added 6 new providers
Increased rate limits by 2-3x per provider
Changed strategy from priority_based → round_robin
Added circuit breaker and retry configuration
Enabled automatic rotation every 30s

Testing Results

Build Status

✅ Build successful with new configuration

Startup Test

✅ Bot started successfully ✅ All providers initialized ✅ Round-robin rotation active

Next Steps

Run 2-hour test to verify 429 error reduction
Monitor provider health and rotation
Fine-tune rate limits based on actual usage
Consider upgrading to paid tier if free limits still exceeded

Provider Selection Criteria

All providers selected based on:

Reliability: Listed on Chainlist or official docs
Free Tier: No API key required for basic use
Rate Limits: Reasonable limits for testing
Geographic Diversity: Different infrastructure providers
Both HTTP & WSS: Support for both protocols

Fallback Strategy

If all free providers hit rate limits:

Circuit breaker will open
Retry with exponential backoff
Wait 60s before retry
Log alert for manual intervention

Recommended: Upgrade to paid tier ($100-300/month) if sustained high load

Monitoring

Monitor these metrics:

# Check provider health
tail -f logs/mev-bot.log | grep -i "provider\|rotation\|429"

# Count rate limit errors
grep -c "429 Too Many Requests" logs/mev-bot_errors.log

# Check which providers are being used
grep "Using provider" logs/mev-bot.log | tail -20

Cost Analysis

Current (Free Tier)

Cost: $0/month
Capacity: ~84 req/s total
Limitations: Subject to rate limits

Recommended (Paid Tier)

Provider: Alchemy/Infura/QuickNode
Cost: $100-300/month
Capacity: 300-1000+ req/s
Benefits:
- Dedicated capacity
- Higher reliability
- Better SLAs
- Archive node access

Implementation Notes

Code Changes Required

None - existing provider manager already supports:

✅ Round-robin rotation
✅ Health checks
✅ Failover
✅ Circuit breaker
✅ Retry logic

Configuration Changes Only

All improvements achieved through YAML configuration updates only.

Rollback Procedure

If issues occur:

# Restore previous config
cp config/providers_runtime.yaml.backup_single_rpc_* config/providers_runtime.yaml

# Rebuild
./scripts/build.sh

# Restart bot
./bin/mev-beta start

Success Metrics

Track these KPIs:

Rate Limit Errors: Should drop from 878/4min to <10/hour
Pool Fetch Success Rate: Should increase from 55% to >98%
Provider Health: All 7 providers should maintain >90% uptime
Rotation: Should rotate between providers every 30s
Response Time: Should average <150ms (was ~105ms)

Documentation References

Provider Manager: pkg/transport/provider_manager.go
Failover Logic: pkg/transport/failover.go
Provider Pools: pkg/transport/provider_pools.go
Rate Limiting: pkg/arbitrum/rate_limited_rpc.go

Appendix: Provider Details

Free Public Arbitrum RPC Endpoints

Official Arbitrum
- HTTP: https://arb1.arbitrum.io/rpc
- WSS: wss://arb1.arbitrum.io/ws
- Limit: ~10 req/s
- Reliability: High (official)
PublicNode (Chainlist)
- HTTP/WSS: arbitrum-one.publicnode.com
- Limit: ~12 req/s
- Reliability: High
- Features: Both HTTP and WSS
Ankr (Chainlist)
- HTTP/WSS: rpc.ankr.com/arbitrum
- Limit: ~12 req/s
- Reliability: High
- Features: Professional infrastructure
BlockPI (Chainlist)
- HTTP/WSS: arbitrum.blockpi.network
- Limit: ~10 req/s
- Reliability: Medium-High
- Features: Public access
LlamaNodes
- HTTP/WSS: arbitrum.llamarpc.com
- Limit: ~10 req/s
- Reliability: Medium
- Features: Community-maintained
Alchemy Free Tier
- HTTP/WSS: arb-mainnet.g.alchemy.com/v2/demo
- Limit: ~15 req/s
- Reliability: High
- Features: Demo key (upgrade available)

Recommended Paid Providers

For production use:

Alchemy ($49-299/month)
- 300M+ compute units
- Archive node access
- Enhanced APIs
Infura ($50-225/month)
- 100K-1M+ requests/day
- Reliable infrastructure
- Good documentation
QuickNode ($49-299/month)
- Dedicated nodes
- Global coverage
- Premium support

Status: Ready for production testing with 7-provider rotation Expected Result: 99% reduction in rate limit errors Recommendation: Monitor for 24-48 hours, then decide on paid upgrade

7.5 KiB Raw Blame History