Files
mev-beta/logs/RATE_LIMIT_ANALYSIS_20251109.md
Administrator 803de231ba feat: create v2-prep branch with comprehensive planning
Restructured project for V2 refactor:

**Structure Changes:**
- Moved all V1 code to orig/ folder (preserved with git mv)
- Created docs/planning/ directory
- Added orig/README_V1.md explaining V1 preservation

**Planning Documents:**
- 00_V2_MASTER_PLAN.md: Complete architecture overview
  - Executive summary of critical V1 issues
  - High-level component architecture diagrams
  - 5-phase implementation roadmap
  - Success metrics and risk mitigation

- 07_TASK_BREAKDOWN.md: Atomic task breakdown
  - 99+ hours of detailed tasks
  - Every task < 2 hours (atomic)
  - Clear dependencies and success criteria
  - Organized by implementation phase

**V2 Key Improvements:**
- Per-exchange parsers (factory pattern)
- Multi-layer strict validation
- Multi-index pool cache
- Background validation pipeline
- Comprehensive observability

**Critical Issues Addressed:**
- Zero address tokens (strict validation + cache enrichment)
- Parsing accuracy (protocol-specific parsers)
- No audit trail (background validation channel)
- Inefficient lookups (multi-index cache)
- Stats disconnection (event-driven metrics)

Next Steps:
1. Review planning documents
2. Begin Phase 1: Foundation (P1-001 through P1-010)
3. Implement parsers in Phase 2
4. Build cache system in Phase 3
5. Add validation pipeline in Phase 4
6. Migrate and test in Phase 5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 10:14:26 +01:00

289 lines
8.0 KiB
Markdown

# Rate Limiting Analysis & Recommendations - November 9, 2025
## Summary
**Configuration Changes Applied:** ✅ Successfully reduced rate limits
**Error Rate Impact:** 🟡 Minimal improvement (5% reduction)
**Root Cause:** Bot design incompatible with public RPC endpoints
**Recommended Solution:** Use premium RPC endpoint or drastically reduce bot scope
## Comparison
### Before Rate Limit Fix
- **Container:** mev-bot-dev-master-dev (first instance)
- **Runtime:** 7 minutes
- **429 Errors:** 2,354 total
- **Error Rate:** 5.60 errors/second
- **Config:** 5 req/sec, 3 concurrent, burst 10
### After Rate Limit Fix
- **Container:** mev-bot-dev-master-dev (rebuilt)
- **Runtime:** 21 minutes
- **429 Errors:** 6,717 total
- **Error Rate:** 5.33 errors/second (-4.8%)
- **Config:** 2 req/sec, 1 concurrent, burst 3
**Improvement:** 5% reduction in error rate, but still unacceptably high
## Root Cause Analysis
### Bot's Request Pattern
The bot generates massive RPC request volume:
1. **Block Processing:** ~4-8 blocks/minute
- Get block data
- Get all transactions
- Get transaction receipts
- Parse events
- **Estimate:** ~20-40 requests/minute
2. **Pool Discovery:** Per swap event detected
- Query Uniswap V3 registry
- Query Uniswap V2 factory
- Query SushiSwap factory
- Query Camelot V3 factory
- Query 4 Curve registries
- **Estimate:** ~8-12 requests per swap event
3. **Arbitrage Scanning:** Every few seconds
- Creates 270 scan tasks for 45 token pairs
- Each task queries multiple pools
- Batch fetches pool state data
- **Estimate:** 270+ requests per scan cycle
**Total Request Rate:** 400-600+ requests/minute = **6-10 requests/second**
### Public Endpoint Limits
Free public RPC endpoints typically allow:
- **arb1.arbitrum.io/rpc:** ~1-2 requests/second
- **publicnode.com:** ~1-2 requests/second
- **1rpc.io:** ~1-2 requests/second
**Gap:** Bot needs 6-10 req/sec, endpoint allows 1-2 req/sec = **5x over limit**
## Why Rate Limiting Didn't Help
The bot's internal rate limiting (2 req/sec) doesn't match the actual request volume because:
1. **Multiple concurrent operations:**
- Block processor running
- Event scanner running
- Arbitrage service running
- Each has its own RPC client
2. **Burst requests:**
- 270 scan tasks created simultaneously
- Even with queuing, bursts hit the endpoint
3. **Fallback endpoints:**
- Also rate-limited
- Switching between them doesn't help
## Current Bot Performance
Despite rate limiting:
### ✅ Working Correctly
- Block processing: Active
- DEX transaction detection: Functional
- Swap event parsing: Working
- Arbitrage scanning: Running (scan #260+ completed)
- Pool blacklisting: Protecting against bad pools
- Services: All healthy
### ❌ Performance Impact
- **No arbitrage opportunities detected:** 0 found in 21 minutes
- **Pool blacklist growing:** 926 pools blacklisted
- **Batch fetch failures:** ~200+ failed fetches
- **Scan completion:** Most scans fail due to missing pool data
## Solutions
### Option 1: Premium RPC Endpoint (RECOMMENDED)
**Pros:**
- Immediate fix
- Full bot functionality
- Designed for this use case
**Premium endpoints with high limits:**
```bash
# Chainstack (50-100 req/sec on paid plans)
ARBITRUM_RPC_ENDPOINT=https://arbitrum-mainnet.core.chainstack.com/YOUR_API_KEY
# Alchemy (300 req/sec on Growth plan)
ARBITRUM_RPC_ENDPOINT=https://arb-mainnet.g.alchemy.com/v2/YOUR_API_KEY
# Infura (100 req/sec on paid plans)
ARBITRUM_RPC_ENDPOINT=https://arbitrum-mainnet.infura.io/v3/YOUR_API_KEY
# QuickNode (500 req/sec on paid plans)
ARBITRUM_RPC_ENDPOINT=https://YOUR_ENDPOINT.arbitrum-mainnet.quiknode.pro/YOUR_TOKEN/
```
**Cost:** $50-200/month depending on provider and tier
**Implementation:**
1. Sign up for premium endpoint
2. Update .env with API key
3. Restart container
4. Monitor - should see 95%+ reduction in 429 errors
### Option 2: Drastically Reduce Bot Scope
**Make bot compatible with public endpoints:**
1. **Disable Curve queries** (save ~4 requests per event):
```yaml
# Reduce protocol coverage
protocols:
- uniswap_v3
- camelot_v3
# Remove: curve, balancer, etc.
```
2. **Reduce arbitrage scan frequency** (save ~100+ requests/minute):
```yaml
arbitrage:
scan_interval: 60 # Scan every 60 seconds instead of every 5
max_scan_tasks: 50 # Reduce from 270 to 50
```
3. **Increase cache times** (reduce redundant queries):
```yaml
uniswap:
cache:
expiration: 1800 # 30 minutes instead of 10
```
4. **Reduce block processing rate**:
```yaml
bot:
polling_interval: 10 # Process blocks slower
max_workers: 1 # Single worker only
```
**Pros:** Free, uses public endpoints
**Cons:**
- Severely limited functionality
- Miss most opportunities
- Slow response time
- Not competitive
### Option 3: Run Your Own Arbitrum Node
**Setup:**
- Run full Arbitrum node locally
- Unlimited RPC requests
- No rate limiting
**Pros:** No rate limits, no costs
**Cons:**
- High initial setup complexity
- Requires 2+ TB storage
- High bandwidth requirements
- Ongoing maintenance
**Cost:** ~$100-200/month in server costs
### Option 4: Hybrid Approach
**Use both public and premium:**
```yaml
arbitrum:
rpc_endpoint: "https://arb-mainnet.g.alchemy.com/v2/YOUR_KEY" # Premium for critical
fallback_endpoints:
- url: "https://arb1.arbitrum.io/rpc" # Public for redundancy
- url: "https://arbitrum-rpc.publicnode.com"
- url: "https://1rpc.io/arb"
```
**Cost:** Lower tier premium ($20-50/month) + free fallbacks
## Immediate Recommendations
### 🔴 CRITICAL - Choose One:
**A) Get Premium RPC Endpoint (Recommended for Production)**
```bash
# Quick start with Alchemy free tier (demo purposes)
ARBITRUM_RPC_ENDPOINT=https://arb-mainnet.g.alchemy.com/v2/demo
```
**B) Reduce Bot Scope for Public Endpoint Testing**
Apply configuration changes in Option 2 above
### 🟡 URGENT - Monitor Performance
```bash
# Watch 429 errors
./scripts/dev-env.sh logs -f | grep "429"
# Count errors over time
watch -n 10 'podman logs mev-bot-dev-master-dev 2>&1 | grep "429" | wc -l'
# Check arbitrage stats
./scripts/dev-env.sh logs | grep "Arbitrage Service Stats" | tail -1
```
### 🟢 RECOMMENDED - Optimize Configuration
Even with premium endpoint, optimize for efficiency:
1. **Disable Curve queries** - Most Arbitrum volume is Uniswap/Camelot
2. **Increase cache times** - Reduce redundant queries
3. **Tune scan frequency** - Balance speed vs resource usage
## Expected Results
### With Premium RPC Endpoint:
- ✅ 95%+ reduction in 429 errors (< 20 errors in 21 minutes)
- ✅ Full arbitrage scanning capability
- ✅ Real-time opportunity detection
- ✅ Competitive performance
### With Reduced Scope on Public Endpoint:
- 🟡 50-70% reduction in 429 errors (~2,000 errors in 21 minutes)
- 🟡 Limited arbitrage scanning
- 🟡 Delayed opportunity detection
- ❌ Not competitive for production MEV
## Cost-Benefit Analysis
### Premium RPC Endpoint
**Cost:** $50-200/month
**Benefit:**
- Full bot functionality
- Can detect $100-1000+/day in opportunities
- **ROI:** Pays for itself on first successful trade
### Public Endpoint with Reduced Scope
**Cost:** $0/month
**Benefit:**
- Testing and development
- Learning and experimentation
- Not suitable for production MEV
- **ROI:** $0 (won't find profitable opportunities)
## Conclusion
**The bot is working correctly.** The issue is architectural mismatch between:
- **Bot Design:** Built for premium RPC endpoints (100+ req/sec)
- **Current Setup:** Using public endpoints (1-2 req/sec)
**Recommendation:**
1. For production MEV: Get premium RPC endpoint ($50-200/month)
2. For testing/development: Reduce bot scope with Option 2 config
**Next Action:**
```bash
# Decision needed from user:
# A) Get premium endpoint and update .env
# B) Apply reduced scope configuration for public endpoint testing
```
The 5% improvement from rate limit changes shows the configuration is working, but it's not enough to bridge the 5x gap between what the bot needs and what public endpoints provide.