This commit adds critical production-ready optimizations and infrastructure: New Features: 1. Pool Version Detector - Detects pool versions before calling slot0() - Eliminates ABI unpacking errors from V2 pools - Caches detection results for performance 2. Price Impact Validation System - Comprehensive risk categorization - Three threshold profiles (Conservative, Default, Aggressive) - Automatic trade splitting recommendations - All tests passing (10/10) 3. Flash Loan Execution Architecture - Complete execution flow design - Multi-provider support (Aave, Balancer, Uniswap) - Safety and risk management systems - Transaction signing and dispatch strategies 4. 24-Hour Validation Test Infrastructure - Production testing framework - Comprehensive monitoring with real-time metrics - Automatic report generation - System health tracking 5. Production Deployment Runbook - Complete deployment procedures - Pre-deployment checklist - Configuration templates - Monitoring and rollback procedures Files Added: - pkg/uniswap/pool_detector.go (273 lines) - pkg/validation/price_impact_validator.go (265 lines) - pkg/validation/price_impact_validator_test.go (242 lines) - docs/architecture/flash_loan_execution_architecture.md (808 lines) - docs/PRODUCTION_DEPLOYMENT_RUNBOOK.md (615 lines) - scripts/24h-validation-test.sh (352 lines) Testing: Core functionality tests passing. Stress test showing 867 TPS (below 1000 TPS target - to be investigated) Impact: Ready for 24-hour validation test and production deployment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
12 KiB
MEV Bot - Production Deployment Runbook
Version: 1.0 Last Updated: October 28, 2025 Audience: DevOps, Production Engineers
Table of Contents
- Pre-Deployment Checklist
- Environment Setup
- Configuration
- Deployment Steps
- Post-Deployment Validation
- Monitoring & Alerting
- Rollback Procedures
- Troubleshooting
Pre-Deployment Checklist
Code Readiness
- All tests passing (
make test) - Security audit completed and issues addressed
- Code review approved
- 24-hour validation test completed successfully
- Performance benchmarks meet targets
- No critical TODOs in codebase
Infrastructure Readiness
- RPC endpoints configured and tested
- Private key/wallet funded with gas (minimum 0.1 ETH)
- Monitoring systems operational
- Alert channels configured (Slack, email, PagerDuty)
- Backup RPC endpoints ready
- Database/storage systems ready
Team Readiness
- On-call engineer assigned
- Runbook reviewed by team
- Communication channels established
- Rollback plan understood
- Emergency contacts documented
Environment Setup
System Requirements
Minimum:
- CPU: 4 cores
- RAM: 8 GB
- Disk: 50 GB SSD
- Network: 100 Mbps, low latency
Recommended (Production):
- CPU: 8 cores
- RAM: 16 GB
- Disk: 100 GB NVMe SSD
- Network: 1 Gbps, < 20ms latency to Arbitrum RPC
Dependencies
# Install Go 1.24+
wget https://go.dev/dl/go1.24.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.24.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
# Verify installation
go version # Should show go1.24 or later
# Install build tools
sudo apt-get update
sudo apt-get install -y build-essential git curl
Repository Setup
# Clone repository
git clone https://github.com/your-org/mev-beta.git
cd mev-beta
# Checkout production branch
git checkout feature/production-profit-optimization
# Verify correct branch
git log -1 --oneline
# Install dependencies
go mod download
go mod verify
Configuration
1. Environment Variables
Create /etc/systemd/system/mev-bot.env:
# RPC Configuration
ARBITRUM_RPC_ENDPOINT=https://arbitrum-mainnet.core.chainstack.com/YOUR_KEY
ARBITRUM_WS_ENDPOINT=wss://arbitrum-mainnet.core.chainstack.com/YOUR_KEY
# Backup RPC (fallback)
BACKUP_RPC_ENDPOINT=https://arb1.arbitrum.io/rpc
# Application Configuration
LOG_LEVEL=info
LOG_FORMAT=json
LOG_OUTPUT=/var/log/mev-bot/mev_bot.log
# Metrics & Monitoring
METRICS_ENABLED=true
METRICS_PORT=9090
# Security
MEV_BOT_ENCRYPTION_KEY=your-32-char-encryption-key-here-minimum-length-required
# Execution Configuration (IMPORTANT: Set to false for detection-only mode)
EXECUTION_ENABLED=false
MAX_POSITION_SIZE=1000000000000000000 # 1 ETH in wei
MIN_PROFIT_THRESHOLD=50000000000000000 # 0.05 ETH in wei
# Provider Configuration
PROVIDER_CONFIG_PATH=/opt/mev-bot/config/providers_runtime.yaml
CRITICAL: Never commit .env files with real credentials to version control!
2. Provider Configuration
Edit config/providers_runtime.yaml:
providers:
- name: "chainstack-primary"
endpoint: "${ARBITRUM_RPC_ENDPOINT}"
type: "https"
weight: 100
timeout: 30s
rateLimit: 100
- name: "chainstack-websocket"
endpoint: "${ARBITRUM_WS_ENDPOINT}"
type: "wss"
weight: 90
timeout: 30s
rateLimit: 100
- name: "public-fallback"
endpoint: "https://arb1.arbitrum.io/rpc"
type: "https"
weight: 50
timeout: 30s
rateLimit: 50
pooling:
maxIdleConnections: 10
maxOpenConnections: 50
connectionTimeout: 30s
idleTimeout: 300s
retry:
maxRetries: 3
retryDelay: 1s
backoffMultiplier: 2
maxBackoff: 8s
3. Systemd Service Configuration
Create /etc/systemd/system/mev-bot.service:
[Unit]
Description=MEV Arbitrage Bot
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=mev-bot
Group=mev-bot
WorkingDirectory=/opt/mev-bot
EnvironmentFile=/etc/systemd/system/mev-bot.env
ExecStart=/opt/mev-bot/bin/mev-bot start
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=10s
# Resource limits
LimitNOFILE=65536
MemoryMax=4G
CPUQuota=400%
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/log/mev-bot /opt/mev-bot/data
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=mev-bot
[Install]
WantedBy=multi-user.target
Deployment Steps
Phase 1: Build & Prepare (10-15 minutes)
# 1. Build binary
cd /opt/mev-bot
make build
# Verify binary
./bin/mev-bot --version
# Expected: MEV Bot v1.0.0 (or similar)
# 2. Run tests
make test
# Ensure all tests pass
# 3. Check binary size and dependencies
ls -lh bin/mev-bot
ldd bin/mev-bot # Should show minimal dependencies
# 4. Create necessary directories
sudo mkdir -p /var/log/mev-bot
sudo mkdir -p /opt/mev-bot/data
sudo chown -R mev-bot:mev-bot /var/log/mev-bot /opt/mev-bot/data
# 5. Set permissions
chmod +x bin/mev-bot
chmod 600 /etc/systemd/system/mev-bot.env # Protect sensitive config
Phase 2: Dry Run (5-10 minutes)
# Run bot in foreground to verify configuration
sudo -u mev-bot /opt/mev-bot/bin/mev-bot start &
BOT_PID=$!
# Wait 2 minutes for initialization
sleep 120
# Check if running
ps aux | grep mev-bot
# Check logs for errors
tail -100 /var/log/mev-bot/mev_bot.log | grep -i error
# Verify RPC connection
tail -100 /var/log/mev-bot/mev_bot.log | grep -i "connected"
# Stop dry run
kill $BOT_PID
Phase 3: Production Start (5 minutes)
# 1. Reload systemd
sudo systemctl daemon-reload
# 2. Enable service (start on boot)
sudo systemctl enable mev-bot
# 3. Start service
sudo systemctl start mev-bot
# 4. Verify status
sudo systemctl status mev-bot
# Expected: active (running)
# 5. Check logs
sudo journalctl -u mev-bot -f --lines=50
# 6. Wait for initialization (30-60 seconds)
sleep 60
# 7. Verify healthy operation
curl -s http://localhost:9090/health/live | jq .
# Expected: {"status": "healthy"}
Phase 4: Validation (15-30 minutes)
# 1. Monitor for opportunities
tail -f /var/log/mev-bot/mev_bot.log | grep "ARBITRAGE OPPORTUNITY"
# 2. Check metrics endpoint
curl -s http://localhost:9090/metrics | grep mev_
# 3. Verify cache performance
tail -100 /var/log/mev-bot/mev_bot.log | grep "cache metrics"
# Look for hit rate 75-85%
# 4. Check for errors
sudo journalctl -u mev-bot --since "10 minutes ago" | grep ERROR
# Should have minimal errors
# 5. Monitor resource usage
htop # Check CPU and memory
# CPU should be 50-80%, Memory < 2GB
# 6. Test failover (optional)
# Temporarily block primary RPC, verify fallback works
Post-Deployment Validation
Health Checks
# Liveness probe (should return 200)
curl -f http://localhost:9090/health/live || echo "LIVENESS FAILED"
# Readiness probe (should return 200)
curl -f http://localhost:9090/health/ready || echo "READINESS FAILED"
# Startup probe (should return 200 after initialization)
curl -f http://localhost:9090/health/startup || echo "STARTUP FAILED"
Performance Metrics
# Check Prometheus metrics
curl -s http://localhost:9090/metrics | grep -E "mev_(opportunities|executions|profit)"
# Expected metrics:
# - mev_opportunities_detected{} <number>
# - mev_opportunities_profitable{} <number>
# - mev_cache_hit_rate{} 0.75-0.85
# - mev_rpc_calls_total{} <number>
Log Analysis
# Analyze last hour of logs
./scripts/log-manager.sh analyze
# Check health score (target: > 90)
./scripts/log-manager.sh health
# Expected output:
# Health Score: 95.5/100 (Excellent)
# Error Rate: < 5%
# Cache Hit Rate: 75-85%
Monitoring & Alerting
Key Metrics to Monitor
| Metric | Threshold | Action |
|---|---|---|
| CPU Usage | > 90% | Scale up or investigate |
| Memory Usage | > 85% | Potential memory leak |
| Error Rate | > 10% | Check logs, may need rollback |
| RPC Failures | > 5/min | Check RPC provider |
| Opportunities/hour | < 1 | May indicate detection issue |
| Cache Hit Rate | < 70% | Review cache configuration |
Alert Configuration
Slack Webhook (edit in config/alerts.yaml):
alerts:
slack:
enabled: true
webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
channel: "#mev-bot-alerts"
thresholds:
error_rate: 0.10 # 10%
cpu_usage: 0.90 # 90%
memory_usage: 0.85 # 85%
min_opportunities_per_hour: 1
Monitoring Commands
# Real-time monitoring
watch -n 5 'systemctl status mev-bot && curl -s http://localhost:9090/metrics | grep mev_'
# Start monitoring daemon (background)
./scripts/log-manager.sh start-daemon
# View operations dashboard
./scripts/log-manager.sh dashboard
# Opens HTML dashboard in browser
Rollback Procedures
Quick Rollback (< 5 minutes)
# 1. Stop current version
sudo systemctl stop mev-bot
# 2. Restore previous binary
sudo cp /opt/mev-bot/bin/mev-bot.backup /opt/mev-bot/bin/mev-bot
# 3. Restart service
sudo systemctl start mev-bot
# 4. Verify rollback
sudo systemctl status mev-bot
tail -100 /var/log/mev-bot/mev_bot.log
Full Rollback (< 15 minutes)
# 1. Stop service
sudo systemctl stop mev-bot
# 2. Checkout previous version
cd /opt/mev-bot
git fetch
git checkout <previous-commit-hash>
# 3. Rebuild
make build
# 4. Restart service
sudo systemctl start mev-bot
# 5. Validate
curl http://localhost:9090/health/live
Troubleshooting
Common Issues
Issue: Bot fails to start
Symptoms:
systemctl status mev-bot
● mev-bot.service - MEV Arbitrage Bot
Loaded: loaded
Active: failed (Result: exit-code)
Diagnosis:
# Check logs
sudo journalctl -u mev-bot -n 100 --no-pager
# Common causes:
# 1. Missing environment variables
# 2. Invalid RPC endpoint
# 3. Permission issues
Solution:
# Verify environment file
cat /etc/systemd/system/mev-bot.env
# Test RPC connection manually
curl -X POST -H "Content-Type: application/json" \
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
$ARBITRUM_RPC_ENDPOINT
# Fix permissions
sudo chown -R mev-bot:mev-bot /opt/mev-bot
Issue: High error rate
Symptoms:
[ERROR] Failed to fetch pool state
[ERROR] RPC call failed
[ERROR] 429 Too Many Requests
Diagnosis:
# Check error rate
./scripts/log-manager.sh analyze | grep "Error Rate"
# Check RPC provider status
curl -s $ARBITRUM_RPC_ENDPOINT
Solution:
# 1. Enable backup RPC endpoint in config
# 2. Reduce rate limits
# 3. Contact RPC provider
# 4. Switch to different provider
Issue: No opportunities detected
Symptoms:
Blocks processed: 10000
Opportunities detected: 0
Diagnosis:
# Check if events are being detected
tail -100 /var/log/mev-bot/mev_bot.log | grep "processing.*event"
# Check profit thresholds
grep MIN_PROFIT_THRESHOLD /etc/systemd/system/mev-bot.env
Solution:
# 1. Lower MIN_PROFIT_THRESHOLD (carefully!)
# 2. Check market conditions (volatility)
# 3. Verify DEX integrations working
# 4. Review price impact thresholds
Issue: Memory leak
Symptoms:
Memory usage increasing over time
OOM killer may terminate process
Diagnosis:
# Monitor memory over time
watch -n 10 'ps aux | grep mev-bot | grep -v grep'
# Generate heap profile
curl http://localhost:9090/debug/pprof/heap > heap.prof
go tool pprof heap.prof
Solution:
# 1. Restart service (temporary fix)
sudo systemctl restart mev-bot
# 2. Investigate with profiler
# 3. Check for goroutine leaks
curl http://localhost:9090/debug/pprof/goroutine?debug=1
# 4. May need code fix and redeploy
Emergency Contacts
| Role | Name | Contact | Availability |
|---|---|---|---|
| On-Call Engineer | TBD | +1-XXX-XXX-XXXX | 24/7 |
| DevOps Lead | TBD | Slack: @devops | Business hours |
| Product Owner | TBD | Email: product@company.com | Business hours |
Change Log
| Date | Version | Changes | Author |
|---|---|---|---|
| 2025-10-28 | 1.0 | Initial runbook | Claude Code |
END OF RUNBOOK
Remember:
- Always test in staging first
- Have rollback plan ready
- Monitor closely after deployment
- Document any issues encountered
- Keep this runbook updated