This commit adds critical production-ready optimizations and infrastructure: New Features: 1. Pool Version Detector - Detects pool versions before calling slot0() - Eliminates ABI unpacking errors from V2 pools - Caches detection results for performance 2. Price Impact Validation System - Comprehensive risk categorization - Three threshold profiles (Conservative, Default, Aggressive) - Automatic trade splitting recommendations - All tests passing (10/10) 3. Flash Loan Execution Architecture - Complete execution flow design - Multi-provider support (Aave, Balancer, Uniswap) - Safety and risk management systems - Transaction signing and dispatch strategies 4. 24-Hour Validation Test Infrastructure - Production testing framework - Comprehensive monitoring with real-time metrics - Automatic report generation - System health tracking 5. Production Deployment Runbook - Complete deployment procedures - Pre-deployment checklist - Configuration templates - Monitoring and rollback procedures Files Added: - pkg/uniswap/pool_detector.go (273 lines) - pkg/validation/price_impact_validator.go (265 lines) - pkg/validation/price_impact_validator_test.go (242 lines) - docs/architecture/flash_loan_execution_architecture.md (808 lines) - docs/PRODUCTION_DEPLOYMENT_RUNBOOK.md (615 lines) - scripts/24h-validation-test.sh (352 lines) Testing: Core functionality tests passing. Stress test showing 867 TPS (below 1000 TPS target - to be investigated) Impact: Ready for 24-hour validation test and production deployment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
616 lines
12 KiB
Markdown
616 lines
12 KiB
Markdown
# MEV Bot - Production Deployment Runbook
|
|
**Version:** 1.0
|
|
**Last Updated:** October 28, 2025
|
|
**Audience:** DevOps, Production Engineers
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Pre-Deployment Checklist](#pre-deployment-checklist)
|
|
2. [Environment Setup](#environment-setup)
|
|
3. [Configuration](#configuration)
|
|
4. [Deployment Steps](#deployment-steps)
|
|
5. [Post-Deployment Validation](#post-deployment-validation)
|
|
6. [Monitoring & Alerting](#monitoring--alerting)
|
|
7. [Rollback Procedures](#rollback-procedures)
|
|
8. [Troubleshooting](#troubleshooting)
|
|
|
|
---
|
|
|
|
## Pre-Deployment Checklist
|
|
|
|
### Code Readiness
|
|
- [ ] All tests passing (`make test`)
|
|
- [ ] Security audit completed and issues addressed
|
|
- [ ] Code review approved
|
|
- [ ] 24-hour validation test completed successfully
|
|
- [ ] Performance benchmarks meet targets
|
|
- [ ] No critical TODOs in codebase
|
|
|
|
### Infrastructure Readiness
|
|
- [ ] RPC endpoints configured and tested
|
|
- [ ] Private key/wallet funded with gas (minimum 0.1 ETH)
|
|
- [ ] Monitoring systems operational
|
|
- [ ] Alert channels configured (Slack, email, PagerDuty)
|
|
- [ ] Backup RPC endpoints ready
|
|
- [ ] Database/storage systems ready
|
|
|
|
### Team Readiness
|
|
- [ ] On-call engineer assigned
|
|
- [ ] Runbook reviewed by team
|
|
- [ ] Communication channels established
|
|
- [ ] Rollback plan understood
|
|
- [ ] Emergency contacts documented
|
|
|
|
---
|
|
|
|
## Environment Setup
|
|
|
|
### System Requirements
|
|
|
|
**Minimum:**
|
|
- CPU: 4 cores
|
|
- RAM: 8 GB
|
|
- Disk: 50 GB SSD
|
|
- Network: 100 Mbps, low latency
|
|
|
|
**Recommended (Production):**
|
|
- CPU: 8 cores
|
|
- RAM: 16 GB
|
|
- Disk: 100 GB NVMe SSD
|
|
- Network: 1 Gbps, < 20ms latency to Arbitrum RPC
|
|
|
|
### Dependencies
|
|
|
|
```bash
|
|
# Install Go 1.24+
|
|
wget https://go.dev/dl/go1.24.linux-amd64.tar.gz
|
|
sudo tar -C /usr/local -xzf go1.24.linux-amd64.tar.gz
|
|
export PATH=$PATH:/usr/local/go/bin
|
|
|
|
# Verify installation
|
|
go version # Should show go1.24 or later
|
|
|
|
# Install build tools
|
|
sudo apt-get update
|
|
sudo apt-get install -y build-essential git curl
|
|
```
|
|
|
|
### Repository Setup
|
|
|
|
```bash
|
|
# Clone repository
|
|
git clone https://github.com/your-org/mev-beta.git
|
|
cd mev-beta
|
|
|
|
# Checkout production branch
|
|
git checkout feature/production-profit-optimization
|
|
|
|
# Verify correct branch
|
|
git log -1 --oneline
|
|
|
|
# Install dependencies
|
|
go mod download
|
|
go mod verify
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### 1. Environment Variables
|
|
|
|
Create `/etc/systemd/system/mev-bot.env`:
|
|
|
|
```bash
|
|
# RPC Configuration
|
|
ARBITRUM_RPC_ENDPOINT=https://arbitrum-mainnet.core.chainstack.com/YOUR_KEY
|
|
ARBITRUM_WS_ENDPOINT=wss://arbitrum-mainnet.core.chainstack.com/YOUR_KEY
|
|
|
|
# Backup RPC (fallback)
|
|
BACKUP_RPC_ENDPOINT=https://arb1.arbitrum.io/rpc
|
|
|
|
# Application Configuration
|
|
LOG_LEVEL=info
|
|
LOG_FORMAT=json
|
|
LOG_OUTPUT=/var/log/mev-bot/mev_bot.log
|
|
|
|
# Metrics & Monitoring
|
|
METRICS_ENABLED=true
|
|
METRICS_PORT=9090
|
|
|
|
# Security
|
|
MEV_BOT_ENCRYPTION_KEY=your-32-char-encryption-key-here-minimum-length-required
|
|
|
|
# Execution Configuration (IMPORTANT: Set to false for detection-only mode)
|
|
EXECUTION_ENABLED=false
|
|
MAX_POSITION_SIZE=1000000000000000000 # 1 ETH in wei
|
|
MIN_PROFIT_THRESHOLD=50000000000000000 # 0.05 ETH in wei
|
|
|
|
# Provider Configuration
|
|
PROVIDER_CONFIG_PATH=/opt/mev-bot/config/providers_runtime.yaml
|
|
```
|
|
|
|
**CRITICAL:** Never commit `.env` files with real credentials to version control!
|
|
|
|
### 2. Provider Configuration
|
|
|
|
Edit `config/providers_runtime.yaml`:
|
|
|
|
```yaml
|
|
providers:
|
|
- name: "chainstack-primary"
|
|
endpoint: "${ARBITRUM_RPC_ENDPOINT}"
|
|
type: "https"
|
|
weight: 100
|
|
timeout: 30s
|
|
rateLimit: 100
|
|
|
|
- name: "chainstack-websocket"
|
|
endpoint: "${ARBITRUM_WS_ENDPOINT}"
|
|
type: "wss"
|
|
weight: 90
|
|
timeout: 30s
|
|
rateLimit: 100
|
|
|
|
- name: "public-fallback"
|
|
endpoint: "https://arb1.arbitrum.io/rpc"
|
|
type: "https"
|
|
weight: 50
|
|
timeout: 30s
|
|
rateLimit: 50
|
|
|
|
pooling:
|
|
maxIdleConnections: 10
|
|
maxOpenConnections: 50
|
|
connectionTimeout: 30s
|
|
idleTimeout: 300s
|
|
|
|
retry:
|
|
maxRetries: 3
|
|
retryDelay: 1s
|
|
backoffMultiplier: 2
|
|
maxBackoff: 8s
|
|
```
|
|
|
|
### 3. Systemd Service Configuration
|
|
|
|
Create `/etc/systemd/system/mev-bot.service`:
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=MEV Arbitrage Bot
|
|
After=network.target
|
|
Wants=network-online.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=mev-bot
|
|
Group=mev-bot
|
|
WorkingDirectory=/opt/mev-bot
|
|
EnvironmentFile=/etc/systemd/system/mev-bot.env
|
|
|
|
ExecStart=/opt/mev-bot/bin/mev-bot start
|
|
ExecReload=/bin/kill -HUP $MAINPID
|
|
KillMode=process
|
|
Restart=on-failure
|
|
RestartSec=10s
|
|
|
|
# Resource limits
|
|
LimitNOFILE=65536
|
|
MemoryMax=4G
|
|
CPUQuota=400%
|
|
|
|
# Security hardening
|
|
NoNewPrivileges=true
|
|
PrivateTmp=true
|
|
ProtectSystem=strict
|
|
ProtectHome=true
|
|
ReadWritePaths=/var/log/mev-bot /opt/mev-bot/data
|
|
|
|
# Logging
|
|
StandardOutput=journal
|
|
StandardError=journal
|
|
SyslogIdentifier=mev-bot
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
---
|
|
|
|
## Deployment Steps
|
|
|
|
### Phase 1: Build & Prepare (10-15 minutes)
|
|
|
|
```bash
|
|
# 1. Build binary
|
|
cd /opt/mev-bot
|
|
make build
|
|
|
|
# Verify binary
|
|
./bin/mev-bot --version
|
|
# Expected: MEV Bot v1.0.0 (or similar)
|
|
|
|
# 2. Run tests
|
|
make test
|
|
# Ensure all tests pass
|
|
|
|
# 3. Check binary size and dependencies
|
|
ls -lh bin/mev-bot
|
|
ldd bin/mev-bot # Should show minimal dependencies
|
|
|
|
# 4. Create necessary directories
|
|
sudo mkdir -p /var/log/mev-bot
|
|
sudo mkdir -p /opt/mev-bot/data
|
|
sudo chown -R mev-bot:mev-bot /var/log/mev-bot /opt/mev-bot/data
|
|
|
|
# 5. Set permissions
|
|
chmod +x bin/mev-bot
|
|
chmod 600 /etc/systemd/system/mev-bot.env # Protect sensitive config
|
|
```
|
|
|
|
### Phase 2: Dry Run (5-10 minutes)
|
|
|
|
```bash
|
|
# Run bot in foreground to verify configuration
|
|
sudo -u mev-bot /opt/mev-bot/bin/mev-bot start &
|
|
BOT_PID=$!
|
|
|
|
# Wait 2 minutes for initialization
|
|
sleep 120
|
|
|
|
# Check if running
|
|
ps aux | grep mev-bot
|
|
|
|
# Check logs for errors
|
|
tail -100 /var/log/mev-bot/mev_bot.log | grep -i error
|
|
|
|
# Verify RPC connection
|
|
tail -100 /var/log/mev-bot/mev_bot.log | grep -i "connected"
|
|
|
|
# Stop dry run
|
|
kill $BOT_PID
|
|
```
|
|
|
|
### Phase 3: Production Start (5 minutes)
|
|
|
|
```bash
|
|
# 1. Reload systemd
|
|
sudo systemctl daemon-reload
|
|
|
|
# 2. Enable service (start on boot)
|
|
sudo systemctl enable mev-bot
|
|
|
|
# 3. Start service
|
|
sudo systemctl start mev-bot
|
|
|
|
# 4. Verify status
|
|
sudo systemctl status mev-bot
|
|
# Expected: active (running)
|
|
|
|
# 5. Check logs
|
|
sudo journalctl -u mev-bot -f --lines=50
|
|
|
|
# 6. Wait for initialization (30-60 seconds)
|
|
sleep 60
|
|
|
|
# 7. Verify healthy operation
|
|
curl -s http://localhost:9090/health/live | jq .
|
|
# Expected: {"status": "healthy"}
|
|
```
|
|
|
|
### Phase 4: Validation (15-30 minutes)
|
|
|
|
```bash
|
|
# 1. Monitor for opportunities
|
|
tail -f /var/log/mev-bot/mev_bot.log | grep "ARBITRAGE OPPORTUNITY"
|
|
|
|
# 2. Check metrics endpoint
|
|
curl -s http://localhost:9090/metrics | grep mev_
|
|
|
|
# 3. Verify cache performance
|
|
tail -100 /var/log/mev-bot/mev_bot.log | grep "cache metrics"
|
|
# Look for hit rate 75-85%
|
|
|
|
# 4. Check for errors
|
|
sudo journalctl -u mev-bot --since "10 minutes ago" | grep ERROR
|
|
# Should have minimal errors
|
|
|
|
# 5. Monitor resource usage
|
|
htop # Check CPU and memory
|
|
# CPU should be 50-80%, Memory < 2GB
|
|
|
|
# 6. Test failover (optional)
|
|
# Temporarily block primary RPC, verify fallback works
|
|
```
|
|
|
|
---
|
|
|
|
## Post-Deployment Validation
|
|
|
|
### Health Checks
|
|
|
|
```bash
|
|
# Liveness probe (should return 200)
|
|
curl -f http://localhost:9090/health/live || echo "LIVENESS FAILED"
|
|
|
|
# Readiness probe (should return 200)
|
|
curl -f http://localhost:9090/health/ready || echo "READINESS FAILED"
|
|
|
|
# Startup probe (should return 200 after initialization)
|
|
curl -f http://localhost:9090/health/startup || echo "STARTUP FAILED"
|
|
```
|
|
|
|
### Performance Metrics
|
|
|
|
```bash
|
|
# Check Prometheus metrics
|
|
curl -s http://localhost:9090/metrics | grep -E "mev_(opportunities|executions|profit)"
|
|
|
|
# Expected metrics:
|
|
# - mev_opportunities_detected{} <number>
|
|
# - mev_opportunities_profitable{} <number>
|
|
# - mev_cache_hit_rate{} 0.75-0.85
|
|
# - mev_rpc_calls_total{} <number>
|
|
```
|
|
|
|
### Log Analysis
|
|
|
|
```bash
|
|
# Analyze last hour of logs
|
|
./scripts/log-manager.sh analyze
|
|
|
|
# Check health score (target: > 90)
|
|
./scripts/log-manager.sh health
|
|
|
|
# Expected output:
|
|
# Health Score: 95.5/100 (Excellent)
|
|
# Error Rate: < 5%
|
|
# Cache Hit Rate: 75-85%
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring & Alerting
|
|
|
|
### Key Metrics to Monitor
|
|
|
|
| Metric | Threshold | Action |
|
|
|--------|-----------|--------|
|
|
| CPU Usage | > 90% | Scale up or investigate |
|
|
| Memory Usage | > 85% | Potential memory leak |
|
|
| Error Rate | > 10% | Check logs, may need rollback |
|
|
| RPC Failures | > 5/min | Check RPC provider |
|
|
| Opportunities/hour | < 1 | May indicate detection issue |
|
|
| Cache Hit Rate | < 70% | Review cache configuration |
|
|
|
|
### Alert Configuration
|
|
|
|
**Slack Webhook** (edit in `config/alerts.yaml`):
|
|
```yaml
|
|
alerts:
|
|
slack:
|
|
enabled: true
|
|
webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
|
|
channel: "#mev-bot-alerts"
|
|
|
|
thresholds:
|
|
error_rate: 0.10 # 10%
|
|
cpu_usage: 0.90 # 90%
|
|
memory_usage: 0.85 # 85%
|
|
min_opportunities_per_hour: 1
|
|
```
|
|
|
|
### Monitoring Commands
|
|
|
|
```bash
|
|
# Real-time monitoring
|
|
watch -n 5 'systemctl status mev-bot && curl -s http://localhost:9090/metrics | grep mev_'
|
|
|
|
# Start monitoring daemon (background)
|
|
./scripts/log-manager.sh start-daemon
|
|
|
|
# View operations dashboard
|
|
./scripts/log-manager.sh dashboard
|
|
# Opens HTML dashboard in browser
|
|
```
|
|
|
|
---
|
|
|
|
## Rollback Procedures
|
|
|
|
### Quick Rollback (< 5 minutes)
|
|
|
|
```bash
|
|
# 1. Stop current version
|
|
sudo systemctl stop mev-bot
|
|
|
|
# 2. Restore previous binary
|
|
sudo cp /opt/mev-bot/bin/mev-bot.backup /opt/mev-bot/bin/mev-bot
|
|
|
|
# 3. Restart service
|
|
sudo systemctl start mev-bot
|
|
|
|
# 4. Verify rollback
|
|
sudo systemctl status mev-bot
|
|
tail -100 /var/log/mev-bot/mev_bot.log
|
|
```
|
|
|
|
### Full Rollback (< 15 minutes)
|
|
|
|
```bash
|
|
# 1. Stop service
|
|
sudo systemctl stop mev-bot
|
|
|
|
# 2. Checkout previous version
|
|
cd /opt/mev-bot
|
|
git fetch
|
|
git checkout <previous-commit-hash>
|
|
|
|
# 3. Rebuild
|
|
make build
|
|
|
|
# 4. Restart service
|
|
sudo systemctl start mev-bot
|
|
|
|
# 5. Validate
|
|
curl http://localhost:9090/health/live
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Issue: Bot fails to start
|
|
|
|
**Symptoms:**
|
|
```
|
|
systemctl status mev-bot
|
|
● mev-bot.service - MEV Arbitrage Bot
|
|
Loaded: loaded
|
|
Active: failed (Result: exit-code)
|
|
```
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check logs
|
|
sudo journalctl -u mev-bot -n 100 --no-pager
|
|
|
|
# Common causes:
|
|
# 1. Missing environment variables
|
|
# 2. Invalid RPC endpoint
|
|
# 3. Permission issues
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# Verify environment file
|
|
cat /etc/systemd/system/mev-bot.env
|
|
|
|
# Test RPC connection manually
|
|
curl -X POST -H "Content-Type: application/json" \
|
|
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
|
$ARBITRUM_RPC_ENDPOINT
|
|
|
|
# Fix permissions
|
|
sudo chown -R mev-bot:mev-bot /opt/mev-bot
|
|
```
|
|
|
|
---
|
|
|
|
#### Issue: High error rate
|
|
|
|
**Symptoms:**
|
|
```
|
|
[ERROR] Failed to fetch pool state
|
|
[ERROR] RPC call failed
|
|
[ERROR] 429 Too Many Requests
|
|
```
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check error rate
|
|
./scripts/log-manager.sh analyze | grep "Error Rate"
|
|
|
|
# Check RPC provider status
|
|
curl -s $ARBITRUM_RPC_ENDPOINT
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# 1. Enable backup RPC endpoint in config
|
|
# 2. Reduce rate limits
|
|
# 3. Contact RPC provider
|
|
# 4. Switch to different provider
|
|
```
|
|
|
|
---
|
|
|
|
#### Issue: No opportunities detected
|
|
|
|
**Symptoms:**
|
|
```
|
|
Blocks processed: 10000
|
|
Opportunities detected: 0
|
|
```
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check if events are being detected
|
|
tail -100 /var/log/mev-bot/mev_bot.log | grep "processing.*event"
|
|
|
|
# Check profit thresholds
|
|
grep MIN_PROFIT_THRESHOLD /etc/systemd/system/mev-bot.env
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# 1. Lower MIN_PROFIT_THRESHOLD (carefully!)
|
|
# 2. Check market conditions (volatility)
|
|
# 3. Verify DEX integrations working
|
|
# 4. Review price impact thresholds
|
|
```
|
|
|
|
---
|
|
|
|
#### Issue: Memory leak
|
|
|
|
**Symptoms:**
|
|
```
|
|
Memory usage increasing over time
|
|
OOM killer may terminate process
|
|
```
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Monitor memory over time
|
|
watch -n 10 'ps aux | grep mev-bot | grep -v grep'
|
|
|
|
# Generate heap profile
|
|
curl http://localhost:9090/debug/pprof/heap > heap.prof
|
|
go tool pprof heap.prof
|
|
```
|
|
|
|
**Solution:**
|
|
```bash
|
|
# 1. Restart service (temporary fix)
|
|
sudo systemctl restart mev-bot
|
|
|
|
# 2. Investigate with profiler
|
|
# 3. Check for goroutine leaks
|
|
curl http://localhost:9090/debug/pprof/goroutine?debug=1
|
|
|
|
# 4. May need code fix and redeploy
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Contacts
|
|
|
|
| Role | Name | Contact | Availability |
|
|
|------|------|---------|--------------|
|
|
| On-Call Engineer | TBD | +1-XXX-XXX-XXXX | 24/7 |
|
|
| DevOps Lead | TBD | Slack: @devops | Business hours |
|
|
| Product Owner | TBD | Email: product@company.com | Business hours |
|
|
|
|
## Change Log
|
|
|
|
| Date | Version | Changes | Author |
|
|
|------|---------|---------|--------|
|
|
| 2025-10-28 | 1.0 | Initial runbook | Claude Code |
|
|
|
|
---
|
|
|
|
**END OF RUNBOOK**
|
|
|
|
**Remember:**
|
|
1. Always test in staging first
|
|
2. Have rollback plan ready
|
|
3. Monitor closely after deployment
|
|
4. Document any issues encountered
|
|
5. Keep this runbook updated
|