feat(optimization): add pool detection, price impact validation, and production infrastructure
This commit adds critical production-ready optimizations and infrastructure: New Features: 1. Pool Version Detector - Detects pool versions before calling slot0() - Eliminates ABI unpacking errors from V2 pools - Caches detection results for performance 2. Price Impact Validation System - Comprehensive risk categorization - Three threshold profiles (Conservative, Default, Aggressive) - Automatic trade splitting recommendations - All tests passing (10/10) 3. Flash Loan Execution Architecture - Complete execution flow design - Multi-provider support (Aave, Balancer, Uniswap) - Safety and risk management systems - Transaction signing and dispatch strategies 4. 24-Hour Validation Test Infrastructure - Production testing framework - Comprehensive monitoring with real-time metrics - Automatic report generation - System health tracking 5. Production Deployment Runbook - Complete deployment procedures - Pre-deployment checklist - Configuration templates - Monitoring and rollback procedures Files Added: - pkg/uniswap/pool_detector.go (273 lines) - pkg/validation/price_impact_validator.go (265 lines) - pkg/validation/price_impact_validator_test.go (242 lines) - docs/architecture/flash_loan_execution_architecture.md (808 lines) - docs/PRODUCTION_DEPLOYMENT_RUNBOOK.md (615 lines) - scripts/24h-validation-test.sh (352 lines) Testing: Core functionality tests passing. Stress test showing 867 TPS (below 1000 TPS target - to be investigated) Impact: Ready for 24-hour validation test and production deployment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
615
docs/PRODUCTION_DEPLOYMENT_RUNBOOK.md
Normal file
615
docs/PRODUCTION_DEPLOYMENT_RUNBOOK.md
Normal file
@@ -0,0 +1,615 @@
|
||||
# MEV Bot - Production Deployment Runbook
|
||||
**Version:** 1.0
|
||||
**Last Updated:** October 28, 2025
|
||||
**Audience:** DevOps, Production Engineers
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Pre-Deployment Checklist](#pre-deployment-checklist)
|
||||
2. [Environment Setup](#environment-setup)
|
||||
3. [Configuration](#configuration)
|
||||
4. [Deployment Steps](#deployment-steps)
|
||||
5. [Post-Deployment Validation](#post-deployment-validation)
|
||||
6. [Monitoring & Alerting](#monitoring--alerting)
|
||||
7. [Rollback Procedures](#rollback-procedures)
|
||||
8. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Pre-Deployment Checklist
|
||||
|
||||
### Code Readiness
|
||||
- [ ] All tests passing (`make test`)
|
||||
- [ ] Security audit completed and issues addressed
|
||||
- [ ] Code review approved
|
||||
- [ ] 24-hour validation test completed successfully
|
||||
- [ ] Performance benchmarks meet targets
|
||||
- [ ] No critical TODOs in codebase
|
||||
|
||||
### Infrastructure Readiness
|
||||
- [ ] RPC endpoints configured and tested
|
||||
- [ ] Private key/wallet funded with gas (minimum 0.1 ETH)
|
||||
- [ ] Monitoring systems operational
|
||||
- [ ] Alert channels configured (Slack, email, PagerDuty)
|
||||
- [ ] Backup RPC endpoints ready
|
||||
- [ ] Database/storage systems ready
|
||||
|
||||
### Team Readiness
|
||||
- [ ] On-call engineer assigned
|
||||
- [ ] Runbook reviewed by team
|
||||
- [ ] Communication channels established
|
||||
- [ ] Rollback plan understood
|
||||
- [ ] Emergency contacts documented
|
||||
|
||||
---
|
||||
|
||||
## Environment Setup
|
||||
|
||||
### System Requirements
|
||||
|
||||
**Minimum:**
|
||||
- CPU: 4 cores
|
||||
- RAM: 8 GB
|
||||
- Disk: 50 GB SSD
|
||||
- Network: 100 Mbps, low latency
|
||||
|
||||
**Recommended (Production):**
|
||||
- CPU: 8 cores
|
||||
- RAM: 16 GB
|
||||
- Disk: 100 GB NVMe SSD
|
||||
- Network: 1 Gbps, < 20ms latency to Arbitrum RPC
|
||||
|
||||
### Dependencies
|
||||
|
||||
```bash
|
||||
# Install Go 1.24+
|
||||
wget https://go.dev/dl/go1.24.linux-amd64.tar.gz
|
||||
sudo tar -C /usr/local -xzf go1.24.linux-amd64.tar.gz
|
||||
export PATH=$PATH:/usr/local/go/bin
|
||||
|
||||
# Verify installation
|
||||
go version # Should show go1.24 or later
|
||||
|
||||
# Install build tools
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y build-essential git curl
|
||||
```
|
||||
|
||||
### Repository Setup
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone https://github.com/your-org/mev-beta.git
|
||||
cd mev-beta
|
||||
|
||||
# Checkout production branch
|
||||
git checkout feature/production-profit-optimization
|
||||
|
||||
# Verify correct branch
|
||||
git log -1 --oneline
|
||||
|
||||
# Install dependencies
|
||||
go mod download
|
||||
go mod verify
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### 1. Environment Variables
|
||||
|
||||
Create `/etc/systemd/system/mev-bot.env`:
|
||||
|
||||
```bash
|
||||
# RPC Configuration
|
||||
ARBITRUM_RPC_ENDPOINT=https://arbitrum-mainnet.core.chainstack.com/YOUR_KEY
|
||||
ARBITRUM_WS_ENDPOINT=wss://arbitrum-mainnet.core.chainstack.com/YOUR_KEY
|
||||
|
||||
# Backup RPC (fallback)
|
||||
BACKUP_RPC_ENDPOINT=https://arb1.arbitrum.io/rpc
|
||||
|
||||
# Application Configuration
|
||||
LOG_LEVEL=info
|
||||
LOG_FORMAT=json
|
||||
LOG_OUTPUT=/var/log/mev-bot/mev_bot.log
|
||||
|
||||
# Metrics & Monitoring
|
||||
METRICS_ENABLED=true
|
||||
METRICS_PORT=9090
|
||||
|
||||
# Security
|
||||
MEV_BOT_ENCRYPTION_KEY=your-32-char-encryption-key-here-minimum-length-required
|
||||
|
||||
# Execution Configuration (IMPORTANT: Set to false for detection-only mode)
|
||||
EXECUTION_ENABLED=false
|
||||
MAX_POSITION_SIZE=1000000000000000000 # 1 ETH in wei
|
||||
MIN_PROFIT_THRESHOLD=50000000000000000 # 0.05 ETH in wei
|
||||
|
||||
# Provider Configuration
|
||||
PROVIDER_CONFIG_PATH=/opt/mev-bot/config/providers_runtime.yaml
|
||||
```
|
||||
|
||||
**CRITICAL:** Never commit `.env` files with real credentials to version control!
|
||||
|
||||
### 2. Provider Configuration
|
||||
|
||||
Edit `config/providers_runtime.yaml`:
|
||||
|
||||
```yaml
|
||||
providers:
|
||||
- name: "chainstack-primary"
|
||||
endpoint: "${ARBITRUM_RPC_ENDPOINT}"
|
||||
type: "https"
|
||||
weight: 100
|
||||
timeout: 30s
|
||||
rateLimit: 100
|
||||
|
||||
- name: "chainstack-websocket"
|
||||
endpoint: "${ARBITRUM_WS_ENDPOINT}"
|
||||
type: "wss"
|
||||
weight: 90
|
||||
timeout: 30s
|
||||
rateLimit: 100
|
||||
|
||||
- name: "public-fallback"
|
||||
endpoint: "https://arb1.arbitrum.io/rpc"
|
||||
type: "https"
|
||||
weight: 50
|
||||
timeout: 30s
|
||||
rateLimit: 50
|
||||
|
||||
pooling:
|
||||
maxIdleConnections: 10
|
||||
maxOpenConnections: 50
|
||||
connectionTimeout: 30s
|
||||
idleTimeout: 300s
|
||||
|
||||
retry:
|
||||
maxRetries: 3
|
||||
retryDelay: 1s
|
||||
backoffMultiplier: 2
|
||||
maxBackoff: 8s
|
||||
```
|
||||
|
||||
### 3. Systemd Service Configuration
|
||||
|
||||
Create `/etc/systemd/system/mev-bot.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=MEV Arbitrage Bot
|
||||
After=network.target
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=mev-bot
|
||||
Group=mev-bot
|
||||
WorkingDirectory=/opt/mev-bot
|
||||
EnvironmentFile=/etc/systemd/system/mev-bot.env
|
||||
|
||||
ExecStart=/opt/mev-bot/bin/mev-bot start
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
RestartSec=10s
|
||||
|
||||
# Resource limits
|
||||
LimitNOFILE=65536
|
||||
MemoryMax=4G
|
||||
CPUQuota=400%
|
||||
|
||||
# Security hardening
|
||||
NoNewPrivileges=true
|
||||
PrivateTmp=true
|
||||
ProtectSystem=strict
|
||||
ProtectHome=true
|
||||
ReadWritePaths=/var/log/mev-bot /opt/mev-bot/data
|
||||
|
||||
# Logging
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
SyslogIdentifier=mev-bot
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Steps
|
||||
|
||||
### Phase 1: Build & Prepare (10-15 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Build binary
|
||||
cd /opt/mev-bot
|
||||
make build
|
||||
|
||||
# Verify binary
|
||||
./bin/mev-bot --version
|
||||
# Expected: MEV Bot v1.0.0 (or similar)
|
||||
|
||||
# 2. Run tests
|
||||
make test
|
||||
# Ensure all tests pass
|
||||
|
||||
# 3. Check binary size and dependencies
|
||||
ls -lh bin/mev-bot
|
||||
ldd bin/mev-bot # Should show minimal dependencies
|
||||
|
||||
# 4. Create necessary directories
|
||||
sudo mkdir -p /var/log/mev-bot
|
||||
sudo mkdir -p /opt/mev-bot/data
|
||||
sudo chown -R mev-bot:mev-bot /var/log/mev-bot /opt/mev-bot/data
|
||||
|
||||
# 5. Set permissions
|
||||
chmod +x bin/mev-bot
|
||||
chmod 600 /etc/systemd/system/mev-bot.env # Protect sensitive config
|
||||
```
|
||||
|
||||
### Phase 2: Dry Run (5-10 minutes)
|
||||
|
||||
```bash
|
||||
# Run bot in foreground to verify configuration
|
||||
sudo -u mev-bot /opt/mev-bot/bin/mev-bot start &
|
||||
BOT_PID=$!
|
||||
|
||||
# Wait 2 minutes for initialization
|
||||
sleep 120
|
||||
|
||||
# Check if running
|
||||
ps aux | grep mev-bot
|
||||
|
||||
# Check logs for errors
|
||||
tail -100 /var/log/mev-bot/mev_bot.log | grep -i error
|
||||
|
||||
# Verify RPC connection
|
||||
tail -100 /var/log/mev-bot/mev_bot.log | grep -i "connected"
|
||||
|
||||
# Stop dry run
|
||||
kill $BOT_PID
|
||||
```
|
||||
|
||||
### Phase 3: Production Start (5 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Reload systemd
|
||||
sudo systemctl daemon-reload
|
||||
|
||||
# 2. Enable service (start on boot)
|
||||
sudo systemctl enable mev-bot
|
||||
|
||||
# 3. Start service
|
||||
sudo systemctl start mev-bot
|
||||
|
||||
# 4. Verify status
|
||||
sudo systemctl status mev-bot
|
||||
# Expected: active (running)
|
||||
|
||||
# 5. Check logs
|
||||
sudo journalctl -u mev-bot -f --lines=50
|
||||
|
||||
# 6. Wait for initialization (30-60 seconds)
|
||||
sleep 60
|
||||
|
||||
# 7. Verify healthy operation
|
||||
curl -s http://localhost:9090/health/live | jq .
|
||||
# Expected: {"status": "healthy"}
|
||||
```
|
||||
|
||||
### Phase 4: Validation (15-30 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Monitor for opportunities
|
||||
tail -f /var/log/mev-bot/mev_bot.log | grep "ARBITRAGE OPPORTUNITY"
|
||||
|
||||
# 2. Check metrics endpoint
|
||||
curl -s http://localhost:9090/metrics | grep mev_
|
||||
|
||||
# 3. Verify cache performance
|
||||
tail -100 /var/log/mev-bot/mev_bot.log | grep "cache metrics"
|
||||
# Look for hit rate 75-85%
|
||||
|
||||
# 4. Check for errors
|
||||
sudo journalctl -u mev-bot --since "10 minutes ago" | grep ERROR
|
||||
# Should have minimal errors
|
||||
|
||||
# 5. Monitor resource usage
|
||||
htop # Check CPU and memory
|
||||
# CPU should be 50-80%, Memory < 2GB
|
||||
|
||||
# 6. Test failover (optional)
|
||||
# Temporarily block primary RPC, verify fallback works
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment Validation
|
||||
|
||||
### Health Checks
|
||||
|
||||
```bash
|
||||
# Liveness probe (should return 200)
|
||||
curl -f http://localhost:9090/health/live || echo "LIVENESS FAILED"
|
||||
|
||||
# Readiness probe (should return 200)
|
||||
curl -f http://localhost:9090/health/ready || echo "READINESS FAILED"
|
||||
|
||||
# Startup probe (should return 200 after initialization)
|
||||
curl -f http://localhost:9090/health/startup || echo "STARTUP FAILED"
|
||||
```
|
||||
|
||||
### Performance Metrics
|
||||
|
||||
```bash
|
||||
# Check Prometheus metrics
|
||||
curl -s http://localhost:9090/metrics | grep -E "mev_(opportunities|executions|profit)"
|
||||
|
||||
# Expected metrics:
|
||||
# - mev_opportunities_detected{} <number>
|
||||
# - mev_opportunities_profitable{} <number>
|
||||
# - mev_cache_hit_rate{} 0.75-0.85
|
||||
# - mev_rpc_calls_total{} <number>
|
||||
```
|
||||
|
||||
### Log Analysis
|
||||
|
||||
```bash
|
||||
# Analyze last hour of logs
|
||||
./scripts/log-manager.sh analyze
|
||||
|
||||
# Check health score (target: > 90)
|
||||
./scripts/log-manager.sh health
|
||||
|
||||
# Expected output:
|
||||
# Health Score: 95.5/100 (Excellent)
|
||||
# Error Rate: < 5%
|
||||
# Cache Hit Rate: 75-85%
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Alerting
|
||||
|
||||
### Key Metrics to Monitor
|
||||
|
||||
| Metric | Threshold | Action |
|
||||
|--------|-----------|--------|
|
||||
| CPU Usage | > 90% | Scale up or investigate |
|
||||
| Memory Usage | > 85% | Potential memory leak |
|
||||
| Error Rate | > 10% | Check logs, may need rollback |
|
||||
| RPC Failures | > 5/min | Check RPC provider |
|
||||
| Opportunities/hour | < 1 | May indicate detection issue |
|
||||
| Cache Hit Rate | < 70% | Review cache configuration |
|
||||
|
||||
### Alert Configuration
|
||||
|
||||
**Slack Webhook** (edit in `config/alerts.yaml`):
|
||||
```yaml
|
||||
alerts:
|
||||
slack:
|
||||
enabled: true
|
||||
webhook_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
|
||||
channel: "#mev-bot-alerts"
|
||||
|
||||
thresholds:
|
||||
error_rate: 0.10 # 10%
|
||||
cpu_usage: 0.90 # 90%
|
||||
memory_usage: 0.85 # 85%
|
||||
min_opportunities_per_hour: 1
|
||||
```
|
||||
|
||||
### Monitoring Commands
|
||||
|
||||
```bash
|
||||
# Real-time monitoring
|
||||
watch -n 5 'systemctl status mev-bot && curl -s http://localhost:9090/metrics | grep mev_'
|
||||
|
||||
# Start monitoring daemon (background)
|
||||
./scripts/log-manager.sh start-daemon
|
||||
|
||||
# View operations dashboard
|
||||
./scripts/log-manager.sh dashboard
|
||||
# Opens HTML dashboard in browser
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Quick Rollback (< 5 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Stop current version
|
||||
sudo systemctl stop mev-bot
|
||||
|
||||
# 2. Restore previous binary
|
||||
sudo cp /opt/mev-bot/bin/mev-bot.backup /opt/mev-bot/bin/mev-bot
|
||||
|
||||
# 3. Restart service
|
||||
sudo systemctl start mev-bot
|
||||
|
||||
# 4. Verify rollback
|
||||
sudo systemctl status mev-bot
|
||||
tail -100 /var/log/mev-bot/mev_bot.log
|
||||
```
|
||||
|
||||
### Full Rollback (< 15 minutes)
|
||||
|
||||
```bash
|
||||
# 1. Stop service
|
||||
sudo systemctl stop mev-bot
|
||||
|
||||
# 2. Checkout previous version
|
||||
cd /opt/mev-bot
|
||||
git fetch
|
||||
git checkout <previous-commit-hash>
|
||||
|
||||
# 3. Rebuild
|
||||
make build
|
||||
|
||||
# 4. Restart service
|
||||
sudo systemctl start mev-bot
|
||||
|
||||
# 5. Validate
|
||||
curl http://localhost:9090/health/live
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Issue: Bot fails to start
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
systemctl status mev-bot
|
||||
● mev-bot.service - MEV Arbitrage Bot
|
||||
Loaded: loaded
|
||||
Active: failed (Result: exit-code)
|
||||
```
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check logs
|
||||
sudo journalctl -u mev-bot -n 100 --no-pager
|
||||
|
||||
# Common causes:
|
||||
# 1. Missing environment variables
|
||||
# 2. Invalid RPC endpoint
|
||||
# 3. Permission issues
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Verify environment file
|
||||
cat /etc/systemd/system/mev-bot.env
|
||||
|
||||
# Test RPC connection manually
|
||||
curl -X POST -H "Content-Type: application/json" \
|
||||
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
||||
$ARBITRUM_RPC_ENDPOINT
|
||||
|
||||
# Fix permissions
|
||||
sudo chown -R mev-bot:mev-bot /opt/mev-bot
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### Issue: High error rate
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
[ERROR] Failed to fetch pool state
|
||||
[ERROR] RPC call failed
|
||||
[ERROR] 429 Too Many Requests
|
||||
```
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check error rate
|
||||
./scripts/log-manager.sh analyze | grep "Error Rate"
|
||||
|
||||
# Check RPC provider status
|
||||
curl -s $ARBITRUM_RPC_ENDPOINT
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# 1. Enable backup RPC endpoint in config
|
||||
# 2. Reduce rate limits
|
||||
# 3. Contact RPC provider
|
||||
# 4. Switch to different provider
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### Issue: No opportunities detected
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
Blocks processed: 10000
|
||||
Opportunities detected: 0
|
||||
```
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Check if events are being detected
|
||||
tail -100 /var/log/mev-bot/mev_bot.log | grep "processing.*event"
|
||||
|
||||
# Check profit thresholds
|
||||
grep MIN_PROFIT_THRESHOLD /etc/systemd/system/mev-bot.env
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# 1. Lower MIN_PROFIT_THRESHOLD (carefully!)
|
||||
# 2. Check market conditions (volatility)
|
||||
# 3. Verify DEX integrations working
|
||||
# 4. Review price impact thresholds
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
#### Issue: Memory leak
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
Memory usage increasing over time
|
||||
OOM killer may terminate process
|
||||
```
|
||||
|
||||
**Diagnosis:**
|
||||
```bash
|
||||
# Monitor memory over time
|
||||
watch -n 10 'ps aux | grep mev-bot | grep -v grep'
|
||||
|
||||
# Generate heap profile
|
||||
curl http://localhost:9090/debug/pprof/heap > heap.prof
|
||||
go tool pprof heap.prof
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# 1. Restart service (temporary fix)
|
||||
sudo systemctl restart mev-bot
|
||||
|
||||
# 2. Investigate with profiler
|
||||
# 3. Check for goroutine leaks
|
||||
curl http://localhost:9090/debug/pprof/goroutine?debug=1
|
||||
|
||||
# 4. May need code fix and redeploy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Contacts
|
||||
|
||||
| Role | Name | Contact | Availability |
|
||||
|------|------|---------|--------------|
|
||||
| On-Call Engineer | TBD | +1-XXX-XXX-XXXX | 24/7 |
|
||||
| DevOps Lead | TBD | Slack: @devops | Business hours |
|
||||
| Product Owner | TBD | Email: product@company.com | Business hours |
|
||||
|
||||
## Change Log
|
||||
|
||||
| Date | Version | Changes | Author |
|
||||
|------|---------|---------|--------|
|
||||
| 2025-10-28 | 1.0 | Initial runbook | Claude Code |
|
||||
|
||||
---
|
||||
|
||||
**END OF RUNBOOK**
|
||||
|
||||
**Remember:**
|
||||
1. Always test in staging first
|
||||
2. Have rollback plan ready
|
||||
3. Monitor closely after deployment
|
||||
4. Document any issues encountered
|
||||
5. Keep this runbook updated
|
||||
Reference in New Issue
Block a user