mev-beta/docs/CRITICAL_FIXES_RECOMMENDATIONS_20251030.md

# Critical Fixes and Recommendations
**Date**: 2025-10-30
**Priority**: URGENT - Production System Failure
**Related**: LOG_ANALYSIS_COMPREHENSIVE_REPORT_20251030.md

## 🚨 IMMEDIATE ACTIONS (Next 24 Hours)

### Priority 0: Fix WebSocket Connection
**Issue**: 9,065 "unsupported protocol scheme wss" errors
**Impact**: Cannot connect to Arbitrum network via WebSocket

#### Root Cause
Code is using HTTP client (`http.Post`) to connect to WebSocket URLs (`wss://`)

#### Fix Required

**File**: `pkg/arbitrum/connection.go` or `pkg/monitor/concurrent.go`

**Current (Incorrect)**:
```go
// Somewhere in connection initialization
client, err := rpc.Dial(wsEndpoint)  // or similar HTTP-based call
resp, err := http.Post(wsEndpoint, ...)  // WRONG for WebSocket
```

**Fixed (Correct)**:
```go
import (
    "github.com/ethereum/go-ethereum/ethclient"
)

// For WebSocket connections
func connectWebSocket(wsURL string) (*ethclient.Client, error) {
    client, err := ethclient.Dial(wsURL)
    if err != nil {
        return nil, fmt.Errorf("failed to connect to %s: %w", wsURL, err)
    }
    return client, nil
}

// For HTTP connections (fallback)
func connectHTTP(httpURL string) (*ethclient.Client, error) {
    client, err := ethclient.Dial(httpURL)
    if err != nil {
        return nil, fmt.Errorf("failed to connect to %s: %w", httpURL, err)
    }
    return client, nil
}
```

**Implementation Steps**:
1. Locate RPC client initialization code
2. Check if using `rpc.Dial()` vs `ethclient.Dial()`
3. Ensure WebSocket URLs use `ethclient.Dial()` directly
4. Remove any HTTP POST attempts to WebSocket endpoints
5. Test connection with: `timeout 30 ./mev-bot start`

**Validation**:
```bash
# Should see successful WebSocket connection
LOG_LEVEL=debug ./mev-bot start 2>&1 | grep -i "websocket\|wss"
```

### Priority 0: Fix Zero Address Parsing
**Issue**: 100% of liquidity events contain zero addresses
**Impact**: Invalid event data, corrupted arbitrage detection

#### Root Cause
Token address extraction from transaction logs returning zero addresses instead of actual token addresses.

#### Fix Required

**File**: `pkg/arbitrum/abi_decoder.go`

**Current Issue**: Token extraction logic likely doing:
```go
// WRONG - returning zero address on extraction failure
func extractTokenAddress(log types.Log) common.Address {
    // If parsing fails, returns common.Address{} which is 0x000...
    return common.Address{}
}
```

**Fixed Implementation**:
```go
func extractTokenAddress(log types.Log, topicIndex int) (common.Address, error) {
    if len(log.Topics) <= topicIndex {
        return common.Address{}, fmt.Errorf("topic index %d out of range", topicIndex)
    }

    address := common.BytesToAddress(log.Topics[topicIndex].Bytes())

    // CRITICAL: Validate address is not zero
    if address == (common.Address{}) {
        return common.Address{}, fmt.Errorf("extracted zero address from topic %d", topicIndex)
    }

    return address, nil
}

// For event parsing
func parseSwapEvent(log types.Log) (*SwapEvent, error) {
    // Extract token addresses from pool
    pool, err := getPoolContract(log.Address)
    if err != nil {
        return nil, fmt.Errorf("failed to get pool: %w", err)
    }

    token0, err := pool.Token0(nil)
    if err != nil {
        return nil, fmt.Errorf("failed to get token0: %w", err)
    }

    token1, err := pool.Token1(nil)
    if err != nil {
        return nil, fmt.Errorf("failed to get token1: %w", err)
    }

    // Validate addresses
    if token0 == (common.Address{}) || token1 == (common.Address{}) {
        return nil, fmt.Errorf("zero address detected: token0=%s, token1=%s", token0.Hex(), token1.Hex())
    }

    return &SwapEvent{
        Token0Address: token0,
        Token1Address: token1,
        // ...
    }, nil
}
```

**Additional Checks Needed**:
1. Add validation before event submission
2. Log and skip events with zero addresses
3. Add metrics for zero address detections
4. Review pool contract call logic

**Validation**:
```bash
# Check for zero addresses in new events
tail -f logs/liquidity_events_*.jsonl | jq -r '.token0Address, .token1Address' | grep -v "0x0000000000000000000000000000000000000000"
```

### Priority 0: Implement Rate Limiting Strategy
**Issue**: 100,709 rate limit errors (429 Too Many Requests)
**Impact**: Service degradation, failed API calls, incomplete data

#### Short-Term Fix (Immediate)
**File**: `internal/config/config.go` and `pkg/arbitrum/connection.go`

```go
type RateLimiter struct {
    limiter   *rate.Limiter
    maxRetries int
    backoff    time.Duration
}

func NewRateLimiter(rps int, burst int) *RateLimiter {
    return &RateLimiter{
        limiter:   rate.NewLimiter(rate.Limit(rps), burst),
        maxRetries: 3,
        backoff:    time.Second,
    }
}

func (rl *RateLimiter) Do(ctx context.Context, fn func() error) error {
    for attempt := 0; attempt <= rl.maxRetries; attempt++ {
        // Wait for rate limit token
        if err := rl.limiter.Wait(ctx); err != nil {
            return fmt.Errorf("rate limiter error: %w", err)
        }

        err := fn()
        if err == nil {
            return nil
        }

        // Check if it's a rate limit error
        if strings.Contains(err.Error(), "429") || strings.Contains(err.Error(), "Too Many Requests") {
            backoff := rl.backoff * time.Duration(1<<attempt) // Exponential backoff
            log.Printf("Rate limited, backing off for %v (attempt %d/%d)", backoff, attempt+1, rl.maxRetries)
            time.Sleep(backoff)
            continue
        }

        return err // Non-rate-limit error
    }

    return fmt.Errorf("max retries exceeded")
}
```

**Configuration**:
```yaml
# config/arbitrum_production.yaml
rpc:
  rate_limit:
    requests_per_second: 10  # Conservative limit
    burst: 20
    max_retries: 3
    backoff_seconds: 1
```

**Apply to all RPC calls**:
```go
// Example usage
err := rateLimiter.Do(ctx, func() error {
    block, err := client.BlockByNumber(ctx, blockNum)
    return err
})
```

#### Long-Term Fix (48 hours)
**Upgrade RPC Provider**:
1. **Option A**: Purchase paid Chainstack plan with higher RPS limits
2. **Option B**: Add multiple RPC providers with load balancing
3. **Option C**: Run local Arbitrum archive node

**Recommended Multi-Provider Setup**:
```go
type RPCProvider struct {
    Name     string
    Endpoint string
    RPS      int
    Priority int
}

var providers = []RPCProvider{
    {Name: "Chainstack", Endpoint: "wss://arbitrum-mainnet.core.chainstack.com/...", RPS: 25, Priority: 1},
    {Name: "Alchemy", Endpoint: "wss://arb-mainnet.g.alchemy.com/v2/YOUR_KEY", RPS: 50, Priority: 2},
    {Name: "Infura", Endpoint: "wss://arbitrum-mainnet.infura.io/ws/v3/YOUR_KEY", RPS: 50, Priority: 3},
    {Name: "Fallback", Endpoint: "https://arb1.arbitrum.io/rpc", RPS: 5, Priority: 4},
}
```

## 🔧 CRITICAL FIXES (24-48 Hours)

### Fix 1: Connection Manager Resilience

**File**: `pkg/arbitrum/connection.go`

**Enhanced Connection Manager**:
```go
type EnhancedConnectionManager struct {
    providers      []RPCProvider
    activeProvider int
    rateLimiters   map[string]*RateLimiter
    healthChecks   map[string]*HealthStatus
    mu             sync.RWMutex
}

type HealthStatus struct {
    LastCheck     time.Time
    IsHealthy     bool
    ErrorCount    int
    SuccessCount  int
    Latency       time.Duration
}

func (m *EnhancedConnectionManager) GetClient(ctx context.Context) (*ethclient.Client, error) {
    m.mu.RLock()
    defer m.mu.RUnlock()

    // Try providers in priority order
    for _, provider := range m.sortedProviders() {
        health := m.healthChecks[provider.Name]

        // Skip unhealthy providers
        if !health.IsHealthy {
            continue
        }

        // Apply rate limiting
        limiter := m.rateLimiters[provider.Name]
        var client *ethclient.Client

        err := limiter.Do(ctx, func() error {
            c, err := ethclient.DialContext(ctx, provider.Endpoint)
            if err != nil {
                return err
            }
            client = c
            return nil
        })

        if err == nil {
            m.updateHealthSuccess(provider.Name)
            return client, nil
        }

        m.updateHealthFailure(provider.Name, err)
    }

    return nil, fmt.Errorf("all RPC providers unavailable")
}

func (m *EnhancedConnectionManager) StartHealthChecks(ctx context.Context) {
    ticker := time.NewTicker(30 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            m.checkAllProviders(ctx)
        }
    }
}
```

**Validation**:
```bash
# Monitor connection switching
LOG_LEVEL=debug ./mev-bot start 2>&1 | grep -i "provider\|connection\|health"
```

### Fix 2: Correct Health Scoring

**File**: `scripts/log-manager.sh:188`

**Current Bug**:
```bash
# Line 188 - unquoted variable causing "[: too many arguments"
if [ $error_rate -gt 10 ]; then
```

**Fixed**:
```bash
# Properly quote variables and handle empty values
if [ -n "$error_rate" ] && [ "$(echo "$error_rate > 10" | bc)" -eq 1 ]; then
    health_status="concerning"
elif [ -n "$error_rate" ] && [ "$(echo "$error_rate > 5" | bc)" -eq 1 ]; then
    health_status="warning"
else
    health_status="healthy"
fi
```

**Enhanced Health Calculation**:
```bash
calculate_health_score() {
    local total_lines=$1
    local error_lines=$2
    local warning_lines=$3
    local rpc_errors=$4
    local zero_addresses=$5

    # Start with 100
    local health_score=100

    # Deduct for error rate
    local error_rate=$(echo "scale=2; $error_lines * 100 / $total_lines" | bc -l 2>/dev/null || echo 0)
    health_score=$(echo "$health_score - $error_rate" | bc)

    # Deduct for RPC failures (each 100 failures = -1 point)
    local rpc_penalty=$(echo "scale=2; $rpc_errors / 100" | bc -l 2>/dev/null || echo 0)
    health_score=$(echo "$health_score - $rpc_penalty" | bc)

    # Deduct for zero addresses (each occurrence = -0.01 point)
    local zero_penalty=$(echo "scale=2; $zero_addresses / 100" | bc -l 2>/dev/null || echo 0)
    health_score=$(echo "$health_score - $zero_penalty" | bc)

    # Floor at 0
    if [ "$(echo "$health_score < 0" | bc)" -eq 1 ]; then
        health_score=0
    fi

    echo "$health_score"
}
```

### Fix 3: Port Conflict Resolution

**Issue**: Metrics (9090) and Dashboard (8080) port conflicts

**File**: `cmd/mev-bot/main.go`

**Current**:
```go
go startMetricsServer(":9090")
go startDashboard(":8080")
```

**Fixed with Port Checking**:
```go
func startWithPortCheck(service string, preferredPort int, handler http.Handler) error {
    port := preferredPort
    maxAttempts := 5

    for attempt := 0; attempt < maxAttempts; attempt++ {
        addr := fmt.Sprintf(":%d", port)
        server := &http.Server{
            Addr:    addr,
            Handler: handler,
        }

        listener, err := net.Listen("tcp", addr)
        if err != nil {
            log.Printf("%s port %d in use, trying %d", service, port, port+1)
            port++
            continue
        }

        log.Printf("✅ %s started on port %d", service, port)
        return server.Serve(listener)
    }

    return fmt.Errorf("failed to start %s after %d attempts", service, maxAttempts)
}

// Usage
go startWithPortCheck("Metrics", 9090, metricsHandler)
go startWithPortCheck("Dashboard", 8080, dashboardHandler)
```

**Alternative - Environment Variables**:
```go
metricsPort := os.Getenv("METRICS_PORT")
if metricsPort == "" {
    metricsPort = "9090"
}

dashboardPort := os.Getenv("DASHBOARD_PORT")
if dashboardPort == "" {
    dashboardPort = "8080"
}
```

## 📋 HIGH PRIORITY FIXES (48-72 Hours)

### Fix 4: Implement Request Caching

**Why**: Reduce RPC calls by 60-80%

**File**: `pkg/arbitrum/pool_cache.go` (new)

```go
type PoolDataCache struct {
    cache *cache.Cache // Using patrickmn/go-cache
    mu    sync.RWMutex
}

type CachedPoolData struct {
    Token0    common.Address
    Token1    common.Address
    Fee       *big.Int
    Liquidity *big.Int
    FetchedAt time.Time
}

func NewPoolDataCache() *PoolDataCache {
    return &PoolDataCache{
        cache: cache.New(5*time.Minute, 10*time.Minute),
    }
}

func (c *PoolDataCache) GetPoolData(ctx context.Context, poolAddr common.Address, fetcher func() (*CachedPoolData, error)) (*CachedPoolData, error) {
    key := poolAddr.Hex()

    // Check cache first
    if data, found := c.cache.Get(key); found {
        return data.(*CachedPoolData), nil
    }

    // Cache miss - fetch from RPC
    data, err := fetcher()
    if err != nil {
        return nil, err
    }

    // Store in cache
    c.cache.Set(key, data, cache.DefaultExpiration)

    return data, nil
}
```

**Usage**:
```go
poolData, err := poolCache.GetPoolData(ctx, poolAddress, func() (*CachedPoolData, error) {
    // This only runs on cache miss
    token0, _ := poolContract.Token0(nil)
    token1, _ := poolContract.Token1(nil)
    fee, _ := poolContract.Fee(nil)
    liquidity, _ := poolContract.Liquidity(nil)

    return &CachedPoolData{
        Token0:    token0,
        Token1:    token1,
        Fee:       fee,
        Liquidity: liquidity,
        FetchedAt: time.Now(),
    }, nil
})
```

### Fix 5: Batch RPC Requests

**File**: `pkg/arbitrum/batch_requests.go` (new)

```go
type BatchRequest struct {
    calls []rpc.BatchElem
    mu    sync.Mutex
}

func (b *BatchRequest) AddPoolDataRequest(poolAddr common.Address) int {
    b.mu.Lock()
    defer b.mu.Unlock()

    idx := len(b.calls)

    // Add all pool data calls in one batch
    b.calls = append(b.calls,
        rpc.BatchElem{Method: "eth_call", Args: []interface{}{/* token0 call */}},
        rpc.BatchElem{Method: "eth_call", Args: []interface{}{/* token1 call */}},
        rpc.BatchElem{Method: "eth_call", Args: []interface{}{/* fee call */}},
        rpc.BatchElem{Method: "eth_call", Args: []interface{}{/* liquidity call */}},
    )

    return idx
}

func (b *BatchRequest) Execute(client *rpc.Client) error {
    b.mu.Lock()
    defer b.mu.Unlock()

    if len(b.calls) == 0 {
        return nil
    }

    err := client.BatchCall(b.calls)
    if err != nil {
        return fmt.Errorf("batch call failed: %w", err)
    }

    // Check individual results
    for i, call := range b.calls {
        if call.Error != nil {
            log.Printf("Batch call %d failed: %v", i, call.Error)
        }
    }

    return nil
}
```

**Impact**: Reduce 4 separate RPC calls per pool to 1 batch call
- **Before**: 100 pools × 4 calls = 400 RPC requests
- **After**: 100 pools ÷ 1 batch = 1 RPC request (with 400 sub-calls)

### Fix 6: Improve Arbitrage Profitability Calculation

**File**: `pkg/arbitrage/detection_engine.go`

**Issues**:
1. Gas cost estimation too high
2. Slippage tolerance too conservative
3. Zero amounts causing invalid calculations

**Enhanced Calculation**:
```go
type ProfitCalculator struct {
    gasPrice        *big.Int
    priorityFee     *big.Int
    slippageBps     int64  // Basis points (100 = 1%)
    minProfitUSD    float64
    executionGasLimit uint64
}

func (pc *ProfitCalculator) CalculateNetProfit(opp *Opportunity) (*ProfitEstimate, error) {
    // Validate inputs
    if opp.AmountIn.Cmp(big.NewInt(0)) == 0 || opp.AmountOut.Cmp(big.NewInt(0)) == 0 {
        return nil, fmt.Errorf("zero amount detected: amountIn=%s, amountOut=%s",
            opp.AmountIn.String(), opp.AmountOut.String())
    }

    // Calculate gross profit in ETH
    grossProfit := new(big.Int).Sub(opp.AmountOut, opp.AmountIn)
    grossProfitETH := new(big.Float).Quo(
        new(big.Float).SetInt(grossProfit),
        new(big.Float).SetInt(big.NewInt(1e18)),
    )

    // Realistic gas estimation
    gasLimit := pc.executionGasLimit  // e.g., 300,000
    if opp.IsMultiHop {
        gasLimit *= 2  // Multi-hop needs more gas
    }

    gasPrice := new(big.Int).Add(pc.gasPrice, pc.priorityFee)
    gasCost := new(big.Int).Mul(gasPrice, big.NewInt(int64(gasLimit)))
    gasCostETH := new(big.Float).Quo(
        new(big.Float).SetInt(gasCost),
        new(big.Float).SetInt(big.NewInt(1e18)),
    )

    // Apply slippage tolerance
    slippageMultiplier := float64(10000-pc.slippageBps) / 10000.0
    grossProfitWithSlippage, _ := new(big.Float).Mul(
        grossProfitETH,
        big.NewFloat(slippageMultiplier),
    ).Float64()

    gasCostFloat, _ := gasCostETH.Float64()
    netProfitETH := grossProfitWithSlippage - gasCostFloat

    // Calculate in USD
    ethPriceUSD := pc.getETHPrice()  // From oracle or cache
    netProfitUSD := netProfitETH * ethPriceUSD

    return &ProfitEstimate{
        GrossProfitETH:    grossProfitETH,
        GasCostETH:        gasCostETH,
        NetProfitETH:      big.NewFloat(netProfitETH),
        NetProfitUSD:      netProfitUSD,
        IsExecutable:      netProfitUSD >= pc.minProfitUSD,
        SlippageApplied:   pc.slippageBps,
        GasLimitUsed:      gasLimit,
    }, nil
}
```

**Configuration**:
```yaml
# config/arbitrum_production.yaml
arbitrage:
  profit_calculation:
    min_profit_usd: 5.0  # Minimum $5 profit
    slippage_bps: 50     # 0.5% slippage tolerance
    gas_limit: 300000    # Base gas limit
    priority_fee_gwei: 0.1  # Additional priority fee
```

## 🔄 OPERATIONAL IMPROVEMENTS (Week 1)

### Improvement 1: Automated Log Rotation

**File**: `/etc/logrotate.d/mev-bot` (system config)

```
/home/administrator/projects/mev-beta/logs/*.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    create 0600 administrator administrator
    size 50M
    postrotate
        /usr/bin/systemctl reload mev-bot.service > /dev/null 2>&1 || true
    endscript
}
```

### Improvement 2: Real-Time Alerting

**File**: `pkg/monitoring/alerts.go` (new)

```go
type AlertManager struct {
    slackWebhook  string
    emailSMTP     string
    thresholds    AlertThresholds
    alertState    map[string]time.Time
    mu            sync.Mutex
}

type AlertThresholds struct {
    ErrorRatePercent     float64  // Alert if >10%
    RPCFailuresPerMin    int      // Alert if >100/min
    ZeroAddressesPerHour int      // Alert if >10/hour
    NoOpportunitiesHours int      // Alert if no opps for N hours
}

func (am *AlertManager) CheckAndAlert(metrics *SystemMetrics) {
    am.mu.Lock()
    defer am.mu.Unlock()

    // Error rate alert
    if metrics.ErrorRate > am.thresholds.ErrorRatePercent {
        if am.shouldAlert("high_error_rate", 5*time.Minute) {
            am.sendAlert("🚨 HIGH ERROR RATE", fmt.Sprintf(
                "Error rate: %.2f%% (threshold: %.2f%%)\nTotal errors: %d",
                metrics.ErrorRate, am.thresholds.ErrorRatePercent, metrics.TotalErrors,
            ))
        }
    }

    // RPC failure alert
    rpcFailuresPerMin := metrics.RPCFailures / int(time.Since(metrics.StartTime).Minutes())
    if rpcFailuresPerMin > am.thresholds.RPCFailuresPerMin {
        if am.shouldAlert("rpc_failures", 10*time.Minute) {
            am.sendAlert("⚠️ RPC FAILURES", fmt.Sprintf(
                "RPC failures: %d/min (threshold: %d/min)\nCheck RPC providers and rate limits",
                rpcFailuresPerMin, am.thresholds.RPCFailuresPerMin,
            ))
        }
    }

    // Zero address alert
    if metrics.ZeroAddressesLastHour > am.thresholds.ZeroAddressesPerHour {
        if am.shouldAlert("zero_addresses", 1*time.Hour) {
            am.sendAlert("❌ ZERO ADDRESS CONTAMINATION", fmt.Sprintf(
                "Zero addresses detected: %d in last hour\nData integrity compromised",
                metrics.ZeroAddressesLastHour,
            ))
        }
    }
}

func (am *AlertManager) shouldAlert(alertType string, cooldown time.Duration) bool {
    lastAlert, exists := am.alertState[alertType]
    if !exists || time.Since(lastAlert) > cooldown {
        am.alertState[alertType] = time.Now()
        return true
    }
    return false
}
```

### Improvement 3: Enhanced Logging with Context

**File**: All files using logging

**Current**:
```go
log.Printf("[ERROR] Failed to get pool data: %v", err)
```

**Enhanced**:
```go
import "log/slog"

logger := slog.With(
    "component", "pool_fetcher",
    "pool", poolAddress.Hex(),
    "block", blockNumber,
)

logger.Error("failed to get pool data",
    "error", err,
    "attempt", attempt,
    "rpc_endpoint", currentEndpoint,
)
```

**Benefits**:
- Structured logging for easy parsing
- Automatic context propagation
- Better filtering and analysis
- JSON output for log aggregation

## 📊 MONITORING & VALIDATION

### Validation Checklist

After implementing fixes, validate each with:

```bash
# 1. WebSocket Connection Fix
✅ No "unsupported protocol scheme wss" errors in logs
✅ Successful WebSocket connection messages
✅ Block subscription working

# 2. Zero Address Fix
✅ No zero addresses in liquidity_events_*.jsonl
✅ Valid token addresses in all events
✅ Factory addresses are non-zero

# 3. Rate Limiting Fix
✅ "Too Many Requests" errors reduced by >90%
✅ Successful RPC calls >95%
✅ Automatic backoff observable in logs

# 4. Connection Manager Fix
✅ Automatic provider failover working
✅ Health checks passing
✅ All providers being utilized

# 5. Health Scoring Fix
✅ Health score reflects actual system state
✅ Score <80 when errors >20%
✅ Alerts triggering at correct thresholds
```

### Performance Metrics to Track

**Before Fixes**:
- Error Rate: 81.1%
- RPC Failures: 100,709
- Zero Addresses: 5,462
- Successful Arbitrages: 0
- Opportunities Rejected: 100%

**Target After Fixes**:
- Error Rate: <5%
- RPC Failures: <100/day
- Zero Addresses: 0
- Successful Arbitrages: >0
- Opportunities Rejected: <80%

### Test Commands

```bash
# Comprehensive system test
./scripts/comprehensive-test.sh

# Individual component tests
go test ./pkg/arbitrum/... -v
go test ./pkg/arbitrage/... -v
go test ./pkg/monitor/... -v

# Integration test with real data
LOG_LEVEL=debug timeout 60 ./mev-bot start 2>&1 | tee test-run.log

# Analyze test run
./scripts/log-manager.sh analyze
./scripts/log-manager.sh health
```

## 🎯 IMPLEMENTATION ROADMAP

### Day 1 (Hours 0-24)
- [ ] Fix WebSocket connection (2 hours)
- [ ] Fix zero address parsing (3 hours)
- [ ] Implement basic rate limiting (2 hours)
- [ ] Fix health scoring script (1 hour)
- [ ] Test and validate (2 hours)
- [ ] Deploy to staging (1 hour)

### Day 2 (Hours 24-48)
- [ ] Enhanced connection manager (4 hours)
- [ ] Fix port conflicts (1 hour)
- [ ] Add multiple RPC providers (2 hours)
- [ ] Implement request caching (3 hours)
- [ ] Full system testing (2 hours)

### Day 3 (Hours 48-72)
- [ ] Batch RPC requests (3 hours)
- [ ] Improve profit calculation (2 hours)
- [ ] Add real-time alerting (2 hours)
- [ ] Enhanced logging (2 hours)
- [ ] Production deployment (3 hours)

### Week 1 (Days 4-7)
- [ ] Log rotation automation
- [ ] Monitoring dashboard improvements
- [ ] Performance optimization
- [ ] Documentation updates
- [ ] Team training on new systems

## 🔒 RISK MITIGATION

### Deployment Risks

| Risk | Probability | Impact | Mitigation |
|------|------------|--------|------------|
| WebSocket fix breaks HTTP fallback | Medium | High | Keep HTTP client as fallback |
| Rate limiting too aggressive | Medium | Medium | Make limits configurable |
| Cache serves stale data | Low | Medium | Add cache invalidation on errors |
| New errors introduced | Medium | High | Comprehensive testing + rollback plan |

### Rollback Plan

If issues occur after deployment:

```bash
# Quick rollback
git revert HEAD
make build
systemctl restart mev-bot

# Restore from backup
cp backups/mev-bot-backup-YYYYMMDD ./mev-bot
systemctl restart mev-bot

# Check rollback success
./scripts/log-manager.sh status
tail -f logs/mev_bot.log
```

### Gradual Rollout

1. **Staging** (Day 1): Deploy all fixes, test for 24 hours
2. **Canary** (Day 2): Deploy to 10% of production capacity
3. **Production** (Day 3): Full production deployment
4. **Monitoring** (Week 1): Intensive monitoring and tuning

## 📚 ADDITIONAL RESOURCES

### Documentation to Update
- [ ] CLAUDE.md - Add new configuration requirements
- [ ] README.md - Update deployment instructions
- [ ] TODO_AUDIT_FIX.md - Mark completed items
- [ ] API.md - Document new monitoring endpoints

### Code Reviews Required
- WebSocket connection changes
- Zero address validation logic
- Rate limiting implementation
- Connection manager enhancements

### Testing Requirements
- Unit tests for all new functions
- Integration tests for RPC connections
- Load testing for rate limiting
- End-to-end arbitrage execution test

---

**Document Version**: 1.0
**Last Updated**: 2025-10-30
**Review Required**: After each fix implementation
**Owner**: MEV Bot Development Team