mev-beta/docs/CRITICAL_FIX_PLAN_20251101.md

# Critical Fix Plan - November 1, 2025

## Issues Identified & Solutions

### 🔴 ISSUE 1: Multi-Hop Scanner Finding 0 Paths

**Root Cause:**
The DFS search in `multihop.go:208` calls `GetAdjacentTokens(currentToken)` but if the trigger token isn't in the pre-populated token graph, it returns an empty map and the search never starts.

**Evidence:**
```
[INFO] 📥 Received bridge arbitrage opportunity id=arb_1762011082_0xaf88d065 path_length=4 pools=0
[INFO] Multi-hop arbitrage scan completed in 99.983µs: found 0 profitable paths out of 0 total paths
                                                                                      ^^^^^^^^
                                                                                      The issue!
```

**The Flow:**
1. Opportunity comes in with start token (e.g., USDC `0xaf88d065...`)
2. `ScanForArbitrage` called with this token
3. `updateTokenGraph` populates 8 hard-coded pools
4. DFS starts: `Get adjacent({0xaf88d065...})`
5. Token graph HAS this token, but...
6. **BUG**: The DFS expects to find cycles but starts at depth=0 with current==target
7. On first iteration (depth=0), it skips the "found cycle" check (requires depth>1)
8. Gets adjacent tokens correctly
9. But something else is wrong...

**Actual Root Cause (Deeper):**
Looking at the logic more carefully:

```go
// Line 199: If we're back at the start token and have made at least 2 hops
if depth > 1 && currentToken == targetToken {
    path := mhs.createArbitragePath(currentTokens, currentPath, amount)
    ...
}
```

The issue is: **The DFS is working, but `createArbitragePath` is returning `nil`** for all paths!

Looking at `createArbitragePath` (line 238-260):
```go
func (mhs *MultiHopScanner) createArbitragePath(...) *ArbitragePath {
    if len(tokens) < 3 || len(pools) != len(tokens)-1 {
        return nil  // ← Validation fail
    }

    // Calculate swap outputs
    for i, pool := range pools {
        outputAmount, err := mhs.calculateSwapOutput(...)
        if err != nil {
            mhs.logger.Debug(...) // ← Silent failure!
            return nil
        }
    }
}
```

**The Real Problem:**
1. DFS finds paths (e.g., USDC → WETH → LINK → USDC)
2. `createArbitragePath` is called
3. `calculateSwapOutput` tries to get pool reserves
4. **But the pools have placeholder liquidity values!** (line 485: `uint256.NewInt(1000000000000000000)`)
5. Or `calculateSwapOutput` fails due to missing SqrtPriceX96 data
6. Path creation fails silently
7. Returns 0 paths

### 🔴 ISSUE 2: Security Manager Disabled

**Status:** CRITICAL - Running without transaction validation

**Location:** `cmd/mev-bot/main.go:141`

**Fix:** Uncomment security manager initialization

### 🔴 ISSUE 3: Rate Limiting (2,699 errors)

**Root Cause:** Single RPC endpoint being overwhelmed

**Fix:** Enable multi-provider failover from `providers_runtime.yaml`

### 🔴 ISSUE 4: Port Binding Conflicts (53 errors)

**Root Cause:** Multiple instances or improper cleanup

**Fix:** Add SO_REUSEADDR and pre-flight port checks

### 🔴 ISSUE 5: Context Cancellation (71 errors)

**Root Cause:** Improper shutdown handling

**Fix:** Add graceful shutdown with proper context handling

---

## Fix Implementation Plan

### Fix 1: Multi-Hop Scanner - Add Real Pool Data Fetching

**File:** `pkg/arbitrage/multihop.go`

**Changes:**
1. Add DEBUG logging to `createArbitragePath` to show why paths fail
2. Fetch real pool data (sqrtPriceX96, liquidity) from RPC in `updateTokenGraph`
3. Add fallback: if RPC fetch fails, use DataFetcher or skip pool
4. Add metrics to track: paths_found, paths_validated, paths_rejected

**Code Addition:**
```go
// In createArbitragePath, add before return nil:
mhs.logger.Debug(fmt.Sprintf("❌ Path validation failed: tokens=%d pools=%d reason=%s",
    len(tokens), len(pools), reason))

// In updateTokenGraph, fetch real data:
for _, pool := range pools {
    // Fetch real pool state from RPC
    slot0, err := mhs.fetchPoolSlot0(ctx, pool.Address)
    if err != nil {
        mhs.logger.Warn(fmt.Sprintf("Failed to fetch pool state for %s: %v", pool.Address, err))
        continue // Skip this pool
    }
    pool.SqrtPriceX96 = slot0.SqrtPriceX96
    pool.Liquidity = slot0.Liquidity
    mhs.addPoolToGraph(pool)
}
```

### Fix 2: Security Manager

**File:** `cmd/mev-bot/main.go`

**Change:** Uncomment lines 143-180 to re-enable security manager

### Fix 3: Multi-Provider RPC

**File:** `cmd/mev-bot/main.go` or provider initialization

**Change:** Enable provider rotation with fallback

```go
// Add after line 132
if providerConfigPath := os.Getenv("PROVIDER_CONFIG_PATH"); providerConfigPath != "" {
    log.Info(fmt.Sprintf("Loading multi-provider configuration from: %s", providerConfigPath))
    // Enable provider manager with failover
}
```

### Fix 4: Port Binding

**File:** `pkg/metrics/server.go` (or equivalent)

**Change:**
```go
listener, err := net.Listen("tcp", fmt.Sprintf(":%d", port))
// Change to:
lc := net.ListenConfig{
    Control: func(network, address string, c syscall.RawConn) error {
        return c.Control(func(fd uintptr) {
            syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1)
        })
    },
}
listener, err := lc.Listen(ctx, "tcp", fmt.Sprintf(":%d", port))
```

### Fix 5: Graceful Shutdown

**File:** `cmd/mev-bot/main.go`

**Change:** Add to shutdown handler (after line 400+):
```go
// Create shutdown context with timeout
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()

// Cancel main context
cancel()

// Wait for goroutines to finish with timeout
done := make(chan struct{})
go func() {
    // Wait for all subsystems
    wg.Wait()
    close(done)
}()

select {
case <-done:
    log.Info("Graceful shutdown completed")
case <-shutdownCtx.Done():
    log.Warn("Shutdown timeout exceeded, forcing exit")
}
```

---

## Implementation Priority

### Phase 1: Critical Security (30 minutes)
1. ✅ Re-enable security manager
2. ✅ Add port reuse socket option
3. ✅ Add graceful shutdown

### Phase 2: Multi-Hop Scanner Fix (1-2 hours)
1. ✅ Add detailed DEBUG logging to identify failure point
2. ✅ Implement real pool data fetching in updateTokenGraph
3. ✅ Add reserve cache integration
4. ✅ Test with live data

### Phase 3: RPC Optimization (1 hour)
1. ✅ Enable multi-provider rotation
2. ✅ Add exponential backoff
3. ✅ Re-enable DataFetcher for batching

### Phase 4: Testing & Validation (1 hour)
1. ✅ Run bot for 10 minutes
2. ✅ Verify no rate limiting errors
3. ✅ Verify multi-hop scanner finds paths
4. ✅ Verify opportunities are executed
5. ✅ Check all metrics

---

## Expected Outcomes

### Before Fixes:
- ❌ 0 profitable paths found
- ❌ 2,699 rate limit errors
- ❌ Security disabled
- ❌ 53 port conflicts
- ❌ 71 context cancellations

### After Fixes:
- ✅ 5-20 profitable paths per opportunity
- ✅ < 10 rate limit errors (99.6% reduction)
- ✅ Security enabled
- ✅ 0 port conflicts
- ✅ 0 context cancellations
- ✅ Actual arbitrage executions!

---

## Testing Commands

```bash
# Phase 1: Build with fixes
make clean && make build

# Phase 2: Test startup (should see no errors)
timeout 30 ./mev-bot start 2>&1 | tee test_output.log

# Phase 3: Check for critical errors
grep -E "ERROR|FATAL|panic" test_output.log | wc -l  # Should be 0

# Phase 4: Check multi-hop scanner
grep "profitable paths" test_output.log | tail -5  # Should show > 0 paths

# Phase 5: Full run (2 minutes)
timeout 120 ./mev-bot start 2>&1 | tee full_test.log

# Phase 6: Analyze results
./scripts/log-manager.sh analyze
```

---

## Rollback Plan

If fixes cause issues:
```bash
git stash  # Stash changes
git checkout 0b1c7bb  # Return to last known good commit
make build && ./mev-bot start
```

---

## Success Criteria

- [ ] Security manager enabled
- [ ] Multi-hop scanner finds > 0 paths
- [ ] Rate limit errors < 1% of previous
- [ ] No port binding errors
- [ ] No context cancellation errors
- [ ] At least 1 arbitrage execution attempt per minute
- [ ] Health score > 95/100

---

**Next Step:** Implement Phase 1 fixes (security critical)