292 lines
8.0 KiB
Markdown
292 lines
8.0 KiB
Markdown
# Critical Fix Plan - November 1, 2025
|
|
|
|
## Issues Identified & Solutions
|
|
|
|
### 🔴 ISSUE 1: Multi-Hop Scanner Finding 0 Paths
|
|
|
|
**Root Cause:**
|
|
The DFS search in `multihop.go:208` calls `GetAdjacentTokens(currentToken)` but if the trigger token isn't in the pre-populated token graph, it returns an empty map and the search never starts.
|
|
|
|
**Evidence:**
|
|
```
|
|
[INFO] 📥 Received bridge arbitrage opportunity id=arb_1762011082_0xaf88d065 path_length=4 pools=0
|
|
[INFO] Multi-hop arbitrage scan completed in 99.983µs: found 0 profitable paths out of 0 total paths
|
|
^^^^^^^^
|
|
The issue!
|
|
```
|
|
|
|
**The Flow:**
|
|
1. Opportunity comes in with start token (e.g., USDC `0xaf88d065...`)
|
|
2. `ScanForArbitrage` called with this token
|
|
3. `updateTokenGraph` populates 8 hard-coded pools
|
|
4. DFS starts: `Get adjacent({0xaf88d065...})`
|
|
5. Token graph HAS this token, but...
|
|
6. **BUG**: The DFS expects to find cycles but starts at depth=0 with current==target
|
|
7. On first iteration (depth=0), it skips the "found cycle" check (requires depth>1)
|
|
8. Gets adjacent tokens correctly
|
|
9. But something else is wrong...
|
|
|
|
**Actual Root Cause (Deeper):**
|
|
Looking at the logic more carefully:
|
|
|
|
```go
|
|
// Line 199: If we're back at the start token and have made at least 2 hops
|
|
if depth > 1 && currentToken == targetToken {
|
|
path := mhs.createArbitragePath(currentTokens, currentPath, amount)
|
|
...
|
|
}
|
|
```
|
|
|
|
The issue is: **The DFS is working, but `createArbitragePath` is returning `nil`** for all paths!
|
|
|
|
Looking at `createArbitragePath` (line 238-260):
|
|
```go
|
|
func (mhs *MultiHopScanner) createArbitragePath(...) *ArbitragePath {
|
|
if len(tokens) < 3 || len(pools) != len(tokens)-1 {
|
|
return nil // ← Validation fail
|
|
}
|
|
|
|
// Calculate swap outputs
|
|
for i, pool := range pools {
|
|
outputAmount, err := mhs.calculateSwapOutput(...)
|
|
if err != nil {
|
|
mhs.logger.Debug(...) // ← Silent failure!
|
|
return nil
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**The Real Problem:**
|
|
1. DFS finds paths (e.g., USDC → WETH → LINK → USDC)
|
|
2. `createArbitragePath` is called
|
|
3. `calculateSwapOutput` tries to get pool reserves
|
|
4. **But the pools have placeholder liquidity values!** (line 485: `uint256.NewInt(1000000000000000000)`)
|
|
5. Or `calculateSwapOutput` fails due to missing SqrtPriceX96 data
|
|
6. Path creation fails silently
|
|
7. Returns 0 paths
|
|
|
|
### 🔴 ISSUE 2: Security Manager Disabled
|
|
|
|
**Status:** CRITICAL - Running without transaction validation
|
|
|
|
**Location:** `cmd/mev-bot/main.go:141`
|
|
|
|
**Fix:** Uncomment security manager initialization
|
|
|
|
### 🔴 ISSUE 3: Rate Limiting (2,699 errors)
|
|
|
|
**Root Cause:** Single RPC endpoint being overwhelmed
|
|
|
|
**Fix:** Enable multi-provider failover from `providers_runtime.yaml`
|
|
|
|
### 🔴 ISSUE 4: Port Binding Conflicts (53 errors)
|
|
|
|
**Root Cause:** Multiple instances or improper cleanup
|
|
|
|
**Fix:** Add SO_REUSEADDR and pre-flight port checks
|
|
|
|
### 🔴 ISSUE 5: Context Cancellation (71 errors)
|
|
|
|
**Root Cause:** Improper shutdown handling
|
|
|
|
**Fix:** Add graceful shutdown with proper context handling
|
|
|
|
---
|
|
|
|
## Fix Implementation Plan
|
|
|
|
### Fix 1: Multi-Hop Scanner - Add Real Pool Data Fetching
|
|
|
|
**File:** `pkg/arbitrage/multihop.go`
|
|
|
|
**Changes:**
|
|
1. Add DEBUG logging to `createArbitragePath` to show why paths fail
|
|
2. Fetch real pool data (sqrtPriceX96, liquidity) from RPC in `updateTokenGraph`
|
|
3. Add fallback: if RPC fetch fails, use DataFetcher or skip pool
|
|
4. Add metrics to track: paths_found, paths_validated, paths_rejected
|
|
|
|
**Code Addition:**
|
|
```go
|
|
// In createArbitragePath, add before return nil:
|
|
mhs.logger.Debug(fmt.Sprintf("❌ Path validation failed: tokens=%d pools=%d reason=%s",
|
|
len(tokens), len(pools), reason))
|
|
|
|
// In updateTokenGraph, fetch real data:
|
|
for _, pool := range pools {
|
|
// Fetch real pool state from RPC
|
|
slot0, err := mhs.fetchPoolSlot0(ctx, pool.Address)
|
|
if err != nil {
|
|
mhs.logger.Warn(fmt.Sprintf("Failed to fetch pool state for %s: %v", pool.Address, err))
|
|
continue // Skip this pool
|
|
}
|
|
pool.SqrtPriceX96 = slot0.SqrtPriceX96
|
|
pool.Liquidity = slot0.Liquidity
|
|
mhs.addPoolToGraph(pool)
|
|
}
|
|
```
|
|
|
|
### Fix 2: Security Manager
|
|
|
|
**File:** `cmd/mev-bot/main.go`
|
|
|
|
**Change:** Uncomment lines 143-180 to re-enable security manager
|
|
|
|
### Fix 3: Multi-Provider RPC
|
|
|
|
**File:** `cmd/mev-bot/main.go` or provider initialization
|
|
|
|
**Change:** Enable provider rotation with fallback
|
|
|
|
```go
|
|
// Add after line 132
|
|
if providerConfigPath := os.Getenv("PROVIDER_CONFIG_PATH"); providerConfigPath != "" {
|
|
log.Info(fmt.Sprintf("Loading multi-provider configuration from: %s", providerConfigPath))
|
|
// Enable provider manager with failover
|
|
}
|
|
```
|
|
|
|
### Fix 4: Port Binding
|
|
|
|
**File:** `pkg/metrics/server.go` (or equivalent)
|
|
|
|
**Change:**
|
|
```go
|
|
listener, err := net.Listen("tcp", fmt.Sprintf(":%d", port))
|
|
// Change to:
|
|
lc := net.ListenConfig{
|
|
Control: func(network, address string, c syscall.RawConn) error {
|
|
return c.Control(func(fd uintptr) {
|
|
syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1)
|
|
})
|
|
},
|
|
}
|
|
listener, err := lc.Listen(ctx, "tcp", fmt.Sprintf(":%d", port))
|
|
```
|
|
|
|
### Fix 5: Graceful Shutdown
|
|
|
|
**File:** `cmd/mev-bot/main.go`
|
|
|
|
**Change:** Add to shutdown handler (after line 400+):
|
|
```go
|
|
// Create shutdown context with timeout
|
|
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
|
|
defer shutdownCancel()
|
|
|
|
// Cancel main context
|
|
cancel()
|
|
|
|
// Wait for goroutines to finish with timeout
|
|
done := make(chan struct{})
|
|
go func() {
|
|
// Wait for all subsystems
|
|
wg.Wait()
|
|
close(done)
|
|
}()
|
|
|
|
select {
|
|
case <-done:
|
|
log.Info("Graceful shutdown completed")
|
|
case <-shutdownCtx.Done():
|
|
log.Warn("Shutdown timeout exceeded, forcing exit")
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Priority
|
|
|
|
### Phase 1: Critical Security (30 minutes)
|
|
1. ✅ Re-enable security manager
|
|
2. ✅ Add port reuse socket option
|
|
3. ✅ Add graceful shutdown
|
|
|
|
### Phase 2: Multi-Hop Scanner Fix (1-2 hours)
|
|
1. ✅ Add detailed DEBUG logging to identify failure point
|
|
2. ✅ Implement real pool data fetching in updateTokenGraph
|
|
3. ✅ Add reserve cache integration
|
|
4. ✅ Test with live data
|
|
|
|
### Phase 3: RPC Optimization (1 hour)
|
|
1. ✅ Enable multi-provider rotation
|
|
2. ✅ Add exponential backoff
|
|
3. ✅ Re-enable DataFetcher for batching
|
|
|
|
### Phase 4: Testing & Validation (1 hour)
|
|
1. ✅ Run bot for 10 minutes
|
|
2. ✅ Verify no rate limiting errors
|
|
3. ✅ Verify multi-hop scanner finds paths
|
|
4. ✅ Verify opportunities are executed
|
|
5. ✅ Check all metrics
|
|
|
|
---
|
|
|
|
## Expected Outcomes
|
|
|
|
### Before Fixes:
|
|
- ❌ 0 profitable paths found
|
|
- ❌ 2,699 rate limit errors
|
|
- ❌ Security disabled
|
|
- ❌ 53 port conflicts
|
|
- ❌ 71 context cancellations
|
|
|
|
### After Fixes:
|
|
- ✅ 5-20 profitable paths per opportunity
|
|
- ✅ < 10 rate limit errors (99.6% reduction)
|
|
- ✅ Security enabled
|
|
- ✅ 0 port conflicts
|
|
- ✅ 0 context cancellations
|
|
- ✅ Actual arbitrage executions!
|
|
|
|
---
|
|
|
|
## Testing Commands
|
|
|
|
```bash
|
|
# Phase 1: Build with fixes
|
|
make clean && make build
|
|
|
|
# Phase 2: Test startup (should see no errors)
|
|
timeout 30 ./mev-bot start 2>&1 | tee test_output.log
|
|
|
|
# Phase 3: Check for critical errors
|
|
grep -E "ERROR|FATAL|panic" test_output.log | wc -l # Should be 0
|
|
|
|
# Phase 4: Check multi-hop scanner
|
|
grep "profitable paths" test_output.log | tail -5 # Should show > 0 paths
|
|
|
|
# Phase 5: Full run (2 minutes)
|
|
timeout 120 ./mev-bot start 2>&1 | tee full_test.log
|
|
|
|
# Phase 6: Analyze results
|
|
./scripts/log-manager.sh analyze
|
|
```
|
|
|
|
---
|
|
|
|
## Rollback Plan
|
|
|
|
If fixes cause issues:
|
|
```bash
|
|
git stash # Stash changes
|
|
git checkout 0b1c7bb # Return to last known good commit
|
|
make build && ./mev-bot start
|
|
```
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
- [ ] Security manager enabled
|
|
- [ ] Multi-hop scanner finds > 0 paths
|
|
- [ ] Rate limit errors < 1% of previous
|
|
- [ ] No port binding errors
|
|
- [ ] No context cancellation errors
|
|
- [ ] At least 1 arbitrage execution attempt per minute
|
|
- [ ] Health score > 95/100
|
|
|
|
---
|
|
|
|
**Next Step:** Implement Phase 1 fixes (security critical)
|