mev-beta/docs/CRITICAL_FIX_PLAN.md

# CRITICAL FIX PLAN: Zero Address Corruption

**Date:** October 23, 2025
**Priority:** P0 - BLOCKS ALL PROFIT
**Estimated Time:** 3-4 hours
**Status:** 🔴 Ready to Implement

---

## 🎯 Problem Summary

**100% of DEX transactions are rejected** due to zero address corruption in token extraction.

**Root Cause:** The "enhanced parser" integration is incomplete. The L2 parser's `extractTokensFromMulticallData()` method **still calls the broken** `calldata.ExtractTokensFromMulticallWithContext()` from multicall.go, which returns zero addresses.

---

## 🔍 The Chain of Failure

### Current (Broken) Flow

```
1. DEX Transaction Detected ✅
   ↓
2. Event Parser calls tokenExtractor.ExtractTokensFromMulticallData() ✅
   ↓
3. L2 Parser's extractTokensFromMulticallData() is called ✅
   ↓
4. ❌ L2 Parser calls calldata.ExtractTokensFromMulticallWithContext()
   ↓
5. ❌ multicall.go's heuristic extraction returns empty addresses
   ↓
6. ❌ Event has Token0=0x000..., Token1=0x000..., PoolAddress=0x000...
   ↓
7. ❌ Event REJECTED (100% rejection rate)
```

### The Smoking Gun

**File:** `pkg/arbitrum/l2_parser.go:1408-1414`

```go
func (p *ArbitrumL2Parser) extractTokensFromMulticallData(params []byte) (token0, token1 string) {
	tokens, err := calldata.ExtractTokensFromMulticallWithContext(params, &calldata.MulticallContext{
		Stage:    "arbitrum.l2_parser.extractTokensFromMulticallData",
		Protocol: "unknown",
	})
	// ^^^ THIS IS THE PROBLEM! Still using broken multicall.go
```

**The Irony:** The L2 parser has perfectly good extraction methods for specific function signatures:
- ✅ `extractTokensFromSwapExactTokensForTokens()` - WORKS
- ✅ `extractTokensFromExactInputSingle()` - WORKS
- ✅ `extractTokensFromSwapExactETHForTokens()` - WORKS

But it's not using them! Instead, it calls the broken multicall.go code.

---

## ✅ The Solution

### Strategy: Bypass Broken Multicall.go Entirely

Instead of trying to fix the complex heuristic extraction in multicall.go, we'll make the L2 parser's `extractTokensFromMulticallData()` decode the multicall structure and route to its own working extraction methods.

### Implementation

**File:** `pkg/arbitrum/l2_parser.go`

**Current Broken Method (lines 1408-1438):**
```go
func (p *ArbitrumL2Parser) extractTokensFromMulticallData(params []byte) (token0, token1 string) {
	tokens, err := calldata.ExtractTokensFromMulticallWithContext(params, &calldata.MulticallContext{
		Stage:    "arbitrum.l2_parser.extractTokensFromMulticallData",
		Protocol: "unknown",
	})
	// ...
}
```

**New Working Method:**
```go
func (p *ArbitrumL2Parser) extractTokensFromMulticallData(params []byte) (token0, token1 string) {
	// CRITICAL FIX: Decode multicall structure and route to working extraction methods
	// instead of calling broken multicall.go heuristics

	if len(params) < 32 {
		return "", ""
	}

	// Multicall format: offset (32 bytes) + length (32 bytes) + data array
	offset := new(big.Int).SetBytes(params[0:32]).Uint64()
	if offset >= uint64(len(params)) {
		return "", ""
	}

	// Read array length
	arrayLength := new(big.Int).SetBytes(params[offset:offset+32]).Uint64()
	if arrayLength == 0 {
		return "", ""
	}

	// Process each call in the multicall
	currentOffset := offset + 32
	for i := uint64(0); i < arrayLength && i < 10; i++ { // Limit to first 10 calls
		if currentOffset + 32 > uint64(len(params)) {
			break
		}

		// Read call data offset
		callOffset := new(big.Int).SetBytes(params[currentOffset:currentOffset+32]).Uint64()
		currentOffset += 32

		if callOffset >= uint64(len(params)) {
			continue
		}

		// Read call data length
		callLength := new(big.Int).SetBytes(params[callOffset:callOffset+32]).Uint64()
		callStart := callOffset + 32
		callEnd := callStart + callLength

		if callEnd > uint64(len(params)) {
			continue
		}

		// Extract the actual call data
		callData := params[callStart:callEnd]

		if len(callData) < 4 {
			continue
		}

		// Try to extract tokens using our WORKING signature-based methods
		t0, t1, err := p.ExtractTokensFromCalldata(callData)
		if err == nil && t0 != (common.Address{}) && t1 != (common.Address{}) {
			return t0.Hex(), t1.Hex()
		}
	}

	return "", ""
}
```

---

## 📋 Step-by-Step Implementation

### Phase 1: Replace Broken Multicall Extraction (1-2 hours)

1. **Update `pkg/arbitrum/l2_parser.go:extractTokensFromMulticallData()`**
   - Replace calldata.ExtractTokensFromMulticallWithContext() call
   - Implement proper multicall decoding
   - Route to existing working extraction methods
   - Add detailed logging for debugging

2. **Add Enhanced Logging**
   ```go
   p.logger.Debug("Multicall extraction attempt",
       "array_length", arrayLength,
       "call_index", i,
       "function_sig", hex.EncodeToString(callData[:4]))
   ```

3. **Add Universal Router Support**
   - UniversalRouter uses different multicall format
   - Add separate handling for function signature `0x3593564c` (execute)
   - Decode V3_SWAP_EXACT_IN, V2_SWAP_EXACT_IN commands

### Phase 2: Test & Validate (30 minutes)

1. **Unit Test**
   ```bash
   # Test with real multicall data from logs
   go test -v ./pkg/arbitrum -run TestExtractTokensFromMulticall
   ```

2. **Integration Test** (1-minute run)
   ```bash
   make build
   timeout 60 ./bin/mev-bot start
   # Expected: >50% success rate (not 0%)
   ```

3. **Validation Metrics**
   - Success rate > 70%
   - Zero address rejections < 30%
   - Valid Token0/Token1/PoolAddress in logs

### Phase 3: Add UniversalRouter Support (1 hour)

UniversalRouter is the most common protocol (~60% of transactions) and uses a unique command-based format.

**File:** `pkg/arbitrum/l2_parser.go`

**Add Method:**
```go
// extractTokensFromUniversalRouter decodes UniversalRouter execute() commands
func (p *ArbitrumL2Parser) extractTokensFromUniversalRouter(params []byte) (token0, token1 common.Address, err error) {
	// UniversalRouter execute format:
	// bytes commands, bytes[] inputs, uint256 deadline

	if len(params) < 96 {
		return common.Address{}, common.Address{}, fmt.Errorf("params too short for universal router")
	}

	// Parse commands offset (first 32 bytes)
	commandsOffset := new(big.Int).SetBytes(params[0:32]).Uint64()

	// Parse inputs offset (second 32 bytes)
	inputsOffset := new(big.Int).SetBytes(params[32:64]).Uint64()

	if commandsOffset >= uint64(len(params)) || inputsOffset >= uint64(len(params)) {
		return common.Address{}, common.Address{}, fmt.Errorf("invalid offsets")
	}

	// Read commands length
	commandsLength := new(big.Int).SetBytes(params[commandsOffset:commandsOffset+32]).Uint64()
	commandsStart := commandsOffset + 32

	// Read first command (V3_SWAP_EXACT_IN = 0x00, V2_SWAP_EXACT_IN = 0x08)
	if commandsStart >= uint64(len(params)) || commandsLength == 0 {
		return common.Address{}, common.Address{}, fmt.Errorf("no commands")
	}

	firstCommand := params[commandsStart]

	// Read inputs array
	inputsLength := new(big.Int).SetBytes(params[inputsOffset:inputsOffset+32]).Uint64()
	if inputsLength == 0 {
		return common.Address{}, common.Address{}, fmt.Errorf("no inputs")
	}

	// Read first input offset and data
	firstInputOffset := inputsOffset + 32
	inputDataOffset := new(big.Int).SetBytes(params[firstInputOffset:firstInputOffset+32]).Uint64()

	if inputDataOffset >= uint64(len(params)) {
		return common.Address{}, common.Address{}, fmt.Errorf("invalid input offset")
	}

	inputDataLength := new(big.Int).SetBytes(params[inputDataOffset:inputDataOffset+32]).Uint64()
	inputDataStart := inputDataOffset + 32
	inputDataEnd := inputDataStart + inputDataLength

	if inputDataEnd > uint64(len(params)) {
		return common.Address{}, common.Address{}, fmt.Errorf("input data out of bounds")
	}

	inputData := params[inputDataStart:inputDataEnd]

	// Decode based on command type
	switch firstCommand {
	case 0x00: // V3_SWAP_EXACT_IN
		// Format: recipient(addr), amountIn(uint256), amountOutMin(uint256), path(bytes), payerIsUser(bool)
		if len(inputData) >= 160 {
			// Path starts at offset 128 (4th parameter)
			pathOffset := new(big.Int).SetBytes(inputData[96:128]).Uint64()
			if pathOffset < uint64(len(inputData)) {
				pathLength := new(big.Int).SetBytes(inputData[pathOffset:pathOffset+32]).Uint64()
				pathStart := pathOffset + 32

				// V3 path format: token0(20 bytes) + fee(3 bytes) + token1(20 bytes)
				if pathLength >= 43 && pathStart+43 <= uint64(len(inputData)) {
					token0 = common.BytesToAddress(inputData[pathStart:pathStart+20])
					token1 = common.BytesToAddress(inputData[pathStart+23:pathStart+43])
					return token0, token1, nil
				}
			}
		}

	case 0x08: // V2_SWAP_EXACT_IN
		// Format: recipient(addr), amountIn(uint256), amountOutMin(uint256), path(addr[]), payerIsUser(bool)
		if len(inputData) >= 128 {
			// Path array offset is at position 96 (4th parameter)
			pathOffset := new(big.Int).SetBytes(inputData[96:128]).Uint64()
			if pathOffset < uint64(len(inputData)) {
				pathArrayLength := new(big.Int).SetBytes(inputData[pathOffset:pathOffset+32]).Uint64()
				if pathArrayLength >= 2 {
					// First token
					token0 = common.BytesToAddress(inputData[pathOffset+32:pathOffset+64])
					// Last token
					lastTokenOffset := pathOffset + 32 + (pathArrayLength-1)*32
					if lastTokenOffset+32 <= uint64(len(inputData)) {
						token1 = common.BytesToAddress(inputData[lastTokenOffset:lastTokenOffset+32])
						return token0, token1, nil
					}
				}
			}
		}
	}

	return common.Address{}, common.Address{}, fmt.Errorf("unsupported universal router command: 0x%02x", firstCommand)
}
```

**Update ExtractTokensFromCalldata to support UniversalRouter:**
```go
func (p *ArbitrumL2Parser) ExtractTokensFromCalldata(calldata []byte) (token0, token1 common.Address, err error) {
	if len(calldata) < 4 {
		return common.Address{}, common.Address{}, fmt.Errorf("calldata too short")
	}

	functionSignature := hex.EncodeToString(calldata[:4])

	switch functionSignature {
	case "3593564c": // execute (UniversalRouter)
		return p.extractTokensFromUniversalRouter(calldata[4:])
	case "38ed1739": // swapExactTokensForTokens
		return p.extractTokensFromSwapExactTokensForTokens(calldata[4:])
	// ... rest of cases
	}
}
```

### Phase 4: Comprehensive Testing (30 minutes)

1. **5-Minute Production Run**
   ```bash
   make build
   timeout 300 ./bin/mev-bot start
   ```

2. **Expected Results**
   - Success rate: 80-90% (up from 0%)
   - Valid events: ~120-150 per minute
   - Arbitrage opportunities: 1-5 per minute
   - Zero rejections: < 20%

3. **Log Analysis**
   ```bash
   # Count successes
   grep "Enhanced parsing success" logs/mev_bot.log | wc -l

   # Count rejections
   grep "REJECTED: Event with zero PoolAddress" logs/mev_bot.log | wc -l

   # Calculate success rate
   # Should be > 80%
   ```

---

## 🔧 Additional Fixes Needed

### 1. Add Pool Address Discovery

Currently, even with correct token extraction, PoolAddress is still zero because we're not querying the actual pool contracts.

**Solution:** Add pool address lookup after token extraction:

```go
// In event parser after successful token extraction
if token0 != (common.Address{}) && token1 != (common.Address{}) {
	// Query factory to get pool address
	poolAddr := p.getPoolAddress(token0, token1, protocol)
	event.PoolAddress = poolAddr
}
```

### 2. Fix Event Creation Flow

**File:** `pkg/events/parser.go`

The event creation needs to properly use extracted tokens:

```go
event := &Event{
	Type:            Swap,
	Protocol:        protocol,
	PoolAddress:     poolAddress,  // ← Need to populate this
	Token0:          token0,        // ← These come from extraction
	Token1:          token1,        // ← These come from extraction
	TransactionHash: txHash,
	BlockNumber:     blockNumber,
	Timestamp:       timestamp,
}
```

---

## 📊 Success Metrics

### Before Fix
- ❌ Success Rate: 0.00%
- ❌ Valid Events: 0/minute
- ❌ Opportunities: 0/minute
- ❌ Revenue: $0/day

### After Fix (Expected)
- ✅ Success Rate: 80-90%
- ✅ Valid Events: 120-150/minute
- ✅ Opportunities: 1-5/minute
- ✅ Revenue: $100-1000/day (with execution)

---

## ⚠️ Risks & Mitigation

### Risk 1: Complex Multicall Formats
**Impact:** Some complex multicalls may still fail
**Mitigation:** Add fallback to heuristic for unknown formats
**Acceptable:** 10-20% failure rate for edge cases

### Risk 2: UniversalRouter Command Variants
**Impact:** Some UniversalRouter commands not supported
**Mitigation:** Add logging for unsupported commands, implement incrementally
**Acceptable:** Cover 80%+ of commands (V3_SWAP, V2_SWAP, WRAP_ETH)

### Risk 3: Protocol-Specific Differences
**Impact:** Each DEX may have slight format variations
**Mitigation:** Test against real transactions from logs
**Acceptable:** 90%+ coverage of major DEXs (Uniswap, SushiSwap, TraderJoe, Camelot)

---

## 🚀 Deployment Plan

### Step 1: Implement Core Fix (2 hours)
- Replace multicall extraction in L2 parser
- Add comprehensive logging
- Build and initial test

### Step 2: Add UniversalRouter Support (1 hour)
- Implement execute() decoder
- Handle V3_SWAP_EXACT_IN and V2_SWAP_EXACT_IN
- Test with real Universal Router transactions

### Step 3: Validate (30 minutes)
- Run 5-minute production test
- Analyze success rate (target: >80%)
- Check for any new error patterns

### Step 4: Commit & Document (30 minutes)
- Commit changes with detailed message
- Update TODO_AUDIT_FIX.md
- Document any remaining issues

---

## 📝 Files to Modify

1. **`pkg/arbitrum/l2_parser.go`** (PRIMARY)
   - Replace extractTokensFromMulticallData() implementation
   - Add extractTokensFromUniversalRouter() method
   - Update ExtractTokensFromCalldata() with UniversalRouter case
   - Estimated changes: ~150 lines

2. **`pkg/events/parser.go`** (SECONDARY - if needed)
   - Verify token extractor is being called correctly
   - Add pool address lookup after extraction
   - Estimated changes: ~20 lines

3. **`pkg/arbitrum/l2_parser_test.go`** (NEW)
   - Add unit tests for multicall extraction
   - Test UniversalRouter decoding
   - Test with real transaction data from logs
   - Estimated: ~200 lines of tests

---

## ✅ Definition of Done

- [ ] extractTokensFromMulticallData() no longer calls broken multicall.go
- [ ] UniversalRouter execute() transactions are decoded correctly
- [ ] Success rate > 80% in 5-minute production run
- [ ] Zero address rejections < 20%
- [ ] At least 1 arbitrage opportunity detected per minute
- [ ] All changes committed with comprehensive message
- [ ] Documentation updated with findings

---

**Next Steps:** Begin implementation of Phase 1

**Estimated Total Time:** 3-4 hours
**Priority:** P0 - Must fix before any profit can be generated
**Status:** Ready to implement