Comprehensive architectural fix integrating proven L2 parser token extraction methods into the event parsing pipeline through clean dependency injection. Core Components: - TokenExtractor interface (pkg/interfaces/token_extractor.go) - Enhanced ArbitrumL2Parser with multicall parsing - Modified EventParser with TokenExtractor injection - Pipeline integration via SetEnhancedEventParser() - Monitor integration at correct execution path (line 138-160) Testing: - Created test/enhanced_parser_integration_test.go - All architecture tests passing - Interface implementation verified Expected Impact: - 100% elimination of zero address corruption - Successful MEV detection from multicall transactions - Significant increase in arbitrage opportunities Documentation: docs/5_development/ZERO_ADDRESS_CORRUPTION_FIX.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
CRITICAL FIX PLAN: Zero Address Corruption
Date: October 23, 2025 Priority: P0 - BLOCKS ALL PROFIT Estimated Time: 3-4 hours Status: 🔴 Ready to Implement
🎯 Problem Summary
100% of DEX transactions are rejected due to zero address corruption in token extraction.
Root Cause: The "enhanced parser" integration is incomplete. The L2 parser's extractTokensFromMulticallData() method still calls the broken calldata.ExtractTokensFromMulticallWithContext() from multicall.go, which returns zero addresses.
🔍 The Chain of Failure
Current (Broken) Flow
1. DEX Transaction Detected ✅
↓
2. Event Parser calls tokenExtractor.ExtractTokensFromMulticallData() ✅
↓
3. L2 Parser's extractTokensFromMulticallData() is called ✅
↓
4. ❌ L2 Parser calls calldata.ExtractTokensFromMulticallWithContext()
↓
5. ❌ multicall.go's heuristic extraction returns empty addresses
↓
6. ❌ Event has Token0=0x000..., Token1=0x000..., PoolAddress=0x000...
↓
7. ❌ Event REJECTED (100% rejection rate)
The Smoking Gun
File: pkg/arbitrum/l2_parser.go:1408-1414
func (p *ArbitrumL2Parser) extractTokensFromMulticallData(params []byte) (token0, token1 string) {
tokens, err := calldata.ExtractTokensFromMulticallWithContext(params, &calldata.MulticallContext{
Stage: "arbitrum.l2_parser.extractTokensFromMulticallData",
Protocol: "unknown",
})
// ^^^ THIS IS THE PROBLEM! Still using broken multicall.go
The Irony: The L2 parser has perfectly good extraction methods for specific function signatures:
- ✅
extractTokensFromSwapExactTokensForTokens()- WORKS - ✅
extractTokensFromExactInputSingle()- WORKS - ✅
extractTokensFromSwapExactETHForTokens()- WORKS
But it's not using them! Instead, it calls the broken multicall.go code.
✅ The Solution
Strategy: Bypass Broken Multicall.go Entirely
Instead of trying to fix the complex heuristic extraction in multicall.go, we'll make the L2 parser's extractTokensFromMulticallData() decode the multicall structure and route to its own working extraction methods.
Implementation
File: pkg/arbitrum/l2_parser.go
Current Broken Method (lines 1408-1438):
func (p *ArbitrumL2Parser) extractTokensFromMulticallData(params []byte) (token0, token1 string) {
tokens, err := calldata.ExtractTokensFromMulticallWithContext(params, &calldata.MulticallContext{
Stage: "arbitrum.l2_parser.extractTokensFromMulticallData",
Protocol: "unknown",
})
// ...
}
New Working Method:
func (p *ArbitrumL2Parser) extractTokensFromMulticallData(params []byte) (token0, token1 string) {
// CRITICAL FIX: Decode multicall structure and route to working extraction methods
// instead of calling broken multicall.go heuristics
if len(params) < 32 {
return "", ""
}
// Multicall format: offset (32 bytes) + length (32 bytes) + data array
offset := new(big.Int).SetBytes(params[0:32]).Uint64()
if offset >= uint64(len(params)) {
return "", ""
}
// Read array length
arrayLength := new(big.Int).SetBytes(params[offset:offset+32]).Uint64()
if arrayLength == 0 {
return "", ""
}
// Process each call in the multicall
currentOffset := offset + 32
for i := uint64(0); i < arrayLength && i < 10; i++ { // Limit to first 10 calls
if currentOffset + 32 > uint64(len(params)) {
break
}
// Read call data offset
callOffset := new(big.Int).SetBytes(params[currentOffset:currentOffset+32]).Uint64()
currentOffset += 32
if callOffset >= uint64(len(params)) {
continue
}
// Read call data length
callLength := new(big.Int).SetBytes(params[callOffset:callOffset+32]).Uint64()
callStart := callOffset + 32
callEnd := callStart + callLength
if callEnd > uint64(len(params)) {
continue
}
// Extract the actual call data
callData := params[callStart:callEnd]
if len(callData) < 4 {
continue
}
// Try to extract tokens using our WORKING signature-based methods
t0, t1, err := p.ExtractTokensFromCalldata(callData)
if err == nil && t0 != (common.Address{}) && t1 != (common.Address{}) {
return t0.Hex(), t1.Hex()
}
}
return "", ""
}
📋 Step-by-Step Implementation
Phase 1: Replace Broken Multicall Extraction (1-2 hours)
-
Update
pkg/arbitrum/l2_parser.go:extractTokensFromMulticallData()- Replace calldata.ExtractTokensFromMulticallWithContext() call
- Implement proper multicall decoding
- Route to existing working extraction methods
- Add detailed logging for debugging
-
Add Enhanced Logging
p.logger.Debug("Multicall extraction attempt", "array_length", arrayLength, "call_index", i, "function_sig", hex.EncodeToString(callData[:4])) -
Add Universal Router Support
- UniversalRouter uses different multicall format
- Add separate handling for function signature
0x3593564c(execute) - Decode V3_SWAP_EXACT_IN, V2_SWAP_EXACT_IN commands
Phase 2: Test & Validate (30 minutes)
-
Unit Test
# Test with real multicall data from logs go test -v ./pkg/arbitrum -run TestExtractTokensFromMulticall -
Integration Test (1-minute run)
make build timeout 60 ./bin/mev-bot start # Expected: >50% success rate (not 0%) -
Validation Metrics
- Success rate > 70%
- Zero address rejections < 30%
- Valid Token0/Token1/PoolAddress in logs
Phase 3: Add UniversalRouter Support (1 hour)
UniversalRouter is the most common protocol (~60% of transactions) and uses a unique command-based format.
File: pkg/arbitrum/l2_parser.go
Add Method:
// extractTokensFromUniversalRouter decodes UniversalRouter execute() commands
func (p *ArbitrumL2Parser) extractTokensFromUniversalRouter(params []byte) (token0, token1 common.Address, err error) {
// UniversalRouter execute format:
// bytes commands, bytes[] inputs, uint256 deadline
if len(params) < 96 {
return common.Address{}, common.Address{}, fmt.Errorf("params too short for universal router")
}
// Parse commands offset (first 32 bytes)
commandsOffset := new(big.Int).SetBytes(params[0:32]).Uint64()
// Parse inputs offset (second 32 bytes)
inputsOffset := new(big.Int).SetBytes(params[32:64]).Uint64()
if commandsOffset >= uint64(len(params)) || inputsOffset >= uint64(len(params)) {
return common.Address{}, common.Address{}, fmt.Errorf("invalid offsets")
}
// Read commands length
commandsLength := new(big.Int).SetBytes(params[commandsOffset:commandsOffset+32]).Uint64()
commandsStart := commandsOffset + 32
// Read first command (V3_SWAP_EXACT_IN = 0x00, V2_SWAP_EXACT_IN = 0x08)
if commandsStart >= uint64(len(params)) || commandsLength == 0 {
return common.Address{}, common.Address{}, fmt.Errorf("no commands")
}
firstCommand := params[commandsStart]
// Read inputs array
inputsLength := new(big.Int).SetBytes(params[inputsOffset:inputsOffset+32]).Uint64()
if inputsLength == 0 {
return common.Address{}, common.Address{}, fmt.Errorf("no inputs")
}
// Read first input offset and data
firstInputOffset := inputsOffset + 32
inputDataOffset := new(big.Int).SetBytes(params[firstInputOffset:firstInputOffset+32]).Uint64()
if inputDataOffset >= uint64(len(params)) {
return common.Address{}, common.Address{}, fmt.Errorf("invalid input offset")
}
inputDataLength := new(big.Int).SetBytes(params[inputDataOffset:inputDataOffset+32]).Uint64()
inputDataStart := inputDataOffset + 32
inputDataEnd := inputDataStart + inputDataLength
if inputDataEnd > uint64(len(params)) {
return common.Address{}, common.Address{}, fmt.Errorf("input data out of bounds")
}
inputData := params[inputDataStart:inputDataEnd]
// Decode based on command type
switch firstCommand {
case 0x00: // V3_SWAP_EXACT_IN
// Format: recipient(addr), amountIn(uint256), amountOutMin(uint256), path(bytes), payerIsUser(bool)
if len(inputData) >= 160 {
// Path starts at offset 128 (4th parameter)
pathOffset := new(big.Int).SetBytes(inputData[96:128]).Uint64()
if pathOffset < uint64(len(inputData)) {
pathLength := new(big.Int).SetBytes(inputData[pathOffset:pathOffset+32]).Uint64()
pathStart := pathOffset + 32
// V3 path format: token0(20 bytes) + fee(3 bytes) + token1(20 bytes)
if pathLength >= 43 && pathStart+43 <= uint64(len(inputData)) {
token0 = common.BytesToAddress(inputData[pathStart:pathStart+20])
token1 = common.BytesToAddress(inputData[pathStart+23:pathStart+43])
return token0, token1, nil
}
}
}
case 0x08: // V2_SWAP_EXACT_IN
// Format: recipient(addr), amountIn(uint256), amountOutMin(uint256), path(addr[]), payerIsUser(bool)
if len(inputData) >= 128 {
// Path array offset is at position 96 (4th parameter)
pathOffset := new(big.Int).SetBytes(inputData[96:128]).Uint64()
if pathOffset < uint64(len(inputData)) {
pathArrayLength := new(big.Int).SetBytes(inputData[pathOffset:pathOffset+32]).Uint64()
if pathArrayLength >= 2 {
// First token
token0 = common.BytesToAddress(inputData[pathOffset+32:pathOffset+64])
// Last token
lastTokenOffset := pathOffset + 32 + (pathArrayLength-1)*32
if lastTokenOffset+32 <= uint64(len(inputData)) {
token1 = common.BytesToAddress(inputData[lastTokenOffset:lastTokenOffset+32])
return token0, token1, nil
}
}
}
}
}
return common.Address{}, common.Address{}, fmt.Errorf("unsupported universal router command: 0x%02x", firstCommand)
}
Update ExtractTokensFromCalldata to support UniversalRouter:
func (p *ArbitrumL2Parser) ExtractTokensFromCalldata(calldata []byte) (token0, token1 common.Address, err error) {
if len(calldata) < 4 {
return common.Address{}, common.Address{}, fmt.Errorf("calldata too short")
}
functionSignature := hex.EncodeToString(calldata[:4])
switch functionSignature {
case "3593564c": // execute (UniversalRouter)
return p.extractTokensFromUniversalRouter(calldata[4:])
case "38ed1739": // swapExactTokensForTokens
return p.extractTokensFromSwapExactTokensForTokens(calldata[4:])
// ... rest of cases
}
}
Phase 4: Comprehensive Testing (30 minutes)
-
5-Minute Production Run
make build timeout 300 ./bin/mev-bot start -
Expected Results
- Success rate: 80-90% (up from 0%)
- Valid events: ~120-150 per minute
- Arbitrage opportunities: 1-5 per minute
- Zero rejections: < 20%
-
Log Analysis
# Count successes grep "Enhanced parsing success" logs/mev_bot.log | wc -l # Count rejections grep "REJECTED: Event with zero PoolAddress" logs/mev_bot.log | wc -l # Calculate success rate # Should be > 80%
🔧 Additional Fixes Needed
1. Add Pool Address Discovery
Currently, even with correct token extraction, PoolAddress is still zero because we're not querying the actual pool contracts.
Solution: Add pool address lookup after token extraction:
// In event parser after successful token extraction
if token0 != (common.Address{}) && token1 != (common.Address{}) {
// Query factory to get pool address
poolAddr := p.getPoolAddress(token0, token1, protocol)
event.PoolAddress = poolAddr
}
2. Fix Event Creation Flow
File: pkg/events/parser.go
The event creation needs to properly use extracted tokens:
event := &Event{
Type: Swap,
Protocol: protocol,
PoolAddress: poolAddress, // ← Need to populate this
Token0: token0, // ← These come from extraction
Token1: token1, // ← These come from extraction
TransactionHash: txHash,
BlockNumber: blockNumber,
Timestamp: timestamp,
}
📊 Success Metrics
Before Fix
- ❌ Success Rate: 0.00%
- ❌ Valid Events: 0/minute
- ❌ Opportunities: 0/minute
- ❌ Revenue: $0/day
After Fix (Expected)
- ✅ Success Rate: 80-90%
- ✅ Valid Events: 120-150/minute
- ✅ Opportunities: 1-5/minute
- ✅ Revenue: $100-1000/day (with execution)
⚠️ Risks & Mitigation
Risk 1: Complex Multicall Formats
Impact: Some complex multicalls may still fail Mitigation: Add fallback to heuristic for unknown formats Acceptable: 10-20% failure rate for edge cases
Risk 2: UniversalRouter Command Variants
Impact: Some UniversalRouter commands not supported Mitigation: Add logging for unsupported commands, implement incrementally Acceptable: Cover 80%+ of commands (V3_SWAP, V2_SWAP, WRAP_ETH)
Risk 3: Protocol-Specific Differences
Impact: Each DEX may have slight format variations Mitigation: Test against real transactions from logs Acceptable: 90%+ coverage of major DEXs (Uniswap, SushiSwap, TraderJoe, Camelot)
🚀 Deployment Plan
Step 1: Implement Core Fix (2 hours)
- Replace multicall extraction in L2 parser
- Add comprehensive logging
- Build and initial test
Step 2: Add UniversalRouter Support (1 hour)
- Implement execute() decoder
- Handle V3_SWAP_EXACT_IN and V2_SWAP_EXACT_IN
- Test with real Universal Router transactions
Step 3: Validate (30 minutes)
- Run 5-minute production test
- Analyze success rate (target: >80%)
- Check for any new error patterns
Step 4: Commit & Document (30 minutes)
- Commit changes with detailed message
- Update TODO_AUDIT_FIX.md
- Document any remaining issues
📝 Files to Modify
-
pkg/arbitrum/l2_parser.go(PRIMARY)- Replace extractTokensFromMulticallData() implementation
- Add extractTokensFromUniversalRouter() method
- Update ExtractTokensFromCalldata() with UniversalRouter case
- Estimated changes: ~150 lines
-
pkg/events/parser.go(SECONDARY - if needed)- Verify token extractor is being called correctly
- Add pool address lookup after extraction
- Estimated changes: ~20 lines
-
pkg/arbitrum/l2_parser_test.go(NEW)- Add unit tests for multicall extraction
- Test UniversalRouter decoding
- Test with real transaction data from logs
- Estimated: ~200 lines of tests
✅ Definition of Done
- extractTokensFromMulticallData() no longer calls broken multicall.go
- UniversalRouter execute() transactions are decoded correctly
- Success rate > 80% in 5-minute production run
- Zero address rejections < 20%
- At least 1 arbitrage opportunity detected per minute
- All changes committed with comprehensive message
- Documentation updated with findings
Next Steps: Begin implementation of Phase 1
Estimated Total Time: 3-4 hours Priority: P0 - Must fix before any profit can be generated Status: Ready to implement