copper-tone-tech/mev-beta

Files

Administrator e02ded0a6a fix: use pool cache to avoid zero addresses in Uniswap V3 parsing

- Added poolCache field to EventParser struct with PoolCache interface
- Modified getPoolTokens() to check cache before returning zero addresses
- Created PoolCache interface in pkg/interfaces for clean separation
- Added debug logging to identify pools missing from cache
- Documented long-term architecture improvements in PARSER_ARCHITECTURE_IMPROVEMENTS.md

This fixes the critical issue where Uniswap V3 swap events would show zero
addresses for tokens when transaction calldata was unavailable. The parser
now falls back to the pool cache which contains previously discovered pool
information.

Benefits:
- Eliminates zero address errors for known pools
- Reduces unnecessary RPC calls
- Provides visibility into which pools are missing from cache
- Lays foundation for per-exchange parser architecture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-10 09:59:37 +01:00

7.4 KiB

Raw Permalink Blame History

Parser Architecture Improvements

Current Issue

Zero address tokens appearing in parsed events due to missing token data when transaction fetch fails.

Immediate Fix Applied (2025-11-09)

Added pool cache to EventParser
Parser now checks pool cache before returning zero addresses
Logs when pools are missing from cache to identify parsing errors

Proposed Long-term Architecture Improvements

1. Individual Parsers Per Exchange Type

Current: Single monolithic EventParser handles all DEX types Proposed: Factory pattern with exchange-specific parsers

type ExchangeParser interface {
    ParseEvent(log *types.Log, tx *types.Transaction) (*Event, error)
    ValidateEvent(event *Event) error
}

type UniswapV2Parser struct {}
type UniswapV3Parser struct {}
type SushiSwapParser struct {}
type CurveParser struct {}

Benefits:

Cleaner code with focused responsibility
Easier to add new DEX types
Better testability
Exchange-specific optimizations

2. Background Pool Data Validation Channel

Proposed: Separate goroutine for pool state validation and updates

type PoolValidationEvent struct {
    PoolAddress  common.Address
    ParsedData   *PoolData
    CachedData   *PoolData
    Changed      bool
    ChangedFields []string
}

// Background validation
func (p *Parser) validatePoolData(ctx context.Context) {
    for event := range p.poolValidationChan {
        cached := p.poolCache.GetPool(event.PoolAddress)
        if cached != nil {
            // Validate parsed data against cache
            if event.ParsedData.Token0 != cached.Token0 {
                p.logger.Warn("Token0 mismatch",
                    "pool", event.PoolAddress,
                    "parsed", event.ParsedData.Token0,
                    "cached", cached.Token0)
            }
            // Log ALL discrepancies
        }
        // Update cache with latest data
        p.poolCache.Update(event.PoolAddress, event.ParsedData)
    }
}

Benefits:

Real-time validation of parsing accuracy
Identifies when sequencer data changes
Helps catch parsing bugs immediately
Non-blocking - doesn't slow down main parsing
Audit trail of pool state changes

3. Pool Data Validation Against Cache

Current: Parse data, submit event, hope it's correct Proposed: Validate parsed data against known good cache data

func (p *Parser) validateAndEnrichEvent(event *Event) error {
    // If pool is in cache, validate parsed data
    if cached := p.poolCache.GetPool(event.PoolAddress); cached != nil {
        validationErrors := []string{}

        // Validate Token0
        if event.Token0 != cached.Token0 && event.Token0 != (common.Address{}) {
            validationErrors = append(validationErrors,
                fmt.Sprintf("Token0 mismatch: parsed=%s, cached=%s",
                    event.Token0, cached.Token0))
        }

        // Validate Token1
        if event.Token1 != cached.Token1 && event.Token1 != (common.Address{}) {
            validationErrors = append(validationErrors,
                fmt.Sprintf("Token1 mismatch: parsed=%s, cached=%s",
                    event.Token1, cached.Token1))
        }

        // Validate Fee
        if event.Fee != cached.Fee && event.Fee != 0 {
            validationErrors = append(validationErrors,
                fmt.Sprintf("Fee mismatch: parsed=%d, cached=%d",
                    event.Fee, cached.Fee))
        }

        if len(validationErrors) > 0 {
            p.logger.Error("Event validation failed",
                "pool", event.PoolAddress,
                "errors", validationErrors)
            return fmt.Errorf("validation errors: %v", validationErrors)
        }

        // Enrich event with cached data if parsed data is missing
        if event.Token0 == (common.Address{}) {
            event.Token0 = cached.Token0
        }
        if event.Token1 == (common.Address{}) {
            event.Token1 = cached.Token1
        }
    }

    return nil
}

Benefits:

Self-healing: fixes missing data from cache
Detects parsing errors immediately
Provides confidence in parsed data
Creates audit trail of validation failures

4. Fast Mapping for Pool Retrieval

Current: Already implemented with PoolCache using map[common.Address]*PoolInfo

Optimization: Add multi-index lookups

type PoolCache struct {
    byAddress      map[common.Address]*PoolInfo
    byTokenPair    map[string][]*PoolInfo // "token0-token1" sorted
    byProtocol     map[Protocol][]*PoolInfo
    byLiquidityRank []common.Address // Sorted by liquidity
}

// O(1) lookups for all access patterns
func (c *PoolCache) GetByAddress(addr common.Address) *PoolInfo
func (c *PoolCache) GetByTokenPair(t0, t1 common.Address) []*PoolInfo
func (c *PoolCache) GetByProtocol(protocol Protocol) []*PoolInfo
func (c *PoolCache) GetTopByLiquidity(limit int) []*PoolInfo

Benefits:

O(1) lookups for all common access patterns
Faster arbitrage path finding
Better pool discovery

5. Comprehensive Logging for Debugging

type ParsingMetrics struct {
    TotalEvents           int64
    SuccessfulParses      int64
    FailedParses          int64
    ZeroAddressCount      int64
    ValidationFailures    int64
    CacheHits             int64
    CacheMisses           int64
    DataDiscrepancies     int64
}

func (p *Parser) logParsingMetrics() {
    p.logger.Info("Parsing metrics",
        "total", p.metrics.TotalEvents,
        "success_rate", float64(p.metrics.SuccessfulParses)/float64(p.metrics.TotalEvents)*100,
        "zero_address_rate", float64(p.metrics.ZeroAddressCount)/float64(p.metrics.TotalEvents)*100,
        "cache_hit_rate", float64(p.metrics.CacheHits)/float64(p.metrics.CacheHits+p.metrics.CacheMisses)*100,
        "validation_failure_rate", float64(p.metrics.ValidationFailures)/float64(p.metrics.TotalEvents)*100)
}

Implementation Roadmap

Phase 1: Immediate (Current)

✅ Add pool cache to parser
✅ Log missing pools
✅ Check cache before returning zero addresses

Phase 2: Validation (Next)

Add validation channel
Implement background validator goroutine
Add validation metrics
Create alerting for validation failures

Phase 3: Per-Exchange Parsers

Create ExchangeParser interface
Implement UniswapV2Parser
Implement UniswapV3Parser
Migrate existing code
Add parser factory

Phase 4: Advanced Features

Multi-index pool cache
Historical state tracking
Anomaly detection
Performance profiling

Expected Benefits

Immediate

✅ Fewer zero address errors
✅ Better debugging visibility
✅ Reduced RPC calls (use cache)

After Full Implementation

99%+ parsing accuracy
Self-healing parser that fixes missing data
Real-time detection of parsing issues
Complete audit trail for troubleshooting
Faster arbitrage detection
Easier to add new DEXes

Metrics to Track

Parsing Accuracy
- Zero address rate (target: < 0.1%)
- Validation failure rate (target: < 0.5%)
- Cache hit rate (target: > 95%)
Performance
- Parse time per event (target: < 1ms)
- Cache lookup time (target: < 0.1ms)
- Validation overhead (target: < 10%)
Reliability
- Data discrepancy rate (target: < 0.1%)
- Parser error rate (target: < 0.01%)
- Event drop rate (target: 0%)

Status: Phase 1 completed 2025-11-09 Next: Implement Phase 2 (validation channel)