Files
mev-beta/docs/PARSER_ARCHITECTURE_IMPROVEMENTS.md
Administrator e02ded0a6a fix: use pool cache to avoid zero addresses in Uniswap V3 parsing
- Added poolCache field to EventParser struct with PoolCache interface
- Modified getPoolTokens() to check cache before returning zero addresses
- Created PoolCache interface in pkg/interfaces for clean separation
- Added debug logging to identify pools missing from cache
- Documented long-term architecture improvements in PARSER_ARCHITECTURE_IMPROVEMENTS.md

This fixes the critical issue where Uniswap V3 swap events would show zero
addresses for tokens when transaction calldata was unavailable. The parser
now falls back to the pool cache which contains previously discovered pool
information.

Benefits:
- Eliminates zero address errors for known pools
- Reduces unnecessary RPC calls
- Provides visibility into which pools are missing from cache
- Lays foundation for per-exchange parser architecture

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 09:59:37 +01:00

7.4 KiB

Parser Architecture Improvements

Current Issue

Zero address tokens appearing in parsed events due to missing token data when transaction fetch fails.

Immediate Fix Applied (2025-11-09)

  • Added pool cache to EventParser
  • Parser now checks pool cache before returning zero addresses
  • Logs when pools are missing from cache to identify parsing errors

Proposed Long-term Architecture Improvements

1. Individual Parsers Per Exchange Type

Current: Single monolithic EventParser handles all DEX types Proposed: Factory pattern with exchange-specific parsers

type ExchangeParser interface {
    ParseEvent(log *types.Log, tx *types.Transaction) (*Event, error)
    ValidateEvent(event *Event) error
}

type UniswapV2Parser struct {}
type UniswapV3Parser struct {}
type SushiSwapParser struct {}
type CurveParser struct {}

Benefits:

  • Cleaner code with focused responsibility
  • Easier to add new DEX types
  • Better testability
  • Exchange-specific optimizations

2. Background Pool Data Validation Channel

Proposed: Separate goroutine for pool state validation and updates

type PoolValidationEvent struct {
    PoolAddress  common.Address
    ParsedData   *PoolData
    CachedData   *PoolData
    Changed      bool
    ChangedFields []string
}

// Background validation
func (p *Parser) validatePoolData(ctx context.Context) {
    for event := range p.poolValidationChan {
        cached := p.poolCache.GetPool(event.PoolAddress)
        if cached != nil {
            // Validate parsed data against cache
            if event.ParsedData.Token0 != cached.Token0 {
                p.logger.Warn("Token0 mismatch",
                    "pool", event.PoolAddress,
                    "parsed", event.ParsedData.Token0,
                    "cached", cached.Token0)
            }
            // Log ALL discrepancies
        }
        // Update cache with latest data
        p.poolCache.Update(event.PoolAddress, event.ParsedData)
    }
}

Benefits:

  • Real-time validation of parsing accuracy
  • Identifies when sequencer data changes
  • Helps catch parsing bugs immediately
  • Non-blocking - doesn't slow down main parsing
  • Audit trail of pool state changes

3. Pool Data Validation Against Cache

Current: Parse data, submit event, hope it's correct Proposed: Validate parsed data against known good cache data

func (p *Parser) validateAndEnrichEvent(event *Event) error {
    // If pool is in cache, validate parsed data
    if cached := p.poolCache.GetPool(event.PoolAddress); cached != nil {
        validationErrors := []string{}

        // Validate Token0
        if event.Token0 != cached.Token0 && event.Token0 != (common.Address{}) {
            validationErrors = append(validationErrors,
                fmt.Sprintf("Token0 mismatch: parsed=%s, cached=%s",
                    event.Token0, cached.Token0))
        }

        // Validate Token1
        if event.Token1 != cached.Token1 && event.Token1 != (common.Address{}) {
            validationErrors = append(validationErrors,
                fmt.Sprintf("Token1 mismatch: parsed=%s, cached=%s",
                    event.Token1, cached.Token1))
        }

        // Validate Fee
        if event.Fee != cached.Fee && event.Fee != 0 {
            validationErrors = append(validationErrors,
                fmt.Sprintf("Fee mismatch: parsed=%d, cached=%d",
                    event.Fee, cached.Fee))
        }

        if len(validationErrors) > 0 {
            p.logger.Error("Event validation failed",
                "pool", event.PoolAddress,
                "errors", validationErrors)
            return fmt.Errorf("validation errors: %v", validationErrors)
        }

        // Enrich event with cached data if parsed data is missing
        if event.Token0 == (common.Address{}) {
            event.Token0 = cached.Token0
        }
        if event.Token1 == (common.Address{}) {
            event.Token1 = cached.Token1
        }
    }

    return nil
}

Benefits:

  • Self-healing: fixes missing data from cache
  • Detects parsing errors immediately
  • Provides confidence in parsed data
  • Creates audit trail of validation failures

4. Fast Mapping for Pool Retrieval

Current: Already implemented with PoolCache using map[common.Address]*PoolInfo

Optimization: Add multi-index lookups

type PoolCache struct {
    byAddress      map[common.Address]*PoolInfo
    byTokenPair    map[string][]*PoolInfo // "token0-token1" sorted
    byProtocol     map[Protocol][]*PoolInfo
    byLiquidityRank []common.Address // Sorted by liquidity
}

// O(1) lookups for all access patterns
func (c *PoolCache) GetByAddress(addr common.Address) *PoolInfo
func (c *PoolCache) GetByTokenPair(t0, t1 common.Address) []*PoolInfo
func (c *PoolCache) GetByProtocol(protocol Protocol) []*PoolInfo
func (c *PoolCache) GetTopByLiquidity(limit int) []*PoolInfo

Benefits:

  • O(1) lookups for all common access patterns
  • Faster arbitrage path finding
  • Better pool discovery

5. Comprehensive Logging for Debugging

type ParsingMetrics struct {
    TotalEvents           int64
    SuccessfulParses      int64
    FailedParses          int64
    ZeroAddressCount      int64
    ValidationFailures    int64
    CacheHits             int64
    CacheMisses           int64
    DataDiscrepancies     int64
}

func (p *Parser) logParsingMetrics() {
    p.logger.Info("Parsing metrics",
        "total", p.metrics.TotalEvents,
        "success_rate", float64(p.metrics.SuccessfulParses)/float64(p.metrics.TotalEvents)*100,
        "zero_address_rate", float64(p.metrics.ZeroAddressCount)/float64(p.metrics.TotalEvents)*100,
        "cache_hit_rate", float64(p.metrics.CacheHits)/float64(p.metrics.CacheHits+p.metrics.CacheMisses)*100,
        "validation_failure_rate", float64(p.metrics.ValidationFailures)/float64(p.metrics.TotalEvents)*100)
}

Implementation Roadmap

Phase 1: Immediate (Current)

  • Add pool cache to parser
  • Log missing pools
  • Check cache before returning zero addresses

Phase 2: Validation (Next)

  • Add validation channel
  • Implement background validator goroutine
  • Add validation metrics
  • Create alerting for validation failures

Phase 3: Per-Exchange Parsers

  • Create ExchangeParser interface
  • Implement UniswapV2Parser
  • Implement UniswapV3Parser
  • Migrate existing code
  • Add parser factory

Phase 4: Advanced Features

  • Multi-index pool cache
  • Historical state tracking
  • Anomaly detection
  • Performance profiling

Expected Benefits

Immediate

  • Fewer zero address errors
  • Better debugging visibility
  • Reduced RPC calls (use cache)

After Full Implementation

  • 99%+ parsing accuracy
  • Self-healing parser that fixes missing data
  • Real-time detection of parsing issues
  • Complete audit trail for troubleshooting
  • Faster arbitrage detection
  • Easier to add new DEXes

Metrics to Track

  1. Parsing Accuracy

    • Zero address rate (target: < 0.1%)
    • Validation failure rate (target: < 0.5%)
    • Cache hit rate (target: > 95%)
  2. Performance

    • Parse time per event (target: < 1ms)
    • Cache lookup time (target: < 0.1ms)
    • Validation overhead (target: < 10%)
  3. Reliability

    • Data discrepancy rate (target: < 0.1%)
    • Parser error rate (target: < 0.01%)
    • Event drop rate (target: 0%)

Status: Phase 1 completed 2025-11-09 Next: Implement Phase 2 (validation channel)