# Parser Architecture Improvements ## Current Issue Zero address tokens appearing in parsed events due to missing token data when transaction fetch fails. ## Immediate Fix Applied (2025-11-09) - Added pool cache to EventParser - Parser now checks pool cache before returning zero addresses - Logs when pools are missing from cache to identify parsing errors ## Proposed Long-term Architecture Improvements ### 1. Individual Parsers Per Exchange Type **Current:** Single monolithic EventParser handles all DEX types **Proposed:** Factory pattern with exchange-specific parsers ```go type ExchangeParser interface { ParseEvent(log *types.Log, tx *types.Transaction) (*Event, error) ValidateEvent(event *Event) error } type UniswapV2Parser struct {} type UniswapV3Parser struct {} type SushiSwapParser struct {} type CurveParser struct {} ``` **Benefits:** - Cleaner code with focused responsibility - Easier to add new DEX types - Better testability - Exchange-specific optimizations --- ### 2. Background Pool Data Validation Channel **Proposed:** Separate goroutine for pool state validation and updates ```go type PoolValidationEvent struct { PoolAddress common.Address ParsedData *PoolData CachedData *PoolData Changed bool ChangedFields []string } // Background validation func (p *Parser) validatePoolData(ctx context.Context) { for event := range p.poolValidationChan { cached := p.poolCache.GetPool(event.PoolAddress) if cached != nil { // Validate parsed data against cache if event.ParsedData.Token0 != cached.Token0 { p.logger.Warn("Token0 mismatch", "pool", event.PoolAddress, "parsed", event.ParsedData.Token0, "cached", cached.Token0) } // Log ALL discrepancies } // Update cache with latest data p.poolCache.Update(event.PoolAddress, event.ParsedData) } } ``` **Benefits:** - Real-time validation of parsing accuracy - Identifies when sequencer data changes - Helps catch parsing bugs immediately - Non-blocking - doesn't slow down main parsing - Audit trail of pool state changes --- ### 3. Pool Data Validation Against Cache **Current:** Parse data, submit event, hope it's correct **Proposed:** Validate parsed data against known good cache data ```go func (p *Parser) validateAndEnrichEvent(event *Event) error { // If pool is in cache, validate parsed data if cached := p.poolCache.GetPool(event.PoolAddress); cached != nil { validationErrors := []string{} // Validate Token0 if event.Token0 != cached.Token0 && event.Token0 != (common.Address{}) { validationErrors = append(validationErrors, fmt.Sprintf("Token0 mismatch: parsed=%s, cached=%s", event.Token0, cached.Token0)) } // Validate Token1 if event.Token1 != cached.Token1 && event.Token1 != (common.Address{}) { validationErrors = append(validationErrors, fmt.Sprintf("Token1 mismatch: parsed=%s, cached=%s", event.Token1, cached.Token1)) } // Validate Fee if event.Fee != cached.Fee && event.Fee != 0 { validationErrors = append(validationErrors, fmt.Sprintf("Fee mismatch: parsed=%d, cached=%d", event.Fee, cached.Fee)) } if len(validationErrors) > 0 { p.logger.Error("Event validation failed", "pool", event.PoolAddress, "errors", validationErrors) return fmt.Errorf("validation errors: %v", validationErrors) } // Enrich event with cached data if parsed data is missing if event.Token0 == (common.Address{}) { event.Token0 = cached.Token0 } if event.Token1 == (common.Address{}) { event.Token1 = cached.Token1 } } return nil } ``` **Benefits:** - Self-healing: fixes missing data from cache - Detects parsing errors immediately - Provides confidence in parsed data - Creates audit trail of validation failures --- ### 4. Fast Mapping for Pool Retrieval **Current:** Already implemented with `PoolCache` using `map[common.Address]*PoolInfo` **Optimization:** Add multi-index lookups ```go type PoolCache struct { byAddress map[common.Address]*PoolInfo byTokenPair map[string][]*PoolInfo // "token0-token1" sorted byProtocol map[Protocol][]*PoolInfo byLiquidityRank []common.Address // Sorted by liquidity } // O(1) lookups for all access patterns func (c *PoolCache) GetByAddress(addr common.Address) *PoolInfo func (c *PoolCache) GetByTokenPair(t0, t1 common.Address) []*PoolInfo func (c *PoolCache) GetByProtocol(protocol Protocol) []*PoolInfo func (c *PoolCache) GetTopByLiquidity(limit int) []*PoolInfo ``` **Benefits:** - O(1) lookups for all common access patterns - Faster arbitrage path finding - Better pool discovery --- ### 5. Comprehensive Logging for Debugging ```go type ParsingMetrics struct { TotalEvents int64 SuccessfulParses int64 FailedParses int64 ZeroAddressCount int64 ValidationFailures int64 CacheHits int64 CacheMisses int64 DataDiscrepancies int64 } func (p *Parser) logParsingMetrics() { p.logger.Info("Parsing metrics", "total", p.metrics.TotalEvents, "success_rate", float64(p.metrics.SuccessfulParses)/float64(p.metrics.TotalEvents)*100, "zero_address_rate", float64(p.metrics.ZeroAddressCount)/float64(p.metrics.TotalEvents)*100, "cache_hit_rate", float64(p.metrics.CacheHits)/float64(p.metrics.CacheHits+p.metrics.CacheMisses)*100, "validation_failure_rate", float64(p.metrics.ValidationFailures)/float64(p.metrics.TotalEvents)*100) } ``` --- ## Implementation Roadmap ### Phase 1: Immediate (Current) - ✅ Add pool cache to parser - ✅ Log missing pools - ✅ Check cache before returning zero addresses ### Phase 2: Validation (Next) - [ ] Add validation channel - [ ] Implement background validator goroutine - [ ] Add validation metrics - [ ] Create alerting for validation failures ### Phase 3: Per-Exchange Parsers - [ ] Create ExchangeParser interface - [ ] Implement UniswapV2Parser - [ ] Implement UniswapV3Parser - [ ] Migrate existing code - [ ] Add parser factory ### Phase 4: Advanced Features - [ ] Multi-index pool cache - [ ] Historical state tracking - [ ] Anomaly detection - [ ] Performance profiling --- ## Expected Benefits ### Immediate - ✅ Fewer zero address errors - ✅ Better debugging visibility - ✅ Reduced RPC calls (use cache) ### After Full Implementation - 99%+ parsing accuracy - Self-healing parser that fixes missing data - Real-time detection of parsing issues - Complete audit trail for troubleshooting - Faster arbitrage detection - Easier to add new DEXes --- ## Metrics to Track 1. **Parsing Accuracy** - Zero address rate (target: < 0.1%) - Validation failure rate (target: < 0.5%) - Cache hit rate (target: > 95%) 2. **Performance** - Parse time per event (target: < 1ms) - Cache lookup time (target: < 0.1ms) - Validation overhead (target: < 10%) 3. **Reliability** - Data discrepancy rate (target: < 0.1%) - Parser error rate (target: < 0.01%) - Event drop rate (target: 0%) --- **Status:** Phase 1 completed 2025-11-09 **Next:** Implement Phase 2 (validation channel)