- Added poolCache field to EventParser struct with PoolCache interface - Modified getPoolTokens() to check cache before returning zero addresses - Created PoolCache interface in pkg/interfaces for clean separation - Added debug logging to identify pools missing from cache - Documented long-term architecture improvements in PARSER_ARCHITECTURE_IMPROVEMENTS.md This fixes the critical issue where Uniswap V3 swap events would show zero addresses for tokens when transaction calldata was unavailable. The parser now falls back to the pool cache which contains previously discovered pool information. Benefits: - Eliminates zero address errors for known pools - Reduces unnecessary RPC calls - Provides visibility into which pools are missing from cache - Lays foundation for per-exchange parser architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
7.4 KiB
7.4 KiB
Parser Architecture Improvements
Current Issue
Zero address tokens appearing in parsed events due to missing token data when transaction fetch fails.
Immediate Fix Applied (2025-11-09)
- Added pool cache to EventParser
- Parser now checks pool cache before returning zero addresses
- Logs when pools are missing from cache to identify parsing errors
Proposed Long-term Architecture Improvements
1. Individual Parsers Per Exchange Type
Current: Single monolithic EventParser handles all DEX types Proposed: Factory pattern with exchange-specific parsers
type ExchangeParser interface {
ParseEvent(log *types.Log, tx *types.Transaction) (*Event, error)
ValidateEvent(event *Event) error
}
type UniswapV2Parser struct {}
type UniswapV3Parser struct {}
type SushiSwapParser struct {}
type CurveParser struct {}
Benefits:
- Cleaner code with focused responsibility
- Easier to add new DEX types
- Better testability
- Exchange-specific optimizations
2. Background Pool Data Validation Channel
Proposed: Separate goroutine for pool state validation and updates
type PoolValidationEvent struct {
PoolAddress common.Address
ParsedData *PoolData
CachedData *PoolData
Changed bool
ChangedFields []string
}
// Background validation
func (p *Parser) validatePoolData(ctx context.Context) {
for event := range p.poolValidationChan {
cached := p.poolCache.GetPool(event.PoolAddress)
if cached != nil {
// Validate parsed data against cache
if event.ParsedData.Token0 != cached.Token0 {
p.logger.Warn("Token0 mismatch",
"pool", event.PoolAddress,
"parsed", event.ParsedData.Token0,
"cached", cached.Token0)
}
// Log ALL discrepancies
}
// Update cache with latest data
p.poolCache.Update(event.PoolAddress, event.ParsedData)
}
}
Benefits:
- Real-time validation of parsing accuracy
- Identifies when sequencer data changes
- Helps catch parsing bugs immediately
- Non-blocking - doesn't slow down main parsing
- Audit trail of pool state changes
3. Pool Data Validation Against Cache
Current: Parse data, submit event, hope it's correct Proposed: Validate parsed data against known good cache data
func (p *Parser) validateAndEnrichEvent(event *Event) error {
// If pool is in cache, validate parsed data
if cached := p.poolCache.GetPool(event.PoolAddress); cached != nil {
validationErrors := []string{}
// Validate Token0
if event.Token0 != cached.Token0 && event.Token0 != (common.Address{}) {
validationErrors = append(validationErrors,
fmt.Sprintf("Token0 mismatch: parsed=%s, cached=%s",
event.Token0, cached.Token0))
}
// Validate Token1
if event.Token1 != cached.Token1 && event.Token1 != (common.Address{}) {
validationErrors = append(validationErrors,
fmt.Sprintf("Token1 mismatch: parsed=%s, cached=%s",
event.Token1, cached.Token1))
}
// Validate Fee
if event.Fee != cached.Fee && event.Fee != 0 {
validationErrors = append(validationErrors,
fmt.Sprintf("Fee mismatch: parsed=%d, cached=%d",
event.Fee, cached.Fee))
}
if len(validationErrors) > 0 {
p.logger.Error("Event validation failed",
"pool", event.PoolAddress,
"errors", validationErrors)
return fmt.Errorf("validation errors: %v", validationErrors)
}
// Enrich event with cached data if parsed data is missing
if event.Token0 == (common.Address{}) {
event.Token0 = cached.Token0
}
if event.Token1 == (common.Address{}) {
event.Token1 = cached.Token1
}
}
return nil
}
Benefits:
- Self-healing: fixes missing data from cache
- Detects parsing errors immediately
- Provides confidence in parsed data
- Creates audit trail of validation failures
4. Fast Mapping for Pool Retrieval
Current: Already implemented with PoolCache using map[common.Address]*PoolInfo
Optimization: Add multi-index lookups
type PoolCache struct {
byAddress map[common.Address]*PoolInfo
byTokenPair map[string][]*PoolInfo // "token0-token1" sorted
byProtocol map[Protocol][]*PoolInfo
byLiquidityRank []common.Address // Sorted by liquidity
}
// O(1) lookups for all access patterns
func (c *PoolCache) GetByAddress(addr common.Address) *PoolInfo
func (c *PoolCache) GetByTokenPair(t0, t1 common.Address) []*PoolInfo
func (c *PoolCache) GetByProtocol(protocol Protocol) []*PoolInfo
func (c *PoolCache) GetTopByLiquidity(limit int) []*PoolInfo
Benefits:
- O(1) lookups for all common access patterns
- Faster arbitrage path finding
- Better pool discovery
5. Comprehensive Logging for Debugging
type ParsingMetrics struct {
TotalEvents int64
SuccessfulParses int64
FailedParses int64
ZeroAddressCount int64
ValidationFailures int64
CacheHits int64
CacheMisses int64
DataDiscrepancies int64
}
func (p *Parser) logParsingMetrics() {
p.logger.Info("Parsing metrics",
"total", p.metrics.TotalEvents,
"success_rate", float64(p.metrics.SuccessfulParses)/float64(p.metrics.TotalEvents)*100,
"zero_address_rate", float64(p.metrics.ZeroAddressCount)/float64(p.metrics.TotalEvents)*100,
"cache_hit_rate", float64(p.metrics.CacheHits)/float64(p.metrics.CacheHits+p.metrics.CacheMisses)*100,
"validation_failure_rate", float64(p.metrics.ValidationFailures)/float64(p.metrics.TotalEvents)*100)
}
Implementation Roadmap
Phase 1: Immediate (Current)
- ✅ Add pool cache to parser
- ✅ Log missing pools
- ✅ Check cache before returning zero addresses
Phase 2: Validation (Next)
- Add validation channel
- Implement background validator goroutine
- Add validation metrics
- Create alerting for validation failures
Phase 3: Per-Exchange Parsers
- Create ExchangeParser interface
- Implement UniswapV2Parser
- Implement UniswapV3Parser
- Migrate existing code
- Add parser factory
Phase 4: Advanced Features
- Multi-index pool cache
- Historical state tracking
- Anomaly detection
- Performance profiling
Expected Benefits
Immediate
- ✅ Fewer zero address errors
- ✅ Better debugging visibility
- ✅ Reduced RPC calls (use cache)
After Full Implementation
- 99%+ parsing accuracy
- Self-healing parser that fixes missing data
- Real-time detection of parsing issues
- Complete audit trail for troubleshooting
- Faster arbitrage detection
- Easier to add new DEXes
Metrics to Track
-
Parsing Accuracy
- Zero address rate (target: < 0.1%)
- Validation failure rate (target: < 0.5%)
- Cache hit rate (target: > 95%)
-
Performance
- Parse time per event (target: < 1ms)
- Cache lookup time (target: < 0.1ms)
- Validation overhead (target: < 10%)
-
Reliability
- Data discrepancy rate (target: < 0.1%)
- Parser error rate (target: < 0.01%)
- Event drop rate (target: 0%)
Status: Phase 1 completed 2025-11-09 Next: Implement Phase 2 (validation channel)