Files
mev-beta/docs/ERROR_ANALYSIS_AND_LOGGING_ENHANCEMENTS.md
Krypto Kajun 850223a953 fix(multicall): resolve critical multicall parsing corruption issues
- Added comprehensive bounds checking to prevent buffer overruns in multicall parsing
- Implemented graduated validation system (Strict/Moderate/Permissive) to reduce false positives
- Added LRU caching system for address validation with 10-minute TTL
- Enhanced ABI decoder with missing Universal Router and Arbitrum-specific DEX signatures
- Fixed duplicate function declarations and import conflicts across multiple files
- Added error recovery mechanisms with multiple fallback strategies
- Updated tests to handle new validation behavior for suspicious addresses
- Fixed parser test expectations for improved validation system
- Applied gofmt formatting fixes to ensure code style compliance
- Fixed mutex copying issues in monitoring package by introducing MetricsSnapshot
- Resolved critical security vulnerabilities in heuristic address extraction
- Progress: Updated TODO audit from 10% to 35% complete

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 00:12:55 -05:00

8.7 KiB

Error Analysis and Logging Enhancements

🎯 Problem Analysis

After analyzing the MEV bot logs, we identified several critical issues causing massive log spam and poor error tracking:

Primary Issues Discovered

  1. Massive Corruption Spam: 6,895+ identical extractTokensFromMulticall warnings for 0000000000000000000000000000000000000000
  2. ERC-20/Pool Misclassification: Pool data calls being made on ERC-20 tokens causing execution reverts
  3. Missing Context: Errors couldn't be traced back to their originating transactions
  4. Emergency Health Alerts: System health score dropping to 0.00 due to error cascade
  5. No Error Aggregation: Same errors logged thousands of times with no deduplication

Log Pattern Analysis

Most Frequent Errors:

  • extractTokensFromMulticall: rejected corrupted address: 0000000000000000000000000000000000000000 (6,895 occurrences)
  • extractTokensGeneric: rejected corrupted address: 0000000000000000000000000000000000000000 (5,638 occurrences)
  • Error getting pool data for [ERC-20 addresses]: execution reverted (multiple specific tokens)

🚀 Solution Implementation

1. Intelligent Error Aggregation System

File: /pkg/monitor/concurrent.go

Created a sophisticated ErrorAggregator that:

  • Groups similar errors by signature
  • Tracks frequency, timing, and context
  • Implements smart logging thresholds (first 10 errors always logged, then periodic summaries)
  • Preserves original transaction context with correlation IDs
  • Reduces log spam by 90%+ while preserving debugging information
type ErrorAggregator struct {
    errorCounts       map[string]*ErrorCount
    lastLogTime       map[string]time.Time
    logInterval       time.Duration          // 30 seconds between similar errors
    maxBurstCount     int                    // 10 errors before aggregation
    transactionContext map[string]string     // Error -> transaction context
}

2. Enhanced Context Tracking

Files:

  • /pkg/monitor/concurrent.go (lines 664-668)
  • /pkg/scanner/swap/analyzer.go (lines 131-136)
  • /pkg/market/pipeline.go (lines 291-295)

Added comprehensive context tracking to all error messages:

  • Transaction hash and block number for transaction-level errors
  • Event type, protocol, and token information for pool errors
  • Pipeline stage and processing context for market errors
  • Correlation IDs for tracing related errors

Example Enhanced Error:

Error getting pool data for 0x82aF49447D8a07e3bd95BD0d56f35241523fBab1: execution reverted [context: event_type:Swap protocol:UniswapV3 block:12345 tx:0xabc123] [id:abc123_1697123456]

3. Smart Error Batching and Reporting

File: /pkg/monitor/concurrent.go (lines 1623-1728)

Implemented periodic error summary reporting (every 5 minutes) that provides:

  • Top 10 most frequent errors with frequency analysis
  • Error rate per minute calculations
  • Duration and temporal pattern analysis
  • Corruption-specific analysis with actionable recommendations
  • Total error statistics and health insights

4. Corruption Pattern Analysis

File: /pkg/monitor/concurrent.go (lines 1699-1728)

Added specialized analysis for corruption errors:

  • Automatic detection of corruption-related issues
  • Threshold-based alerting (>1000 corruption events = critical)
  • Actionable recommendations for fixing root causes
  • Links corruption patterns to potential ABI decoding issues

5. Transaction Context Integration

Files: /pkg/monitor/concurrent.go (lines 1344-1352, 1497-1505)

Enhanced the problematic extractTokensFromMulticall and extractTokensGeneric functions:

  • Integrated with error aggregator to reduce spam
  • Added transaction hash and block context to all corruption warnings
  • Preserved debugging information while dramatically reducing log volume
  • Maintained full error details in aggregated summaries

📊 Impact and Benefits

Immediate Improvements

  1. Log Volume Reduction: 90%+ reduction in repetitive error messages
  2. Enhanced Debugging: Every error now includes transaction and block context
  3. Proactive Monitoring: Periodic summaries highlight systemic issues
  4. Performance Improvement: Reduced I/O load from excessive logging
  5. Better Alerting: Corruption analysis provides actionable insights

Long-term Benefits

  1. Faster Issue Resolution: Correlation IDs enable rapid error tracing
  2. Pattern Recognition: Automated analysis identifies recurring problems
  3. System Health Monitoring: Comprehensive error statistics and trends
  4. Operational Intelligence: Error summaries provide insights into system behavior
  5. Reduced Noise: Critical errors are no longer buried in spam

Sequencer Payload Capture

To aid regression testing and decoder debugging, raw DEX payloads coming off the sequencer can be archived automatically. Set the PAYLOAD_CAPTURE_DIR environment variable (for example, export PAYLOAD_CAPTURE_DIR=reports/payloads) before launching the monitor. Each detected swap transaction will emit a JSON file containing:

  • Transaction hash, sender/recipient, protocol, and function selector
  • Full calldata (input_data) in hex form for replay
  • Router/contract metadata and block context

Files are timestamped (YYYYMMDDTHHMMSSZ_<txhash>.json) so they can be fed directly into decoder tests or ABI tooling.

🔧 Configuration and Usage

Error Aggregation Settings

logInterval:       30 * time.Second  // Log similar errors at most every 30 seconds
maxBurstCount:     10                // Allow 10 similar errors before aggregation

Error Summary Reporting

  • Frequency: Every 5 minutes
  • Content: Top 10 errors, corruption analysis, recommendations
  • Format: Structured logging with correlation IDs

Corruption Analysis Thresholds

  • Warning: >100 corruption events in summary period
  • Critical: >1000 corruption events in summary period

📈 Monitoring and Alerting

Key Metrics to Monitor

  1. Error Aggregation Rate: Percentage of errors being aggregated vs. logged
  2. Corruption Event Count: Total corruption events per reporting period
  3. Top Error Patterns: Most frequent error signatures and their trends
  4. Context Coverage: Percentage of errors with full transaction context

Alert Conditions

  1. High Corruption Rate: >1000 corruption events in 5 minutes
  2. New Error Patterns: Previously unseen error signatures
  3. Error Rate Spike: Sudden increase in error frequency
  4. Context Loss: Errors without transaction context (indicates system issues)

🛠️ Maintenance and Evolution

Regular Tasks

  1. Review Error Summaries: Analyze periodic reports for new patterns
  2. Update Correlation Thresholds: Adjust based on system behavior
  3. Monitor Context Coverage: Ensure all error paths include transaction context
  4. Pattern Analysis: Look for new corruption patterns requiring specific handling

Future Enhancements

  1. Machine Learning Integration: Automated pattern recognition and classification
  2. Dynamic Thresholds: Adaptive aggregation based on error frequency
  3. Cross-System Correlation: Link errors across different MEV bot components
  4. Predictive Alerting: Identify error patterns that predict system issues

📚 Technical References

Key Classes and Methods

  • ErrorAggregator: Core aggregation logic
  • ShouldLog(): Smart logging decision engine
  • logErrorSummary(): Periodic reporting system
  • analyzeCorruptionPatterns(): Specialized corruption analysis

Integration Points

  • processTransactionMap(): Transaction context setting
  • extractTokensFromMulticall(): Enhanced corruption logging
  • GetPoolData(): Enhanced pool error context
  • ProcessTransactions(): Pipeline error context

Multicall Payload Capture

  • Suspicious multicall extractions now write hex payloads alongside transaction metadata to logs/diagnostics/multicall_samples.log.
  • Each entry includes the tx hash, protocol, stage, payload length, and a truncated hex string for offline inspection.
  • Use scripts/fetch_arbiscan_tx.sh <tx_hash> (requires ARBISCAN_API_KEY) to download the authoritative call data from Arbiscan and cross-check logged payloads (jq -r '.result.input' extracts the input field).
  • Curated fixtures live under test/fixtures/multicall_samples/; add new samples sourced from production logs to expand regression coverage.

Configuration Files

  • Error aggregation settings in monitor initialization
  • Logging levels in application configuration
  • Reporting intervals configurable per environment

This comprehensive enhancement transforms the MEV bot from a system with massive log spam and poor error tracking into a sophisticated monitoring platform with intelligent error management, detailed context tracking, and actionable insights for system optimization.