Restructured project for V2 refactor: **Structure Changes:** - Moved all V1 code to orig/ folder (preserved with git mv) - Created docs/planning/ directory - Added orig/README_V1.md explaining V1 preservation **Planning Documents:** - 00_V2_MASTER_PLAN.md: Complete architecture overview - Executive summary of critical V1 issues - High-level component architecture diagrams - 5-phase implementation roadmap - Success metrics and risk mitigation - 07_TASK_BREAKDOWN.md: Atomic task breakdown - 99+ hours of detailed tasks - Every task < 2 hours (atomic) - Clear dependencies and success criteria - Organized by implementation phase **V2 Key Improvements:** - Per-exchange parsers (factory pattern) - Multi-layer strict validation - Multi-index pool cache - Background validation pipeline - Comprehensive observability **Critical Issues Addressed:** - Zero address tokens (strict validation + cache enrichment) - Parsing accuracy (protocol-specific parsers) - No audit trail (background validation channel) - Inefficient lookups (multi-index cache) - Stats disconnection (event-driven metrics) Next Steps: 1. Review planning documents 2. Begin Phase 1: Foundation (P1-001 through P1-010) 3. Implement parsers in Phase 2 4. Build cache system in Phase 3 5. Add validation pipeline in Phase 4 6. Migrate and test in Phase 5 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
12 KiB
MEV Bot V2 - Master Architecture Plan
Executive Summary
V2 represents a complete architectural overhaul addressing critical parsing, validation, and scalability issues identified in V1. The rebuild focuses on:
- Zero Tolerance for Invalid Data: Eliminate all zero addresses and zero amounts
- Per-Exchange Parser Architecture: Individual parsers for each DEX type
- Real-time Validation Pipeline: Background validation with audit trails
- Scalable Pool Discovery: Efficient caching and multi-index lookups
- Observable System: Comprehensive metrics, logging, and health monitoring
Critical Issues from V1
1. Zero Address/Amount Problems
- Root Cause: Parser returns zero addresses when transaction data unavailable
- Impact: Invalid events submitted to scanner, wasted computation
- V2 Solution: Strict validation at multiple layers + pool cache enrichment
2. Parsing Accuracy Issues
- Root Cause: Monolithic parser handling all DEX types generically
- Impact: Missing token data, incorrect amounts, protocol-specific edge cases
- V2 Solution: Per-exchange parsers with protocol-specific logic
3. No Data Quality Audit Trail
- Root Cause: No validation or comparison of parsed data vs cached data
- Impact: Silent failures, no visibility into parsing degradation
- V2 Solution: Background validation channel with discrepancy logging
4. Inefficient Pool Lookups
- Root Cause: Single-index cache (by address only)
- Impact: Slow arbitrage path discovery, no ranking by liquidity
- V2 Solution: Multi-index cache (address, token pair, protocol, liquidity)
5. Stats Disconnection
- Root Cause: Events detected but not reflected in stats
- Impact: Monitoring blindness, unclear system health
- V2 Solution: Event-driven metrics with guaranteed consistency
V2 Architecture Principles
1. Fail-Fast with Visibility
- Reject invalid data immediately at source
- Log all rejections with detailed context
- Never allow garbage data to propagate
2. Single Responsibility
- One parser per exchange type
- One validator per data type
- One cache per index type
3. Observable by Default
- Every component emits metrics
- Every operation is logged
- Every error has context
4. Self-Healing
- Automatic retry with exponential backoff
- Fallback to cache when RPC fails
- Circuit breakers for cascading failures
5. Test-Driven
- Unit tests for every parser
- Integration tests for full pipeline
- Chaos testing for failure scenarios
High-Level Component Architecture
┌─────────────────────────────────────────────────────────────┐
│ Arbitrum Monitor │
│ - WebSocket subscription │
│ - Transaction/receipt buffering │
│ - Rate limiting & connection management │
└───────────────┬─────────────────────────────────────────────┘
│
├─ Transactions & Receipts
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Parser Factory │
│ - Route to correct parser based on protocol │
│ - Manage parser lifecycle │
└───────────────┬─────────────────────────────────────────────┘
│
┌──────────┼──────────┬──────────┬──────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ ┌────────┐
│Uniswap │ │Uniswap │ │SushiSwap│ │ Camelot │ │ Curve │
│V2 Parser│ │V3 Parser │ │ Parser │ │ Parser │ │ Parser │
└────┬────┘ └────┬─────┘ └───┬────┘ └────┬─────┘ └───┬────┘
│ │ │ │ │
└───────────┴────────────┴───────────┴───────────┘
│
▼
┌────────────────────────────────────────┐
│ Event Validation Layer │
│ - Check zero addresses │
│ - Check zero amounts │
│ - Validate against pool cache │
│ - Log discrepancies │
└────────────┬───────────────────────────┘
│
┌──────────┴──────────┐
│ │
▼ ▼
┌─────────────┐ ┌──────────────────┐
│ Scanner │ │ Background │
│ (Valid │ │ Validation │
│ Events) │ │ Channel │
└─────────────┘ │ (Audit Trail) │
└──────────────────┘
V2 Directory Structure
mev-bot/
├── orig/ # V1 codebase preserved
│ ├── cmd/
│ ├── pkg/
│ ├── internal/
│ └── config/
│
├── docs/
│ └── planning/ # V2 planning documents
│ ├── 00_V2_MASTER_PLAN.md
│ ├── 01_PARSER_ARCHITECTURE.md
│ ├── 02_VALIDATION_PIPELINE.md
│ ├── 03_POOL_CACHE_SYSTEM.md
│ ├── 04_METRICS_OBSERVABILITY.md
│ ├── 05_DATA_FLOW.md
│ ├── 06_IMPLEMENTATION_PHASES.md
│ └── 07_TASK_BREAKDOWN.md
│
├── cmd/
│ └── mev-bot/
│ └── main.go # New V2 entry point
│
├── pkg/
│ ├── parsers/ # NEW: Per-exchange parsers
│ │ ├── factory.go
│ │ ├── interface.go
│ │ ├── uniswap_v2.go
│ │ ├── uniswap_v3.go
│ │ ├── sushiswap.go
│ │ ├── camelot.go
│ │ └── curve.go
│ │
│ ├── validation/ # NEW: Validation pipeline
│ │ ├── validator.go
│ │ ├── rules.go
│ │ ├── background.go
│ │ └── metrics.go
│ │
│ ├── cache/ # NEW: Multi-index cache
│ │ ├── pool_cache.go
│ │ ├── index_by_address.go
│ │ ├── index_by_tokens.go
│ │ ├── index_by_liquidity.go
│ │ └── index_by_protocol.go
│ │
│ ├── discovery/ # Pool discovery system
│ │ ├── scanner.go
│ │ ├── factory_watcher.go
│ │ └── blacklist.go
│ │
│ ├── monitor/ # Arbitrum monitoring
│ │ ├── sequencer.go
│ │ ├── connection.go
│ │ └── rate_limiter.go
│ │
│ ├── events/ # Event types and handling
│ │ ├── types.go
│ │ ├── router.go
│ │ └── processor.go
│ │
│ ├── arbitrage/ # Arbitrage detection
│ │ ├── detector.go
│ │ ├── calculator.go
│ │ └── executor.go
│ │
│ └── observability/ # NEW: Metrics & logging
│ ├── metrics.go
│ ├── logger.go
│ ├── tracing.go
│ └── health.go
│
├── internal/
│ ├── config/ # Configuration management
│ └── utils/ # Shared utilities
│
└── tests/
├── unit/ # Unit tests
├── integration/ # Integration tests
└── e2e/ # End-to-end tests
Implementation Phases
Phase 1: Foundation (Weeks 1-2)
Goal: Set up V2 project structure and core interfaces
Tasks:
- Create V2 directory structure
- Define all interfaces (Parser, Validator, Cache, etc.)
- Set up logging and metrics infrastructure
- Create base test framework
- Implement connection management
Phase 2: Parser Refactor (Weeks 3-5)
Goal: Implement per-exchange parsers with validation
Tasks:
- Create Parser interface and factory
- Implement UniswapV2 parser with tests
- Implement UniswapV3 parser with tests
- Implement SushiSwap parser with tests
- Implement Camelot parser with tests
- Implement Curve parser with tests
- Add strict validation layer
- Integration testing
Phase 3: Cache System (Weeks 6-7)
Goal: Multi-index pool cache with efficient lookups
Tasks:
- Design cache schema
- Implement address index
- Implement token-pair index
- Implement liquidity ranking index
- Implement protocol index
- Add cache persistence
- Add cache invalidation logic
- Performance testing
Phase 4: Validation Pipeline (Weeks 8-9)
Goal: Background validation with audit trails
Tasks:
- Create validation channel
- Implement background validator goroutine
- Add comparison logic (parsed vs cached)
- Implement discrepancy logging
- Create validation metrics
- Add alerting for validation failures
- Integration testing
Phase 5: Migration & Testing (Weeks 10-12)
Goal: Migrate from V1 to V2, comprehensive testing
Tasks:
- Create migration path
- Run parallel systems (V1 and V2)
- Compare outputs
- Fix discrepancies
- Load testing
- Chaos testing
- Production deployment
- Monitoring setup
Success Metrics
Parsing Accuracy
- Zero Address Rate: < 0.01% (target: 0%)
- Zero Amount Rate: < 0.01% (target: 0%)
- Validation Failure Rate: < 0.5%
- Cache Hit Rate: > 95%
Performance
- Parse Time: < 1ms per event (p99)
- Cache Lookup: < 0.1ms (p99)
- End-to-end Latency: < 10ms from receipt to scanner
Reliability
- Uptime: > 99.9%
- Data Discrepancy Rate: < 0.1%
- Event Drop Rate: 0%
Observability
- All Events Logged: 100%
- All Rejections Logged: 100%
- Metrics Coverage: 100% of components
Risk Mitigation
Risk: Breaking Changes During Migration
Mitigation:
- Run V1 and V2 in parallel
- Compare outputs
- Gradual rollout with feature flags
Risk: Performance Degradation
Mitigation:
- Comprehensive benchmarking
- Load testing before deployment
- Circuit breakers for cascading failures
Risk: Incomplete Test Coverage
Mitigation:
- TDD approach for all new code
- Minimum 90% test coverage requirement
- Integration and E2E tests mandatory
Risk: Data Quality Regression
Mitigation:
- Continuous validation against Arbiscan
- Alerting on validation failures
- Automated rollback on critical issues
Next Steps
- Review and approve this master plan
- Read detailed component plans in subsequent documents
- Review task breakdown in
07_TASK_BREAKDOWN.md - Begin Phase 1 implementation
Document Status: Draft for Review Created: 2025-11-10 Last Updated: 2025-11-10 Version: 1.0