Files
mev-beta/docs/planning/00_V2_MASTER_PLAN.md
Administrator 803de231ba feat: create v2-prep branch with comprehensive planning
Restructured project for V2 refactor:

**Structure Changes:**
- Moved all V1 code to orig/ folder (preserved with git mv)
- Created docs/planning/ directory
- Added orig/README_V1.md explaining V1 preservation

**Planning Documents:**
- 00_V2_MASTER_PLAN.md: Complete architecture overview
  - Executive summary of critical V1 issues
  - High-level component architecture diagrams
  - 5-phase implementation roadmap
  - Success metrics and risk mitigation

- 07_TASK_BREAKDOWN.md: Atomic task breakdown
  - 99+ hours of detailed tasks
  - Every task < 2 hours (atomic)
  - Clear dependencies and success criteria
  - Organized by implementation phase

**V2 Key Improvements:**
- Per-exchange parsers (factory pattern)
- Multi-layer strict validation
- Multi-index pool cache
- Background validation pipeline
- Comprehensive observability

**Critical Issues Addressed:**
- Zero address tokens (strict validation + cache enrichment)
- Parsing accuracy (protocol-specific parsers)
- No audit trail (background validation channel)
- Inefficient lookups (multi-index cache)
- Stats disconnection (event-driven metrics)

Next Steps:
1. Review planning documents
2. Begin Phase 1: Foundation (P1-001 through P1-010)
3. Implement parsers in Phase 2
4. Build cache system in Phase 3
5. Add validation pipeline in Phase 4
6. Migrate and test in Phase 5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 10:14:26 +01:00

12 KiB

MEV Bot V2 - Master Architecture Plan

Executive Summary

V2 represents a complete architectural overhaul addressing critical parsing, validation, and scalability issues identified in V1. The rebuild focuses on:

  1. Zero Tolerance for Invalid Data: Eliminate all zero addresses and zero amounts
  2. Per-Exchange Parser Architecture: Individual parsers for each DEX type
  3. Real-time Validation Pipeline: Background validation with audit trails
  4. Scalable Pool Discovery: Efficient caching and multi-index lookups
  5. Observable System: Comprehensive metrics, logging, and health monitoring

Critical Issues from V1

1. Zero Address/Amount Problems

  • Root Cause: Parser returns zero addresses when transaction data unavailable
  • Impact: Invalid events submitted to scanner, wasted computation
  • V2 Solution: Strict validation at multiple layers + pool cache enrichment

2. Parsing Accuracy Issues

  • Root Cause: Monolithic parser handling all DEX types generically
  • Impact: Missing token data, incorrect amounts, protocol-specific edge cases
  • V2 Solution: Per-exchange parsers with protocol-specific logic

3. No Data Quality Audit Trail

  • Root Cause: No validation or comparison of parsed data vs cached data
  • Impact: Silent failures, no visibility into parsing degradation
  • V2 Solution: Background validation channel with discrepancy logging

4. Inefficient Pool Lookups

  • Root Cause: Single-index cache (by address only)
  • Impact: Slow arbitrage path discovery, no ranking by liquidity
  • V2 Solution: Multi-index cache (address, token pair, protocol, liquidity)

5. Stats Disconnection

  • Root Cause: Events detected but not reflected in stats
  • Impact: Monitoring blindness, unclear system health
  • V2 Solution: Event-driven metrics with guaranteed consistency

V2 Architecture Principles

1. Fail-Fast with Visibility

  • Reject invalid data immediately at source
  • Log all rejections with detailed context
  • Never allow garbage data to propagate

2. Single Responsibility

  • One parser per exchange type
  • One validator per data type
  • One cache per index type

3. Observable by Default

  • Every component emits metrics
  • Every operation is logged
  • Every error has context

4. Self-Healing

  • Automatic retry with exponential backoff
  • Fallback to cache when RPC fails
  • Circuit breakers for cascading failures

5. Test-Driven

  • Unit tests for every parser
  • Integration tests for full pipeline
  • Chaos testing for failure scenarios

High-Level Component Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Arbitrum Monitor                         │
│  - WebSocket subscription                                   │
│  - Transaction/receipt buffering                            │
│  - Rate limiting & connection management                    │
└───────────────┬─────────────────────────────────────────────┘
                │
                ├─ Transactions & Receipts
                │
                ▼
┌─────────────────────────────────────────────────────────────┐
│                  Parser Factory                              │
│  - Route to correct parser based on protocol                │
│  - Manage parser lifecycle                                  │
└───────────────┬─────────────────────────────────────────────┘
                │
     ┌──────────┼──────────┬──────────┬──────────┐
     │          │          │          │          │
     ▼          ▼          ▼          ▼          ▼
┌─────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ ┌────────┐
│Uniswap  │ │Uniswap   │ │SushiSwap│ │ Camelot  │ │ Curve  │
│V2 Parser│ │V3 Parser │ │ Parser │ │  Parser  │ │ Parser │
└────┬────┘ └────┬─────┘ └───┬────┘ └────┬─────┘ └───┬────┘
     │           │            │           │           │
     └───────────┴────────────┴───────────┴───────────┘
                              │
                              ▼
         ┌────────────────────────────────────────┐
         │        Event Validation Layer          │
         │  - Check zero addresses                │
         │  - Check zero amounts                  │
         │  - Validate against pool cache         │
         │  - Log discrepancies                   │
         └────────────┬───────────────────────────┘
                      │
           ┌──────────┴──────────┐
           │                     │
           ▼                     ▼
    ┌─────────────┐      ┌──────────────────┐
    │   Scanner   │      │ Background       │
    │   (Valid    │      │ Validation       │
    │   Events)   │      │ Channel          │
    └─────────────┘      │ (Audit Trail)    │
                         └──────────────────┘

V2 Directory Structure

mev-bot/
├── orig/                          # V1 codebase preserved
│   ├── cmd/
│   ├── pkg/
│   ├── internal/
│   └── config/
│
├── docs/
│   └── planning/                  # V2 planning documents
│       ├── 00_V2_MASTER_PLAN.md
│       ├── 01_PARSER_ARCHITECTURE.md
│       ├── 02_VALIDATION_PIPELINE.md
│       ├── 03_POOL_CACHE_SYSTEM.md
│       ├── 04_METRICS_OBSERVABILITY.md
│       ├── 05_DATA_FLOW.md
│       ├── 06_IMPLEMENTATION_PHASES.md
│       └── 07_TASK_BREAKDOWN.md
│
├── cmd/
│   └── mev-bot/
│       └── main.go                # New V2 entry point
│
├── pkg/
│   ├── parsers/                   # NEW: Per-exchange parsers
│   │   ├── factory.go
│   │   ├── interface.go
│   │   ├── uniswap_v2.go
│   │   ├── uniswap_v3.go
│   │   ├── sushiswap.go
│   │   ├── camelot.go
│   │   └── curve.go
│   │
│   ├── validation/                # NEW: Validation pipeline
│   │   ├── validator.go
│   │   ├── rules.go
│   │   ├── background.go
│   │   └── metrics.go
│   │
│   ├── cache/                     # NEW: Multi-index cache
│   │   ├── pool_cache.go
│   │   ├── index_by_address.go
│   │   ├── index_by_tokens.go
│   │   ├── index_by_liquidity.go
│   │   └── index_by_protocol.go
│   │
│   ├── discovery/                 # Pool discovery system
│   │   ├── scanner.go
│   │   ├── factory_watcher.go
│   │   └── blacklist.go
│   │
│   ├── monitor/                   # Arbitrum monitoring
│   │   ├── sequencer.go
│   │   ├── connection.go
│   │   └── rate_limiter.go
│   │
│   ├── events/                    # Event types and handling
│   │   ├── types.go
│   │   ├── router.go
│   │   └── processor.go
│   │
│   ├── arbitrage/                 # Arbitrage detection
│   │   ├── detector.go
│   │   ├── calculator.go
│   │   └── executor.go
│   │
│   └── observability/             # NEW: Metrics & logging
│       ├── metrics.go
│       ├── logger.go
│       ├── tracing.go
│       └── health.go
│
├── internal/
│   ├── config/                    # Configuration management
│   └── utils/                     # Shared utilities
│
└── tests/
    ├── unit/                      # Unit tests
    ├── integration/               # Integration tests
    └── e2e/                       # End-to-end tests

Implementation Phases

Phase 1: Foundation (Weeks 1-2)

Goal: Set up V2 project structure and core interfaces

Tasks:

  1. Create V2 directory structure
  2. Define all interfaces (Parser, Validator, Cache, etc.)
  3. Set up logging and metrics infrastructure
  4. Create base test framework
  5. Implement connection management

Phase 2: Parser Refactor (Weeks 3-5)

Goal: Implement per-exchange parsers with validation

Tasks:

  1. Create Parser interface and factory
  2. Implement UniswapV2 parser with tests
  3. Implement UniswapV3 parser with tests
  4. Implement SushiSwap parser with tests
  5. Implement Camelot parser with tests
  6. Implement Curve parser with tests
  7. Add strict validation layer
  8. Integration testing

Phase 3: Cache System (Weeks 6-7)

Goal: Multi-index pool cache with efficient lookups

Tasks:

  1. Design cache schema
  2. Implement address index
  3. Implement token-pair index
  4. Implement liquidity ranking index
  5. Implement protocol index
  6. Add cache persistence
  7. Add cache invalidation logic
  8. Performance testing

Phase 4: Validation Pipeline (Weeks 8-9)

Goal: Background validation with audit trails

Tasks:

  1. Create validation channel
  2. Implement background validator goroutine
  3. Add comparison logic (parsed vs cached)
  4. Implement discrepancy logging
  5. Create validation metrics
  6. Add alerting for validation failures
  7. Integration testing

Phase 5: Migration & Testing (Weeks 10-12)

Goal: Migrate from V1 to V2, comprehensive testing

Tasks:

  1. Create migration path
  2. Run parallel systems (V1 and V2)
  3. Compare outputs
  4. Fix discrepancies
  5. Load testing
  6. Chaos testing
  7. Production deployment
  8. Monitoring setup

Success Metrics

Parsing Accuracy

  • Zero Address Rate: < 0.01% (target: 0%)
  • Zero Amount Rate: < 0.01% (target: 0%)
  • Validation Failure Rate: < 0.5%
  • Cache Hit Rate: > 95%

Performance

  • Parse Time: < 1ms per event (p99)
  • Cache Lookup: < 0.1ms (p99)
  • End-to-end Latency: < 10ms from receipt to scanner

Reliability

  • Uptime: > 99.9%
  • Data Discrepancy Rate: < 0.1%
  • Event Drop Rate: 0%

Observability

  • All Events Logged: 100%
  • All Rejections Logged: 100%
  • Metrics Coverage: 100% of components

Risk Mitigation

Risk: Breaking Changes During Migration

Mitigation:

  • Run V1 and V2 in parallel
  • Compare outputs
  • Gradual rollout with feature flags

Risk: Performance Degradation

Mitigation:

  • Comprehensive benchmarking
  • Load testing before deployment
  • Circuit breakers for cascading failures

Risk: Incomplete Test Coverage

Mitigation:

  • TDD approach for all new code
  • Minimum 90% test coverage requirement
  • Integration and E2E tests mandatory

Risk: Data Quality Regression

Mitigation:

  • Continuous validation against Arbiscan
  • Alerting on validation failures
  • Automated rollback on critical issues

Next Steps

  1. Review and approve this master plan
  2. Read detailed component plans in subsequent documents
  3. Review task breakdown in 07_TASK_BREAKDOWN.md
  4. Begin Phase 1 implementation

Document Status: Draft for Review Created: 2025-11-10 Last Updated: 2025-11-10 Version: 1.0