Files
mev-beta/SPEC.md
Administrator 3505921207 feat: comprehensive audit infrastructure and Phase 1 refactoring
This commit includes:

## Audit & Testing Infrastructure
- scripts/audit.sh: 12-section comprehensive codebase audit
- scripts/test.sh: 7 test types (unit, integration, race, bench, coverage, contracts, pkg)
- scripts/check-compliance.sh: SPEC.md compliance validation
- scripts/check-docs.sh: Documentation coverage checker
- scripts/dev.sh: Unified development script with all commands

## Documentation
- SPEC.md: Authoritative technical specification
- docs/AUDIT_AND_TESTING.md: Complete testing guide (600+ lines)
- docs/SCRIPTS_REFERENCE.md: All scripts documented (700+ lines)
- docs/README.md: Documentation index and navigation
- docs/DEVELOPMENT_SETUP.md: Environment setup guide
- docs/REFACTORING_PLAN.md: Systematic refactoring plan

## Phase 1 Refactoring (Critical Fixes)
- pkg/validation/helpers.go: Validation functions for addresses/amounts
- pkg/sequencer/selector_registry.go: Thread-safe selector registry
- pkg/sequencer/reader.go: Fixed race conditions with atomic metrics
- pkg/sequencer/swap_filter.go: Fixed race conditions, added error logging
- pkg/sequencer/decoder.go: Added address validation

## Changes Summary
- Fixed race conditions on 13 metric counters (atomic operations)
- Added validation at all ingress points
- Eliminated silent error handling
- Created selector registry for future ABI migration
- Reduced SPEC.md violations from 7 to 5

Build Status:  All packages compile
Compliance:  No race conditions, no silent failures
Documentation:  1,700+ lines across 5 comprehensive guides

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 07:17:13 +01:00

11 KiB

MEV Bot Technical Specification

Project Overview

High-performance MEV bot for Arbitrum focused on real-time swap detection and arbitrage opportunities from the Arbitrum sequencer feed.

Core Architecture Principles

1. Channel-Based Concurrency

ALL processing, parsing, and logging MUST use Go channels for optimal performance

  • Non-blocking message passing between components
  • Worker pools for parallel processing
  • Buffered channels to prevent backpressure
  • No synchronous blocking operations in hot paths

2. Sequencer-First Architecture

The Arbitrum sequencer feed is the PRIMARY data source

  • WebSocket connection to: wss://arb1.arbitrum.io/feed
  • Real-time transaction broadcast before inclusion in blocks
  • NO reliance on HTTP RPC endpoints except for historical data
  • Sequencer MUST be isolated in its own channel

3. Official Contract Sources

ALL contract ABIs MUST be derived from official contract sources

  • Store official DEX contracts in contracts/lib/ via Foundry
  • Build contracts using Foundry (forge build)
  • Extract ABIs from build artifacts in contracts/out/
  • Generate Go bindings using abigen from extracted ABIs
  • ALL contracts in contracts/src/ MUST have bindings
  • NO manually written ABI JSON files
  • NO hardcoded function selectors

Sequencer Processing Pipeline

Stage 1: Message Reception

Arbitrum Sequencer Feed
         ↓
  [Raw WebSocket Messages]
         ↓
   Message Channel

Stage 2: Swap Filtering

Message Channel
      ↓
[Swap Filter Workers]  ← Pool Cache (read-only)
      ↓
 Swap Event Channel

Swap Filter Responsibilities:

  • Identify swap transactions from supported DEXes
  • Extract pool addresses from transactions
  • Discover new pools not in cache
  • Emit SwapEvent to downstream channel

Supported DEXes:

  • Uniswap V2/V3/V4
  • Camelot V2/V3/V4
  • Balancer (all versions)
  • Kyber (all versions)
  • Curve (all versions)
  • SushiSwap
  • Other UniswapV2-compatible exchanges

Stage 3: Pool Discovery

Swap Event Channel
       ↓
[Pool Discovery]
       ↓
   Pool Cache  ← Auto-save to disk
       ↓
  Pool Mapping (address → info)

Pool Cache Behavior:

  • Thread-safe concurrent access (RWMutex)
  • Automatic persistence to JSON every 100 new pools
  • Periodic saves every 5 minutes
  • Mapping prevents duplicate processing
  • First seen timestamp tracking
  • Swap count statistics

Stage 4: Arbitrage Detection

Swap Event Channel
       ↓
[Arbitrage Scanner] ← Pool Cache (multi-index)
       ↓
 Opportunity Channel

Contract Bindings Management

Directory Structure

contracts/
├── lib/               # Foundry dependencies (official DEX contracts)
│   ├── v2-core/      # git submodule: Uniswap/v2-core
│   ├── v3-core/      # git submodule: Uniswap/v3-core
│   ├── camelot-amm/  # git submodule: CamelotLabs/camelot-amm-v2
│   └── ...
├── src/               # Custom wrapper contracts (if needed)
│   └── interfaces/    # Interface contracts for binding generation
├── out/               # Foundry build artifacts (gitignored)
│   └── *.sol/
│       └── *.json    # ABI + bytecode
└── foundry.toml       # Foundry configuration

bindings/
├── uniswap_v2/
│   ├── router.go     # Generated from IUniswapV2Router02
│   └── pair.go       # Generated from IUniswapV2Pair
├── uniswap_v3/
│   └── router.go     # Generated from ISwapRouter
├── camelot/
│   └── router.go     # Generated from ICamelotRouter
└── README.md          # Binding usage documentation

Binding Generation Workflow

  1. Install Official Contracts

    forge install Uniswap/v2-core
    forge install Uniswap/v3-core
    forge install Uniswap/v4-core
    forge install camelotlabs/camelot-amm-v2
    forge install balancer/balancer-v2-monorepo
    forge install KyberNetwork/ks-elastic-sc
    forge install curvefi/curve-contract
    
  2. Build Contracts

    forge build
    
  3. Extract ABIs

    # Example for UniswapV2Router02
    jq '.abi' contracts/out/IUniswapV2Router02.sol/IUniswapV2Router02.json > /tmp/router_abi.json
    
  4. Generate Bindings

    abigen --abi=/tmp/router_abi.json \
           --pkg=uniswap_v2 \
           --type=UniswapV2Router \
           --out=bindings/uniswap_v2/router.go
    
  5. Automate with Script

    • Use scripts/generate-bindings.sh to automate steps 3-4
    • Run after any contract update

Binding Usage in Code

DO THIS (ABI-based detection):

import (
    "github.com/ethereum/go-ethereum/accounts/abi"
    "strings"
)

routerABI, _ := abi.JSON(strings.NewReader(uniswap_v2.UniswapV2RouterABI))
method, err := routerABI.MethodById(txData[:4])
if err == nil {
    isSwap := strings.Contains(method.Name, "swap")
    if isSwap {
        params, _ := method.Inputs.Unpack(txData[4:])
        // Type-safe parameter access
        amountIn := params[0].(*big.Int)
        path := params[2].([]common.Address)
    }
}

DON'T DO THIS (hardcoded selectors):

// WRONG - hardcoded, fragile, unmaintainable
if hex.EncodeToString(txData[0:4]) == "38ed1739" {
    // swapExactTokensForTokens
}

Pool Cache Design

Multi-Index Requirements

The pool cache MUST support efficient lookups by:

  1. Address - Primary key
  2. Token Pair - Find all pools for a pair (A,B)
  3. Protocol - Find all Uniswap pools, all Camelot pools, etc.
  4. Liquidity - Find top N pools by TVL

Data Structure

type PoolInfo struct {
    Address    common.Address
    Protocol   string // "UniswapV2", "Camelot", etc.
    Version    string // "V2", "V3", etc.
    Token0     common.Address
    Token1     common.Address
    Fee        uint32 // basis points
    FirstSeen  time.Time
    LastSeen   time.Time
    SwapCount  uint64
    Liquidity  *big.Int // Estimated TVL
}

type PoolCache struct {
    // Primary storage
    pools map[common.Address]*PoolInfo

    // Indexes
    byTokenPair map[TokenPair][]common.Address
    byProtocol  map[string][]common.Address
    byLiquidity []*PoolInfo // Sorted by liquidity

    mu sync.RWMutex
}

Thread Safety

  • Use RWMutex for concurrent read/write access
  • Read locks for queries
  • Write locks for updates
  • No locks held during I/O operations (save to disk)

Development Environment

Containerized Development

ALL development MUST occur in containers

# docker-compose.yml profiles
services:
  go-dev:       # Go 1.21 with full toolchain
  python-dev:   # Python 3.11 for scripts
  foundry:      # Forge, Cast, Anvil for contract work

Start dev environment:

./scripts/dev-up.sh
# or
podman-compose up -d go-dev python-dev foundry

Enter containers:

podman exec -it mev-go-dev sh
podman exec -it mev-foundry sh

Build Process

# In go-dev container
cd /workspace
go build -o bin/mev-bot ./cmd/mev-bot/main.go

Testing

# Unit tests
go test ./pkg/... -v

# Integration tests
go test ./tests/integration/... -v

# Benchmarks
go test ./pkg/... -bench=. -benchmem

Observability

Metrics (Prometheus)

Every component MUST export metrics:

  • sequencer_messages_received_total
  • swaps_detected_total{protocol, version}
  • pools_discovered_total{protocol}
  • arbitrage_opportunities_found_total
  • arbitrage_execution_attempts_total{result}

Logging (Structured)

Use go-ethereum's structured logger:

logger.Info("swap detected",
    "protocol", swap.Protocol.Name,
    "hash", swap.TxHash,
    "pool", swap.Pool.Address.Hex(),
    "token0", swap.Pool.Token0.Hex(),
    "token1", swap.Pool.Token1.Hex())

Health Monitoring

  • Sequencer connection status
  • Message processing rate
  • Channel buffer utilization
  • Pool cache hit rate
  • Arbitrage execution success rate

Validation Rules

Swap Event Validation

MUST validate ALL parsed swap events:

  1. Non-zero addresses - token0, token1, pool address
  2. Non-zero amounts - amountIn, amountOut
  3. Valid token pair - token0 < token1 (canonical ordering)
  4. Known protocol - matches supported DEX list
  5. Reasonable amounts - within sanity bounds

Reject Invalid Data Immediately

  • Log rejection with full context
  • Increment rejection metrics
  • NEVER propagate invalid data downstream

Error Handling

Fail-Fast Philosophy

  • Reject bad data at the source
  • Log all errors with stack traces
  • Emit error metrics
  • Never silent failures

Graceful Degradation

  • Circuit breakers for RPC failover
  • Retry logic with exponential backoff
  • Automatic reconnection for WebSocket
  • Pool cache persistence survives restarts

Configuration

Environment Variables

# Sequencer (PRIMARY)
ARBITRUM_SEQUENCER_URL=wss://arb1.arbitrum.io/feed

# RPC (FALLBACK ONLY)
RPC_URL=https://arbitrum-mainnet.core.chainstack.com/<key>
WS_URL=wss://arbitrum-mainnet.core.chainstack.com/<key>

# Chain
CHAIN_ID=42161

# API Keys
ARBISCAN_API_KEY=<key>

# Wallet
PRIVATE_KEY=<key>

Performance Tuning

# Worker pool sizes
SWAP_FILTER_WORKERS=16
ARBITRAGE_WORKERS=8

# Channel buffer sizes
MESSAGE_BUFFER=1000
SWAP_EVENT_BUFFER=500
OPPORTUNITY_BUFFER=100

# Pool cache
POOL_CACHE_AUTOSAVE_COUNT=100
POOL_CACHE_AUTOSAVE_INTERVAL=5m

Git Workflow

Branches

  • master - Stable production branch
  • feature/v2-prep - V2 planning and architecture
  • feature/<component> - Feature branches for V2 components

Commit Messages

type(scope): brief description

- Detailed changes
- Why the change was needed
- Breaking changes or migration notes

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>

Types: feat, fix, perf, refactor, test, docs, build, ci

Critical Rules

MUST DO

Use Arbitrum sequencer feed as primary data source Use channels for ALL inter-component communication Derive contract ABIs from official sources via Foundry Generate Go bindings for all contracts with abigen Validate ALL parsed data before propagation Use thread-safe concurrent data structures Emit comprehensive metrics and structured logs Run all development in containers Write tests for all components

MUST NOT DO

Use HTTP RPC as primary data source (sequencer only!) Write manual ABI JSON files (use Foundry builds!) Hardcode function selectors (use ABI lookups!) Allow zero addresses or zero amounts to propagate Use blocking operations in hot paths Modify shared state without locks Silent failures without logging Run builds outside of containers

References