Files
mev-beta/SPEC.md
Administrator 3505921207 feat: comprehensive audit infrastructure and Phase 1 refactoring
This commit includes:

## Audit & Testing Infrastructure
- scripts/audit.sh: 12-section comprehensive codebase audit
- scripts/test.sh: 7 test types (unit, integration, race, bench, coverage, contracts, pkg)
- scripts/check-compliance.sh: SPEC.md compliance validation
- scripts/check-docs.sh: Documentation coverage checker
- scripts/dev.sh: Unified development script with all commands

## Documentation
- SPEC.md: Authoritative technical specification
- docs/AUDIT_AND_TESTING.md: Complete testing guide (600+ lines)
- docs/SCRIPTS_REFERENCE.md: All scripts documented (700+ lines)
- docs/README.md: Documentation index and navigation
- docs/DEVELOPMENT_SETUP.md: Environment setup guide
- docs/REFACTORING_PLAN.md: Systematic refactoring plan

## Phase 1 Refactoring (Critical Fixes)
- pkg/validation/helpers.go: Validation functions for addresses/amounts
- pkg/sequencer/selector_registry.go: Thread-safe selector registry
- pkg/sequencer/reader.go: Fixed race conditions with atomic metrics
- pkg/sequencer/swap_filter.go: Fixed race conditions, added error logging
- pkg/sequencer/decoder.go: Added address validation

## Changes Summary
- Fixed race conditions on 13 metric counters (atomic operations)
- Added validation at all ingress points
- Eliminated silent error handling
- Created selector registry for future ABI migration
- Reduced SPEC.md violations from 7 to 5

Build Status:  All packages compile
Compliance:  No race conditions, no silent failures
Documentation:  1,700+ lines across 5 comprehensive guides

🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 07:17:13 +01:00

431 lines
11 KiB
Markdown

# MEV Bot Technical Specification
## Project Overview
High-performance MEV bot for Arbitrum focused on real-time swap detection and arbitrage opportunities from the Arbitrum sequencer feed.
## Core Architecture Principles
### 1. Channel-Based Concurrency
**ALL processing, parsing, and logging MUST use Go channels for optimal performance**
- Non-blocking message passing between components
- Worker pools for parallel processing
- Buffered channels to prevent backpressure
- No synchronous blocking operations in hot paths
### 2. Sequencer-First Architecture
**The Arbitrum sequencer feed is the PRIMARY data source**
- WebSocket connection to: `wss://arb1.arbitrum.io/feed`
- Real-time transaction broadcast before inclusion in blocks
- NO reliance on HTTP RPC endpoints except for historical data
- Sequencer MUST be isolated in its own channel
### 3. Official Contract Sources
**ALL contract ABIs MUST be derived from official contract sources**
- Store official DEX contracts in `contracts/lib/` via Foundry
- Build contracts using Foundry (`forge build`)
- Extract ABIs from build artifacts in `contracts/out/`
- Generate Go bindings using `abigen` from extracted ABIs
- ALL contracts in `contracts/src/` MUST have bindings
- NO manually written ABI JSON files
- NO hardcoded function selectors
## Sequencer Processing Pipeline
### Stage 1: Message Reception
```
Arbitrum Sequencer Feed
[Raw WebSocket Messages]
Message Channel
```
### Stage 2: Swap Filtering
```
Message Channel
[Swap Filter Workers] ← Pool Cache (read-only)
Swap Event Channel
```
**Swap Filter Responsibilities:**
- Identify swap transactions from supported DEXes
- Extract pool addresses from transactions
- Discover new pools not in cache
- Emit SwapEvent to downstream channel
**Supported DEXes:**
- Uniswap V2/V3/V4
- Camelot V2/V3/V4
- Balancer (all versions)
- Kyber (all versions)
- Curve (all versions)
- SushiSwap
- Other UniswapV2-compatible exchanges
### Stage 3: Pool Discovery
```
Swap Event Channel
[Pool Discovery]
Pool Cache ← Auto-save to disk
Pool Mapping (address → info)
```
**Pool Cache Behavior:**
- Thread-safe concurrent access (RWMutex)
- Automatic persistence to JSON every 100 new pools
- Periodic saves every 5 minutes
- Mapping prevents duplicate processing
- First seen timestamp tracking
- Swap count statistics
### Stage 4: Arbitrage Detection
```
Swap Event Channel
[Arbitrage Scanner] ← Pool Cache (multi-index)
Opportunity Channel
```
## Contract Bindings Management
### Directory Structure
```
contracts/
├── lib/ # Foundry dependencies (official DEX contracts)
│ ├── v2-core/ # git submodule: Uniswap/v2-core
│ ├── v3-core/ # git submodule: Uniswap/v3-core
│ ├── camelot-amm/ # git submodule: CamelotLabs/camelot-amm-v2
│ └── ...
├── src/ # Custom wrapper contracts (if needed)
│ └── interfaces/ # Interface contracts for binding generation
├── out/ # Foundry build artifacts (gitignored)
│ └── *.sol/
│ └── *.json # ABI + bytecode
└── foundry.toml # Foundry configuration
bindings/
├── uniswap_v2/
│ ├── router.go # Generated from IUniswapV2Router02
│ └── pair.go # Generated from IUniswapV2Pair
├── uniswap_v3/
│ └── router.go # Generated from ISwapRouter
├── camelot/
│ └── router.go # Generated from ICamelotRouter
└── README.md # Binding usage documentation
```
### Binding Generation Workflow
1. **Install Official Contracts**
```bash
forge install Uniswap/v2-core
forge install Uniswap/v3-core
forge install Uniswap/v4-core
forge install camelotlabs/camelot-amm-v2
forge install balancer/balancer-v2-monorepo
forge install KyberNetwork/ks-elastic-sc
forge install curvefi/curve-contract
```
2. **Build Contracts**
```bash
forge build
```
3. **Extract ABIs**
```bash
# Example for UniswapV2Router02
jq '.abi' contracts/out/IUniswapV2Router02.sol/IUniswapV2Router02.json > /tmp/router_abi.json
```
4. **Generate Bindings**
```bash
abigen --abi=/tmp/router_abi.json \
--pkg=uniswap_v2 \
--type=UniswapV2Router \
--out=bindings/uniswap_v2/router.go
```
5. **Automate with Script**
- Use `scripts/generate-bindings.sh` to automate steps 3-4
- Run after any contract update
### Binding Usage in Code
**DO THIS** (ABI-based detection):
```go
import (
"github.com/ethereum/go-ethereum/accounts/abi"
"strings"
)
routerABI, _ := abi.JSON(strings.NewReader(uniswap_v2.UniswapV2RouterABI))
method, err := routerABI.MethodById(txData[:4])
if err == nil {
isSwap := strings.Contains(method.Name, "swap")
if isSwap {
params, _ := method.Inputs.Unpack(txData[4:])
// Type-safe parameter access
amountIn := params[0].(*big.Int)
path := params[2].([]common.Address)
}
}
```
**DON'T DO THIS** (hardcoded selectors):
```go
// WRONG - hardcoded, fragile, unmaintainable
if hex.EncodeToString(txData[0:4]) == "38ed1739" {
// swapExactTokensForTokens
}
```
## Pool Cache Design
### Multi-Index Requirements
The pool cache MUST support efficient lookups by:
1. **Address** - Primary key
2. **Token Pair** - Find all pools for a pair (A,B)
3. **Protocol** - Find all Uniswap pools, all Camelot pools, etc.
4. **Liquidity** - Find top N pools by TVL
### Data Structure
```go
type PoolInfo struct {
Address common.Address
Protocol string // "UniswapV2", "Camelot", etc.
Version string // "V2", "V3", etc.
Token0 common.Address
Token1 common.Address
Fee uint32 // basis points
FirstSeen time.Time
LastSeen time.Time
SwapCount uint64
Liquidity *big.Int // Estimated TVL
}
type PoolCache struct {
// Primary storage
pools map[common.Address]*PoolInfo
// Indexes
byTokenPair map[TokenPair][]common.Address
byProtocol map[string][]common.Address
byLiquidity []*PoolInfo // Sorted by liquidity
mu sync.RWMutex
}
```
### Thread Safety
- Use `RWMutex` for concurrent read/write access
- Read locks for queries
- Write locks for updates
- No locks held during I/O operations (save to disk)
## Development Environment
### Containerized Development
**ALL development MUST occur in containers**
```yaml
# docker-compose.yml profiles
services:
go-dev: # Go 1.21 with full toolchain
python-dev: # Python 3.11 for scripts
foundry: # Forge, Cast, Anvil for contract work
```
**Start dev environment:**
```bash
./scripts/dev-up.sh
# or
podman-compose up -d go-dev python-dev foundry
```
**Enter containers:**
```bash
podman exec -it mev-go-dev sh
podman exec -it mev-foundry sh
```
### Build Process
```bash
# In go-dev container
cd /workspace
go build -o bin/mev-bot ./cmd/mev-bot/main.go
```
### Testing
```bash
# Unit tests
go test ./pkg/... -v
# Integration tests
go test ./tests/integration/... -v
# Benchmarks
go test ./pkg/... -bench=. -benchmem
```
## Observability
### Metrics (Prometheus)
Every component MUST export metrics:
- `sequencer_messages_received_total`
- `swaps_detected_total{protocol, version}`
- `pools_discovered_total{protocol}`
- `arbitrage_opportunities_found_total`
- `arbitrage_execution_attempts_total{result}`
### Logging (Structured)
Use go-ethereum's structured logger:
```go
logger.Info("swap detected",
"protocol", swap.Protocol.Name,
"hash", swap.TxHash,
"pool", swap.Pool.Address.Hex(),
"token0", swap.Pool.Token0.Hex(),
"token1", swap.Pool.Token1.Hex())
```
### Health Monitoring
- Sequencer connection status
- Message processing rate
- Channel buffer utilization
- Pool cache hit rate
- Arbitrage execution success rate
## Validation Rules
### Swap Event Validation
MUST validate ALL parsed swap events:
1. **Non-zero addresses** - token0, token1, pool address
2. **Non-zero amounts** - amountIn, amountOut
3. **Valid token pair** - token0 < token1 (canonical ordering)
4. **Known protocol** - matches supported DEX list
5. **Reasonable amounts** - within sanity bounds
### Reject Invalid Data Immediately
- Log rejection with full context
- Increment rejection metrics
- NEVER propagate invalid data downstream
## Error Handling
### Fail-Fast Philosophy
- Reject bad data at the source
- Log all errors with stack traces
- Emit error metrics
- Never silent failures
### Graceful Degradation
- Circuit breakers for RPC failover
- Retry logic with exponential backoff
- Automatic reconnection for WebSocket
- Pool cache persistence survives restarts
## Configuration
### Environment Variables
```bash
# Sequencer (PRIMARY)
ARBITRUM_SEQUENCER_URL=wss://arb1.arbitrum.io/feed
# RPC (FALLBACK ONLY)
RPC_URL=https://arbitrum-mainnet.core.chainstack.com/<key>
WS_URL=wss://arbitrum-mainnet.core.chainstack.com/<key>
# Chain
CHAIN_ID=42161
# API Keys
ARBISCAN_API_KEY=<key>
# Wallet
PRIVATE_KEY=<key>
```
### Performance Tuning
```bash
# Worker pool sizes
SWAP_FILTER_WORKERS=16
ARBITRAGE_WORKERS=8
# Channel buffer sizes
MESSAGE_BUFFER=1000
SWAP_EVENT_BUFFER=500
OPPORTUNITY_BUFFER=100
# Pool cache
POOL_CACHE_AUTOSAVE_COUNT=100
POOL_CACHE_AUTOSAVE_INTERVAL=5m
```
## Git Workflow
### Branches
- `master` - Stable production branch
- `feature/v2-prep` - V2 planning and architecture
- `feature/<component>` - Feature branches for V2 components
### Commit Messages
```
type(scope): brief description
- Detailed changes
- Why the change was needed
- Breaking changes or migration notes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
```
**Types**: `feat`, `fix`, `perf`, `refactor`, `test`, `docs`, `build`, `ci`
## Critical Rules
### MUST DO
✅ Use Arbitrum sequencer feed as primary data source
✅ Use channels for ALL inter-component communication
✅ Derive contract ABIs from official sources via Foundry
✅ Generate Go bindings for all contracts with `abigen`
✅ Validate ALL parsed data before propagation
✅ Use thread-safe concurrent data structures
✅ Emit comprehensive metrics and structured logs
✅ Run all development in containers
✅ Write tests for all components
### MUST NOT DO
❌ Use HTTP RPC as primary data source (sequencer only!)
❌ Write manual ABI JSON files (use Foundry builds!)
❌ Hardcode function selectors (use ABI lookups!)
❌ Allow zero addresses or zero amounts to propagate
❌ Use blocking operations in hot paths
❌ Modify shared state without locks
❌ Silent failures without logging
❌ Run builds outside of containers
## References
- [Arbitrum Sequencer Feed](https://www.degencode.com/p/decoding-the-arbitrum-sequencer-feed)
- [Foundry Book](https://book.getfoundry.sh/)
- [Abigen Documentation](https://geth.ethereum.org/docs/tools/abigen)
- V2 Architecture: `docs/planning/00_V2_MASTER_PLAN.md`
- V2 Task Breakdown: `docs/planning/07_TASK_BREAKDOWN.md`
- Project Guidelines: `CLAUDE.md`