This commit includes: ## Audit & Testing Infrastructure - scripts/audit.sh: 12-section comprehensive codebase audit - scripts/test.sh: 7 test types (unit, integration, race, bench, coverage, contracts, pkg) - scripts/check-compliance.sh: SPEC.md compliance validation - scripts/check-docs.sh: Documentation coverage checker - scripts/dev.sh: Unified development script with all commands ## Documentation - SPEC.md: Authoritative technical specification - docs/AUDIT_AND_TESTING.md: Complete testing guide (600+ lines) - docs/SCRIPTS_REFERENCE.md: All scripts documented (700+ lines) - docs/README.md: Documentation index and navigation - docs/DEVELOPMENT_SETUP.md: Environment setup guide - docs/REFACTORING_PLAN.md: Systematic refactoring plan ## Phase 1 Refactoring (Critical Fixes) - pkg/validation/helpers.go: Validation functions for addresses/amounts - pkg/sequencer/selector_registry.go: Thread-safe selector registry - pkg/sequencer/reader.go: Fixed race conditions with atomic metrics - pkg/sequencer/swap_filter.go: Fixed race conditions, added error logging - pkg/sequencer/decoder.go: Added address validation ## Changes Summary - Fixed race conditions on 13 metric counters (atomic operations) - Added validation at all ingress points - Eliminated silent error handling - Created selector registry for future ABI migration - Reduced SPEC.md violations from 7 to 5 Build Status: ✅ All packages compile Compliance: ✅ No race conditions, no silent failures Documentation: ✅ 1,700+ lines across 5 comprehensive guides 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
431 lines
11 KiB
Markdown
431 lines
11 KiB
Markdown
# MEV Bot Technical Specification
|
|
|
|
## Project Overview
|
|
|
|
High-performance MEV bot for Arbitrum focused on real-time swap detection and arbitrage opportunities from the Arbitrum sequencer feed.
|
|
|
|
## Core Architecture Principles
|
|
|
|
### 1. Channel-Based Concurrency
|
|
**ALL processing, parsing, and logging MUST use Go channels for optimal performance**
|
|
|
|
- Non-blocking message passing between components
|
|
- Worker pools for parallel processing
|
|
- Buffered channels to prevent backpressure
|
|
- No synchronous blocking operations in hot paths
|
|
|
|
### 2. Sequencer-First Architecture
|
|
**The Arbitrum sequencer feed is the PRIMARY data source**
|
|
|
|
- WebSocket connection to: `wss://arb1.arbitrum.io/feed`
|
|
- Real-time transaction broadcast before inclusion in blocks
|
|
- NO reliance on HTTP RPC endpoints except for historical data
|
|
- Sequencer MUST be isolated in its own channel
|
|
|
|
### 3. Official Contract Sources
|
|
**ALL contract ABIs MUST be derived from official contract sources**
|
|
|
|
- Store official DEX contracts in `contracts/lib/` via Foundry
|
|
- Build contracts using Foundry (`forge build`)
|
|
- Extract ABIs from build artifacts in `contracts/out/`
|
|
- Generate Go bindings using `abigen` from extracted ABIs
|
|
- ALL contracts in `contracts/src/` MUST have bindings
|
|
- NO manually written ABI JSON files
|
|
- NO hardcoded function selectors
|
|
|
|
## Sequencer Processing Pipeline
|
|
|
|
### Stage 1: Message Reception
|
|
```
|
|
Arbitrum Sequencer Feed
|
|
↓
|
|
[Raw WebSocket Messages]
|
|
↓
|
|
Message Channel
|
|
```
|
|
|
|
### Stage 2: Swap Filtering
|
|
```
|
|
Message Channel
|
|
↓
|
|
[Swap Filter Workers] ← Pool Cache (read-only)
|
|
↓
|
|
Swap Event Channel
|
|
```
|
|
|
|
**Swap Filter Responsibilities:**
|
|
- Identify swap transactions from supported DEXes
|
|
- Extract pool addresses from transactions
|
|
- Discover new pools not in cache
|
|
- Emit SwapEvent to downstream channel
|
|
|
|
**Supported DEXes:**
|
|
- Uniswap V2/V3/V4
|
|
- Camelot V2/V3/V4
|
|
- Balancer (all versions)
|
|
- Kyber (all versions)
|
|
- Curve (all versions)
|
|
- SushiSwap
|
|
- Other UniswapV2-compatible exchanges
|
|
|
|
### Stage 3: Pool Discovery
|
|
```
|
|
Swap Event Channel
|
|
↓
|
|
[Pool Discovery]
|
|
↓
|
|
Pool Cache ← Auto-save to disk
|
|
↓
|
|
Pool Mapping (address → info)
|
|
```
|
|
|
|
**Pool Cache Behavior:**
|
|
- Thread-safe concurrent access (RWMutex)
|
|
- Automatic persistence to JSON every 100 new pools
|
|
- Periodic saves every 5 minutes
|
|
- Mapping prevents duplicate processing
|
|
- First seen timestamp tracking
|
|
- Swap count statistics
|
|
|
|
### Stage 4: Arbitrage Detection
|
|
```
|
|
Swap Event Channel
|
|
↓
|
|
[Arbitrage Scanner] ← Pool Cache (multi-index)
|
|
↓
|
|
Opportunity Channel
|
|
```
|
|
|
|
## Contract Bindings Management
|
|
|
|
### Directory Structure
|
|
```
|
|
contracts/
|
|
├── lib/ # Foundry dependencies (official DEX contracts)
|
|
│ ├── v2-core/ # git submodule: Uniswap/v2-core
|
|
│ ├── v3-core/ # git submodule: Uniswap/v3-core
|
|
│ ├── camelot-amm/ # git submodule: CamelotLabs/camelot-amm-v2
|
|
│ └── ...
|
|
├── src/ # Custom wrapper contracts (if needed)
|
|
│ └── interfaces/ # Interface contracts for binding generation
|
|
├── out/ # Foundry build artifacts (gitignored)
|
|
│ └── *.sol/
|
|
│ └── *.json # ABI + bytecode
|
|
└── foundry.toml # Foundry configuration
|
|
|
|
bindings/
|
|
├── uniswap_v2/
|
|
│ ├── router.go # Generated from IUniswapV2Router02
|
|
│ └── pair.go # Generated from IUniswapV2Pair
|
|
├── uniswap_v3/
|
|
│ └── router.go # Generated from ISwapRouter
|
|
├── camelot/
|
|
│ └── router.go # Generated from ICamelotRouter
|
|
└── README.md # Binding usage documentation
|
|
```
|
|
|
|
### Binding Generation Workflow
|
|
|
|
1. **Install Official Contracts**
|
|
```bash
|
|
forge install Uniswap/v2-core
|
|
forge install Uniswap/v3-core
|
|
forge install Uniswap/v4-core
|
|
forge install camelotlabs/camelot-amm-v2
|
|
forge install balancer/balancer-v2-monorepo
|
|
forge install KyberNetwork/ks-elastic-sc
|
|
forge install curvefi/curve-contract
|
|
```
|
|
|
|
2. **Build Contracts**
|
|
```bash
|
|
forge build
|
|
```
|
|
|
|
3. **Extract ABIs**
|
|
```bash
|
|
# Example for UniswapV2Router02
|
|
jq '.abi' contracts/out/IUniswapV2Router02.sol/IUniswapV2Router02.json > /tmp/router_abi.json
|
|
```
|
|
|
|
4. **Generate Bindings**
|
|
```bash
|
|
abigen --abi=/tmp/router_abi.json \
|
|
--pkg=uniswap_v2 \
|
|
--type=UniswapV2Router \
|
|
--out=bindings/uniswap_v2/router.go
|
|
```
|
|
|
|
5. **Automate with Script**
|
|
- Use `scripts/generate-bindings.sh` to automate steps 3-4
|
|
- Run after any contract update
|
|
|
|
### Binding Usage in Code
|
|
|
|
**DO THIS** (ABI-based detection):
|
|
```go
|
|
import (
|
|
"github.com/ethereum/go-ethereum/accounts/abi"
|
|
"strings"
|
|
)
|
|
|
|
routerABI, _ := abi.JSON(strings.NewReader(uniswap_v2.UniswapV2RouterABI))
|
|
method, err := routerABI.MethodById(txData[:4])
|
|
if err == nil {
|
|
isSwap := strings.Contains(method.Name, "swap")
|
|
if isSwap {
|
|
params, _ := method.Inputs.Unpack(txData[4:])
|
|
// Type-safe parameter access
|
|
amountIn := params[0].(*big.Int)
|
|
path := params[2].([]common.Address)
|
|
}
|
|
}
|
|
```
|
|
|
|
**DON'T DO THIS** (hardcoded selectors):
|
|
```go
|
|
// WRONG - hardcoded, fragile, unmaintainable
|
|
if hex.EncodeToString(txData[0:4]) == "38ed1739" {
|
|
// swapExactTokensForTokens
|
|
}
|
|
```
|
|
|
|
## Pool Cache Design
|
|
|
|
### Multi-Index Requirements
|
|
The pool cache MUST support efficient lookups by:
|
|
|
|
1. **Address** - Primary key
|
|
2. **Token Pair** - Find all pools for a pair (A,B)
|
|
3. **Protocol** - Find all Uniswap pools, all Camelot pools, etc.
|
|
4. **Liquidity** - Find top N pools by TVL
|
|
|
|
### Data Structure
|
|
```go
|
|
type PoolInfo struct {
|
|
Address common.Address
|
|
Protocol string // "UniswapV2", "Camelot", etc.
|
|
Version string // "V2", "V3", etc.
|
|
Token0 common.Address
|
|
Token1 common.Address
|
|
Fee uint32 // basis points
|
|
FirstSeen time.Time
|
|
LastSeen time.Time
|
|
SwapCount uint64
|
|
Liquidity *big.Int // Estimated TVL
|
|
}
|
|
|
|
type PoolCache struct {
|
|
// Primary storage
|
|
pools map[common.Address]*PoolInfo
|
|
|
|
// Indexes
|
|
byTokenPair map[TokenPair][]common.Address
|
|
byProtocol map[string][]common.Address
|
|
byLiquidity []*PoolInfo // Sorted by liquidity
|
|
|
|
mu sync.RWMutex
|
|
}
|
|
```
|
|
|
|
### Thread Safety
|
|
- Use `RWMutex` for concurrent read/write access
|
|
- Read locks for queries
|
|
- Write locks for updates
|
|
- No locks held during I/O operations (save to disk)
|
|
|
|
## Development Environment
|
|
|
|
### Containerized Development
|
|
**ALL development MUST occur in containers**
|
|
|
|
```yaml
|
|
# docker-compose.yml profiles
|
|
services:
|
|
go-dev: # Go 1.21 with full toolchain
|
|
python-dev: # Python 3.11 for scripts
|
|
foundry: # Forge, Cast, Anvil for contract work
|
|
```
|
|
|
|
**Start dev environment:**
|
|
```bash
|
|
./scripts/dev-up.sh
|
|
# or
|
|
podman-compose up -d go-dev python-dev foundry
|
|
```
|
|
|
|
**Enter containers:**
|
|
```bash
|
|
podman exec -it mev-go-dev sh
|
|
podman exec -it mev-foundry sh
|
|
```
|
|
|
|
### Build Process
|
|
```bash
|
|
# In go-dev container
|
|
cd /workspace
|
|
go build -o bin/mev-bot ./cmd/mev-bot/main.go
|
|
```
|
|
|
|
### Testing
|
|
```bash
|
|
# Unit tests
|
|
go test ./pkg/... -v
|
|
|
|
# Integration tests
|
|
go test ./tests/integration/... -v
|
|
|
|
# Benchmarks
|
|
go test ./pkg/... -bench=. -benchmem
|
|
```
|
|
|
|
## Observability
|
|
|
|
### Metrics (Prometheus)
|
|
Every component MUST export metrics:
|
|
|
|
- `sequencer_messages_received_total`
|
|
- `swaps_detected_total{protocol, version}`
|
|
- `pools_discovered_total{protocol}`
|
|
- `arbitrage_opportunities_found_total`
|
|
- `arbitrage_execution_attempts_total{result}`
|
|
|
|
### Logging (Structured)
|
|
Use go-ethereum's structured logger:
|
|
|
|
```go
|
|
logger.Info("swap detected",
|
|
"protocol", swap.Protocol.Name,
|
|
"hash", swap.TxHash,
|
|
"pool", swap.Pool.Address.Hex(),
|
|
"token0", swap.Pool.Token0.Hex(),
|
|
"token1", swap.Pool.Token1.Hex())
|
|
```
|
|
|
|
### Health Monitoring
|
|
- Sequencer connection status
|
|
- Message processing rate
|
|
- Channel buffer utilization
|
|
- Pool cache hit rate
|
|
- Arbitrage execution success rate
|
|
|
|
## Validation Rules
|
|
|
|
### Swap Event Validation
|
|
MUST validate ALL parsed swap events:
|
|
|
|
1. **Non-zero addresses** - token0, token1, pool address
|
|
2. **Non-zero amounts** - amountIn, amountOut
|
|
3. **Valid token pair** - token0 < token1 (canonical ordering)
|
|
4. **Known protocol** - matches supported DEX list
|
|
5. **Reasonable amounts** - within sanity bounds
|
|
|
|
### Reject Invalid Data Immediately
|
|
- Log rejection with full context
|
|
- Increment rejection metrics
|
|
- NEVER propagate invalid data downstream
|
|
|
|
## Error Handling
|
|
|
|
### Fail-Fast Philosophy
|
|
- Reject bad data at the source
|
|
- Log all errors with stack traces
|
|
- Emit error metrics
|
|
- Never silent failures
|
|
|
|
### Graceful Degradation
|
|
- Circuit breakers for RPC failover
|
|
- Retry logic with exponential backoff
|
|
- Automatic reconnection for WebSocket
|
|
- Pool cache persistence survives restarts
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
```bash
|
|
# Sequencer (PRIMARY)
|
|
ARBITRUM_SEQUENCER_URL=wss://arb1.arbitrum.io/feed
|
|
|
|
# RPC (FALLBACK ONLY)
|
|
RPC_URL=https://arbitrum-mainnet.core.chainstack.com/<key>
|
|
WS_URL=wss://arbitrum-mainnet.core.chainstack.com/<key>
|
|
|
|
# Chain
|
|
CHAIN_ID=42161
|
|
|
|
# API Keys
|
|
ARBISCAN_API_KEY=<key>
|
|
|
|
# Wallet
|
|
PRIVATE_KEY=<key>
|
|
```
|
|
|
|
### Performance Tuning
|
|
```bash
|
|
# Worker pool sizes
|
|
SWAP_FILTER_WORKERS=16
|
|
ARBITRAGE_WORKERS=8
|
|
|
|
# Channel buffer sizes
|
|
MESSAGE_BUFFER=1000
|
|
SWAP_EVENT_BUFFER=500
|
|
OPPORTUNITY_BUFFER=100
|
|
|
|
# Pool cache
|
|
POOL_CACHE_AUTOSAVE_COUNT=100
|
|
POOL_CACHE_AUTOSAVE_INTERVAL=5m
|
|
```
|
|
|
|
## Git Workflow
|
|
|
|
### Branches
|
|
- `master` - Stable production branch
|
|
- `feature/v2-prep` - V2 planning and architecture
|
|
- `feature/<component>` - Feature branches for V2 components
|
|
|
|
### Commit Messages
|
|
```
|
|
type(scope): brief description
|
|
|
|
- Detailed changes
|
|
- Why the change was needed
|
|
- Breaking changes or migration notes
|
|
|
|
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
|
Co-Authored-By: Claude <noreply@anthropic.com>
|
|
```
|
|
|
|
**Types**: `feat`, `fix`, `perf`, `refactor`, `test`, `docs`, `build`, `ci`
|
|
|
|
## Critical Rules
|
|
|
|
### MUST DO
|
|
✅ Use Arbitrum sequencer feed as primary data source
|
|
✅ Use channels for ALL inter-component communication
|
|
✅ Derive contract ABIs from official sources via Foundry
|
|
✅ Generate Go bindings for all contracts with `abigen`
|
|
✅ Validate ALL parsed data before propagation
|
|
✅ Use thread-safe concurrent data structures
|
|
✅ Emit comprehensive metrics and structured logs
|
|
✅ Run all development in containers
|
|
✅ Write tests for all components
|
|
|
|
### MUST NOT DO
|
|
❌ Use HTTP RPC as primary data source (sequencer only!)
|
|
❌ Write manual ABI JSON files (use Foundry builds!)
|
|
❌ Hardcode function selectors (use ABI lookups!)
|
|
❌ Allow zero addresses or zero amounts to propagate
|
|
❌ Use blocking operations in hot paths
|
|
❌ Modify shared state without locks
|
|
❌ Silent failures without logging
|
|
❌ Run builds outside of containers
|
|
|
|
## References
|
|
|
|
- [Arbitrum Sequencer Feed](https://www.degencode.com/p/decoding-the-arbitrum-sequencer-feed)
|
|
- [Foundry Book](https://book.getfoundry.sh/)
|
|
- [Abigen Documentation](https://geth.ethereum.org/docs/tools/abigen)
|
|
- V2 Architecture: `docs/planning/00_V2_MASTER_PLAN.md`
|
|
- V2 Task Breakdown: `docs/planning/07_TASK_BREAKDOWN.md`
|
|
- Project Guidelines: `CLAUDE.md`
|