feat: comprehensive audit infrastructure and Phase 1 refactoring
This commit includes: ## Audit & Testing Infrastructure - scripts/audit.sh: 12-section comprehensive codebase audit - scripts/test.sh: 7 test types (unit, integration, race, bench, coverage, contracts, pkg) - scripts/check-compliance.sh: SPEC.md compliance validation - scripts/check-docs.sh: Documentation coverage checker - scripts/dev.sh: Unified development script with all commands ## Documentation - SPEC.md: Authoritative technical specification - docs/AUDIT_AND_TESTING.md: Complete testing guide (600+ lines) - docs/SCRIPTS_REFERENCE.md: All scripts documented (700+ lines) - docs/README.md: Documentation index and navigation - docs/DEVELOPMENT_SETUP.md: Environment setup guide - docs/REFACTORING_PLAN.md: Systematic refactoring plan ## Phase 1 Refactoring (Critical Fixes) - pkg/validation/helpers.go: Validation functions for addresses/amounts - pkg/sequencer/selector_registry.go: Thread-safe selector registry - pkg/sequencer/reader.go: Fixed race conditions with atomic metrics - pkg/sequencer/swap_filter.go: Fixed race conditions, added error logging - pkg/sequencer/decoder.go: Added address validation ## Changes Summary - Fixed race conditions on 13 metric counters (atomic operations) - Added validation at all ingress points - Eliminated silent error handling - Created selector registry for future ABI migration - Reduced SPEC.md violations from 7 to 5 Build Status: ✅ All packages compile Compliance: ✅ No race conditions, no silent failures Documentation: ✅ 1,700+ lines across 5 comprehensive guides 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
430
SPEC.md
Normal file
430
SPEC.md
Normal file
@@ -0,0 +1,430 @@
|
||||
# MEV Bot Technical Specification
|
||||
|
||||
## Project Overview
|
||||
|
||||
High-performance MEV bot for Arbitrum focused on real-time swap detection and arbitrage opportunities from the Arbitrum sequencer feed.
|
||||
|
||||
## Core Architecture Principles
|
||||
|
||||
### 1. Channel-Based Concurrency
|
||||
**ALL processing, parsing, and logging MUST use Go channels for optimal performance**
|
||||
|
||||
- Non-blocking message passing between components
|
||||
- Worker pools for parallel processing
|
||||
- Buffered channels to prevent backpressure
|
||||
- No synchronous blocking operations in hot paths
|
||||
|
||||
### 2. Sequencer-First Architecture
|
||||
**The Arbitrum sequencer feed is the PRIMARY data source**
|
||||
|
||||
- WebSocket connection to: `wss://arb1.arbitrum.io/feed`
|
||||
- Real-time transaction broadcast before inclusion in blocks
|
||||
- NO reliance on HTTP RPC endpoints except for historical data
|
||||
- Sequencer MUST be isolated in its own channel
|
||||
|
||||
### 3. Official Contract Sources
|
||||
**ALL contract ABIs MUST be derived from official contract sources**
|
||||
|
||||
- Store official DEX contracts in `contracts/lib/` via Foundry
|
||||
- Build contracts using Foundry (`forge build`)
|
||||
- Extract ABIs from build artifacts in `contracts/out/`
|
||||
- Generate Go bindings using `abigen` from extracted ABIs
|
||||
- ALL contracts in `contracts/src/` MUST have bindings
|
||||
- NO manually written ABI JSON files
|
||||
- NO hardcoded function selectors
|
||||
|
||||
## Sequencer Processing Pipeline
|
||||
|
||||
### Stage 1: Message Reception
|
||||
```
|
||||
Arbitrum Sequencer Feed
|
||||
↓
|
||||
[Raw WebSocket Messages]
|
||||
↓
|
||||
Message Channel
|
||||
```
|
||||
|
||||
### Stage 2: Swap Filtering
|
||||
```
|
||||
Message Channel
|
||||
↓
|
||||
[Swap Filter Workers] ← Pool Cache (read-only)
|
||||
↓
|
||||
Swap Event Channel
|
||||
```
|
||||
|
||||
**Swap Filter Responsibilities:**
|
||||
- Identify swap transactions from supported DEXes
|
||||
- Extract pool addresses from transactions
|
||||
- Discover new pools not in cache
|
||||
- Emit SwapEvent to downstream channel
|
||||
|
||||
**Supported DEXes:**
|
||||
- Uniswap V2/V3/V4
|
||||
- Camelot V2/V3/V4
|
||||
- Balancer (all versions)
|
||||
- Kyber (all versions)
|
||||
- Curve (all versions)
|
||||
- SushiSwap
|
||||
- Other UniswapV2-compatible exchanges
|
||||
|
||||
### Stage 3: Pool Discovery
|
||||
```
|
||||
Swap Event Channel
|
||||
↓
|
||||
[Pool Discovery]
|
||||
↓
|
||||
Pool Cache ← Auto-save to disk
|
||||
↓
|
||||
Pool Mapping (address → info)
|
||||
```
|
||||
|
||||
**Pool Cache Behavior:**
|
||||
- Thread-safe concurrent access (RWMutex)
|
||||
- Automatic persistence to JSON every 100 new pools
|
||||
- Periodic saves every 5 minutes
|
||||
- Mapping prevents duplicate processing
|
||||
- First seen timestamp tracking
|
||||
- Swap count statistics
|
||||
|
||||
### Stage 4: Arbitrage Detection
|
||||
```
|
||||
Swap Event Channel
|
||||
↓
|
||||
[Arbitrage Scanner] ← Pool Cache (multi-index)
|
||||
↓
|
||||
Opportunity Channel
|
||||
```
|
||||
|
||||
## Contract Bindings Management
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
contracts/
|
||||
├── lib/ # Foundry dependencies (official DEX contracts)
|
||||
│ ├── v2-core/ # git submodule: Uniswap/v2-core
|
||||
│ ├── v3-core/ # git submodule: Uniswap/v3-core
|
||||
│ ├── camelot-amm/ # git submodule: CamelotLabs/camelot-amm-v2
|
||||
│ └── ...
|
||||
├── src/ # Custom wrapper contracts (if needed)
|
||||
│ └── interfaces/ # Interface contracts for binding generation
|
||||
├── out/ # Foundry build artifacts (gitignored)
|
||||
│ └── *.sol/
|
||||
│ └── *.json # ABI + bytecode
|
||||
└── foundry.toml # Foundry configuration
|
||||
|
||||
bindings/
|
||||
├── uniswap_v2/
|
||||
│ ├── router.go # Generated from IUniswapV2Router02
|
||||
│ └── pair.go # Generated from IUniswapV2Pair
|
||||
├── uniswap_v3/
|
||||
│ └── router.go # Generated from ISwapRouter
|
||||
├── camelot/
|
||||
│ └── router.go # Generated from ICamelotRouter
|
||||
└── README.md # Binding usage documentation
|
||||
```
|
||||
|
||||
### Binding Generation Workflow
|
||||
|
||||
1. **Install Official Contracts**
|
||||
```bash
|
||||
forge install Uniswap/v2-core
|
||||
forge install Uniswap/v3-core
|
||||
forge install Uniswap/v4-core
|
||||
forge install camelotlabs/camelot-amm-v2
|
||||
forge install balancer/balancer-v2-monorepo
|
||||
forge install KyberNetwork/ks-elastic-sc
|
||||
forge install curvefi/curve-contract
|
||||
```
|
||||
|
||||
2. **Build Contracts**
|
||||
```bash
|
||||
forge build
|
||||
```
|
||||
|
||||
3. **Extract ABIs**
|
||||
```bash
|
||||
# Example for UniswapV2Router02
|
||||
jq '.abi' contracts/out/IUniswapV2Router02.sol/IUniswapV2Router02.json > /tmp/router_abi.json
|
||||
```
|
||||
|
||||
4. **Generate Bindings**
|
||||
```bash
|
||||
abigen --abi=/tmp/router_abi.json \
|
||||
--pkg=uniswap_v2 \
|
||||
--type=UniswapV2Router \
|
||||
--out=bindings/uniswap_v2/router.go
|
||||
```
|
||||
|
||||
5. **Automate with Script**
|
||||
- Use `scripts/generate-bindings.sh` to automate steps 3-4
|
||||
- Run after any contract update
|
||||
|
||||
### Binding Usage in Code
|
||||
|
||||
**DO THIS** (ABI-based detection):
|
||||
```go
|
||||
import (
|
||||
"github.com/ethereum/go-ethereum/accounts/abi"
|
||||
"strings"
|
||||
)
|
||||
|
||||
routerABI, _ := abi.JSON(strings.NewReader(uniswap_v2.UniswapV2RouterABI))
|
||||
method, err := routerABI.MethodById(txData[:4])
|
||||
if err == nil {
|
||||
isSwap := strings.Contains(method.Name, "swap")
|
||||
if isSwap {
|
||||
params, _ := method.Inputs.Unpack(txData[4:])
|
||||
// Type-safe parameter access
|
||||
amountIn := params[0].(*big.Int)
|
||||
path := params[2].([]common.Address)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**DON'T DO THIS** (hardcoded selectors):
|
||||
```go
|
||||
// WRONG - hardcoded, fragile, unmaintainable
|
||||
if hex.EncodeToString(txData[0:4]) == "38ed1739" {
|
||||
// swapExactTokensForTokens
|
||||
}
|
||||
```
|
||||
|
||||
## Pool Cache Design
|
||||
|
||||
### Multi-Index Requirements
|
||||
The pool cache MUST support efficient lookups by:
|
||||
|
||||
1. **Address** - Primary key
|
||||
2. **Token Pair** - Find all pools for a pair (A,B)
|
||||
3. **Protocol** - Find all Uniswap pools, all Camelot pools, etc.
|
||||
4. **Liquidity** - Find top N pools by TVL
|
||||
|
||||
### Data Structure
|
||||
```go
|
||||
type PoolInfo struct {
|
||||
Address common.Address
|
||||
Protocol string // "UniswapV2", "Camelot", etc.
|
||||
Version string // "V2", "V3", etc.
|
||||
Token0 common.Address
|
||||
Token1 common.Address
|
||||
Fee uint32 // basis points
|
||||
FirstSeen time.Time
|
||||
LastSeen time.Time
|
||||
SwapCount uint64
|
||||
Liquidity *big.Int // Estimated TVL
|
||||
}
|
||||
|
||||
type PoolCache struct {
|
||||
// Primary storage
|
||||
pools map[common.Address]*PoolInfo
|
||||
|
||||
// Indexes
|
||||
byTokenPair map[TokenPair][]common.Address
|
||||
byProtocol map[string][]common.Address
|
||||
byLiquidity []*PoolInfo // Sorted by liquidity
|
||||
|
||||
mu sync.RWMutex
|
||||
}
|
||||
```
|
||||
|
||||
### Thread Safety
|
||||
- Use `RWMutex` for concurrent read/write access
|
||||
- Read locks for queries
|
||||
- Write locks for updates
|
||||
- No locks held during I/O operations (save to disk)
|
||||
|
||||
## Development Environment
|
||||
|
||||
### Containerized Development
|
||||
**ALL development MUST occur in containers**
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml profiles
|
||||
services:
|
||||
go-dev: # Go 1.21 with full toolchain
|
||||
python-dev: # Python 3.11 for scripts
|
||||
foundry: # Forge, Cast, Anvil for contract work
|
||||
```
|
||||
|
||||
**Start dev environment:**
|
||||
```bash
|
||||
./scripts/dev-up.sh
|
||||
# or
|
||||
podman-compose up -d go-dev python-dev foundry
|
||||
```
|
||||
|
||||
**Enter containers:**
|
||||
```bash
|
||||
podman exec -it mev-go-dev sh
|
||||
podman exec -it mev-foundry sh
|
||||
```
|
||||
|
||||
### Build Process
|
||||
```bash
|
||||
# In go-dev container
|
||||
cd /workspace
|
||||
go build -o bin/mev-bot ./cmd/mev-bot/main.go
|
||||
```
|
||||
|
||||
### Testing
|
||||
```bash
|
||||
# Unit tests
|
||||
go test ./pkg/... -v
|
||||
|
||||
# Integration tests
|
||||
go test ./tests/integration/... -v
|
||||
|
||||
# Benchmarks
|
||||
go test ./pkg/... -bench=. -benchmem
|
||||
```
|
||||
|
||||
## Observability
|
||||
|
||||
### Metrics (Prometheus)
|
||||
Every component MUST export metrics:
|
||||
|
||||
- `sequencer_messages_received_total`
|
||||
- `swaps_detected_total{protocol, version}`
|
||||
- `pools_discovered_total{protocol}`
|
||||
- `arbitrage_opportunities_found_total`
|
||||
- `arbitrage_execution_attempts_total{result}`
|
||||
|
||||
### Logging (Structured)
|
||||
Use go-ethereum's structured logger:
|
||||
|
||||
```go
|
||||
logger.Info("swap detected",
|
||||
"protocol", swap.Protocol.Name,
|
||||
"hash", swap.TxHash,
|
||||
"pool", swap.Pool.Address.Hex(),
|
||||
"token0", swap.Pool.Token0.Hex(),
|
||||
"token1", swap.Pool.Token1.Hex())
|
||||
```
|
||||
|
||||
### Health Monitoring
|
||||
- Sequencer connection status
|
||||
- Message processing rate
|
||||
- Channel buffer utilization
|
||||
- Pool cache hit rate
|
||||
- Arbitrage execution success rate
|
||||
|
||||
## Validation Rules
|
||||
|
||||
### Swap Event Validation
|
||||
MUST validate ALL parsed swap events:
|
||||
|
||||
1. **Non-zero addresses** - token0, token1, pool address
|
||||
2. **Non-zero amounts** - amountIn, amountOut
|
||||
3. **Valid token pair** - token0 < token1 (canonical ordering)
|
||||
4. **Known protocol** - matches supported DEX list
|
||||
5. **Reasonable amounts** - within sanity bounds
|
||||
|
||||
### Reject Invalid Data Immediately
|
||||
- Log rejection with full context
|
||||
- Increment rejection metrics
|
||||
- NEVER propagate invalid data downstream
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Fail-Fast Philosophy
|
||||
- Reject bad data at the source
|
||||
- Log all errors with stack traces
|
||||
- Emit error metrics
|
||||
- Never silent failures
|
||||
|
||||
### Graceful Degradation
|
||||
- Circuit breakers for RPC failover
|
||||
- Retry logic with exponential backoff
|
||||
- Automatic reconnection for WebSocket
|
||||
- Pool cache persistence survives restarts
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
```bash
|
||||
# Sequencer (PRIMARY)
|
||||
ARBITRUM_SEQUENCER_URL=wss://arb1.arbitrum.io/feed
|
||||
|
||||
# RPC (FALLBACK ONLY)
|
||||
RPC_URL=https://arbitrum-mainnet.core.chainstack.com/<key>
|
||||
WS_URL=wss://arbitrum-mainnet.core.chainstack.com/<key>
|
||||
|
||||
# Chain
|
||||
CHAIN_ID=42161
|
||||
|
||||
# API Keys
|
||||
ARBISCAN_API_KEY=<key>
|
||||
|
||||
# Wallet
|
||||
PRIVATE_KEY=<key>
|
||||
```
|
||||
|
||||
### Performance Tuning
|
||||
```bash
|
||||
# Worker pool sizes
|
||||
SWAP_FILTER_WORKERS=16
|
||||
ARBITRAGE_WORKERS=8
|
||||
|
||||
# Channel buffer sizes
|
||||
MESSAGE_BUFFER=1000
|
||||
SWAP_EVENT_BUFFER=500
|
||||
OPPORTUNITY_BUFFER=100
|
||||
|
||||
# Pool cache
|
||||
POOL_CACHE_AUTOSAVE_COUNT=100
|
||||
POOL_CACHE_AUTOSAVE_INTERVAL=5m
|
||||
```
|
||||
|
||||
## Git Workflow
|
||||
|
||||
### Branches
|
||||
- `master` - Stable production branch
|
||||
- `feature/v2-prep` - V2 planning and architecture
|
||||
- `feature/<component>` - Feature branches for V2 components
|
||||
|
||||
### Commit Messages
|
||||
```
|
||||
type(scope): brief description
|
||||
|
||||
- Detailed changes
|
||||
- Why the change was needed
|
||||
- Breaking changes or migration notes
|
||||
|
||||
🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
||||
Co-Authored-By: Claude <noreply@anthropic.com>
|
||||
```
|
||||
|
||||
**Types**: `feat`, `fix`, `perf`, `refactor`, `test`, `docs`, `build`, `ci`
|
||||
|
||||
## Critical Rules
|
||||
|
||||
### MUST DO
|
||||
✅ Use Arbitrum sequencer feed as primary data source
|
||||
✅ Use channels for ALL inter-component communication
|
||||
✅ Derive contract ABIs from official sources via Foundry
|
||||
✅ Generate Go bindings for all contracts with `abigen`
|
||||
✅ Validate ALL parsed data before propagation
|
||||
✅ Use thread-safe concurrent data structures
|
||||
✅ Emit comprehensive metrics and structured logs
|
||||
✅ Run all development in containers
|
||||
✅ Write tests for all components
|
||||
|
||||
### MUST NOT DO
|
||||
❌ Use HTTP RPC as primary data source (sequencer only!)
|
||||
❌ Write manual ABI JSON files (use Foundry builds!)
|
||||
❌ Hardcode function selectors (use ABI lookups!)
|
||||
❌ Allow zero addresses or zero amounts to propagate
|
||||
❌ Use blocking operations in hot paths
|
||||
❌ Modify shared state without locks
|
||||
❌ Silent failures without logging
|
||||
❌ Run builds outside of containers
|
||||
|
||||
## References
|
||||
|
||||
- [Arbitrum Sequencer Feed](https://www.degencode.com/p/decoding-the-arbitrum-sequencer-feed)
|
||||
- [Foundry Book](https://book.getfoundry.sh/)
|
||||
- [Abigen Documentation](https://geth.ethereum.org/docs/tools/abigen)
|
||||
- V2 Architecture: `docs/planning/00_V2_MASTER_PLAN.md`
|
||||
- V2 Task Breakdown: `docs/planning/07_TASK_BREAKDOWN.md`
|
||||
- Project Guidelines: `CLAUDE.md`
|
||||
Reference in New Issue
Block a user