- Added comprehensive bounds checking to prevent buffer overruns in multicall parsing - Implemented graduated validation system (Strict/Moderate/Permissive) to reduce false positives - Added LRU caching system for address validation with 10-minute TTL - Enhanced ABI decoder with missing Universal Router and Arbitrum-specific DEX signatures - Fixed duplicate function declarations and import conflicts across multiple files - Added error recovery mechanisms with multiple fallback strategies - Updated tests to handle new validation behavior for suspicious addresses - Fixed parser test expectations for improved validation system - Applied gofmt formatting fixes to ensure code style compliance - Fixed mutex copying issues in monitoring package by introducing MetricsSnapshot - Resolved critical security vulnerabilities in heuristic address extraction - Progress: Updated TODO audit from 10% to 35% complete 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
349 lines
11 KiB
Markdown
349 lines
11 KiB
Markdown
# Testing and Benchmarking Documentation
|
|
|
|
## Overview
|
|
|
|
The MEV Bot project includes comprehensive testing and benchmarking for all critical components, with particular focus on the mathematical functions in the `uniswap` package. This documentation covers the testing strategy, benchmarking procedures, and performance optimization validation.
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Testing
|
|
|
|
The project uses the `testing` package and `testify/assert` for assertions. Tests are organized by package and function:
|
|
|
|
1. **Mathematical Function Tests** - Located in `pkg/uniswap/*_test.go`
|
|
2. **Core Service Tests** - Located in respective package test files
|
|
3. **Integration Tests** - Located in `pkg/test/` directory
|
|
|
|
### Test Categories
|
|
|
|
#### Mathematical Accuracy Tests
|
|
- Verify correctness of Uniswap V3 pricing calculations
|
|
- Validate round-trip conversions (sqrtPriceX96 ↔ price ↔ tick)
|
|
- Test edge cases and boundary conditions
|
|
- Compare optimized vs original implementations
|
|
|
|
#### Functional Tests
|
|
- Test service initialization and configuration
|
|
- Validate event processing workflows
|
|
- Verify database operations
|
|
- Check error handling and recovery
|
|
|
|
#### Integration Tests
|
|
- End-to-end testing of arbitrage detection
|
|
- Network interaction testing
|
|
- Contract interaction validation
|
|
- Performance under load testing
|
|
|
|
## Mathematical Function Testing
|
|
|
|
### Core Pricing Functions
|
|
|
|
#### `SqrtPriceX96ToPrice` Tests
|
|
- Verifies conversion from sqrtPriceX96 to standard price
|
|
- Tests known values (e.g., 2^96 → price = 1.0)
|
|
- Validates precision with floating-point comparisons
|
|
|
|
#### `PriceToSqrtPriceX96` Tests
|
|
- Verifies conversion from standard price to sqrtPriceX96
|
|
- Tests known values (e.g., price = 1.0 → 2^96)
|
|
- Accounts for floating-point precision limitations
|
|
|
|
#### `TickToSqrtPriceX96` Tests
|
|
- Verifies conversion from tick to sqrtPriceX96
|
|
- Tests known values (e.g., tick = 0 → 2^96)
|
|
|
|
#### `SqrtPriceX96ToTick` Tests
|
|
- Verifies conversion from sqrtPriceX96 to tick
|
|
- Tests known values (e.g., 2^96 → tick = 0)
|
|
|
|
### Round-trip Conversion Tests
|
|
|
|
#### `TestRoundTripConversions`
|
|
- Validates sqrtPriceX96 → price → sqrtPriceX96 conversions
|
|
- Tests tick → sqrtPriceX96 → tick conversions
|
|
- Ensures precision is maintained within acceptable tolerance
|
|
|
|
#### `TestGetTickAtSqrtPriceWithUint256`
|
|
- Tests uint256-based tick calculations
|
|
- Validates compatibility with different data types
|
|
|
|
#### `TestTickSpacingCalculations`
|
|
- Tests tick spacing calculations for different fee tiers
|
|
- Validates next/previous tick calculations
|
|
|
|
### Cached Function Tests
|
|
|
|
#### `TestCachedFunctionAccuracy`
|
|
- Compares original vs cached function results
|
|
- Ensures mathematical accuracy is preserved in optimizations
|
|
- Validates that caching doesn't affect precision
|
|
|
|
## Benchmarking
|
|
|
|
### Performance Testing Framework
|
|
|
|
The project uses Go's built-in benchmarking framework with the following approach:
|
|
|
|
1. **Micro-benchmarks** - Individual function performance
|
|
2. **Macro-benchmarks** - End-to-end workflow performance
|
|
3. **Regression testing** - Performance comparison over time
|
|
4. **Load testing** - Performance under concurrent operations
|
|
|
|
### Mathematical Function Benchmarks
|
|
|
|
#### Original Functions
|
|
- `BenchmarkSqrtPriceX96ToPrice` - Baseline performance
|
|
- `BenchmarkPriceToSqrtPriceX96` - Baseline performance
|
|
- `BenchmarkTickToSqrtPriceX96` - Baseline performance
|
|
- `BenchmarkSqrtPriceX96ToTick` - Baseline performance
|
|
|
|
#### Cached Functions
|
|
- `BenchmarkSqrtPriceX96ToPriceCached` - Optimized performance
|
|
- `BenchmarkPriceToSqrtPriceX96Cached` - Optimized performance
|
|
|
|
#### Performance Comparison
|
|
The benchmarks demonstrate significant performance improvements:
|
|
- **SqrtPriceX96ToPriceCached**: ~24% faster than original
|
|
- **PriceToSqrtPriceX96Cached**: ~12% faster than original
|
|
- Memory allocations reduced by 20-33%
|
|
|
|
### Running Tests
|
|
|
|
#### Unit Tests
|
|
```bash
|
|
# Run all unit tests
|
|
go test ./...
|
|
|
|
# Run tests with verbose output
|
|
go test -v ./...
|
|
|
|
# Run tests with coverage
|
|
go test -cover ./...
|
|
|
|
# Run tests with coverage and output to file
|
|
go test -coverprofile=coverage.out ./...
|
|
```
|
|
|
|
#### Mathematical Function Tests
|
|
```bash
|
|
# Run only Uniswap pricing tests
|
|
go test ./pkg/uniswap/...
|
|
|
|
# Run with verbose output
|
|
go test -v ./pkg/uniswap/...
|
|
|
|
# Run with coverage
|
|
go test -cover ./pkg/uniswap/...
|
|
```
|
|
|
|
#### Specific Test Cases
|
|
```bash
|
|
# Run a specific test function
|
|
go test -run TestSqrtPriceX96ToPrice ./pkg/uniswap/
|
|
|
|
# Run tests matching a pattern
|
|
go test -run Test.*Price ./pkg/uniswap/
|
|
```
|
|
|
|
### Math Audit CLI
|
|
|
|
The `tools/math-audit` CLI provides deterministic regression checks for the
|
|
pricing engines across multiple DEX models (Uniswap V2/V3, Camelot/Algebra,
|
|
Ramses, Curve, Balancer, TraderJoe). It also embeds pared-down versions of the
|
|
round-trip and symmetry property tests so that math regressions are caught
|
|
without relying on build tags.
|
|
|
|
```bash
|
|
# Run the audit against the canonical vector set and emit reports
|
|
go run ./tools/math-audit --vectors default --report reports/math/latest
|
|
|
|
# Or use the convenience script (writes to reports/math/latest)
|
|
scripts/run_audit_suite.sh
|
|
|
|
# Via make target
|
|
make math-audit
|
|
```
|
|
|
|
The CLI writes both JSON (`report.json`) and Markdown (`report.md`) summaries
|
|
into the provided directory, which can be attached to CI artifacts or shared
|
|
with reviewers.
|
|
|
|
When the Drone `test-suite` pipeline runs, it persists
|
|
`reports/math/latest/report.{json,md}` as build artifacts. The stage fails if
|
|
either file is missing or empty, guaranteeing downstream Harness promotions have
|
|
the math audit evidence available for review.
|
|
|
|
### Profitability Simulation CLI
|
|
|
|
The profitability harness at `tools/simulation` replays historical opportunity
|
|
vectors and reports hit rate and net profit after gas costs.
|
|
|
|
```bash
|
|
# Run against the bundled default vectors
|
|
make simulate-profit
|
|
|
|
# Override vector file and report location
|
|
SIMULATION_VECTORS=tools/simulation/vectors/my-slice.json \
|
|
scripts/run_profit_simulation.sh /tmp/sim-report
|
|
```
|
|
|
|
The CLI emits stdout summaries and writes structured reports to
|
|
`reports/simulation/latest/summary.{json,md}` (or the directory passed via
|
|
`--report`). Use the Markdown file for change-management artefacts and stash the
|
|
JSON alongside math-audit outputs for reproducible profitability audits.
|
|
|
|
### Environment-Specific Pipelines & Local Hooks
|
|
|
|
CI/CD now runs through Drone and Harness:
|
|
|
|
- **Drone `test-suite`** — lint, race/coverage tests, binary build, smoke start,
|
|
math audit, profitability simulation, and dry-run Docker build.
|
|
- **Drone `security-suite`** — gosec, govulncheck, Nancy, and security fuzz
|
|
tests on protected branches.
|
|
- **Drone `integration-opt-in`** — manual stage for integration tests requiring
|
|
RPC access or heavy fixtures.
|
|
- **Harness `staging_promotion`** — builds on Drone artifacts, packages a Docker
|
|
image, and upgrades the staging environment via Helm.
|
|
|
|
Use `drone exec --pipeline <name>` for local validation and `harness pipeline
|
|
execute --file harness/pipelines/staging.yaml` (or the UI) for promotions.
|
|
|
|
Legacy fork-dependent suites are gated behind optional build tags:
|
|
- `go test -tags='integration legacy' ./...` runs RPC-heavy legacy harnesses.
|
|
- `go test -tags='integration forked' ./test/arbitrage_fork_test.go` exercises fork-only scenarios.
|
|
|
|
Developers should mirror the dev/test gates locally before pushing:
|
|
|
|
```bash
|
|
# Fast dev parity with pipeline-dev
|
|
./scripts/quality-check.sh
|
|
|
|
# Security/math parity with audit pipeline
|
|
./scripts/run_audit_suite.sh
|
|
```
|
|
|
|
The helper `scripts/git-workflow.sh push` command executes the same checks used
|
|
by the CI pre-push hook (formatting, lint, unit tests). Add `./scripts/git-workflow.sh
|
|
push` to your workflow or wire it into `.git/hooks/pre-push` to avoid CI
|
|
surprises.
|
|
|
|
### Running Benchmarks
|
|
|
|
#### Basic Benchmarks
|
|
```bash
|
|
# Run all benchmarks
|
|
go test -bench=. ./...
|
|
|
|
# Run benchmarks with memory profiling
|
|
go test -bench=. -benchmem ./...
|
|
|
|
# Run benchmarks with timing
|
|
go test -bench=. -benchtime=5s ./...
|
|
|
|
# Run specific benchmark
|
|
go test -bench=BenchmarkSqrtPriceX96ToPrice ./pkg/uniswap/
|
|
```
|
|
|
|
#### Benchmark Analysis
|
|
```bash
|
|
# Run benchmarks and save results
|
|
go test -bench=. -benchmem ./pkg/uniswap/ > benchmark_results.txt
|
|
|
|
# Compare benchmark results
|
|
benchcmp old_results.txt new_results.txt
|
|
```
|
|
|
|
## Performance Optimization Validation
|
|
|
|
### Constant Caching Validation
|
|
|
|
The optimization strategy caches expensive constant calculations:
|
|
- `2^96` - Used in sqrtPriceX96 conversions
|
|
- `2^192` - Used in price calculations
|
|
|
|
Validation ensures:
|
|
1. Mathematical accuracy is preserved
|
|
2. Performance improvements are measurable
|
|
3. Memory usage is optimized
|
|
4. Thread safety is maintained
|
|
|
|
### Uint256 Optimization Attempts
|
|
|
|
Attempts to optimize with uint256 operations were evaluated but found to:
|
|
- Not provide performance benefits due to conversion overhead
|
|
- Maintain the same precision as big.Int operations
|
|
- Add complexity without benefit
|
|
|
|
### Memory Allocation Reduction
|
|
|
|
Optimizations focus on:
|
|
- Reducing garbage collection pressure
|
|
- Minimizing object creation in hot paths
|
|
- Reusing precomputed constants
|
|
- Efficient data structure usage
|
|
|
|
## Continuous Integration Testing
|
|
|
|
### Test Automation
|
|
- Unit tests run on every commit
|
|
- Integration tests run on pull requests
|
|
- Performance benchmarks tracked over time
|
|
- Regression testing prevents performance degradation
|
|
|
|
### Code Quality Gates
|
|
- Minimum test coverage thresholds
|
|
- Performance regression detection
|
|
- Static analysis and linting
|
|
- Security scanning
|
|
|
|
## Best Practices
|
|
|
|
### Test Writing
|
|
1. Use table-driven tests for multiple test cases
|
|
2. Include edge cases and boundary conditions
|
|
3. Test error conditions and failure paths
|
|
4. Use meaningful test names and descriptions
|
|
5. Keep tests independent and isolated
|
|
|
|
### Benchmarking
|
|
1. Use realistic test data
|
|
2. Reset timer to exclude setup time
|
|
3. Run benchmarks for sufficient iterations
|
|
4. Compare results against baselines
|
|
5. Document performance expectations
|
|
|
|
### Performance Validation
|
|
1. Measure before and after optimizations
|
|
2. Validate mathematical accuracy is preserved
|
|
3. Test under realistic load conditions
|
|
4. Monitor memory allocation patterns
|
|
5. Profile CPU and memory usage
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Test Issues
|
|
1. **Floating-point precision errors** - Use `assert.InDelta` for floating-point comparisons
|
|
2. **Race conditions** - Use `-race` flag to detect race conditions
|
|
3. **Timeout failures** - Increase test timeout for slow operations
|
|
4. **Resource leaks** - Ensure proper cleanup in test functions
|
|
|
|
### Benchmark Issues
|
|
1. **Unstable results** - Run benchmarks multiple times
|
|
2. **Insufficient iterations** - Increase benchmark time
|
|
3. **External interference** - Run benchmarks on isolated systems
|
|
4. **Measurement noise** - Use statistical analysis for comparison
|
|
|
|
## Future Improvements
|
|
|
|
### Testing Enhancements
|
|
1. Property-based testing with `gopter` or similar libraries
|
|
2. Fuzz testing for edge case discovery
|
|
3. Load testing frameworks for stress testing
|
|
4. Automated performance regression detection
|
|
|
|
### Benchmarking Improvements
|
|
1. Continuous benchmark tracking
|
|
2. Comparative benchmarking across versions
|
|
3. Detailed profiling integration
|
|
4. Resource usage monitoring
|