- Added comprehensive bounds checking to prevent buffer overruns in multicall parsing - Implemented graduated validation system (Strict/Moderate/Permissive) to reduce false positives - Added LRU caching system for address validation with 10-minute TTL - Enhanced ABI decoder with missing Universal Router and Arbitrum-specific DEX signatures - Fixed duplicate function declarations and import conflicts across multiple files - Added error recovery mechanisms with multiple fallback strategies - Updated tests to handle new validation behavior for suspicious addresses - Fixed parser test expectations for improved validation system - Applied gofmt formatting fixes to ensure code style compliance - Fixed mutex copying issues in monitoring package by introducing MetricsSnapshot - Resolved critical security vulnerabilities in heuristic address extraction - Progress: Updated TODO audit from 10% to 35% complete 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
11 KiB
Testing and Benchmarking Documentation
Overview
The MEV Bot project includes comprehensive testing and benchmarking for all critical components, with particular focus on the mathematical functions in the uniswap package. This documentation covers the testing strategy, benchmarking procedures, and performance optimization validation.
Testing Strategy
Unit Testing
The project uses the testing package and testify/assert for assertions. Tests are organized by package and function:
- Mathematical Function Tests - Located in
pkg/uniswap/*_test.go - Core Service Tests - Located in respective package test files
- Integration Tests - Located in
pkg/test/directory
Test Categories
Mathematical Accuracy Tests
- Verify correctness of Uniswap V3 pricing calculations
- Validate round-trip conversions (sqrtPriceX96 ↔ price ↔ tick)
- Test edge cases and boundary conditions
- Compare optimized vs original implementations
Functional Tests
- Test service initialization and configuration
- Validate event processing workflows
- Verify database operations
- Check error handling and recovery
Integration Tests
- End-to-end testing of arbitrage detection
- Network interaction testing
- Contract interaction validation
- Performance under load testing
Mathematical Function Testing
Core Pricing Functions
SqrtPriceX96ToPrice Tests
- Verifies conversion from sqrtPriceX96 to standard price
- Tests known values (e.g., 2^96 → price = 1.0)
- Validates precision with floating-point comparisons
PriceToSqrtPriceX96 Tests
- Verifies conversion from standard price to sqrtPriceX96
- Tests known values (e.g., price = 1.0 → 2^96)
- Accounts for floating-point precision limitations
TickToSqrtPriceX96 Tests
- Verifies conversion from tick to sqrtPriceX96
- Tests known values (e.g., tick = 0 → 2^96)
SqrtPriceX96ToTick Tests
- Verifies conversion from sqrtPriceX96 to tick
- Tests known values (e.g., 2^96 → tick = 0)
Round-trip Conversion Tests
TestRoundTripConversions
- Validates sqrtPriceX96 → price → sqrtPriceX96 conversions
- Tests tick → sqrtPriceX96 → tick conversions
- Ensures precision is maintained within acceptable tolerance
TestGetTickAtSqrtPriceWithUint256
- Tests uint256-based tick calculations
- Validates compatibility with different data types
TestTickSpacingCalculations
- Tests tick spacing calculations for different fee tiers
- Validates next/previous tick calculations
Cached Function Tests
TestCachedFunctionAccuracy
- Compares original vs cached function results
- Ensures mathematical accuracy is preserved in optimizations
- Validates that caching doesn't affect precision
Benchmarking
Performance Testing Framework
The project uses Go's built-in benchmarking framework with the following approach:
- Micro-benchmarks - Individual function performance
- Macro-benchmarks - End-to-end workflow performance
- Regression testing - Performance comparison over time
- Load testing - Performance under concurrent operations
Mathematical Function Benchmarks
Original Functions
BenchmarkSqrtPriceX96ToPrice- Baseline performanceBenchmarkPriceToSqrtPriceX96- Baseline performanceBenchmarkTickToSqrtPriceX96- Baseline performanceBenchmarkSqrtPriceX96ToTick- Baseline performance
Cached Functions
BenchmarkSqrtPriceX96ToPriceCached- Optimized performanceBenchmarkPriceToSqrtPriceX96Cached- Optimized performance
Performance Comparison
The benchmarks demonstrate significant performance improvements:
- SqrtPriceX96ToPriceCached: ~24% faster than original
- PriceToSqrtPriceX96Cached: ~12% faster than original
- Memory allocations reduced by 20-33%
Running Tests
Unit Tests
# Run all unit tests
go test ./...
# Run tests with verbose output
go test -v ./...
# Run tests with coverage
go test -cover ./...
# Run tests with coverage and output to file
go test -coverprofile=coverage.out ./...
Mathematical Function Tests
# Run only Uniswap pricing tests
go test ./pkg/uniswap/...
# Run with verbose output
go test -v ./pkg/uniswap/...
# Run with coverage
go test -cover ./pkg/uniswap/...
Specific Test Cases
# Run a specific test function
go test -run TestSqrtPriceX96ToPrice ./pkg/uniswap/
# Run tests matching a pattern
go test -run Test.*Price ./pkg/uniswap/
Math Audit CLI
The tools/math-audit CLI provides deterministic regression checks for the
pricing engines across multiple DEX models (Uniswap V2/V3, Camelot/Algebra,
Ramses, Curve, Balancer, TraderJoe). It also embeds pared-down versions of the
round-trip and symmetry property tests so that math regressions are caught
without relying on build tags.
# Run the audit against the canonical vector set and emit reports
go run ./tools/math-audit --vectors default --report reports/math/latest
# Or use the convenience script (writes to reports/math/latest)
scripts/run_audit_suite.sh
# Via make target
make math-audit
The CLI writes both JSON (report.json) and Markdown (report.md) summaries
into the provided directory, which can be attached to CI artifacts or shared
with reviewers.
When the Drone test-suite pipeline runs, it persists
reports/math/latest/report.{json,md} as build artifacts. The stage fails if
either file is missing or empty, guaranteeing downstream Harness promotions have
the math audit evidence available for review.
Profitability Simulation CLI
The profitability harness at tools/simulation replays historical opportunity
vectors and reports hit rate and net profit after gas costs.
# Run against the bundled default vectors
make simulate-profit
# Override vector file and report location
SIMULATION_VECTORS=tools/simulation/vectors/my-slice.json \
scripts/run_profit_simulation.sh /tmp/sim-report
The CLI emits stdout summaries and writes structured reports to
reports/simulation/latest/summary.{json,md} (or the directory passed via
--report). Use the Markdown file for change-management artefacts and stash the
JSON alongside math-audit outputs for reproducible profitability audits.
Environment-Specific Pipelines & Local Hooks
CI/CD now runs through Drone and Harness:
- Drone
test-suite— lint, race/coverage tests, binary build, smoke start, math audit, profitability simulation, and dry-run Docker build. - Drone
security-suite— gosec, govulncheck, Nancy, and security fuzz tests on protected branches. - Drone
integration-opt-in— manual stage for integration tests requiring RPC access or heavy fixtures. - Harness
staging_promotion— builds on Drone artifacts, packages a Docker image, and upgrades the staging environment via Helm.
Use drone exec --pipeline <name> for local validation and harness pipeline execute --file harness/pipelines/staging.yaml (or the UI) for promotions.
Legacy fork-dependent suites are gated behind optional build tags:
go test -tags='integration legacy' ./...runs RPC-heavy legacy harnesses.go test -tags='integration forked' ./test/arbitrage_fork_test.goexercises fork-only scenarios.
Developers should mirror the dev/test gates locally before pushing:
# Fast dev parity with pipeline-dev
./scripts/quality-check.sh
# Security/math parity with audit pipeline
./scripts/run_audit_suite.sh
The helper scripts/git-workflow.sh push command executes the same checks used
by the CI pre-push hook (formatting, lint, unit tests). Add ./scripts/git-workflow.sh push to your workflow or wire it into .git/hooks/pre-push to avoid CI
surprises.
Running Benchmarks
Basic Benchmarks
# Run all benchmarks
go test -bench=. ./...
# Run benchmarks with memory profiling
go test -bench=. -benchmem ./...
# Run benchmarks with timing
go test -bench=. -benchtime=5s ./...
# Run specific benchmark
go test -bench=BenchmarkSqrtPriceX96ToPrice ./pkg/uniswap/
Benchmark Analysis
# Run benchmarks and save results
go test -bench=. -benchmem ./pkg/uniswap/ > benchmark_results.txt
# Compare benchmark results
benchcmp old_results.txt new_results.txt
Performance Optimization Validation
Constant Caching Validation
The optimization strategy caches expensive constant calculations:
2^96- Used in sqrtPriceX96 conversions2^192- Used in price calculations
Validation ensures:
- Mathematical accuracy is preserved
- Performance improvements are measurable
- Memory usage is optimized
- Thread safety is maintained
Uint256 Optimization Attempts
Attempts to optimize with uint256 operations were evaluated but found to:
- Not provide performance benefits due to conversion overhead
- Maintain the same precision as big.Int operations
- Add complexity without benefit
Memory Allocation Reduction
Optimizations focus on:
- Reducing garbage collection pressure
- Minimizing object creation in hot paths
- Reusing precomputed constants
- Efficient data structure usage
Continuous Integration Testing
Test Automation
- Unit tests run on every commit
- Integration tests run on pull requests
- Performance benchmarks tracked over time
- Regression testing prevents performance degradation
Code Quality Gates
- Minimum test coverage thresholds
- Performance regression detection
- Static analysis and linting
- Security scanning
Best Practices
Test Writing
- Use table-driven tests for multiple test cases
- Include edge cases and boundary conditions
- Test error conditions and failure paths
- Use meaningful test names and descriptions
- Keep tests independent and isolated
Benchmarking
- Use realistic test data
- Reset timer to exclude setup time
- Run benchmarks for sufficient iterations
- Compare results against baselines
- Document performance expectations
Performance Validation
- Measure before and after optimizations
- Validate mathematical accuracy is preserved
- Test under realistic load conditions
- Monitor memory allocation patterns
- Profile CPU and memory usage
Troubleshooting
Common Test Issues
- Floating-point precision errors - Use
assert.InDeltafor floating-point comparisons - Race conditions - Use
-raceflag to detect race conditions - Timeout failures - Increase test timeout for slow operations
- Resource leaks - Ensure proper cleanup in test functions
Benchmark Issues
- Unstable results - Run benchmarks multiple times
- Insufficient iterations - Increase benchmark time
- External interference - Run benchmarks on isolated systems
- Measurement noise - Use statistical analysis for comparison
Future Improvements
Testing Enhancements
- Property-based testing with
gopteror similar libraries - Fuzz testing for edge case discovery
- Load testing frameworks for stress testing
- Automated performance regression detection
Benchmarking Improvements
- Continuous benchmark tracking
- Comparative benchmarking across versions
- Detailed profiling integration
- Resource usage monitoring