fix(critical): complete execution pipeline - all blockers fixed and operational
This commit is contained in:
344
docs/STRUCTURED_ERROR_LOGGING_GUIDE.md
Normal file
344
docs/STRUCTURED_ERROR_LOGGING_GUIDE.md
Normal file
@@ -0,0 +1,344 @@
|
||||
# Structured Error Logging Guide
|
||||
|
||||
## Overview
|
||||
|
||||
Every error in the MEV bot must now include:
|
||||
1. **Reason** - Why the error occurred (root cause)
|
||||
2. **Origin** - Where it happened (file, function, line - automatically tracked)
|
||||
3. **Context** - What we were trying to do
|
||||
4. **Category** - Type of error (Network, Parsing, Validation, etc.)
|
||||
5. **Severity** - How critical is this error
|
||||
6. **Details** - Additional structured data
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Before (Old Way - BAD)
|
||||
```go
|
||||
// ❌ NO CONTEXT - Don't do this anymore
|
||||
logger.Error("Failed to get latest block")
|
||||
|
||||
// ❌ MINIMAL CONTEXT - Still not enough
|
||||
logger.Error("Failed to get latest block:", err)
|
||||
```
|
||||
|
||||
### After (New Way - GOOD)
|
||||
```go
|
||||
import pkgerrors "github.com/fraktal/mev-beta/pkg/errors"
|
||||
|
||||
// ✅ FULL CONTEXT - Do this instead
|
||||
logger.ErrorStructured(
|
||||
pkgerrors.NetworkError("Failed to fetch latest block").
|
||||
WithReason("RPC endpoint returned 429 rate limit").
|
||||
WithAction("Polling for new blocks to detect MEV opportunities").
|
||||
WithImpact("Block processing delayed, may miss time-sensitive arbitrage opportunities").
|
||||
WithSuggestion("Reduce polling frequency or use backup RPC endpoint").
|
||||
WithDetail("endpoint", rpcURL).
|
||||
WithDetail("blockNumber", lastBlock).
|
||||
Wrap(err),
|
||||
)
|
||||
```
|
||||
|
||||
## Error Categories
|
||||
|
||||
### Network Errors
|
||||
```go
|
||||
err := pkgerrors.NetworkError("DNS resolution failed").
|
||||
WithReason("Nameserver timeout for arb1.arbitrum.io").
|
||||
WithAction("Connecting to Arbitrum RPC endpoint").
|
||||
WithImpact("Cannot fetch blockchain data, all MEV operations suspended").
|
||||
WithSuggestion("Check DNS configuration in /etc/resolv.conf or use IP address").
|
||||
WithDetail("hostname", "arb1.arbitrum.io").
|
||||
WithDetail("nameserver", "8.8.8.8").
|
||||
Wrap(originalErr)
|
||||
|
||||
logger.ErrorStructured(err)
|
||||
```
|
||||
|
||||
**Output**:
|
||||
```
|
||||
Main log (compact):
|
||||
2025/11/02 20:19:03 [ERROR] [NETWORK/ERROR] DNS resolution failed | Reason: Nameserver timeout for arb1.arbitrum.io | Action: Connecting to Arbitrum RPC endpoint | Origin: pkg/arbitrum/connection.go:142 | Underlying: lookup arb1.arbitrum.io: i/o timeout
|
||||
|
||||
Error log (detailed):
|
||||
2025/11/02 20:19:03 [ERROR] [ERR-1730584743-NETWORK] NETWORK/ERROR: DNS resolution failed
|
||||
Origin: pkg/arbitrum/connection.go:142 (ConnectToRPC)
|
||||
ErrorID: ERR-1730584743-NETWORK
|
||||
Timestamp: 2025-11-02T20:19:03Z
|
||||
Reason: Nameserver timeout for arb1.arbitrum.io
|
||||
Action: Connecting to Arbitrum RPC endpoint
|
||||
Impact: Cannot fetch blockchain data, all MEV operations suspended
|
||||
Suggestion: Check DNS configuration in /etc/resolv.conf or use IP address
|
||||
Details:
|
||||
- hostname: arb1.arbitrum.io
|
||||
- nameserver: 8.8.8.8
|
||||
Underlying: lookup arb1.arbitrum.io: i/o timeout
|
||||
```
|
||||
|
||||
### Parsing Errors
|
||||
```go
|
||||
err := pkgerrors.ParsingError("Failed to decode swap event").
|
||||
WithReason("ABI signature mismatch - expected Swap(address,address,int256,int256) but got different signature").
|
||||
WithAction("Parsing Uniswap V3 swap transaction for arbitrage detection").
|
||||
WithImpact("This swap will not be considered for arbitrage opportunities").
|
||||
WithSuggestion("Update ABI definition or add support for this swap variant").
|
||||
WithDetail("txHash", "0x1234...").
|
||||
WithDetail("poolAddress", "0xabcd...").
|
||||
WithDetail("expectedSig", "0x1c411e9a").
|
||||
WithDetail("actualSig", "0x9f2c64").
|
||||
Wrap(abiErr)
|
||||
|
||||
logger.ErrorStructured(err)
|
||||
```
|
||||
|
||||
### Validation Errors
|
||||
```go
|
||||
err := pkgerrors.ValidationError("Invalid token pair detected").
|
||||
WithReason("Token0 address is zero address (0x0000...)").
|
||||
WithAction("Validating swap event before profit calculation").
|
||||
WithImpact("Skipping this opportunity to avoid calculation errors").
|
||||
WithSuggestion("Fix pool detection logic to exclude invalid pools").
|
||||
WithDetail("token0", zeroAddress.Hex()).
|
||||
WithDetail("token1", token1.Hex()).
|
||||
WithDetail("poolAddress", pool.Hex())
|
||||
|
||||
logger.WarnStructured(err)
|
||||
```
|
||||
|
||||
### Execution Errors
|
||||
```go
|
||||
err := pkgerrors.ExecutionError("Transaction reverted on-chain").
|
||||
WithReason("Insufficient liquidity in target pool at execution time").
|
||||
WithAction("Executing flash loan arbitrage transaction").
|
||||
WithImpact("Lost gas fees (~0.00008 ETH), no profit captured").
|
||||
WithSuggestion("Increase slippage tolerance or implement pre-execution simulation").
|
||||
WithDetail("txHash", tx.Hash().Hex()).
|
||||
WithDetail("gasUsed", receipt.GasUsed).
|
||||
WithDetail("revertReason", revertMsg).
|
||||
WithDetail("estimatedProfit", "0.015 ETH").
|
||||
WithDetail("actualLoss", "0.00008 ETH")
|
||||
|
||||
logger.ErrorStructured(err)
|
||||
```
|
||||
|
||||
### Math/Calculation Errors
|
||||
```go
|
||||
err := pkgerrors.MathError("Profit margin calculation overflow").
|
||||
WithReason("AmountOut too small (0.000001 ETH), division by near-zero causes overflow").
|
||||
WithAction("Calculating profit margin for arbitrage opportunity").
|
||||
WithImpact("Opportunity rejected to prevent extreme values in logs").
|
||||
WithSuggestion("Add minimum amount threshold of 0.0001 ETH before calculations").
|
||||
WithDetail("amountIn", "0.5 ETH").
|
||||
WithDetail("amountOut", "0.000001 ETH").
|
||||
WithDetail("netProfit", "-0.00008 ETH")
|
||||
|
||||
logger.WarnStructured(err)
|
||||
```
|
||||
|
||||
### Configuration Errors
|
||||
```go
|
||||
err := pkgerrors.ConfigurationError("Invalid RPC configuration").
|
||||
WithReason("providers_runtime.yaml missing required 'url' field for primary provider").
|
||||
WithAction("Loading RPC provider configuration at startup").
|
||||
WithImpact("Cannot connect to blockchain, bot will not start").
|
||||
WithSuggestion("Add 'url' field to primary provider configuration").
|
||||
WithDetail("configFile", "config/providers_runtime.yaml").
|
||||
WithDetail("provider", "primary").
|
||||
Wrap(configErr)
|
||||
|
||||
logger.ErrorStructured(err)
|
||||
```
|
||||
|
||||
## Helper Functions
|
||||
|
||||
### Quick Error Creation
|
||||
```go
|
||||
// For common patterns, use helper functions
|
||||
err := pkgerrors.NetworkError("Connection timeout")
|
||||
err := pkgerrors.ParsingError("ABI decode failed")
|
||||
err := pkgerrors.ValidationError("Invalid input")
|
||||
err := pkgerrors.ExecutionError("Transaction reverted")
|
||||
err := pkgerrors.ConfigurationError("Missing config file")
|
||||
err := pkgerrors.MathError("Division by zero")
|
||||
err := pkgerrors.SecurityError("Unauthorized access")
|
||||
```
|
||||
|
||||
### Custom Categories and Severities
|
||||
```go
|
||||
err := pkgerrors.NewStructuredError(
|
||||
pkgerrors.CategoryDatabase,
|
||||
pkgerrors.SeverityCritical,
|
||||
"Failed to save opportunity to database",
|
||||
).
|
||||
WithReason("Connection pool exhausted, all 10 connections in use").
|
||||
WithAction("Persisting arbitrage opportunity for analysis").
|
||||
WithImpact("Opportunity data will be lost, cannot track historical performance").
|
||||
WithSuggestion("Increase database connection pool size or reduce write frequency")
|
||||
```
|
||||
|
||||
## Migration from Old to New
|
||||
|
||||
### Pattern 1: Simple Error
|
||||
```go
|
||||
// OLD
|
||||
logger.Error("Failed to parse transaction", "error", err)
|
||||
|
||||
// NEW
|
||||
logger.ErrorStructured(
|
||||
pkgerrors.ParsingError("Failed to parse transaction").
|
||||
WithReason("Transaction data is incomplete or corrupted").
|
||||
WithAction("Parsing pending transaction from mempool").
|
||||
WithImpact("Transaction skipped, may miss MEV opportunity").
|
||||
Wrap(err),
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 2: Error with Context
|
||||
```go
|
||||
// OLD
|
||||
logger.Error(fmt.Sprintf("Pool %s validation failed: %v", poolAddr, err))
|
||||
|
||||
// NEW
|
||||
logger.ErrorStructured(
|
||||
pkgerrors.ValidationError("Pool validation failed").
|
||||
WithReason("Pool reserves returned zero values").
|
||||
WithAction("Validating pool before adding to arbitrage scan").
|
||||
WithImpact("Pool excluded from opportunity detection").
|
||||
WithSuggestion("Check if pool is active and has liquidity").
|
||||
WithDetail("poolAddress", poolAddr.Hex()).
|
||||
Wrap(err),
|
||||
)
|
||||
```
|
||||
|
||||
### Pattern 3: Warning
|
||||
```go
|
||||
// OLD
|
||||
logger.Warn("Rate limit exceeded")
|
||||
|
||||
// NEW
|
||||
logger.WarnStructured(
|
||||
pkgerrors.NetworkError("RPC rate limit exceeded").
|
||||
WithReason("Exceeded 100 requests per second quota").
|
||||
WithAction("Fetching pool data for arbitrage detection").
|
||||
WithImpact("Reduced scanning speed, may miss fast opportunities").
|
||||
WithSuggestion("Implement request batching or use backup endpoint").
|
||||
WithDetail("endpoint", rpcURL).
|
||||
WithDetail("requestCount", reqCount).
|
||||
WithDetail("timeWindow", "1s"),
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Always Provide Reason
|
||||
```go
|
||||
// ❌ BAD
|
||||
WithReason("error occurred")
|
||||
|
||||
// ✅ GOOD
|
||||
WithReason("TCP connection refused - RPC endpoint is down or firewalled")
|
||||
```
|
||||
|
||||
### 2. Be Specific in Actions
|
||||
```go
|
||||
// ❌ BAD
|
||||
WithAction("processing data")
|
||||
|
||||
// ✅ GOOD
|
||||
WithAction("Fetching Uniswap V3 pool reserves for profit calculation")
|
||||
```
|
||||
|
||||
### 3. Describe Real Impact
|
||||
```go
|
||||
// ❌ BAD
|
||||
WithImpact("something might break")
|
||||
|
||||
// ✅ GOOD
|
||||
WithImpact("Arbitrage detection stopped, estimated revenue loss: $50-100/hour")
|
||||
```
|
||||
|
||||
### 4. Give Actionable Suggestions
|
||||
```go
|
||||
// ❌ BAD
|
||||
WithSuggestion("fix the problem")
|
||||
|
||||
// ✅ GOOD
|
||||
WithSuggestion("Restart with PROVIDER_CONFIG_PATH pointing to valid providers_runtime.yaml")
|
||||
```
|
||||
|
||||
### 5. Add Relevant Details
|
||||
```go
|
||||
// ✅ GOOD
|
||||
WithDetail("txHash", tx.Hash().Hex()).
|
||||
WithDetail("blockNumber", blockNum).
|
||||
WithDetail("gasPrice", gasPrice.String()).
|
||||
WithDetail("poolAddress", pool.Hex()).
|
||||
WithDetail("attemptNumber", retryCount)
|
||||
```
|
||||
|
||||
## Output Format
|
||||
|
||||
### Compact (Main Log)
|
||||
```
|
||||
[NETWORK/ERROR] DNS resolution failed | Reason: Nameserver timeout | Action: Connecting to RPC | Origin: connection.go:142 | Underlying: i/o timeout
|
||||
```
|
||||
|
||||
### Detailed (Error Log)
|
||||
```
|
||||
[ERR-1730584743-NETWORK] NETWORK/ERROR: DNS resolution failed
|
||||
Origin: pkg/arbitrum/connection.go:142 (ConnectToRPC)
|
||||
ErrorID: ERR-1730584743-NETWORK
|
||||
Timestamp: 2025-11-02T20:19:03Z
|
||||
Reason: Nameserver timeout for arb1.arbitrum.io
|
||||
Action: Connecting to Arbitrum RPC endpoint
|
||||
Impact: Cannot fetch blockchain data, all MEV operations suspended
|
||||
Suggestion: Check DNS configuration or use IP address
|
||||
Details:
|
||||
- hostname: arb1.arbitrum.io
|
||||
Underlying: lookup arb1.arbitrum.io: i/o timeout
|
||||
```
|
||||
|
||||
## Error Categories Reference
|
||||
|
||||
| Category | Severity | Use For |
|
||||
|----------|----------|---------|
|
||||
| `CategoryNetwork` | ERROR | RPC, DNS, connection issues |
|
||||
| `CategoryParsing` | ERROR | ABI decoding, transaction parsing |
|
||||
| `CategoryValidation` | WARNING | Input validation, data validation |
|
||||
| `CategoryExecution` | CRITICAL | Transaction execution, contract calls |
|
||||
| `CategoryConfiguration` | CRITICAL | Config loading, invalid settings |
|
||||
| `CategoryDatabase` | ERROR | Database operations |
|
||||
| `CategorySecurity` | CRITICAL | Security violations, unauthorized access |
|
||||
| `CategoryMath` | ERROR | Arithmetic errors, overflow/underflow |
|
||||
| `CategoryInternal` | ERROR | Internal logic errors, unexpected state |
|
||||
| `CategoryExternal` | ERROR | External service failures |
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
# Build with new error system
|
||||
go build -o mev-bot ./cmd/mev-bot
|
||||
|
||||
# Check error log for structured format
|
||||
tail -f logs/mev_bot_errors.log
|
||||
|
||||
# Verify all errors have:
|
||||
# - Category/Severity
|
||||
# - Reason
|
||||
# - Action
|
||||
# - Origin (file:line)
|
||||
```
|
||||
|
||||
## Benefits
|
||||
|
||||
1. **Debuggability**: Know exactly where and why each error occurred
|
||||
2. **Monitoring**: Can alert on specific error categories
|
||||
3. **Analytics**: Track error patterns over time
|
||||
4. **Troubleshooting**: Users can quickly understand and fix issues
|
||||
5. **Professionalism**: Production-grade error reporting
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Gradually migrate existing `logger.Error()` calls to `logger.ErrorStructured()`
|
||||
2. Add error categorization to all new code
|
||||
3. Update error handling in critical paths first (RPC, parsing, execution)
|
||||
4. Monitor error logs for patterns and improve error messages
|
||||
Reference in New Issue
Block a user