374 lines
13 KiB
Markdown
374 lines
13 KiB
Markdown
# Structured Error Logging System - Implementation Complete
|
|
|
|
## Executive Summary
|
|
|
|
Implemented a comprehensive structured error logging system that ensures **every error has a reason and origin**. The system automatically tracks file, function, and line numbers while requiring developers to provide context about why errors occurred and what the impact is.
|
|
|
|
## Problem Statement
|
|
|
|
Previously, errors were logged like this:
|
|
```
|
|
[2025/11/02 20:19:03] ❌ ERROR #2
|
|
[2025/11/02 20:19:03] ❌ ERROR #3
|
|
```
|
|
|
|
**Issues**:
|
|
- No reason (why it happened)
|
|
- No origin (where it happened)
|
|
- No context (what we were doing)
|
|
- No impact (what it affects)
|
|
- No suggestion (how to fix it)
|
|
|
|
## Solution
|
|
|
|
Created a structured error system with automatic origin tracking and required context fields.
|
|
|
|
### Example: Before vs After
|
|
|
|
**Before (Bad)**:
|
|
```go
|
|
logger.Error("Failed to get latest block:", err)
|
|
```
|
|
|
|
Output:
|
|
```
|
|
2025/11/02 20:19:03 [ERROR] Failed to get latest block: Post "https://...": dial tcp: lookup arb1.arbitrum.io: Temporary failure in name resolution
|
|
```
|
|
|
|
**After (Good)**:
|
|
```go
|
|
logger.ErrorStructured(
|
|
pkgerrors.NetworkError("Failed to fetch latest block").
|
|
WithReason("DNS nameserver timeout for arb1.arbitrum.io").
|
|
WithAction("Polling Arbitrum blockchain for new blocks to detect MEV opportunities").
|
|
WithImpact("Block processing suspended, missing time-sensitive arbitrage opportunities (est. $50-100/hour loss)").
|
|
WithSuggestion("Check /etc/resolv.conf DNS configuration or use IP address fallback").
|
|
WithDetail("endpoint", "arb1.arbitrum.io").
|
|
WithDetail("nameserver", "8.8.8.8").
|
|
WithDetail("blockNumber", lastBlock).
|
|
Wrap(err),
|
|
)
|
|
```
|
|
|
|
Output (Compact - Main Log):
|
|
```
|
|
2025/11/02 20:19:03 [ERROR] [NETWORK/ERROR] Failed to fetch latest block | Reason: DNS nameserver timeout for arb1.arbitrum.io | Action: Polling Arbitrum blockchain for new blocks | Origin: pkg/arbitrum/connection.go:142 | Underlying: lookup failed
|
|
```
|
|
|
|
Output (Detailed - Error Log):
|
|
```
|
|
2025/11/02 20:19:03 [ERROR] [ERR-1730584743-NETWORK] NETWORK/ERROR: Failed to fetch latest block
|
|
Origin: /home/admin/mev-beta/pkg/arbitrum/connection.go:142 (ConnectToRPC)
|
|
ErrorID: ERR-1730584743-NETWORK
|
|
Timestamp: 2025-11-02T20:19:03Z
|
|
Reason: DNS nameserver timeout for arb1.arbitrum.io
|
|
Action: Polling Arbitrum blockchain for new blocks to detect MEV opportunities
|
|
Impact: Block processing suspended, missing time-sensitive arbitrage opportunities (est. $50-100/hour loss)
|
|
Suggestion: Check /etc/resolv.conf DNS configuration or use IP address fallback
|
|
Details:
|
|
- endpoint: arb1.arbitrum.io
|
|
- nameserver: 8.8.8.8
|
|
- blockNumber: 396193450
|
|
Underlying: Post "https://arb1.arbitrum.io/rpc": dial tcp: lookup arb1.arbitrum.io: Temporary failure in name resolution
|
|
```
|
|
|
|
## Implementation
|
|
|
|
### 1. Structured Error Type (`pkg/errors/structured_error.go`)
|
|
|
|
```go
|
|
type StructuredError struct {
|
|
// Core error information
|
|
Message string
|
|
Category ErrorCategory // NETWORK, PARSING, VALIDATION, etc.
|
|
Severity ErrorSeverity // DEBUG, WARNING, ERROR, CRITICAL, FATAL
|
|
|
|
// Origin tracking (AUTOMATIC)
|
|
File string // Auto-detected
|
|
Function string // Auto-detected
|
|
Line int // Auto-detected
|
|
Package string // Auto-detected
|
|
|
|
// Context (REQUIRED by developer)
|
|
Reason string // Why this error occurred
|
|
Action string // What we were trying to do
|
|
Impact string // Impact on the system
|
|
Suggestion string // How to fix it
|
|
Details map[string]interface{} // Additional context
|
|
UnderlyingErr error // Original error
|
|
|
|
// Metadata
|
|
Timestamp time.Time
|
|
ErrorID string // Unique ID for tracking
|
|
}
|
|
```
|
|
|
|
### 2. Error Categories
|
|
|
|
| Category | Use For | Examples |
|
|
|----------|---------|----------|
|
|
| `CategoryNetwork` | RPC, DNS, TCP | Connection timeout, DNS failure, rate limit |
|
|
| `CategoryParsing` | ABI, JSON, data | ABI decode failure, invalid JSON, corrupt data |
|
|
| `CategoryValidation` | Input checks | Zero address, invalid amount, missing field |
|
|
| `CategoryExecution` | Transactions | TX reverted, gas estimation failed, nonce error |
|
|
| `CategoryConfiguration` | Config errors | Missing file, invalid YAML, wrong permissions |
|
|
| `CategoryMath` | Calculations | Overflow, division by zero, precision loss |
|
|
| `CategorySecurity` | Security issues | Unauthorized access, invalid signature |
|
|
| `CategoryDatabase` | DB operations | Connection pool exhausted, query timeout |
|
|
| `CategoryInternal` | Logic errors | Unexpected state, nil pointer, assertion failed |
|
|
| `CategoryExternal` | External APIs | Third-party API down, data feed failure |
|
|
|
|
### 3. Error Severities
|
|
|
|
| Severity | When to Use | Example |
|
|
|----------|-------------|---------|
|
|
| `SeverityDebug` | Diagnostic info | "Skipping dust amount (0.0001 ETH)" |
|
|
| `SeverityInfo` | Notable events | "Switched to backup RPC endpoint" |
|
|
| `SeverityWarning` | Potential issues | "Rate limit approaching (90/100 req/s)" |
|
|
| `SeverityError` | Actual errors | "Failed to parse transaction" |
|
|
| `SeverityCritical` | Critical errors | "All RPC endpoints down" |
|
|
| `SeverityFatal` | System cannot continue | "Fatal: Config file not found" |
|
|
|
|
### 4. Logger Integration (`internal/logger/logger.go`)
|
|
|
|
Added new methods:
|
|
- `ErrorStructured(*pkgerrors.StructuredError)` - Log structured error
|
|
- `WarnStructured(*pkgerrors.StructuredError)` - Log structured warning
|
|
|
|
Logging outputs:
|
|
- **Main log**: Compact one-line format for quick scanning
|
|
- **Error log**: Full detailed format for debugging
|
|
|
|
### 5. Helper Functions
|
|
|
|
```go
|
|
// Quick creation for common patterns
|
|
NetworkError("message") // Network issues
|
|
ParsingError("message") // Parsing failures
|
|
ValidationError("message") // Validation failures
|
|
ExecutionError("message") // Execution failures
|
|
ConfigurationError("message") // Config errors
|
|
MathError("message") // Math errors
|
|
SecurityError("message") // Security issues
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Network Error Example
|
|
```go
|
|
err := pkgerrors.NetworkError("RPC connection timeout").
|
|
WithReason("TCP connection refused after 3 retry attempts").
|
|
WithAction("Fetching pool reserves for arbitrage detection").
|
|
WithImpact("Cannot calculate arbitrage opportunities, estimated loss: $50-100/hour").
|
|
WithSuggestion("Check RPC endpoint status or switch to backup provider").
|
|
WithDetail("endpoint", rpcURL).
|
|
WithDetail("retryCount", 3).
|
|
WithDetail("timeout", "30s").
|
|
Wrap(originalErr)
|
|
|
|
logger.ErrorStructured(err)
|
|
```
|
|
|
|
### Parsing Error Example
|
|
```go
|
|
err := pkgerrors.ParsingError("Failed to decode Uniswap V3 swap").
|
|
WithReason("ABI signature mismatch - pool uses non-standard Swap event").
|
|
WithAction("Parsing swap transaction for profit calculation").
|
|
WithImpact("This swap skipped, may miss arbitrage opportunity").
|
|
WithSuggestion("Add ABI variant for this pool type or update pool detector").
|
|
WithDetail("txHash", tx.Hash().Hex()).
|
|
WithDetail("poolAddress", poolAddr.Hex()).
|
|
WithDetail("expectedSig", "0x1c411e9a").
|
|
WithDetail("actualSig", "0x9f2c64")
|
|
|
|
logger.ErrorStructured(err)
|
|
```
|
|
|
|
### Validation Error Example
|
|
```go
|
|
err := pkgerrors.ValidationError("Zero address detected in token pair").
|
|
WithReason("Pool contract returned 0x000... for token0 address").
|
|
WithAction("Validating pool data before adding to arbitrage scan").
|
|
WithImpact("Pool excluded from opportunity detection").
|
|
WithSuggestion("Pool may be incorrectly initialized - check deployment").
|
|
WithDetail("poolAddress", pool.Hex()).
|
|
WithDetail("token0", "0x0000000000000000000000000000000000000000").
|
|
WithDetail("token1", token1.Hex())
|
|
|
|
logger.WarnStructured(err)
|
|
```
|
|
|
|
### Math Error Example
|
|
```go
|
|
err := pkgerrors.MathError("Profit margin overflow").
|
|
WithReason("AmountOut too small (0.000001 ETH) causes division by near-zero").
|
|
WithAction("Calculating profit margin for opportunity ranking").
|
|
WithImpact("Opportunity rejected to prevent extreme values").
|
|
WithSuggestion("Filter dust amounts (< 0.0001 ETH) before calculations").
|
|
WithDetail("amountIn", "0.5 ETH").
|
|
WithDetail("amountOut", "0.000001 ETH").
|
|
WithDetail("profitMargin", "overflow")
|
|
|
|
logger.WarnStructured(err)
|
|
```
|
|
|
|
## Benefits
|
|
|
|
### 1. Debuggability
|
|
- **Know exactly where**: File, function, line automatically tracked
|
|
- **Know exactly why**: Reason field explains root cause
|
|
- **Know the context**: Action field explains what we were doing
|
|
- **Know the impact**: Impact field quantifies the damage
|
|
|
|
### 2. Monitoring & Alerting
|
|
- **Category-based alerts**: Alert on CRITICAL security errors
|
|
- **Pattern detection**: Find recurring network issues
|
|
- **Impact tracking**: Measure revenue loss from errors
|
|
- **Trend analysis**: Track error rates by category over time
|
|
|
|
### 3. Troubleshooting
|
|
- **Self-service**: Users can understand errors without support
|
|
- **Actionable suggestions**: Every error includes next steps
|
|
- **Complete context**: All relevant data in Details map
|
|
- **Error IDs**: Track specific error instances across systems
|
|
|
|
### 4. Professional Quality
|
|
- **Production-ready**: Meets enterprise logging standards
|
|
- **Comprehensive**: All error information in one place
|
|
- **Structured**: Machine-readable for log aggregation
|
|
- **Human-readable**: Clear messages for developers
|
|
|
|
## Migration Strategy
|
|
|
|
### Phase 1: Critical Paths (Completed)
|
|
- ✅ Created error system (`pkg/errors/structured_error.go`)
|
|
- ✅ Extended logger (`internal/logger/logger.go`)
|
|
- ✅ Created migration guide
|
|
- ✅ Tested compilation
|
|
|
|
### Phase 2: High-Priority Components (Next)
|
|
1. **RPC/Network Layer** (`pkg/arbitrum/connection.go`, `pkg/transport/`)
|
|
- All connection errors
|
|
- DNS failures
|
|
- Rate limits
|
|
|
|
2. **Parsing Layer** (`pkg/arbitrum/parser.go`, `pkg/events/`)
|
|
- ABI decoding failures
|
|
- Transaction parsing errors
|
|
- Invalid data handling
|
|
|
|
3. **Execution Layer** (`pkg/arbitrage/executor.go`, `pkg/execution/`)
|
|
- Transaction failures
|
|
- Gas estimation errors
|
|
- Revert handling
|
|
|
|
### Phase 3: Remaining Components
|
|
4. **Validation** (`pkg/validation/`)
|
|
5. **Math/Calculations** (`pkg/profitcalc/`, `pkg/math/`)
|
|
6. **Configuration** (`internal/config/`)
|
|
7. **Database** (`pkg/arbitrage/database.go`)
|
|
|
|
## Statistics
|
|
|
|
- **164 Error() calls** across 66 files
|
|
- **280 Warn() calls** across 88 files
|
|
- **Total**: ~444 error logging calls to migrate
|
|
|
|
## File Changes
|
|
|
|
### New Files
|
|
1. `/pkg/errors/structured_error.go` (370 lines)
|
|
- StructuredError type
|
|
- Error categories and severities
|
|
- Helper functions
|
|
- Formatting methods
|
|
|
|
2. `/docs/STRUCTURED_ERROR_LOGGING_GUIDE.md` (485 lines)
|
|
- Usage guide
|
|
- Migration examples
|
|
- Best practices
|
|
- Category reference
|
|
|
|
3. `/docs/ERROR_LOGGING_SYSTEM_IMPLEMENTATION.md` (this file)
|
|
- Implementation overview
|
|
- Migration strategy
|
|
- Benefits and rationale
|
|
|
|
### Modified Files
|
|
1. `/internal/logger/logger.go`
|
|
- Added `ErrorStructured()` method
|
|
- Added `WarnStructured()` method
|
|
- Imported `pkg/errors` package
|
|
|
|
## Testing
|
|
|
|
### Build Verification
|
|
```bash
|
|
go build -o mev-bot ./cmd/mev-bot
|
|
# ✅ Build successful with new error system
|
|
```
|
|
|
|
### Usage Test
|
|
```go
|
|
// Test structured error creation and logging
|
|
err := pkgerrors.NetworkError("Test error").
|
|
WithReason("Unit test").
|
|
WithAction("Testing error system").
|
|
WithImpact("No impact - test only").
|
|
WithSuggestion("Ignore this test error")
|
|
|
|
logger.ErrorStructured(err)
|
|
```
|
|
|
|
### Expected Output
|
|
```
|
|
Main Log:
|
|
2025/11/02 20:47:00 [ERROR] [NETWORK/ERROR] Test error | Reason: Unit test | Action: Testing error system | Origin: main_test.go:25
|
|
|
|
Error Log:
|
|
2025/11/02 20:47:00 [ERROR] [ERR-1730584820-NETWORK] NETWORK/ERROR: Test error
|
|
Origin: /path/to/main_test.go:25 (TestErrorLogging)
|
|
ErrorID: ERR-1730584820-NETWORK
|
|
Timestamp: 2025-11-02T20:47:00Z
|
|
Reason: Unit test
|
|
Action: Testing error system
|
|
Impact: No impact - test only
|
|
Suggestion: Ignore this test error
|
|
```
|
|
|
|
## Next Steps
|
|
|
|
1. **Immediate**: Start using `ErrorStructured()` for all new code
|
|
2. **Short-term**: Migrate critical path errors (RPC, parsing, execution)
|
|
3. **Medium-term**: Migrate remaining error calls
|
|
4. **Long-term**: Add error rate monitoring and alerting
|
|
|
|
## Backward Compatibility
|
|
|
|
- ✅ Old `logger.Error()` calls still work
|
|
- ✅ No breaking changes to existing code
|
|
- ✅ Gradual migration supported
|
|
- ✅ Both systems can coexist
|
|
|
|
## Documentation
|
|
|
|
- [Usage Guide](./STRUCTURED_ERROR_LOGGING_GUIDE.md)
|
|
- [Migration Examples](./STRUCTURED_ERROR_LOGGING_GUIDE.md#migration-from-old-to-new)
|
|
- [Best Practices](./STRUCTURED_ERROR_LOGGING_GUIDE.md#best-practices)
|
|
- [Category Reference](./STRUCTURED_ERROR_LOGGING_GUIDE.md#error-categories-reference)
|
|
|
|
## Conclusion
|
|
|
|
The structured error logging system is **production-ready** and **fully implemented**. Every error can now include:
|
|
|
|
✅ **Reason** - Why it happened (developer-provided)
|
|
✅ **Origin** - Where it happened (auto-tracked: file, function, line)
|
|
✅ **Context** - What we were doing (developer-provided)
|
|
✅ **Category** - Type of error (developer-selected)
|
|
✅ **Severity** - How critical (developer-selected)
|
|
✅ **Impact** - What it affects (developer-provided)
|
|
✅ **Suggestion** - How to fix (developer-provided)
|
|
✅ **Details** - Additional data (developer-provided)
|
|
|
|
**No more anonymous errors**. Every error tells a complete story.
|