Files
mev-beta/docs/STRUCTURED_ERROR_LOGGING_GUIDE.md

11 KiB

Structured Error Logging Guide

Overview

Every error in the MEV bot must now include:

  1. Reason - Why the error occurred (root cause)
  2. Origin - Where it happened (file, function, line - automatically tracked)
  3. Context - What we were trying to do
  4. Category - Type of error (Network, Parsing, Validation, etc.)
  5. Severity - How critical is this error
  6. Details - Additional structured data

Quick Start

Before (Old Way - BAD)

// ❌ NO CONTEXT - Don't do this anymore
logger.Error("Failed to get latest block")

// ❌ MINIMAL CONTEXT - Still not enough
logger.Error("Failed to get latest block:", err)

After (New Way - GOOD)

import pkgerrors "github.com/fraktal/mev-beta/pkg/errors"

// ✅ FULL CONTEXT - Do this instead
logger.ErrorStructured(
    pkgerrors.NetworkError("Failed to fetch latest block").
        WithReason("RPC endpoint returned 429 rate limit").
        WithAction("Polling for new blocks to detect MEV opportunities").
        WithImpact("Block processing delayed, may miss time-sensitive arbitrage opportunities").
        WithSuggestion("Reduce polling frequency or use backup RPC endpoint").
        WithDetail("endpoint", rpcURL).
        WithDetail("blockNumber", lastBlock).
        Wrap(err),
)

Error Categories

Network Errors

err := pkgerrors.NetworkError("DNS resolution failed").
    WithReason("Nameserver timeout for arb1.arbitrum.io").
    WithAction("Connecting to Arbitrum RPC endpoint").
    WithImpact("Cannot fetch blockchain data, all MEV operations suspended").
    WithSuggestion("Check DNS configuration in /etc/resolv.conf or use IP address").
    WithDetail("hostname", "arb1.arbitrum.io").
    WithDetail("nameserver", "8.8.8.8").
    Wrap(originalErr)

logger.ErrorStructured(err)

Output:

Main log (compact):
2025/11/02 20:19:03 [ERROR] [NETWORK/ERROR] DNS resolution failed | Reason: Nameserver timeout for arb1.arbitrum.io | Action: Connecting to Arbitrum RPC endpoint | Origin: pkg/arbitrum/connection.go:142 | Underlying: lookup arb1.arbitrum.io: i/o timeout

Error log (detailed):
2025/11/02 20:19:03 [ERROR] [ERR-1730584743-NETWORK] NETWORK/ERROR: DNS resolution failed
  Origin: pkg/arbitrum/connection.go:142 (ConnectToRPC)
  ErrorID: ERR-1730584743-NETWORK
  Timestamp: 2025-11-02T20:19:03Z
  Reason: Nameserver timeout for arb1.arbitrum.io
  Action: Connecting to Arbitrum RPC endpoint
  Impact: Cannot fetch blockchain data, all MEV operations suspended
  Suggestion: Check DNS configuration in /etc/resolv.conf or use IP address
  Details:
    - hostname: arb1.arbitrum.io
    - nameserver: 8.8.8.8
  Underlying: lookup arb1.arbitrum.io: i/o timeout

Parsing Errors

err := pkgerrors.ParsingError("Failed to decode swap event").
    WithReason("ABI signature mismatch - expected Swap(address,address,int256,int256) but got different signature").
    WithAction("Parsing Uniswap V3 swap transaction for arbitrage detection").
    WithImpact("This swap will not be considered for arbitrage opportunities").
    WithSuggestion("Update ABI definition or add support for this swap variant").
    WithDetail("txHash", "0x1234...").
    WithDetail("poolAddress", "0xabcd...").
    WithDetail("expectedSig", "0x1c411e9a").
    WithDetail("actualSig", "0x9f2c64").
    Wrap(abiErr)

logger.ErrorStructured(err)

Validation Errors

err := pkgerrors.ValidationError("Invalid token pair detected").
    WithReason("Token0 address is zero address (0x0000...)").
    WithAction("Validating swap event before profit calculation").
    WithImpact("Skipping this opportunity to avoid calculation errors").
    WithSuggestion("Fix pool detection logic to exclude invalid pools").
    WithDetail("token0", zeroAddress.Hex()).
    WithDetail("token1", token1.Hex()).
    WithDetail("poolAddress", pool.Hex())

logger.WarnStructured(err)

Execution Errors

err := pkgerrors.ExecutionError("Transaction reverted on-chain").
    WithReason("Insufficient liquidity in target pool at execution time").
    WithAction("Executing flash loan arbitrage transaction").
    WithImpact("Lost gas fees (~0.00008 ETH), no profit captured").
    WithSuggestion("Increase slippage tolerance or implement pre-execution simulation").
    WithDetail("txHash", tx.Hash().Hex()).
    WithDetail("gasUsed", receipt.GasUsed).
    WithDetail("revertReason", revertMsg).
    WithDetail("estimatedProfit", "0.015 ETH").
    WithDetail("actualLoss", "0.00008 ETH")

logger.ErrorStructured(err)

Math/Calculation Errors

err := pkgerrors.MathError("Profit margin calculation overflow").
    WithReason("AmountOut too small (0.000001 ETH), division by near-zero causes overflow").
    WithAction("Calculating profit margin for arbitrage opportunity").
    WithImpact("Opportunity rejected to prevent extreme values in logs").
    WithSuggestion("Add minimum amount threshold of 0.0001 ETH before calculations").
    WithDetail("amountIn", "0.5 ETH").
    WithDetail("amountOut", "0.000001 ETH").
    WithDetail("netProfit", "-0.00008 ETH")

logger.WarnStructured(err)

Configuration Errors

err := pkgerrors.ConfigurationError("Invalid RPC configuration").
    WithReason("providers_runtime.yaml missing required 'url' field for primary provider").
    WithAction("Loading RPC provider configuration at startup").
    WithImpact("Cannot connect to blockchain, bot will not start").
    WithSuggestion("Add 'url' field to primary provider configuration").
    WithDetail("configFile", "config/providers_runtime.yaml").
    WithDetail("provider", "primary").
    Wrap(configErr)

logger.ErrorStructured(err)

Helper Functions

Quick Error Creation

// For common patterns, use helper functions
err := pkgerrors.NetworkError("Connection timeout")
err := pkgerrors.ParsingError("ABI decode failed")
err := pkgerrors.ValidationError("Invalid input")
err := pkgerrors.ExecutionError("Transaction reverted")
err := pkgerrors.ConfigurationError("Missing config file")
err := pkgerrors.MathError("Division by zero")
err := pkgerrors.SecurityError("Unauthorized access")

Custom Categories and Severities

err := pkgerrors.NewStructuredError(
    pkgerrors.CategoryDatabase,
    pkgerrors.SeverityCritical,
    "Failed to save opportunity to database",
).
    WithReason("Connection pool exhausted, all 10 connections in use").
    WithAction("Persisting arbitrage opportunity for analysis").
    WithImpact("Opportunity data will be lost, cannot track historical performance").
    WithSuggestion("Increase database connection pool size or reduce write frequency")

Migration from Old to New

Pattern 1: Simple Error

// OLD
logger.Error("Failed to parse transaction", "error", err)

// NEW
logger.ErrorStructured(
    pkgerrors.ParsingError("Failed to parse transaction").
        WithReason("Transaction data is incomplete or corrupted").
        WithAction("Parsing pending transaction from mempool").
        WithImpact("Transaction skipped, may miss MEV opportunity").
        Wrap(err),
)

Pattern 2: Error with Context

// OLD
logger.Error(fmt.Sprintf("Pool %s validation failed: %v", poolAddr, err))

// NEW
logger.ErrorStructured(
    pkgerrors.ValidationError("Pool validation failed").
        WithReason("Pool reserves returned zero values").
        WithAction("Validating pool before adding to arbitrage scan").
        WithImpact("Pool excluded from opportunity detection").
        WithSuggestion("Check if pool is active and has liquidity").
        WithDetail("poolAddress", poolAddr.Hex()).
        Wrap(err),
)

Pattern 3: Warning

// OLD
logger.Warn("Rate limit exceeded")

// NEW
logger.WarnStructured(
    pkgerrors.NetworkError("RPC rate limit exceeded").
        WithReason("Exceeded 100 requests per second quota").
        WithAction("Fetching pool data for arbitrage detection").
        WithImpact("Reduced scanning speed, may miss fast opportunities").
        WithSuggestion("Implement request batching or use backup endpoint").
        WithDetail("endpoint", rpcURL).
        WithDetail("requestCount", reqCount).
        WithDetail("timeWindow", "1s"),
)

Best Practices

1. Always Provide Reason

// ❌ BAD
WithReason("error occurred")

// ✅ GOOD
WithReason("TCP connection refused - RPC endpoint is down or firewalled")

2. Be Specific in Actions

// ❌ BAD
WithAction("processing data")

// ✅ GOOD
WithAction("Fetching Uniswap V3 pool reserves for profit calculation")

3. Describe Real Impact

// ❌ BAD
WithImpact("something might break")

// ✅ GOOD
WithImpact("Arbitrage detection stopped, estimated revenue loss: $50-100/hour")

4. Give Actionable Suggestions

// ❌ BAD
WithSuggestion("fix the problem")

// ✅ GOOD
WithSuggestion("Restart with PROVIDER_CONFIG_PATH pointing to valid providers_runtime.yaml")

5. Add Relevant Details

// ✅ GOOD
WithDetail("txHash", tx.Hash().Hex()).
WithDetail("blockNumber", blockNum).
WithDetail("gasPrice", gasPrice.String()).
WithDetail("poolAddress", pool.Hex()).
WithDetail("attemptNumber", retryCount)

Output Format

Compact (Main Log)

[NETWORK/ERROR] DNS resolution failed | Reason: Nameserver timeout | Action: Connecting to RPC | Origin: connection.go:142 | Underlying: i/o timeout

Detailed (Error Log)

[ERR-1730584743-NETWORK] NETWORK/ERROR: DNS resolution failed
  Origin: pkg/arbitrum/connection.go:142 (ConnectToRPC)
  ErrorID: ERR-1730584743-NETWORK
  Timestamp: 2025-11-02T20:19:03Z
  Reason: Nameserver timeout for arb1.arbitrum.io
  Action: Connecting to Arbitrum RPC endpoint
  Impact: Cannot fetch blockchain data, all MEV operations suspended
  Suggestion: Check DNS configuration or use IP address
  Details:
    - hostname: arb1.arbitrum.io
  Underlying: lookup arb1.arbitrum.io: i/o timeout

Error Categories Reference

Category Severity Use For
CategoryNetwork ERROR RPC, DNS, connection issues
CategoryParsing ERROR ABI decoding, transaction parsing
CategoryValidation WARNING Input validation, data validation
CategoryExecution CRITICAL Transaction execution, contract calls
CategoryConfiguration CRITICAL Config loading, invalid settings
CategoryDatabase ERROR Database operations
CategorySecurity CRITICAL Security violations, unauthorized access
CategoryMath ERROR Arithmetic errors, overflow/underflow
CategoryInternal ERROR Internal logic errors, unexpected state
CategoryExternal ERROR External service failures

Testing

# Build with new error system
go build -o mev-bot ./cmd/mev-bot

# Check error log for structured format
tail -f logs/mev_bot_errors.log

# Verify all errors have:
# - Category/Severity
# - Reason
# - Action
# - Origin (file:line)

Benefits

  1. Debuggability: Know exactly where and why each error occurred
  2. Monitoring: Can alert on specific error categories
  3. Analytics: Track error patterns over time
  4. Troubleshooting: Users can quickly understand and fix issues
  5. Professionalism: Production-grade error reporting

Next Steps

  1. Gradually migrate existing logger.Error() calls to logger.ErrorStructured()
  2. Add error categorization to all new code
  3. Update error handling in critical paths first (RPC, parsing, execution)
  4. Monitor error logs for patterns and improve error messages