Files
mev-beta/docs/WATCH_SCRIPT_ERROR_DISPLAY_FIX.md

158 lines
4.7 KiB
Markdown

# Watch Script Error Display Fix - November 2, 2025
## Problem
The live monitoring script (`scripts/watch-live.sh`) was showing anonymous error messages like:
```
[2025/11/02 20:19:03] ❌ ERROR #2
[2025/11/02 20:19:03] ❌ ERROR #3
[2025/11/02 20:19:03] ❌ ERROR #4
```
Without displaying the actual error details, making it impossible to diagnose issues.
## Root Cause
### Issue 1: Incorrect Error Message Extraction Pattern
The watch script used this pattern to extract error messages:
```bash
ERROR_MSG=$(echo "$line" | grep -oP 'error[=:].*' | head -c 80)
```
**Problem**: This pattern only matched lines containing lowercase `error:` or `error=`, but the actual log format was:
```
2025/11/02 20:19:03 [ERROR] ❌ CRITICAL: Failed to create Arbitrum monitor...
```
The pattern failed because:
1. The log uses `[ERROR]` tag, not `error:` or `error=`
2. No fallback patterns to extract the message after the tag
### Issue 2: Actual Errors Were DNS Resolution Failures
The anonymous errors were actually:
```
[ERROR] ❌ CRITICAL: Failed to create Arbitrum monitor: failed to create contract executor:
failed to get chain ID: Post "https://arb1.arbitrum.io/rpc": dial tcp:
lookup arb1.arbitrum.io: Temporary failure in name resolution
```
**Root Cause**: DNS resolution failure for `arb1.arbitrum.io` - the system couldn't resolve the Arbitrum RPC endpoint hostname.
## Solution
### Fix 1: Improved Error Message Extraction
Updated the watch script to use a **multi-pattern extraction strategy**:
```bash
# Try pattern 1: "error:" or "error=" (original pattern)
ERROR_MSG=$(echo "$line" | grep -oP 'error[=:].*' | head -c 100)
# Try pattern 2: Extract message after [ERROR] or [WARN] tag
if [[ -z "$ERROR_MSG" ]]; then
ERROR_MSG=$(echo "$line" | sed -n 's/.*\[ERROR\]\s*//p' | sed -n 's/.*\[WARN\]\s*//p' | head -c 100)
fi
# Try pattern 3: Extract everything after timestamp and log level
if [[ -z "$ERROR_MSG" ]]; then
ERROR_MSG=$(echo "$line" | sed -E 's/^[0-9]{4}\/[0-9]{2}\/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} \[(ERROR|WARN)\] //' | head -c 100)
fi
```
### Fix 2: Better Error Categorization
Added DNS/network error detection:
```bash
elif echo "$line" | grep -qi "DNS\|name resolution\|lookup.*failed"; then
RPC_ERRORS=$((RPC_ERRORS + 1))
echo -e "${RED}[$TIMESTAMP] ❌ DNS/NETWORK ERROR #$RPC_ERRORS${NC}"
```
### Fix 3: Increased Message Length
Changed from 80 to 100 characters to show more context:
```bash
head -c 100 # Was: head -c 80
```
## Before vs After
### Before (Anonymous Errors)
```
[2025/11/02 20:19:03] ❌ ERROR #2
[2025/11/02 20:19:03] ❌ ERROR #3
[2025/11/02 20:19:03] ❌ ERROR #4
```
### After (With Context)
```
[2025/11/02 20:19:03] ❌ DNS/NETWORK ERROR #2
⚠️ ❌ CRITICAL: Failed to create Arbitrum monitor: failed to create contract executor: failed to get c
[2025/11/02 20:19:03] ❌ CONNECTION ERROR #3
⚠️ ❌ Failed to get latest block: Post "https://arb1.arbitrum.io/rpc": dial tcp: lookup arb1.arbi
[2025/11/02 20:19:07] ❌ CONNECTION ERROR #4
⚠️ ❌ Failed to get latest block: Post "https://arb1.arbitrum.io/rpc": dial tcp: lookup arb1.arbi
```
## Testing
Test the extraction with a sample log line:
```bash
echo "2025/11/02 20:19:03 [ERROR] ❌ CRITICAL: Failed to create Arbitrum monitor: failed to get chain ID" | \
bash -c 'line="$(cat)"; ERROR_MSG=$(echo "$line" | sed -E "s/^[0-9]{4}\/[0-9]{2}\/[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} \[(ERROR|WARN)\] //" | head -c 100); echo "$ERROR_MSG"'
```
Expected output:
```
❌ CRITICAL: Failed to create Arbitrum monitor: failed to get chain ID
```
## DNS Resolution Issue
The underlying cause of the 135+ errors was DNS failure:
**Problem**: System couldn't resolve `arb1.arbitrum.io`
**Solutions**:
1. Check DNS configuration: `cat /etc/resolv.conf`
2. Test DNS resolution: `nslookup arb1.arbitrum.io`
3. Use alternative RPC endpoint with IP address or different hostname
4. Add DNS fallback in provider configuration
## Files Modified
- `/home/administrator/projects/mev-beta/scripts/watch-live.sh`
## Benefits
1. **Visibility**: Errors now show actual error messages
2. **Debugging**: Can identify DNS, parsing, connection issues immediately
3. **Categorization**: Errors are labeled (DNS, CONNECTION, PARSING, etc.)
4. **Context**: 100 characters shows enough detail to understand the issue
## Future Improvements
1. **Full Message Display**: Option to show complete error (not truncated)
2. **Error Deduplication**: Group identical repeated errors
3. **Error Rate Tracking**: Show errors/minute metric
4. **Color Coding**: Different colors for different error types
## Verification
Run the watch script and verify error messages now display:
```bash
./scripts/watch-live.sh
```
You should now see detailed error messages instead of anonymous "ERROR #X".