178 lines
5.0 KiB
Markdown
178 lines
5.0 KiB
Markdown
# Pool Data Errors - Root Cause Analysis & Fix Plan
|
|
|
|
**Date**: November 3, 2025
|
|
**Status**: Active Investigation
|
|
**Impact**: High - Affecting opportunity executability validation
|
|
|
|
## Executive Summary
|
|
|
|
Pool data errors are preventing the system from validating opportunities as executable. Currently, **347+ opportunities detected but 0 are marked as executable**, all being rejected due to inability to fetch pool reserve data.
|
|
|
|
### Key Finding
|
|
**No opportunities are executable because pool validation is failing silently** rather than returning proper error messages for filtering.
|
|
|
|
---
|
|
|
|
## Root Cause Analysis
|
|
|
|
### 1. **Primary Blocker: RPC Connection Failures**
|
|
|
|
**Evidence:**
|
|
```
|
|
Post "https://arb1.arbitrum.io/rpc": dial tcp: lookup arb1.arbitrum.io: Temporary failure in name resolution
|
|
```
|
|
|
|
**Impact:**
|
|
- Primary RPC endpoint unreachable
|
|
- System falls back to fallback mode (basic block polling)
|
|
- Pool data cannot be fetched
|
|
|
|
**Current Status:** INTERMITTENT (was happening 12:00-12:12, recovered by 14:11)
|
|
|
|
---
|
|
|
|
### 2. **Secondary: Batch Pool Data Fetch Timeouts**
|
|
|
|
**Evidence:**
|
|
```
|
|
[WARN] Batch fetch failed for 0x42FC852A750BA93D5bf772ecdc857e87a86403a9:
|
|
no data returned for pool - recording failure
|
|
[WARN] Failed to fetch batch 0-1: batch fetch V3 data failed:
|
|
Post "https://arb1.arbitrum.io/rpc": context deadline exceeded
|
|
```
|
|
|
|
**Root Cause:**
|
|
- Batch fetcher using 10-second timeout
|
|
- Network latency + RPC overload = frequent timeouts
|
|
- Pools are being queried one-at-a-time instead of true batch
|
|
|
|
**Affected Code:** `pkg/datafetcher/batch_fetcher.go`
|
|
|
|
**Impact:**
|
|
- Legitimate pools failing due to timeout
|
|
- Same pools retried repeatedly (inefficient)
|
|
- Pools being blacklisted prematurely
|
|
|
|
---
|
|
|
|
### 3. **Tertiary: Division by Zero in Smart Contracts**
|
|
|
|
**Evidence:**
|
|
```
|
|
[WARN] Failed to fetch batch 0-1: batch fetch V3 data failed:
|
|
execution reverted: division or modulo by zero
|
|
```
|
|
|
|
**Root Causes:**
|
|
- Querying uninitialized/zero-liquidity pools
|
|
- Non-standard pool implementations (broken fee() function, etc.)
|
|
- Smart contract state inconsistencies on L2
|
|
|
|
**Affected Pools:** ~10-15 pools (from 2025-11-02 logs)
|
|
|
|
---
|
|
|
|
### 4. **Quaternary: Non-Standard Pool Implementations**
|
|
|
|
**Evidence:**
|
|
```
|
|
[ERROR] Error getting pool data for 0xC6962004f452bE9203591991D15f6b388e09E8D0:
|
|
pool ...is blacklisted: failed to call token1() - non-standard pool contract
|
|
```
|
|
|
|
**Issue:**
|
|
- Some pools don't follow standard ERC-20 interface
|
|
- token0(), token1() calls fail
|
|
- No graceful fallback to skip these pools
|
|
|
|
**Current Handling:** Blacklisting (correct), but error message suggests filtering could be better
|
|
|
|
---
|
|
|
|
## Why All Opportunities Show "Not Executable"
|
|
|
|
### Call Chain:
|
|
1. Swap event detected ✅
|
|
2. Opportunity analyzed ✅
|
|
3. **Pool validation triggered for executability check**
|
|
- Attempts to fetch reserve data
|
|
- RPC call fails or times out
|
|
- **Execution marked as false (default)**
|
|
4. Opportunity logged with `isExecutable:false`
|
|
|
|
### The Critical Issue:
|
|
When pool data can't be fetched, the system **doesn't return proper error context** for intelligent filtering. Instead, it:
|
|
- Returns nil reserves
|
|
- Marks as non-executable
|
|
- Doesn't distinguish between:
|
|
- "Pool doesn't exist" (skip)
|
|
- "RPC timeout" (retry)
|
|
- "Non-standard pool" (blacklist)
|
|
|
|
---
|
|
|
|
## System Status
|
|
|
|
### Watch Script Output
|
|
- **Opportunities Detected**: 347+
|
|
- **Executable**: 0 (all failing pool validation)
|
|
- **Executions**: 0
|
|
- **Errors**: 0 (watch script filters out expected warnings)
|
|
|
|
### Logs Status
|
|
```
|
|
2025/11/03 14:16:11 - Present
|
|
✅ Watch script successfully reading logs
|
|
✅ Opportunity detection working
|
|
❌ Pool validation blocking all executions
|
|
```
|
|
|
|
---
|
|
|
|
## Solution Strategy
|
|
|
|
### Phase 1: Immediate (Next 30 minutes)
|
|
1. **Increase batch fetch timeout** from 10s to 30s
|
|
2. **Implement exponential backoff** for retry logic
|
|
3. **Add proper error context** to distinguish error types
|
|
|
|
### Phase 2: Short-term (Next hour)
|
|
1. **Fix RPC endpoint configuration** if primary is down
|
|
2. **Implement batch caching** to avoid repeated failures
|
|
3. **Add pool pre-validation** before RPC queries
|
|
|
|
### Phase 3: Medium-term (Today)
|
|
1. **Smart pool filtering** - skip known bad contracts early
|
|
2. **Improved monitoring** - track pool failure patterns
|
|
3. **Emergency fallback** - use backup RPC providers
|
|
|
|
---
|
|
|
|
## Affected Code Files
|
|
|
|
| File | Issue | Priority |
|
|
|------|-------|----------|
|
|
| `pkg/datafetcher/batch_fetcher.go` | 10s timeout, no backoff | HIGH |
|
|
| `pkg/scanner/market/scanner.go` | No error context in pool fetch | HIGH |
|
|
| `pkg/scanner/market/pool_validator.go` | Pre-validation could filter better | MEDIUM |
|
|
| `pkg/uniswap/multicall.go` | No fallback for failed calls | MEDIUM |
|
|
|
|
---
|
|
|
|
## Metrics to Track
|
|
|
|
- Pool fetch success rate (target: >95%)
|
|
- RPC timeout frequency (target: <1%)
|
|
- Pool blacklist size (current: ~10-15)
|
|
- Opportunity executability rate (current: 0%, target: >5%)
|
|
|
|
---
|
|
|
|
## Next Actions
|
|
|
|
1. Read batch fetcher timeout configuration
|
|
2. Implement improved error handling
|
|
3. Add retry logic with backoff
|
|
4. Test with current opportunity stream
|
|
5. Monitor for improvement in executability rate
|