# MEV & Profitability Research on Arbitrum

## Purpose
- Aggregate methodology, tooling, and findings related to identifying MEV and profit opportunities on Arbitrum.
- Provide reproducible guidance so agents can extend experiments without duplicating work.

## Current Capabilities Snapshot
- **Core services**: `cmd/mev-bot`, `pkg/arbitrage`, `pkg/transport`, `pkg/scanner`, and `pkg/profitcalc` implement the live pipeline.
- **Monitoring & reporting**: `internal/monitoring`, Prometheus dashboards, and `docs/8_reports/` capture historic profitability metrics.
- **Simulation tooling**: `tools/simulation`, `make simulate-profit`, and artifacts under `reports/simulation/` enable backtesting.

## Research Tracks
### 1. DEX Price Arbitrage
- Targets: Uniswap v3, Camelot, Sushi, GMX spot pools.
- Signals: Pool reserves, swap events, TWAP deltas, cross-pair spreads.
- KPIs: Expected profit per block, win rate, gas/priority fee sensitivity.

### 2. Liquidation Monitoring
- Targets: Aave, Radiant, other Arbitrum lending markets.
- Signals: Health factor drift, oracle price updates, pending liquidation calls.
- KPIs: Post-liquidation slippage, competing bot density, execution latency.

### 3. Cross-Domain / Cross-Chain Opportunities
- Scenarios: L1↔L2 basis gaps, bridge delays, stablecoin depegs.
- Signals: L1 oracle vs L2 pool divergence, bridge queue depth, sequencer backlog.
- KPIs: Net basis capture, transfer latency risk, capital lock-up duration.

### 4. Latency & Order-Flow Strategies *(ethics review required)*
- Includes sandwiching, back-running, private order flow analysis.
- Emphasise legal and policy review before experimentation.

## External Research Snapshot (as of 2025-10-19)
- **Timeboost express lane audit (Sep 2025):** Analysis of ~11.5M auctions found over 90% won by two participants, 22% revert rates, weakening secondary markets, and declining DAO revenue—indicating current Timeboost design is centralising order flow and underperforming fairness objectives.
- **Spam-based arbitrage on fast-finality rollups (Jun 2025):** Shows splitting MEV into many micro transactions remains optimal post-Dencun; on Arbitrum, 80% of reverted swaps concentrate in USDC/WETH pairs and cluster at block tops, signalling a sustained latency race outside priority-fee auctions.
- **Optimistic MEV measurement (Jun 2025):** Quantifies "on-chain probe" strategies driving 7% of Arbitrum gas usage in Q1 2025 despite limited fee contribution—highlighting speculative load on sequencers and sensitivity to volatility and aggregator activity.
- **Cross-chain arbitrage taxonomy (Jan 2025):** Longitudinal study across nine chains attributes ~32% of observed events to bridge-based moves, yielding a conservative $9.5M profit lower bound; provides a baseline for assessing Arbitrum cross-domain MEV defences.
- **Sequencer profit sustainability (Mar 2025):** DAO-commissioned report decomposes sequencer revenues/costs (including blob and L1 settlement fees) and stresses integrating Timeboost and orderflow auctions into long-term economic planning.
- **Community proposals and dashboards (Apr–Sep 2025):** FairFlow proposal aims to adjust Timeboost parameters for broader participation; community analytics suggest Timeboost revenue is nearing parity with base fees (~$1M/month) with potential to reach $100M annually if adoption expands.

*Actionable follow-up*: Integrate insights above into experiment backlog—e.g., replicate Timeboost revert analysis locally, extend spam-detection metrics in `pkg/scanner`, and simulate bridge-based arbitrage using the cross-chain taxonomy as benchmarks.

## Data Sources & Access Checklist
- **On-chain RPC/archive**: Document credentials (Alchemy, Infura, self-hosted nodes) and rate limits.
- **Mempool / private relays**: Track availability of Flashbots-style endpoints or sequencer feeds.
- **Historical datasets**: Record storage locations under `data/` (Parquet/CSV), retention policies, refresh cadence.
- **Off-chain signals**: Centralised exchange order books, funding rates, oracle feeds.

### Dataset Inventory (Initial)
| Path | Description | Refresh Cadence | Notes |
| --- | --- | --- | --- |
| `data/pools.txt` | Seed list of Arbitrum liquidity pool addresses (Uniswap v3, Sushi, Camelot). | Manual | Generated October 2025; extend with TVL, fee tier metadata before backtests. |
| `data/raw_arbitrum_portal_projects.json` | Raw Arbitrum Portal `/api/projects` export (all categories). | Pull ad hoc | Auto-fetched by `make refresh-mev-datasets` (or run `curl -s https://portal-data.arbitrum.io/api/projects > data/raw_arbitrum_portal_projects.json`). |
| `datasets/arbitrum_llama_exchanges.csv` | DeFiLlama snapshot of all Arbitrum Dex/Derivatives/Options protocols. | Pull ad hoc | Generated via `pull_llama_exchange_snapshot.py` (run automatically by `make refresh-mev-datasets`). |
| `datasets/arbitrum_portal_exchanges.csv` | Portal-derived exchange list filtered to DEX / Aggregator / Perps / Options / Derivatives. | Pull ad hoc | Generated 2025-10-19 via helper script (see below); retains project IDs, chains, URLs. |
| `datasets/arbitrum_llama_exchange_subset.csv` | DeFiLlama exchange slice limited to Dexs / DEX Aggregator / Derivatives / Options categories. | Pull ad hoc | Rebuilt 2025-10-19 from `arbitrum_llama_exchanges.csv` for easier joins (source CSV generated via API pull). |
| `datasets/arbitrum_exchange_sources.csv` | Combined view of Portal + DeFiLlama exchanges with `sources` flag. | Derived | Regenerate after refreshing either upstream dataset to track coverage gaps. |
| `datasets/arbitrum_lending_markets.csv` | Lending/CDP venues on Arbitrum with TVL + borrowed balances, audit coverage, and oracle support. | Pull ad hoc | Generated 2025-10-19 via `update_market_datasets.py`; derive liquidation watchlists and oracle dependencies. |
| `datasets/arbitrum_bridges.csv` | Bridge + cross-chain routing protocols exposing Arbitrum liquidity with share-of-TVL metrics. | Pull ad hoc | Generated 2025-10-19 via `update_market_datasets.py`; baseline for cross-domain arbitrage monitoring. |
| `reports/simulation/latest/summary.md` | Most recent profitability simulation output. | Per simulation run | Use as baseline for comparing new opportunity vectors. |
| `reports/simulation/latest/summary.json` | Machine-readable KPIs from latest simulation. | Per simulation run | Ingest into notebooks for longitudinal analysis. |
| `reports/ci/` | CI pipeline logs (lint, gosec, etc.). | Per pipeline run | Useful when correlating security changes with profitability regressions. |

#### Exchange Dataset Refresh Workflow
Run the following from repo root whenever Portal or DeFiLlama listings change:
```bash
# 1. Pull latest Portal catalogue
curl -s https://portal-data.arbitrum.io/api/projects > data/raw_arbitrum_portal_projects.json

# 2. Refresh all MEV research datasets (validates prerequisites automatically)
make refresh-mev-datasets
```
`scripts/refresh-mev-datasets.sh` orchestrates the Python regenerators, fetching the latest Portal catalogue and DeFiLlama snapshot before rebuilding downstream CSVs. Set `SKIP_PORTAL_FETCH=1` if you already staged a customised Portal dump; direct invocation (`pull_llama_exchange_snapshot.py`, `update_exchange_datasets.py`, `update_market_datasets.py`) remains available for bespoke filters.

## Methodology Template
1. Define hypothesis & expected alpha source.
2. Enumerate required datasets & tooling (ETL scripts, simulations, live hooks).
3. Implement deterministic data extraction (commit scripts to `tools/` or `scripts/`).
4. Run analysis/backtests; save notebooks or summaries under `reports/research/`.
5. Evaluate results (KPIs, risk, infrastructure requirements).
6. Record follow-up tasks, blockers, and owners.

### Experiment Log Format
```
YYYY-MM-DD – <experiment title>
Hypothesis:
Setup:
Datasets:
Results:
Risks/Assumptions:
Next Steps:
Artifacts: reports/research/YYYY-MM-DD_<slug>.md
```

### Repository Structure
- `experiments/` – Checked-in summaries of completed experiments (one markdown per study).
- `datasets/` – Documentation of raw/processed datasets leveraged during research.
- `tooling/` – Notes on scripts, notebooks, and automation supporting experiments.
- `reports/research/` (repo root) – Canonical location for detailed experiment artifacts referenced above.

**Related datasets:**
- `datasets/arbitrum_exchanges.md` – narrative breakdown of major Arbitrum exchanges with metrics and citations.
- `datasets/arbitrum_exchanges.csv` – structured CSV for ingesting exchange metadata (variant, category, key notes, source URL).
- `datasets/arbitrum_llama_exchanges.csv` – DeFiLlama snapshot of all Arbitrum Dex/Derivatives/Options protocols (re-generated automatically from the protocols API).
- `datasets/arbitrum_portal_exchanges.csv` – machine-readable Arbitrum Portal exchange list (DEX/Perps/Options/Derivatives).
- `datasets/arbitrum_exchange_sources.csv` – merged Portal + DeFiLlama source map with gap indicators.
- `datasets/arbitrum_lending_markets.csv` – liquidation/borrowing venue roster with Arbitrum TVL + borrowed metrics and oracle coverage.
- `datasets/arbitrum_bridges.csv` – cross-domain bridge inventory with Arbitrum share-of-liquidity statistics for basis/opportunity tracking.
- `verification/arbitrum_pool_verifications.md` – verification status tracker for high-priority pools/routers (link back to contract audits).

## Tooling Inventory
- **Collection**: Extend `pkg/scanner`, `pkg/events`, and custom scripts under `scripts/` to ingest new pools or lending data.
- **Simulation**: Use `tools/simulation` with new vector captures; document command variants.
- **Analytics**: Prefer reproducible notebooks or Go/Polars pipelines; store outputs under `reports/research/`.
- **Security constraints**: Align experiments with `pkg/security` (rate limiting, key usage); update `TODO_AUDIT_FIX.md` if additional permissions are required.

## Compliance & Safety
- Respect RPC provider ToS and relevant regulations (front-running, market manipulation). 
- Avoid storing private keys or sensitive order flow in shared logs; follow `docs/6_operations/SECURITY.md`.
- Coordinate with stakeholders before testing intrusive strategies (e.g., sandwiching live users).

## Immediate Next Actions
1. Inventory existing Arbitrum datasets and document access details here.
2. Select an initial research question (e.g., Uniswap ↔ Camelot price divergence).
3. Capture a baseline simulation run; archive outputs under `reports/research/`.
4. Append checklist items within this document as work progresses.