# Dataset Notes Document raw and processed data sources used for MEV research. Each entry should cover: - Source / acquisition method - Schema or key fields - Refresh cadence and retention policy - Storage path (e.g., `data/`, `reports/`) ## Current Datasets - `arbitrum_exchanges.md`: Narrative overview of leading Arbitrum exchanges (spot, aggregator, derivatives, options) with citations and contextual analytics. - `arbitrum_exchanges.csv`: Structured table of exchange variants, categories, feature notes, and source URLs for downstream ingestion. - `arbitrum_llama_exchanges.csv`: Auto-generated snapshot (288 rows as of 2025-10-19) of every Arbitrum protocol tagged as Dexs/Derivatives/DEX Aggregator/Options on DeFiLlama, including slug, website, Twitter, and current Arbitrum TVL for coverage validation. - `data/raw_arbitrum_portal_projects.json`: Full Arbitrum Portal `/api/projects` dump captured on 2025-10-19 (631 KB); refresh with `curl -s https://portal-data.arbitrum.io/api/projects`. - `arbitrum_portal_exchanges.csv`: Filtered list (151 rows) of Portal projects whose subcategories include `DEX`, `DEX Aggregator`, `Perpetuals`, `Options`, `Derivatives`, or `Centralized Exchange`; retains project IDs, chains, and URLs. - `arbitrum_llama_exchange_subset.csv`: Normalised slice of the DeFiLlama export limited to Dexs / DEX Aggregator / Derivatives / Options categories for quicker joins (288 rows). - `arbitrum_exchange_sources.csv`: Canonical merge of Portal + DeFiLlama exchanges with source flags so coverage gaps are easy to spot (409 merged rows). - `arbitrum_lending_markets.csv`: Snapshot of Arbitrum-enabled lending/CDP venues from the DeFiLlama protocols API, including chain coverage, TVL, borrowed balances, audit status, and oracle usage (147 rows as of 2025-10-19). - `arbitrum_bridges.csv`: Catalog of bridge and cross-chain routing protocols touching Arbitrum with per-chain TVL allocation and governance metadata (63 rows as of 2025-10-19). - `verification/arbitrum_pool_verifications.md`: Filtered short list of priority pools/routers with contract verification status snapshots (updated 2025-10-19); moved under the verification workspace. ### Refresh scripts - `pull_llama_exchange_snapshot.py`: Downloads the DeFiLlama protocols catalogue and writes `arbitrum_llama_exchanges.csv` for downstream joins. - `scripts/refresh-mev-datasets.sh`: Coordinated runner that fetches the latest Portal catalogue (unless `SKIP_PORTAL_FETCH=1`), pulls the DeFiLlama snapshot, and executes both dataset generators—exposed via `make refresh-mev-datasets`. - `update_exchange_datasets.py`: Rebuild exchange CSVs from saved Arbitrum Portal + DeFiLlama exports. - `update_market_datasets.py`: Online fetch of DeFiLlama protocols to surface lending/CDP and bridge datasets for liquidation and cross-domain research prep.