mev-beta/docs/IMPLEMENTATION_GUIDE_L2_OPTIMIZATIONS.md

# Layer 2 Optimizations Implementation Guide

**Created:** 2025-11-01
**Status:** Ready for Phase 1
**Risk Level:** Low (All changes are non-breaking)

---

## Overview

This guide provides step-by-step instructions for implementing Arbitrum-specific optimizations based on our comprehensive Layer 2 research. All changes are **non-breaking** and can be rolled back if needed.

---

## Quick Start

### Step 1: Review Research
Read: `docs/L2_MEV_BOT_RESEARCH_REPORT.md`
- Validates our current implementation ✅
- Identifies non-breaking improvements 🟡
- Provides competitive analysis 📊

### Step 2: Test Configuration
```bash
# Backup current config
cp config/arbitrum_production.yaml config/arbitrum_production.yaml.backup

# Test merge optimized config (dry run)
./scripts/validate-l2-config.sh --dry-run

# Apply Phase 1 optimizations
./scripts/apply-l2-optimizations.sh --phase 1
```

### Step 3: Monitor Results
```bash
# Watch live with L2-specific metrics
./scripts/watch-l2-metrics.sh

# Compare with baseline
./scripts/compare-performance.sh --baseline before --current after
```

---

## Phase-by-Phase Implementation

### Phase 1: Configuration Tuning (Week 1)
**Effort:** 1-2 hours | **Risk:** Low | **Reversible:** Yes

#### What's Changing
- `opportunity_ttl`: 30s → 5s (tuned for 250ms blocks)
- `max_path_age`: 60s → 10s (tuned for 250ms blocks)
- Add `execution_deadline`: 3s (new parameter)

#### Implementation Steps

**1. Enable Phase 1 in config:**
```yaml
# config/arbitrum_production.yaml

# Add at the end of the file:

# ===== LAYER 2 OPTIMIZATIONS (Phase 1) =====
features:
  use_arbitrum_optimized_timeouts: true
  use_dynamic_ttl: false  # Start with static, enable later

arbitrage_optimized:
  opportunity_ttl: "5s"                # 20 blocks @ 250ms
  max_path_age: "10s"                  # 40 blocks @ 250ms
  execution_deadline: "3s"              # 12 blocks @ 250ms

  # Backward compatibility
  legacy_opportunity_ttl: "30s"        # For rollback
  legacy_max_path_age: "60s"           # For rollback
```

**2. Update code to read new config:**
```go
// internal/config/config.go

type ArbitrageOptimized struct {
    OpportunityTTL      time.Duration `yaml:"opportunity_ttl"`
    MaxPathAge          time.Duration `yaml:"max_path_age"`
    ExecutionDeadline   time.Duration `yaml:"execution_deadline"`

    // Legacy values for rollback
    LegacyOpportunityTTL time.Duration `yaml:"legacy_opportunity_ttl"`
    LegacyMaxPathAge     time.Duration `yaml:"legacy_max_path_age"`
}

type Features struct {
    UseArbitrumOptimizedTimeouts bool `yaml:"use_arbitrum_optimized_timeouts"`
    UseDynamicTTL               bool `yaml:"use_dynamic_ttl"`
}

// In Config struct
type Config struct {
    // ... existing fields ...
    Features            Features           `yaml:"features"`
    ArbitrageOptimized  ArbitrageOptimized `yaml:"arbitrage_optimized"`
}

// Helper to get active TTL
func (c *Config) GetOpportunityTTL() time.Duration {
    if c.Features.UseArbitrumOptimizedTimeouts {
        return c.ArbitrageOptimized.OpportunityTTL
    }
    return c.Arbitrage.OpportunityTTL  // Legacy
}
```

**3. Update arbitrage service:**
```go
// pkg/arbitrage/service.go

func (s *ArbitrageService) isOpportunityValid(opp *types.ArbitrageOpportunity) bool {
    // Use configurable TTL
    ttl := s.config.GetOpportunityTTL()

    age := time.Since(opp.Timestamp)
    if age > ttl {
        s.logger.Debug(fmt.Sprintf(
            "Opportunity expired: age=%s, ttl=%s",
            age, ttl,
        ))
        return false
    }

    return true
}
```

**4. Test Phase 1:**
```bash
# Start bot with Phase 1 config
PROVIDER_CONFIG_PATH=$PWD/config/providers_runtime.yaml timeout 60 ./mev-bot start

# Monitor logs for:
# - Opportunities detected
# - Opportunities expired (should see more due to shorter TTL)
# - Execution attempts
# - Success rate
```

**5. Validate Results:**
```bash
# After 1 hour, compare metrics
./scripts/analyze-l2-phase1.sh

# Check for:
# - Reduced stale opportunity execution ✅
# - Similar or better success rate ✅
# - No increase in errors ✅
```

**6. Rollback if Needed:**
```yaml
# Set in config/arbitrum_production.yaml
features:
  use_arbitrum_optimized_timeouts: false  # Back to legacy
```

---

### Phase 2: Transaction Pre-filtering (Week 2)
**Effort:** 4-6 hours | **Risk:** Medium | **Reversible:** Yes

#### What's Changing
- Filter non-DEX transactions at monitor level
- Expected 80-90% reduction in processed transactions
- Improved latency and reduced CPU usage

#### Implementation Steps

**1. Create DEX filter module:**
```go
// pkg/monitor/dex_filter.go

package monitor

import (
    "encoding/hex"
    "sync"

    "github.com/ethereum/go-ethereum/common"
    "github.com/ethereum/go-ethereum/core/types"
)

type DEXFilter struct {
    knownDEXAddresses map[common.Address]bool
    swapSignatures    map[string]bool
    mu                sync.RWMutex

    // Statistics
    totalTx      uint64
    filteredTx   uint64
    passedTx     uint64
}

func NewDEXFilter(dexAddresses []common.Address, swapSigs []string) *DEXFilter {
    filter := &DEXFilter{
        knownDEXAddresses: make(map[common.Address]bool),
        swapSignatures:    make(map[string]bool),
    }

    // Build lookup maps
    for _, addr := range dexAddresses {
        filter.knownDEXAddresses[addr] = true
    }

    for _, sig := range swapSigs {
        filter.swapSignatures[sig] = true
    }

    return filter
}

func (f *DEXFilter) ShouldProcess(tx *types.Transaction) bool {
    f.mu.Lock()
    f.totalTx++
    f.mu.Unlock()

    // Must have a recipient
    if tx.To() == nil {
        f.incrementFiltered()
        return false
    }

    // Check if recipient is known DEX
    if f.knownDEXAddresses[*tx.To()] {
        f.incrementPassed()
        return true
    }

    // Check function signature
    if len(tx.Data()) >= 4 {
        sig := hex.EncodeToString(tx.Data()[:4])
        if f.swapSignatures[sig] {
            f.incrementPassed()
            return true
        }
    }

    f.incrementFiltered()
    return false
}

func (f *DEXFilter) GetStats() (total, filtered, passed uint64) {
    f.mu.RLock()
    defer f.mu.RUnlock()
    return f.totalTx, f.filteredTx, f.passedTx
}

func (f *DEXFilter) incrementFiltered() {
    f.mu.Lock()
    f.filteredTx++
    f.mu.Unlock()
}

func (f *DEXFilter) incrementPassed() {
    f.mu.Lock()
    f.passedTx++
    f.mu.Unlock()
}
```

**2. Integrate into monitor:**
```go
// pkg/monitor/concurrent.go

type ArbitrumMonitor struct {
    // ... existing fields ...
    dexFilter *DEXFilter
    filterEnabled bool
}

func (m *ArbitrumMonitor) processTransaction(tx *types.Transaction) {
    // Apply filter if enabled
    if m.filterEnabled && !m.dexFilter.ShouldProcess(tx) {
        // Log occasionally (1% sample rate)
        if rand.Float64() < 0.01 {
            m.logger.Debug(fmt.Sprintf(
                "Filtered non-DEX tx: %s to %s",
                tx.Hash().Hex()[:10],
                tx.To().Hex()[:10],
            ))
        }
        return
    }

    // Process as normal
    m.processSwapTransaction(tx)
}

// Periodic stats logging
func (m *ArbitrumMonitor) logFilterStats() {
    total, filtered, passed := m.dexFilter.GetStats()
    filterRate := float64(filtered) / float64(total) * 100

    m.logger.Info(fmt.Sprintf(
        "DEX Filter Stats: total=%d, passed=%d (%.1f%%), filtered=%d (%.1f%%)",
        total, passed, 100-filterRate, filtered, filterRate,
    ))
}
```

**3. Enable Phase 2:**
```yaml
# config/arbitrum_production.yaml

features:
  enable_dex_prefilter: true           # Enable filtering
  log_filtered_transactions: true      # Log for monitoring

dex_filter:
  enabled: true
  filter_mode: "whitelist"
  log_filtered: true
  filtered_log_sample_rate: 0.01       # Log 1%
```

**4. Test and Monitor:**
```bash
# Start with filtering enabled
PROVIDER_CONFIG_PATH=$PWD/config/providers_runtime.yaml timeout 60 ./mev-bot start

# Watch filter statistics
tail -f logs/mev_bot.log | grep "DEX Filter Stats"

# Should see ~80-90% filtering rate
# Example: filtered=890, passed=110 (11%)
```

**5. Validate No Missed Opportunities:**
```bash
# Check that we're not filtering profitable transactions
./scripts/validate-filter-accuracy.sh

# Reviews:
# - Opportunities before filtering: X
# - Opportunities after filtering: Y
# - Difference should be <1%
```

---

### Phase 3: Sequencer Feed (Week 3)
**Effort:** 8-12 hours | **Risk:** Medium | **Reversible:** Yes

#### Status
⏸️ **Defer to Phase 3** - Requires more extensive testing

#### Reason
- Direct sequencer feed monitoring requires careful testing
- Risk of connection issues impacting operations
- Phase 1 & 2 provide significant improvements already

#### Future Implementation
When ready for Phase 3, see `docs/L2_MEV_BOT_RESEARCH_REPORT.md` Section 3.2 for detailed implementation plan.

---

### Phase 4-5: Timeboost (Month 2+)
**Effort:** 16-24 hours | **Risk:** High | **Reversible:** Partial

#### Status
🔮 **Future Feature** - Only if competition demands

#### Decision Criteria
Implement Timeboost if:
1. ✅ Phases 1-2 deployed successfully
2. ✅ Consistently profitable for >1 month
3. ✅ Evidence of opportunities being sniped by express lane users
4. ✅ Average opportunity profit >$100
5. ✅ Sufficient capital for express lane bidding ($1000+ reserved)

#### Implementation
See `docs/L2_MEV_BOT_RESEARCH_REPORT.md` Section 3.1 for complete Timeboost integration guide.

---

## Testing Checklist

### Pre-Deployment Tests
- [ ] Config validates without errors
- [ ] All DEX addresses are correct
- [ ] Swap signatures are complete
- [ ] Backward compatibility config present
- [ ] Rollback procedure tested

### Phase 1 Tests
- [ ] Opportunities expire faster (5s vs 30s)
- [ ] No increase in error rate
- [ ] Similar or better success rate
- [ ] Reduced stale opportunity execution
- [ ] System remains stable

### Phase 2 Tests
- [ ] 80-90% transaction filtering rate
- [ ] No missed DEX transactions
- [ ] Reduced CPU usage
- [ ] Improved latency
- [ ] Filter stats logging works
- [ ] Sample logging at correct rate (1%)

### Performance Tests
- [ ] Load test with 1000+ tx/sec
- [ ] Memory usage stable
- [ ] No goroutine leaks
- [ ] Latency within targets (<200ms)
- [ ] Success rate maintained or improved

---

## Monitoring & Metrics

### Key Metrics to Track

**Phase 1 (Timing):**
- Opportunity TTL hits (count)
- Average opportunity age at execution
- Stale opportunity rejections
- Execution success rate

**Phase 2 (Filtering):**
- Total transactions processed
- Transactions filtered (%)
- Transactions passed (%)
- CPU usage (before/after)
- Memory usage (before/after)
- Average detection latency

**Comparative:**
- Opportunities detected (before/after)
- Opportunities executed (before/after)
- Total profit (before/after)
- Success rate (before/after)

### Monitoring Commands

```bash
# Real-time L2-specific monitoring
./scripts/watch-l2-metrics.sh

# Generate performance report
./scripts/generate-l2-report.sh --period 24h

# Compare with baseline
./scripts/compare-performance.sh \
    --baseline logs/baseline_metrics.json \
    --current logs/current_metrics.json
```

---

## Rollback Procedures

### Emergency Rollback (Immediate)

If critical issues detected:

```bash
# Stop bot
pkill mev-bot

# Restore backup config
cp config/arbitrum_production.yaml.backup config/arbitrum_production.yaml

# Restart
PROVIDER_CONFIG_PATH=$PWD/config/providers_runtime.yaml ./mev-bot start
```

### Feature-Specific Rollback

**Phase 1:**
```yaml
features:
  use_arbitrum_optimized_timeouts: false
```

**Phase 2:**
```yaml
features:
  enable_dex_prefilter: false
```

### Automatic Rollback

The system includes automatic rollback on high failure rate:

```yaml
legacy_config:
  auto_rollback_on_failure: true
  rollback_threshold_failures: 10  # After 10 consecutive failures
```

---

## Success Criteria

### Phase 1 Success
- ✅ Reduced stale opportunity execution by >50%
- ✅ Maintained or improved success rate
- ✅ No increase in error rate
- ✅ System stability maintained

### Phase 2 Success
- ✅ 80-90% transaction filtering achieved
- ✅ <1% missed DEX transactions
- ✅ >30% reduction in CPU usage
- ✅ >20% improvement in detection latency
- ✅ No degradation in opportunity detection

### Overall Success
- ✅ Maintained profitability
- ✅ Improved competitive position
- ✅ Reduced resource usage
- ✅ Better alignment with L2 characteristics
- ✅ No breaking changes or downtime

---

## Troubleshooting

### Issue: Increased Opportunity Expiration
**Symptom:** Many opportunities expiring before execution
**Cause:** TTL too short (5s might be aggressive)
**Fix:**
```yaml
arbitrage_optimized:
  opportunity_ttl: "7s"  # Increase to 7s (28 blocks)
```

### Issue: Filter Missing Opportunities
**Symptom:** Fewer opportunities detected with filter enabled
**Fix:**
1. Check filtered transaction logs
2. Identify missed DEX addresses or signatures
3. Add to filter configuration
4. Redeploy

### Issue: High CPU Usage with Filter
**Symptom:** CPU usage higher than expected
**Cause:** Inefficient filter lookups
**Fix:**
```yaml
dex_filter:
  cache_lookups: true
  cache_ttl: "5m"
```

---

## Next Steps After Deployment

1. **Week 1:** Deploy Phase 1, monitor for 7 days
2. **Week 2:** If Phase 1 successful, deploy Phase 2
3. **Week 3:** Monitor combined Phase 1+2 performance
4. **Week 4:** Gather data for Phase 3 decision
5. **Month 2+:** Evaluate Timeboost based on competition

---

## Support & Documentation

- **Research Report:** `docs/L2_MEV_BOT_RESEARCH_REPORT.md`
- **Configuration:** `config/arbitrum_optimized.yaml`
- **Scripts:** `scripts/l2-*.sh`
- **Monitoring:** `scripts/watch-l2-metrics.sh`

---

**Status:** ✅ Ready for Phase 1 Deployment
**Risk Level:** 🟢 Low (Non-Breaking Changes)
**Estimated Impact:** 📈 20-30% Performance Improvement