Files
mev-beta/docs/planning/06_MEDIUM-001_Rate_Limiting_Enhancement_Plan.md
Krypto Kajun 850223a953 fix(multicall): resolve critical multicall parsing corruption issues
- Added comprehensive bounds checking to prevent buffer overruns in multicall parsing
- Implemented graduated validation system (Strict/Moderate/Permissive) to reduce false positives
- Added LRU caching system for address validation with 10-minute TTL
- Enhanced ABI decoder with missing Universal Router and Arbitrum-specific DEX signatures
- Fixed duplicate function declarations and import conflicts across multiple files
- Added error recovery mechanisms with multiple fallback strategies
- Updated tests to handle new validation behavior for suspicious addresses
- Fixed parser test expectations for improved validation system
- Applied gofmt formatting fixes to ensure code style compliance
- Fixed mutex copying issues in monitoring package by introducing MetricsSnapshot
- Resolved critical security vulnerabilities in heuristic address extraction
- Progress: Updated TODO audit from 10% to 35% complete

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 00:12:55 -05:00

9.1 KiB

MEDIUM-001: Rate Limiting Enhancement - Detailed Fix Plan

Issue ID: MEDIUM-001
Category: Security
Priority: Medium
Status: Not Started
Generated: October 9, 2025
Estimate: 3-4 hours

Overview

This plan enhances rate limiting mechanisms to prevent abuse and ensure fair resource usage. The implementation will include sliding window rate limiting, distributed rate limiting support, adaptive rate limiting, and bypass detection capabilities.

Current Implementation Issues

  • Basic rate limiting in pkg/security/keymanager.go:781-823
  • No distributed rate limiting for multiple instances
  • Static rate limits that don't adapt to system load
  • No detection mechanism for rate limiting bypass attempts

Implementation Tasks

1. Implement Sliding Window Rate Limiting

Task ID: MEDIUM-001.1
Time Estimate: 1.5 hours
Dependencies: None

Replace basic rate limiting with sliding window implementation in pkg/security/keymanager.go:781-823:

  • Implement sliding window algorithm for more accurate rate limiting
  • Track request timestamps within the sliding window
  • Calculate requests per time unit dynamically
  • Maintain accuracy across time boundaries
import (
    "sync"
    "time"
)

type SlidingWindowRateLimiter struct {
    mu          sync.RWMutex
    windowSize  time.Duration
    maxRequests int
    requests    []time.Time
}

func NewSlidingWindowRateLimiter(windowSize time.Duration, maxRequests int) *SlidingWindowRateLimiter {
    return &SlidingWindowRateLimiter{
        windowSize:  windowSize,
        maxRequests: maxRequests,
        requests:    make([]time.Time, 0),
    }
}

func (rl *SlidingWindowRateLimiter) Allow(key string) bool {
    rl.mu.Lock()
    defer rl.mu.Unlock()
    
    now := time.Now()
    
    // Remove requests outside the window
    windowStart := now.Add(-rl.windowSize)
    filteredRequests := make([]time.Time, 0)
    for _, reqTime := range rl.requests {
        if reqTime.After(windowStart) {
            filteredRequests = append(filteredRequests, reqTime)
        }
    }
    rl.requests = filteredRequests
    
    // Check if we're under the limit
    if len(rl.requests) < rl.maxRequests {
        rl.requests = append(rl.requests, now)
        return true
    }
    
    return false
}

func (rl *SlidingWindowRateLimiter) GetRemaining(key string) int {
    rl.mu.RLock()
    defer rl.mu.RUnlock()
    
    now := time.Now()
    windowStart := now.Add(-rl.windowSize)
    
    count := 0
    for _, reqTime := range rl.requests {
        if reqTime.After(windowStart) {
            count++
        }
    }
    
    return rl.maxRequests - count
}

2. Add Distributed Rate Limiting Support

Task ID: MEDIUM-001.2
Time Estimate: 1 hour
Dependencies: MEDIUM-001.1

Implement distributed rate limiting for multiple instances:

  • Use Redis or similar for shared rate limit state
  • Implement distributed sliding window algorithm
  • Handle Redis connection failures gracefully
  • Provide fallback to in-memory limiting if Redis unavailable
type DistributedRateLimiter struct {
    localLimiter   *SlidingWindowRateLimiter
    redisClient    *redis.Client
    windowSize     time.Duration
    maxRequests    int
}

func (drl *DistributedRateLimiter) Allow(key string) bool {
    // Try distributed rate limiting first
    if drl.redisClient != nil {
        return drl.allowDistributed(key)
    }
    
    // Fall back to local rate limiting
    return drl.localLimiter.Allow(key)
}

func (drl *DistributedRateLimiter) allowDistributed(key string) bool {
    now := time.Now().UnixNano()
    windowStart := now - drl.windowSize.Nanoseconds()
    
    // Use Redis to maintain rate limit state across instances
    pipe := drl.redisClient.Pipeline()
    
    // Remove old entries
    pipe.ZRemRangeByScore("rate_limit:"+key, "0", fmt.Sprintf("%d", windowStart))
    
    // Add current request
    pipe.ZAdd("rate_limit:"+key, &redis.Z{
        Score:  float64(now),
        Member: fmt.Sprintf("%d", now),
    })
    
    // Get count in window
    countCmd := pipe.ZCard("rate_limit:" + key)
    
    // Set expiration
    pipe.Expire("rate_limit:"+key, drl.windowSize)
    
    _, err := pipe.Exec()
    if err != nil {
        // Fallback to local limiter on Redis error
        return drl.localLimiter.Allow(key)
    }
    
    count, err := countCmd.Result()
    if err != nil {
        return drl.localLimiter.Allow(key)
    }
    
    return int(count) <= drl.maxRequests
}

3. Implement Adaptive Rate Limiting

Task ID: MEDIUM-001.3
Time Estimate: 1 hour
Dependencies: MEDIUM-001.1, MEDIUM-001.2

Create adaptive rate limiting based on system load:

  • Monitor system resources (CPU, memory, network)
  • Adjust rate limits based on current load
  • Implement different limits for different user tiers
  • Provide configurable load thresholds
type AdaptiveRateLimiter struct {
    baseLimiter    RateLimiter
    systemMonitor  *SystemMonitor
    loadThresholds LoadThresholds
}

type LoadThresholds struct {
    lowLoad    int  // requests per second when system load is low
    highLoad   int  // requests per second when system load is high
    cpuHigh    int  // CPU percentage considered high
    memHigh    int  // memory percentage considered high
}

func (arl *AdaptiveRateLimiter) Allow(key string) bool {
    systemLoad := arl.systemMonitor.GetSystemLoad()
    
    // Adjust max requests based on system load
    adjustedMaxRequests := arl.calculateAdjustedLimit(systemLoad)
    
    // Create temporary limiter with adjusted values
    tempLimiter := NewSlidingWindowRateLimiter(
        arl.baseLimiter.WindowSize(),
        adjustedMaxRequests,
    )
    
    return tempLimiter.Allow(key)
}

func (arl *AdaptiveRateLimiter) calculateAdjustedLimit(load *SystemLoad) int {
    // If system is under high load, reduce rate limit
    if load.CPU > arl.loadThresholds.cpuHigh || load.Memory > arl.loadThresholds.memHigh {
        return arl.loadThresholds.highLoad
    }
    
    return arl.loadThresholds.lowLoad
}

4. Add Rate Limiting Bypass Detection and Alerting

Task ID: MEDIUM-001.4
Time Estimate: 0.5 hours
Dependencies: MEDIUM-001.1, MEDIUM-001.2, MEDIUM-001.3

Implement monitoring for rate limiting bypass attempts:

  • Detect unusual patterns that might indicate bypass attempts
  • Log suspicious activity for analysis
  • Send alerts for potential bypass attempts
  • Track statistics on bypass detection
func (arl *AdaptiveRateLimiter) detectBypassAttempts(key string, result bool) {
    // Log blocked requests for analysis
    if !result {  // Request was blocked
        // Update metrics
        arl.metrics.IncRateLimitExceeded(key)
        
        // Check for pattern of rapid consecutive requests
        if arl.isBypassPattern(key) {
            arl.logger.Warn("Potential rate limit bypass attempt detected",
                "key", key,
                "timestamp", time.Now().Unix(),
            )
            
            arl.alertSystem.SendAlert("Rate Limit Bypass Attempt", map[string]interface{}{
                "key":       key,
                "timestamp": time.Now().Unix(),
            })
        }
    }
}

func (arl *AdaptiveRateLimiter) isBypassPattern(key string) bool {
    // Implement pattern detection logic
    // This could include things like:
    // - Rapid consecutive blocked requests
    // - Requests from multiple IPs using same key
    // - Requests with unusual timing patterns
    return arl.metrics.GetBlockedRequestsPerMinute(key) > 50
}

Integration with Key Manager

Enhanced Key Manager with Rate Limiting

type KeyManager struct {
    // ... existing fields
    rateLimiter *DistributedRateLimiter
    // ... other fields
}

func (km *KeyManager) SignTransaction(keyID string, tx *types.Transaction) (*types.Transaction, error) {
    // Check rate limit before signing
    if allowed := km.rateLimiter.Allow(keyID); !allowed {
        km.logger.Warn("Rate limit exceeded for key", "keyID", keyID)
        return nil, fmt.Errorf("rate limit exceeded for key %s", keyID)
    }
    
    // Perform the signing operation
    // ... existing signing logic
}

Testing Strategy

  • Unit tests for sliding window algorithm
  • Integration tests for distributed rate limiting
  • Load testing to verify adaptive behavior
  • Negative tests for bypass detection

Code Review Checklist

  • Sliding window algorithm implemented correctly
  • Distributed rate limiting supports multiple instances
  • Adaptive rate limiting responds to system load
  • Bypass detection and alerting implemented
  • Fallback mechanisms for Redis failures
  • Performance impact is acceptable
  • Tests cover all scenarios

Rollback Strategy

If issues arise after deployment:

  1. Disable distributed rate limiting (use local only)
  2. Revert to basic rate limiting implementation
  3. Monitor performance and request patterns

Success Metrics

  • Accurate rate limiting with sliding window
  • Distributed rate limiting working across instances
  • Adaptive rate limiting responding to system load
  • Rate limit bypass attempts detected and logged
  • No performance degradation beyond acceptable limits