# HIGH-002: Race Condition Fixes - Detailed Fix Plan **Issue ID:** HIGH-002 **Category:** Security **Priority:** High **Status:** Not Started **Generated:** October 9, 2025 **Estimate:** 4-5 hours ## Overview This plan addresses multiple race condition vulnerabilities in critical code paths, particularly around shared state access and synchronization. Race conditions can lead to data corruption, inconsistent states, and security vulnerabilities in concurrent environments. ## Affected Files and Areas - `pkg/security/keymanager.go:481,526,531` - Atomic operation consistency - `pkg/arbitrage/service.go` - Shared state protection - `pkg/scanner/concurrent.go` - Worker pool synchronization - `pkg/transport/provider_manager.go` - Connection state management ## Implementation Tasks ### 1. Review All Shared State Access Patterns **Task ID:** HIGH-002.1 **Time Estimate:** 1.5 hours **Dependencies:** None Conduct comprehensive review of shared state access patterns: - Identify all shared variables and data structures - Map access patterns (read/write operations) - Document current synchronization mechanisms - Identify potential race conditions - Assess risk level for each identified race condition **Focus Areas:** - In `pkg/security/keymanager.go`: Review all concurrent access to key data structures - In `pkg/arbitrage/service.go`: Examine order book and pricing data access - In `pkg/scanner/concurrent.go`: Analyze worker state and result handling - In `pkg/transport/provider_manager.go`: Evaluate connection pool management ### 2. Replace Inconsistent Atomic Usage with Proper Synchronization **Task ID:** HIGH-002.2 **Time Estimate:** 1.5 hours **Dependencies:** HIGH-002.1 Fix atomic operation inconsistencies in `pkg/security/keymanager.go:481,526,531`: - Replace inappropriate atomic operations with mutexes where needed - Ensure atomic operations are used correctly for simple operations - Consolidate access patterns to a consistent approach - Add proper synchronization for complex shared state ```go // Example of fixing atomic usage inconsistency type KeyManager struct { mu sync.RWMutex keys map[string]*ecdsa.PrivateKey counter *atomic.Int64 // Use atomic for simple counters } func (km *KeyManager) GetKey(id string) (*ecdsa.PrivateKey, error) { km.mu.RLock() defer km.mu.RUnlock() key, exists := km.keys[id] if !exists { return nil, fmt.Errorf("key not found: %s", id) } return key, nil } func (km *KeyManager) IncrementCounter() { km.counter.Add(1) // Proper atomic operation } ``` ### 3. Add Race Detection Tests to CI Pipeline **Task ID:** HIGH-002.3 **Time Estimate:** 0.5 hours **Dependencies:** HIGH-002.1, HIGH-002.2 Implement race detection in the CI pipeline: - Add `-race` flag to all Go test commands - Configure race detection for integration tests - Set up automated race condition testing - Monitor for race conditions in pull requests ### 4. Implement Proper Read-Write Lock Usage **Task ID:** HIGH-002.4 **Time Estimate:** 1 hour **Dependencies:** HIGH-002.1, HIGH-002.2 Replace basic mutexes with appropriate read-write locks where the usage pattern is predominantly read-access: - In arbitrage service for order book data - In scanner for cached results - In provider manager for connection state - Optimize for read-heavy scenarios ```go // Example implementation of proper RWLock usage type ArbitrageService struct { mu sync.RWMutex orderBooks map[string]*OrderBook prices map[string]*big.Float } func (as *ArbitrageService) GetPrice(pair string) (*big.Float, error) { as.mu.RLock() // Use read lock for read operations defer as.mu.RUnlock() price, exists := as.prices[pair] if !exists { return nil, fmt.Errorf("price not found for pair: %s", pair) } return new(big.Float).Set(price), nil } func (as *ArbitrageService) UpdatePrice(pair string, price *big.Float) error { as.mu.Lock() // Use write lock for updates defer as.mu.Unlock() as.prices[pair] = new(big.Float).Set(price) return nil } ``` ### 5. Conduct Comprehensive Race Condition Testing **Task ID:** HIGH-002.5 **Time Estimate:** 0.5 hours **Dependencies:** HIGH-002.2, HIGH-002.3, HIGH-002.4 Perform stress testing for race conditions: - High-concurrency unit tests - Load testing with concurrent access patterns - Long-running integration tests - Manual verification of synchronized access ## Detailed Implementation Steps ### In `pkg/security/keymanager.go`: ```go import ( "sync" "sync/atomic" ) type KeyManager struct { mu sync.RWMutex keys map[string]*ecdsa.PrivateKey counter int64 // Changed from inconsistent usage // ... other fields } func (km *KeyManager) GetKey(id string) (*ecdsa.PrivateKey, error) { km.mu.RLock() defer km.mu.RUnlock() key, exists := km.keys[id] if !exists { return nil, fmt.Errorf("key not found") } return key, nil } func (km *KeyManager) UpdateCounter() { atomic.AddInt64(&km.counter, 1) // Proper atomic usage } // Fix lines 481, 526, 531 to use appropriate synchronization func (km *KeyManager) ProcessKey(id string) error { km.mu.Lock() defer km.mu.Unlock() key, exists := km.keys[id] if !exists { return fmt.Errorf("key does not exist") } // Process key operations with mutex held // ... return nil } ``` ### In `pkg/arbitrage/service.go`: ```go type ArbitrageService struct { mu sync.RWMutex orderBooks map[string]*OrderBook strategies sync.Map // Use sync.Map for concurrent access // ... other fields } func (as *ArbitrageService) UpdateOrderBook(pair string, book *OrderBook) error { as.mu.Lock() defer as.mu.Unlock() as.orderBooks[pair] = book return nil } func (as *ArbitrageService) GetOrderBook(pair string) (*OrderBook, error) { as.mu.RLock() defer as.mu.RUnlock() book, exists := as.orderBooks[pair] if !exists { return nil, fmt.Errorf("order book not found for pair %s", pair) } return book, nil } ``` ### In `pkg/scanner/concurrent.go`: ```go type ScannerWorkerPool struct { mu sync.Mutex workers []*Worker results chan *ScanResult isActive atomic.Bool // ... other fields } func (swp *ScannerWorkerPool) AddWorker(w *Worker) { swp.mu.Lock() defer swp.mu.Unlock() swp.workers = append(swp.workers, w) } func (swp *ScannerWorkerPool) SubmitResult(result *ScanResult) { // Use non-blocking send or handle channel full select { case swp.results <- result: default: // Handle full channel - log error or implement backup strategy } } ``` ## Testing Strategy - Unit tests with high concurrency - Integration tests with race detection enabled - Stress testing with concurrent access patterns - Manual code review for synchronization logic ## Code Review Checklist - [ ] All shared state is properly synchronized - [ ] Atomic operations used appropriately for simple values - [ ] Read-write locks used for read-heavy scenarios - [ ] Mutexes used for complex state changes - [ ] Race condition tests pass with -race flag - [ ] No deadlocks introduced - [ ] Performance impact is acceptable ## Rollback Strategy If issues arise after deployment: 1. Revert synchronization changes 2. Temporarily run in single-threaded mode for critical operations 3. Monitor performance and stability metrics ## Success Metrics - No race conditions detected with -race flag enabled - All concurrent tests pass consistently - No performance degradation beyond acceptable thresholds - No deadlocks or lock contention issues - Consistent state across all shared resources