web-hosts/domains/coppertone.tech/docs/LLM-Integration-Plan.md

# LLM Integration Plan

## Overview

This document outlines the implementation plan for integrating multiple Large Language Model (LLM) providers into the Copper Tone Technologies platform. The system will allow users to configure API keys for different providers and interact with AI assistants through a unified chat interface.

## Supported LLM Providers

1. **OpenAI (ChatGPT)** - GPT-4, GPT-4o, GPT-3.5-turbo
2. **Google Gemini** - Gemini Pro, Gemini Flash
3. **Anthropic Claude** - Claude Sonnet, Opus, Haiku
4. **Qwen AI** - Qwen Max, Qwen Plus
5. **HuggingFace** - Meta Llama 3.3, Mistral, and other open-source models

## Architecture

### Frontend Components

#### 1. ChatBox Component (`components/ui/ChatBox.vue`)
- **Status**: ✅ Created
- **Features**:
  - Floating chat button (bottom-right corner)
  - Expandable/collapsible chat window
  - Provider selection dropdown
  - Message history display
  - Typing indicator
  - Message input with send button
  - Smooth animations and transitions
- **Events**:
  - `open` - Triggered when chat is opened
  - `close` - Triggered when chat is closed
  - `providerChange` - Emitted when user switches providers
  - `messageSent` - Emitted when user sends a message
- **Exposed Methods**:
  - `addAssistantMessage(content: string)` - Add AI response to chat
  - `stopTyping()` - Stop typing indicator

#### 2. LLM Settings View (`views/LLMSettingsView.vue`)
- **Status**: ✅ Created
- **Route**: `/settings/llm` (protected, requires authentication)
- **Features**:
  - Grid display of provider cards
  - Configuration status indicators
  - API key management
  - Model selection
  - Temperature and max tokens configuration
  - Secure API key storage
  - Information section about API usage

#### 3. LLM Provider Card (`components/views/LLMSettings/LLMProviderCard.vue`)
- **Status**: ✅ Created
- **Features**:
  - Provider icon and description
  - Configuration status badge
  - Secure API key input (show/hide toggle)
  - Model selection with default values
  - Advanced settings (temperature, max tokens)
  - Edit/Save/Delete actions
  - Confirmation dialogs for destructive actions

### State Management

#### LLM Store (`stores/llm.ts`)
- **Status**: ✅ Created
- **State**:
  - `configs` - API configurations for each provider
  - `isConfigured` - Boolean flags indicating which providers are configured
  - `chatHistory` - Message history per provider
  - `error` - Error messages
  - `isLoading` - Loading state
- **Actions**:
  - `loadConfigs()` - Load configurations from backend
  - `saveConfig(provider, config)` - Save provider configuration
  - `deleteConfig(provider)` - Delete provider configuration
  - `sendMessage(provider, message)` - Send message to LLM
  - `clearHistory(provider)` - Clear chat history
  - `getDefaultModel(provider)` - Get default model for provider

### Backend Service

#### LLM Service (Go)
- **Status**: ⏳ Needs Implementation
- **Location**: `backend/functions/llm-service/`
- **Port**: 8085
- **Database Tables**:
  ```sql
  CREATE TABLE llm_configs (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    provider VARCHAR(50) NOT NULL, -- 'openai', 'gemini', 'claude', 'qwen', 'huggingface'
    api_key_encrypted TEXT NOT NULL, -- Encrypted API key
    model VARCHAR(100),
    temperature DECIMAL(3, 2) DEFAULT 0.7,
    max_tokens INTEGER DEFAULT 2048,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(user_id, provider)
  );

  CREATE TABLE llm_chat_history (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    provider VARCHAR(50) NOT NULL,
    role VARCHAR(20) NOT NULL, -- 'user', 'assistant', 'system'
    content TEXT NOT NULL,
    tokens_used INTEGER,
    created_at TIMESTAMP DEFAULT NOW(),
    INDEX idx_user_provider (user_id, provider)
  );
  ```

#### API Endpoints

##### 1. Get All Configurations
```
GET /llm/configs
Authorization: Bearer {jwt_token}

Response:
{
  "configs": {
    "openai": {
      "model": "gpt-4o",
      "temperature": 0.7,
      "maxTokens": 2048,
      "apiKey": "sk-***" // Masked for security
    },
    "gemini": { ... }
  }
}
```

##### 2. Save Configuration
```
POST /llm/config/{provider}
Authorization: Bearer {jwt_token}
Content-Type: application/json

Request:
{
  "apiKey": "sk-...",
  "model": "gpt-4o",
  "temperature": 0.7,
  "maxTokens": 2048
}

Response:
{
  "success": true,
  "message": "Configuration saved"
}
```

##### 3. Delete Configuration
```
DELETE /llm/config/{provider}
Authorization: Bearer {jwt_token}

Response:
{
  "success": true,
  "message": "Configuration deleted"
}
```

##### 4. Send Chat Message
```
POST /llm/chat/{provider}
Authorization: Bearer {jwt_token}
Content-Type: application/json

Request:
{
  "message": "Hello, how are you?",
  "history": [
    { "role": "user", "content": "Previous message", "timestamp": "2025-11-24T..." },
    { "role": "assistant", "content": "Previous response", "timestamp": "2025-11-24T..." }
  ]
}

Response:
{
  "response": "I'm doing well, thank you! How can I assist you today?",
  "tokensUsed": 45,
  "model": "gpt-4o"
}
```

##### 5. Get Chat History
```
GET /llm/history/{provider}?limit=50&offset=0
Authorization: Bearer {jwt_token}

Response:
{
  "history": [
    {
      "id": 123,
      "role": "user",
      "content": "Hello",
      "tokensUsed": 2,
      "createdAt": "2025-11-24T..."
    },
    {
      "id": 124,
      "role": "assistant",
      "content": "Hi! How can I help?",
      "tokensUsed": 8,
      "createdAt": "2025-11-24T..."
    }
  ],
  "total": 150
}
```

##### 6. Clear Chat History
```
DELETE /llm/history/{provider}
Authorization: Bearer {jwt_token}

Response:
{
  "success": true,
  "message": "Chat history cleared"
}
```

## Provider-Specific Implementation

### 1. OpenAI Integration

**Library**: `github.com/sashabaranov/go-openai`

**Configuration**:
```go
client := openai.NewClient(apiKey)
resp, err := client.CreateChatCompletion(
  context.Background(),
  openai.ChatCompletionRequest{
    Model: config.Model,
    Messages: messages,
    Temperature: config.Temperature,
    MaxTokens: config.MaxTokens,
  },
)
```

**Default Model**: `gpt-4o`
**API Documentation**: https://platform.openai.com/docs/api-reference

### 2. Google Gemini Integration

**Library**: `github.com/google/generative-ai-go`

**Configuration**:
```go
client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
model := client.GenerativeModel(config.Model)
model.SetTemperature(config.Temperature)
model.SetMaxOutputTokens(int32(config.MaxTokens))
resp, err := model.GenerateContent(ctx, genai.Text(message))
```

**Default Model**: `gemini-2.0-flash-exp`
**API Documentation**: https://ai.google.dev/docs

### 3. Anthropic Claude Integration

**Library**: `github.com/anthropics/anthropic-sdk-go`

**Configuration**:
```go
client := anthropic.NewClient(anthropic.WithAPIKey(apiKey))
resp, err := client.Messages.Create(context.Background(), anthropic.MessageCreateParams{
  Model: config.Model,
  Messages: messages,
  Temperature: anthropic.Float(config.Temperature),
  MaxTokens: config.MaxTokens,
})
```

**Default Model**: `claude-sonnet-4-5-20250929`
**API Documentation**: https://docs.anthropic.com/claude/reference

### 4. Qwen AI Integration

**Library**: HTTP client with API calls to DashScope

**Configuration**:
```go
// Using DashScope API
url := "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation"
// Custom HTTP request with API key in Authorization header
```

**Default Model**: `qwen-max`
**API Documentation**: https://help.aliyun.com/zh/dashscope/

### 5. HuggingFace Integration

**Library**: HTTP client with Inference API

**Configuration**:
```go
url := fmt.Sprintf("https://api-inference.huggingface.co/models/%s", config.Model)
// Custom HTTP request with API key in Authorization header
```

**Default Model**: `meta-llama/Llama-3.3-70B-Instruct`
**API Documentation**: https://huggingface.co/docs/api-inference/index

## Security Considerations

### API Key Encryption
- API keys must be encrypted at rest using AES-256-GCM
- Encryption key stored in environment variable `ENCRYPTION_KEY`
- Keys decrypted only when needed for API calls
- Never return unencrypted API keys to frontend (mask with `sk-***`)

### Rate Limiting
- Implement per-user rate limiting (e.g., 100 requests/hour)
- Prevent abuse of expensive API calls
- Return 429 status code when rate limit exceeded

### Input Validation
- Validate message content (max length, sanitize HTML)
- Validate temperature (0.0 - 2.0)
- Validate max tokens (1 - 32000)
- Sanitize user inputs to prevent injection attacks

### Access Control
- Only authenticated users can access LLM features
- Users can only access their own configurations
- JWT token validation on all endpoints
- RBAC: All roles (CLIENT, STAFF, ADMIN) can use chatbot

## Cost Management

### Token Tracking
- Record tokens used for each request
- Display usage statistics in settings
- Optional: Set per-user token limits

### Cost Estimation
- Calculate estimated costs based on provider pricing
- Display warnings when approaching limits
- Provider pricing (approximate):
  - OpenAI GPT-4o: $2.50/$10.00 per 1M input/output tokens
  - Gemini Flash: Free tier available, $0.075/$0.30 per 1M tokens
  - Claude Sonnet: $3.00/$15.00 per 1M input/output tokens
  - Qwen: Varies by region
  - HuggingFace: Free for limited usage, paid tiers available

## Testing Plan

### Unit Tests
- Test LLM config CRUD operations
- Test API key encryption/decryption
- Test message history storage
- Test rate limiting logic

### Integration Tests
- Test each provider integration with test API keys
- Verify error handling for invalid keys
- Test chat completion flow end-to-end

### E2E Tests (Cypress)
- Test opening/closing chatbox
- Test provider selection
- Test sending messages
- Test configuration management in settings

## Deployment

### Environment Variables
```bash
# LLM Service
LLM_SERVICE_PORT=8085
ENCRYPTION_KEY=<64-character-hex-string>

# Database
DB_HOST=db
DB_PORT=5432
DB_USER=user
DB_PASSWORD=password
DB_NAME=coppertone_db

# JWT
JWT_SECRET=<same-as-auth-service>
```

### podman-compose.yml
```yaml
llm-service:
  build:
    context: ./backend/functions/llm-service
    dockerfile: Containerfile
  ports:
    - "8085:8080"
  environment:
    - DB_HOST=db
    - DB_USER=user
    - DB_PASSWORD=password
    - DB_NAME=coppertone_db
    - JWT_SECRET=${JWT_SECRET}
    - ENCRYPTION_KEY=${ENCRYPTION_KEY}
  depends_on:
    - db
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
    interval: 30s
    timeout: 10s
    retries: 3
```

### Migration
```sql
-- migrations/005_create_llm_tables.up.sql
CREATE TABLE IF NOT EXISTS llm_configs (
  id SERIAL PRIMARY KEY,
  user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  provider VARCHAR(50) NOT NULL,
  api_key_encrypted TEXT NOT NULL,
  model VARCHAR(100),
  temperature DECIMAL(3, 2) DEFAULT 0.7,
  max_tokens INTEGER DEFAULT 2048,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW(),
  UNIQUE(user_id, provider)
);

CREATE INDEX idx_llm_configs_user ON llm_configs(user_id);

CREATE TABLE IF NOT EXISTS llm_chat_history (
  id SERIAL PRIMARY KEY,
  user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  provider VARCHAR(50) NOT NULL,
  role VARCHAR(20) NOT NULL,
  content TEXT NOT NULL,
  tokens_used INTEGER,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_llm_history_user_provider ON llm_chat_history(user_id, provider, created_at DESC);
```

## Implementation Checklist

### Frontend (Completed)
- [x] ChatBox component created
- [x] LLM Settings view created
- [x] LLM Provider Card component created
- [x] LLM store (Pinia) created
- [x] Router updated with settings route
- [ ] Add chatbox to dashboard layouts
- [ ] Add "AI Assistant Settings" link to dashboard navigation
- [ ] Test UI components locally

### Backend (To Do)
- [ ] Create `backend/functions/llm-service/` directory
- [ ] Initialize Go module
- [ ] Implement API key encryption/decryption
- [ ] Create database migration (005)
- [ ] Implement GET /llm/configs endpoint
- [ ] Implement POST /llm/config/{provider} endpoint
- [ ] Implement DELETE /llm/config/{provider} endpoint
- [ ] Implement POST /llm/chat/{provider} endpoint with all providers
- [ ] Implement GET /llm/history/{provider} endpoint
- [ ] Implement DELETE /llm/history/{provider} endpoint
- [ ] Add rate limiting middleware
- [ ] Add input validation
- [ ] Create Containerfile
- [ ] Update podman-compose.yml
- [ ] Add health check endpoint
- [ ] Write unit tests
- [ ] Write integration tests

### Deployment
- [ ] Add environment variables to deployment
- [ ] Test in staging environment
- [ ] Update PRODUCTION_CHECKLIST.md
- [ ] Document API endpoints in API documentation

## Future Enhancements

1. **Streaming Responses**: Implement Server-Sent Events (SSE) for real-time streaming
2. **File Attachments**: Allow users to upload files for analysis
3. **Conversation Management**: Save and organize multiple conversation threads
4. **Prompt Templates**: Pre-built prompts for common tasks
5. **Multi-Model Comparison**: Send same message to multiple models and compare responses
6. **Custom System Prompts**: Allow users to set custom system prompts per provider
7. **Usage Analytics Dashboard**: Visualize token usage and costs over time
8. **Admin Monitoring**: ADMIN users can see platform-wide LLM usage statistics

## Resources

- OpenAI API: https://platform.openai.com/docs
- Google Gemini: https://ai.google.dev/docs
- Anthropic Claude: https://docs.anthropic.com/claude/reference
- Qwen AI: https://help.aliyun.com/zh/dashscope/
- HuggingFace: https://huggingface.co/docs/api-inference/index

---

**Document Version**: 1.0.0
**Last Updated**: 2025-11-24
**Status**: Frontend Complete, Backend Planned