14 KiB
LLM Integration Plan
Overview
This document outlines the implementation plan for integrating multiple Large Language Model (LLM) providers into the Copper Tone Technologies platform. The system will allow users to configure API keys for different providers and interact with AI assistants through a unified chat interface.
Supported LLM Providers
- OpenAI (ChatGPT) - GPT-4, GPT-4o, GPT-3.5-turbo
- Google Gemini - Gemini Pro, Gemini Flash
- Anthropic Claude - Claude Sonnet, Opus, Haiku
- Qwen AI - Qwen Max, Qwen Plus
- HuggingFace - Meta Llama 3.3, Mistral, and other open-source models
Architecture
Frontend Components
1. ChatBox Component (components/ui/ChatBox.vue)
- Status: ✅ Created
- Features:
- Floating chat button (bottom-right corner)
- Expandable/collapsible chat window
- Provider selection dropdown
- Message history display
- Typing indicator
- Message input with send button
- Smooth animations and transitions
- Events:
open- Triggered when chat is openedclose- Triggered when chat is closedproviderChange- Emitted when user switches providersmessageSent- Emitted when user sends a message
- Exposed Methods:
addAssistantMessage(content: string)- Add AI response to chatstopTyping()- Stop typing indicator
2. LLM Settings View (views/LLMSettingsView.vue)
- Status: ✅ Created
- Route:
/settings/llm(protected, requires authentication) - Features:
- Grid display of provider cards
- Configuration status indicators
- API key management
- Model selection
- Temperature and max tokens configuration
- Secure API key storage
- Information section about API usage
3. LLM Provider Card (components/views/LLMSettings/LLMProviderCard.vue)
- Status: ✅ Created
- Features:
- Provider icon and description
- Configuration status badge
- Secure API key input (show/hide toggle)
- Model selection with default values
- Advanced settings (temperature, max tokens)
- Edit/Save/Delete actions
- Confirmation dialogs for destructive actions
State Management
LLM Store (stores/llm.ts)
- Status: ✅ Created
- State:
configs- API configurations for each providerisConfigured- Boolean flags indicating which providers are configuredchatHistory- Message history per providererror- Error messagesisLoading- Loading state
- Actions:
loadConfigs()- Load configurations from backendsaveConfig(provider, config)- Save provider configurationdeleteConfig(provider)- Delete provider configurationsendMessage(provider, message)- Send message to LLMclearHistory(provider)- Clear chat historygetDefaultModel(provider)- Get default model for provider
Backend Service
LLM Service (Go)
- Status: ⏳ Needs Implementation
- Location:
backend/functions/llm-service/ - Port: 8085
- Database Tables:
CREATE TABLE llm_configs ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, provider VARCHAR(50) NOT NULL, -- 'openai', 'gemini', 'claude', 'qwen', 'huggingface' api_key_encrypted TEXT NOT NULL, -- Encrypted API key model VARCHAR(100), temperature DECIMAL(3, 2) DEFAULT 0.7, max_tokens INTEGER DEFAULT 2048, created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW(), UNIQUE(user_id, provider) ); CREATE TABLE llm_chat_history ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE, provider VARCHAR(50) NOT NULL, role VARCHAR(20) NOT NULL, -- 'user', 'assistant', 'system' content TEXT NOT NULL, tokens_used INTEGER, created_at TIMESTAMP DEFAULT NOW(), INDEX idx_user_provider (user_id, provider) );
API Endpoints
1. Get All Configurations
GET /llm/configs
Authorization: Bearer {jwt_token}
Response:
{
"configs": {
"openai": {
"model": "gpt-4o",
"temperature": 0.7,
"maxTokens": 2048,
"apiKey": "sk-***" // Masked for security
},
"gemini": { ... }
}
}
2. Save Configuration
POST /llm/config/{provider}
Authorization: Bearer {jwt_token}
Content-Type: application/json
Request:
{
"apiKey": "sk-...",
"model": "gpt-4o",
"temperature": 0.7,
"maxTokens": 2048
}
Response:
{
"success": true,
"message": "Configuration saved"
}
3. Delete Configuration
DELETE /llm/config/{provider}
Authorization: Bearer {jwt_token}
Response:
{
"success": true,
"message": "Configuration deleted"
}
4. Send Chat Message
POST /llm/chat/{provider}
Authorization: Bearer {jwt_token}
Content-Type: application/json
Request:
{
"message": "Hello, how are you?",
"history": [
{ "role": "user", "content": "Previous message", "timestamp": "2025-11-24T..." },
{ "role": "assistant", "content": "Previous response", "timestamp": "2025-11-24T..." }
]
}
Response:
{
"response": "I'm doing well, thank you! How can I assist you today?",
"tokensUsed": 45,
"model": "gpt-4o"
}
5. Get Chat History
GET /llm/history/{provider}?limit=50&offset=0
Authorization: Bearer {jwt_token}
Response:
{
"history": [
{
"id": 123,
"role": "user",
"content": "Hello",
"tokensUsed": 2,
"createdAt": "2025-11-24T..."
},
{
"id": 124,
"role": "assistant",
"content": "Hi! How can I help?",
"tokensUsed": 8,
"createdAt": "2025-11-24T..."
}
],
"total": 150
}
6. Clear Chat History
DELETE /llm/history/{provider}
Authorization: Bearer {jwt_token}
Response:
{
"success": true,
"message": "Chat history cleared"
}
Provider-Specific Implementation
1. OpenAI Integration
Library: github.com/sashabaranov/go-openai
Configuration:
client := openai.NewClient(apiKey)
resp, err := client.CreateChatCompletion(
context.Background(),
openai.ChatCompletionRequest{
Model: config.Model,
Messages: messages,
Temperature: config.Temperature,
MaxTokens: config.MaxTokens,
},
)
Default Model: gpt-4o
API Documentation: https://platform.openai.com/docs/api-reference
2. Google Gemini Integration
Library: github.com/google/generative-ai-go
Configuration:
client, err := genai.NewClient(ctx, option.WithAPIKey(apiKey))
model := client.GenerativeModel(config.Model)
model.SetTemperature(config.Temperature)
model.SetMaxOutputTokens(int32(config.MaxTokens))
resp, err := model.GenerateContent(ctx, genai.Text(message))
Default Model: gemini-2.0-flash-exp
API Documentation: https://ai.google.dev/docs
3. Anthropic Claude Integration
Library: github.com/anthropics/anthropic-sdk-go
Configuration:
client := anthropic.NewClient(anthropic.WithAPIKey(apiKey))
resp, err := client.Messages.Create(context.Background(), anthropic.MessageCreateParams{
Model: config.Model,
Messages: messages,
Temperature: anthropic.Float(config.Temperature),
MaxTokens: config.MaxTokens,
})
Default Model: claude-sonnet-4-5-20250929
API Documentation: https://docs.anthropic.com/claude/reference
4. Qwen AI Integration
Library: HTTP client with API calls to DashScope
Configuration:
// Using DashScope API
url := "https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation"
// Custom HTTP request with API key in Authorization header
Default Model: qwen-max
API Documentation: https://help.aliyun.com/zh/dashscope/
5. HuggingFace Integration
Library: HTTP client with Inference API
Configuration:
url := fmt.Sprintf("https://api-inference.huggingface.co/models/%s", config.Model)
// Custom HTTP request with API key in Authorization header
Default Model: meta-llama/Llama-3.3-70B-Instruct
API Documentation: https://huggingface.co/docs/api-inference/index
Security Considerations
API Key Encryption
- API keys must be encrypted at rest using AES-256-GCM
- Encryption key stored in environment variable
ENCRYPTION_KEY - Keys decrypted only when needed for API calls
- Never return unencrypted API keys to frontend (mask with
sk-***)
Rate Limiting
- Implement per-user rate limiting (e.g., 100 requests/hour)
- Prevent abuse of expensive API calls
- Return 429 status code when rate limit exceeded
Input Validation
- Validate message content (max length, sanitize HTML)
- Validate temperature (0.0 - 2.0)
- Validate max tokens (1 - 32000)
- Sanitize user inputs to prevent injection attacks
Access Control
- Only authenticated users can access LLM features
- Users can only access their own configurations
- JWT token validation on all endpoints
- RBAC: All roles (CLIENT, STAFF, ADMIN) can use chatbot
Cost Management
Token Tracking
- Record tokens used for each request
- Display usage statistics in settings
- Optional: Set per-user token limits
Cost Estimation
- Calculate estimated costs based on provider pricing
- Display warnings when approaching limits
- Provider pricing (approximate):
- OpenAI GPT-4o: $2.50/$10.00 per 1M input/output tokens
- Gemini Flash: Free tier available, $0.075/$0.30 per 1M tokens
- Claude Sonnet: $3.00/$15.00 per 1M input/output tokens
- Qwen: Varies by region
- HuggingFace: Free for limited usage, paid tiers available
Testing Plan
Unit Tests
- Test LLM config CRUD operations
- Test API key encryption/decryption
- Test message history storage
- Test rate limiting logic
Integration Tests
- Test each provider integration with test API keys
- Verify error handling for invalid keys
- Test chat completion flow end-to-end
E2E Tests (Cypress)
- Test opening/closing chatbox
- Test provider selection
- Test sending messages
- Test configuration management in settings
Deployment
Environment Variables
# LLM Service
LLM_SERVICE_PORT=8085
ENCRYPTION_KEY=<64-character-hex-string>
# Database
DB_HOST=db
DB_PORT=5432
DB_USER=user
DB_PASSWORD=password
DB_NAME=coppertone_db
# JWT
JWT_SECRET=<same-as-auth-service>
podman-compose.yml
llm-service:
build:
context: ./backend/functions/llm-service
dockerfile: Containerfile
ports:
- "8085:8080"
environment:
- DB_HOST=db
- DB_USER=user
- DB_PASSWORD=password
- DB_NAME=coppertone_db
- JWT_SECRET=${JWT_SECRET}
- ENCRYPTION_KEY=${ENCRYPTION_KEY}
depends_on:
- db
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
Migration
-- migrations/005_create_llm_tables.up.sql
CREATE TABLE IF NOT EXISTS llm_configs (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
provider VARCHAR(50) NOT NULL,
api_key_encrypted TEXT NOT NULL,
model VARCHAR(100),
temperature DECIMAL(3, 2) DEFAULT 0.7,
max_tokens INTEGER DEFAULT 2048,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
UNIQUE(user_id, provider)
);
CREATE INDEX idx_llm_configs_user ON llm_configs(user_id);
CREATE TABLE IF NOT EXISTS llm_chat_history (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id) ON DELETE CASCADE,
provider VARCHAR(50) NOT NULL,
role VARCHAR(20) NOT NULL,
content TEXT NOT NULL,
tokens_used INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_llm_history_user_provider ON llm_chat_history(user_id, provider, created_at DESC);
Implementation Checklist
Frontend (Completed)
- ChatBox component created
- LLM Settings view created
- LLM Provider Card component created
- LLM store (Pinia) created
- Router updated with settings route
- Add chatbox to dashboard layouts
- Add "AI Assistant Settings" link to dashboard navigation
- Test UI components locally
Backend (To Do)
- Create
backend/functions/llm-service/directory - Initialize Go module
- Implement API key encryption/decryption
- Create database migration (005)
- Implement GET /llm/configs endpoint
- Implement POST /llm/config/{provider} endpoint
- Implement DELETE /llm/config/{provider} endpoint
- Implement POST /llm/chat/{provider} endpoint with all providers
- Implement GET /llm/history/{provider} endpoint
- Implement DELETE /llm/history/{provider} endpoint
- Add rate limiting middleware
- Add input validation
- Create Containerfile
- Update podman-compose.yml
- Add health check endpoint
- Write unit tests
- Write integration tests
Deployment
- Add environment variables to deployment
- Test in staging environment
- Update PRODUCTION_CHECKLIST.md
- Document API endpoints in API documentation
Future Enhancements
- Streaming Responses: Implement Server-Sent Events (SSE) for real-time streaming
- File Attachments: Allow users to upload files for analysis
- Conversation Management: Save and organize multiple conversation threads
- Prompt Templates: Pre-built prompts for common tasks
- Multi-Model Comparison: Send same message to multiple models and compare responses
- Custom System Prompts: Allow users to set custom system prompts per provider
- Usage Analytics Dashboard: Visualize token usage and costs over time
- Admin Monitoring: ADMIN users can see platform-wide LLM usage statistics
Resources
- OpenAI API: https://platform.openai.com/docs
- Google Gemini: https://ai.google.dev/docs
- Anthropic Claude: https://docs.anthropic.com/claude/reference
- Qwen AI: https://help.aliyun.com/zh/dashscope/
- HuggingFace: https://huggingface.co/docs/api-inference/index
Document Version: 1.0.0 Last Updated: 2025-11-24 Status: Frontend Complete, Backend Planned