first commit

2026-02-21 03:10:39 +00:00
commit d0aa0c5d63
174 changed files with 21169 additions and 0 deletions
--- a/.taskmaster/tasks/task_012.md
+++ b/.taskmaster/tasks/task_012.md
@@ -0,0 +1,153 @@
+# Task ID: 12
+
+**Title:** Implement Local LLM Provider Integrations
+
+**Status:** pending
+
+**Dependencies:** 11
+
+**Priority:** medium
+
+**Description:** Create adapters for different local LLM providers: Ollama, Gemini CLI, vLLM, and DeepSeek API for request refinement and response filtering.
+
+**Details:**
+
+Create LLM provider implementations:
+
+```typescript
+// providers/ollama.ts
+export class OllamaProvider implements LLMProvider {
+  constructor(private config: { host: string; model: string }) {}
+
+  async refineQuery(params: RefineParams): Promise<any> {
+    const prompt = `You are helping refine a data request.
+Tool: ${params.tool}
+Original request: ${JSON.stringify(params.originalArgs)}
+Context (what the user wants): ${params.context}
+
+Refine this query to be more specific. Output JSON only.`;
+
+    const response = await fetch(`${this.config.host}/api/generate`, {
+      method: 'POST',
+      body: JSON.stringify({ model: this.config.model, prompt, format: 'json' })
+    });
+    return JSON.parse((await response.json()).response);
+  }
+
+  async filterResponse(params: FilterParams): Promise<string> {
+    const prompt = `Filter this data to only include relevant information.
+Query: ${JSON.stringify(params.query)}
+Data: ${JSON.stringify(params.response).slice(0, 10000)}
+
+Extract only the relevant parts. Be concise. Max ${params.maxTokens} tokens.`;
+
+    const response = await fetch(`${this.config.host}/api/generate`, {
+      method: 'POST',
+      body: JSON.stringify({ model: this.config.model, prompt })
+    });
+    return (await response.json()).response;
+  }
+}
+
+// providers/gemini-cli.ts
+export class GeminiCliProvider implements LLMProvider {
+  async refineQuery(params: RefineParams): Promise<any> {
+    const result = await execAsync(
+      `echo '${this.buildPrompt(params)}' | gemini -m gemini-2.0-flash`
+    );
+    return JSON.parse(result.stdout);
+  }
+}
+
+// providers/deepseek.ts
+export class DeepSeekProvider implements LLMProvider {
+  constructor(private apiKey: string) {}
+
+  async refineQuery(params: RefineParams): Promise<any> {
+    const response = await fetch('https://api.deepseek.com/v1/chat/completions', {
+      method: 'POST',
+      headers: {
+        'Authorization': `Bearer ${this.apiKey}`,
+        'Content-Type': 'application/json'
+      },
+      body: JSON.stringify({
+        model: 'deepseek-chat',
+        messages: [{ role: 'user', content: this.buildPrompt(params) }]
+      })
+    });
+    return JSON.parse((await response.json()).choices[0].message.content);
+  }
+}
+
+// Factory
+export function createLLMProvider(config: LLMConfig): LLMProvider {
+  switch (config.type) {
+    case 'ollama': return new OllamaProvider(config);
+    case 'gemini-cli': return new GeminiCliProvider();
+    case 'deepseek': return new DeepSeekProvider(config.apiKey);
+    case 'vllm': return new VLLMProvider(config);
+    default: throw new Error(`Unknown LLM provider: ${config.type}`);
+  }
+}
+```
+
+**Test Strategy:**
+
+Unit test each provider with mocked API responses. Integration test with local Ollama instance. Test fallback behavior when provider is unavailable. Benchmark token usage reduction.
+
+## Subtasks
+
+### 12.1. Implement OllamaProvider with TDD, health checks, and circuit breaker pattern
+
+**Status:** pending  
+**Dependencies:** None  
+
+Create the Ollama LLM provider implementation with full TDD approach, including health check endpoint monitoring, circuit breaker for fault tolerance, and mock mode for testing without a running Ollama instance.
+
+**Details:**
+
+Create src/providers/ollama.ts implementing LLMProvider interface from Task 11. Write Vitest tests BEFORE implementation covering: (1) refineQuery() sends correct POST to /api/generate with model and format:json, (2) filterResponse() handles large responses by truncating input to 10K chars, (3) healthCheck() calls /api/tags endpoint and returns true if model exists, (4) Circuit breaker opens after 3 consecutive failures within 30s, trips for 60s, then half-opens, (5) Timeout handling with AbortController after configurable duration (default 30s), (6) Mock mode returns deterministic responses when OLLAMA_MOCK=true for CI/CD. Implement connection pooling using undici Agent. Add structured logging for SRE monitoring with fields: model, prompt_tokens, completion_tokens, latency_ms, error_type. Security: Sanitize all prompt inputs using PromptSanitizer from Task 11.4, validate JSON responses with Zod schema before parsing. Rate limiting: configurable requests-per-minute with token bucket algorithm.
+
+### 12.2. Implement GeminiCliProvider and DeepSeekProvider with security hardening
+
+**Status:** pending  
+**Dependencies:** 12.1  
+
+Create Gemini CLI provider using subprocess execution with shell injection prevention, and DeepSeek API provider with secure API key handling, both following TDD methodology.
+
+**Details:**
+
+Create src/providers/gemini-cli.ts: Use execa (not child_process.exec) to prevent shell injection - pass prompt via stdin pipe, not command line arguments. Implement buildPrompt() with template literals and JSON.stringify for safe interpolation. Add timeout handling (default 60s for CLI). Parse stdout as JSON with Zod validation. Health check: verify 'gemini' binary exists using which command. Create src/providers/deepseek.ts: Implement OpenAI-compatible API client with fetch. API key from config (never log or include in prompts). Implement retry with exponential backoff for 429/5xx responses. Circuit breaker for API unavailability. Both providers: Implement LLMProvider interface methods refineQuery() and filterResponse(). Add mock modes for testing. Security review: (1) No credentials in logged prompts, (2) Validate all API responses before parsing, (3) Sanitize user inputs in prompts using shared PromptSanitizer.
+
+### 12.3. Implement VLLMProvider with OpenAI-compatible API and batch inference support
+
+**Status:** pending  
+**Dependencies:** 12.1  
+
+Create vLLM provider supporting the OpenAI-compatible API endpoint, with batch inference optimization for processing multiple requests efficiently, and configurable model selection.
+
+**Details:**
+
+Create src/providers/vllm.ts implementing LLMProvider interface. vLLM exposes OpenAI-compatible endpoint at /v1/completions or /v1/chat/completions. Support both completion and chat modes via config. Implement batch inference: when multiple refineQuery/filterResponse calls arrive within batching window (default 50ms), combine into single API call with multiple prompts for better GPU utilization. Configuration: { host: string, model: string, maxTokens: number, temperature: number, batchWindowMs: number }. Health check: call /health or /v1/models endpoint. Implement request queuing with configurable max queue size. Circuit breaker pattern matching OllamaProvider. Add metrics collection: batch_size_histogram, queue_depth_gauge, inference_time_per_request. Security: Same prompt sanitization as other providers. Mock mode for CI/CD testing.
+
+### 12.4. Implement LLM provider factory with configuration validation and provider benchmarking utilities
+
+**Status:** pending  
+**Dependencies:** 12.1, 12.2, 12.3  
+
+Create the factory function and configuration system for instantiating LLM providers, plus benchmarking utilities for data scientists to compare provider performance, quality, and cost.
+
+**Details:**
+
+Create src/providers/factory.ts with createLLMProvider(config: LLMConfig): LLMProvider function. LLMConfig Zod schema: { type: 'ollama'|'gemini-cli'|'deepseek'|'vllm', ...provider-specific fields }. Validate config at construction time with descriptive errors. Create src/utils/benchmark.ts with ProviderBenchmark class: Methods: runBenchmark(provider, testCases): BenchmarkResult, compareBenchmarks(results[]): ComparisonReport. BenchmarkResult type: { provider: string, testCases: { input, output, latencyMs, inputTokens, outputTokens, qualityScore? }[], avgLatency, p95Latency, totalTokens, estimatedCost? }. Include standard test cases for filtering accuracy: database rows, Slack messages, Jira tickets with known 'correct' filtered outputs. Quality scoring: compare filtered output against golden reference using semantic similarity (optional LLM-as-judge). Export results as JSON and markdown table for documentation. Add CLI command: mcpctl benchmark-providers --providers ollama,deepseek --test-suite standard.
+
+### 12.5. Implement security review layer and comprehensive integration tests for all providers
+
+**Status:** pending  
+**Dependencies:** 12.1, 12.2, 12.3, 12.4  
+
+Create security middleware for prompt injection prevention across all providers, implement rate limiting, add comprehensive integration tests verifying provider interoperability, and document security controls.
+
+**Details:**
+
+Create src/providers/security.ts with: (1) PromptSanitizer class - detect and neutralize injection patterns: 'ignore previous', 'system:', 'assistant:', embedded JSON/XML that could hijack prompts. Use regex + heuristic scoring. (2) ResponseValidator - validate LLM outputs match expected schema, detect and reject responses that contain prompt leakage or injection artifacts. (3) RateLimiter - token bucket per provider with configurable limits, shared across provider instances. (4) AuditLogger - log all LLM interactions for security review: timestamp, provider, sanitized_prompt (no PII), response_length, flagged_patterns. Create tests/integration/providers.test.ts: Test all 4 providers with same test suite verifying interface compliance. Create SECURITY_AUDIT.md documenting: all security controls, threat model (prompt injection, data exfiltration, DoS), test coverage, and manual review checklist. Add to CI: security-focused test suite that must pass before merge.