12 KiB
Task ID: 11
Title: Design Local LLM Proxy Architecture
Status: pending
Dependencies: 1, 3
Priority: high
Description: Design the local proxy component that intercepts MCP requests, uses local LLMs to pre-filter data, and communicates with mcpd.
Details:
Create the local-proxy package architecture:
// src/local-proxy/src/index.ts
// The local proxy acts as an MCP server that Claude connects to
// It intercepts requests, uses local LLM for filtering, then forwards to mcpd
import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
export class McpctlLocalProxy {
private server: Server;
private llmProvider: LLMProvider;
private mcpdClient: McpdClient;
constructor(config: ProxyConfig) {
this.server = new Server({
name: 'mcpctl-proxy',
version: '1.0.0'
}, {
capabilities: { tools: {} }
});
this.llmProvider = createLLMProvider(config.llm);
this.mcpdClient = new McpdClient(config.mcpdUrl);
this.setupHandlers();
}
private setupHandlers() {
// List available tools from all configured MCP servers
this.server.setRequestHandler('tools/list', async () => {
const tools = await this.mcpdClient.listAvailableTools();
return { tools };
});
// Handle tool calls with pre-filtering
this.server.setRequestHandler('tools/call', async (request) => {
const { name, arguments: args } = request.params;
// Step 1: Use local LLM to interpret the request
const refinedQuery = await this.llmProvider.refineQuery({
tool: name,
originalArgs: args,
context: request.params._context // What Claude is looking for
});
// Step 2: Forward to mcpd with refined query
const rawResult = await this.mcpdClient.callTool(name, refinedQuery);
// Step 3: Use local LLM to filter/summarize response
const filteredResult = await this.llmProvider.filterResponse({
tool: name,
query: refinedQuery,
response: rawResult,
maxTokens: 2000 // Keep context window small for Claude
});
return { content: [{ type: 'text', text: filteredResult }] };
});
}
async start() {
const transport = new StdioServerTransport();
await this.server.connect(transport);
}
}
// LLM Provider interface
interface LLMProvider {
refineQuery(params: RefineParams): Promise<any>;
filterResponse(params: FilterParams): Promise<string>;
}
Architecture flow:
Claude <--stdio--> mcpctl-proxy <--HTTP--> mcpd <---> MCP servers (containers)
|
v
Local LLM (Ollama/Gemini/vLLM)
Test Strategy:
Unit test request/response transformation. Mock LLM provider and verify refinement logic. Integration test with actual local LLM. Test error handling when LLM is unavailable.
Subtasks
11.1. Create local-proxy package structure with TDD infrastructure and mock LLM provider
Status: pending
Dependencies: None
Initialize the src/local-proxy directory with clean architecture layers, Vitest configuration, and a comprehensive mock LLM provider for testing without GPU requirements.
Details:
Create src/local-proxy/ with directory structure: src/{handlers,providers,services,middleware,types,utils}. Set up package.json with @modelcontextprotocol/sdk, vitest, and shared workspace dependencies. Configure vitest.config.ts with coverage requirements (>90%). Implement MockLLMProvider class that returns deterministic responses for testing - this is critical for CI/CD pipelines without GPU. Create test fixtures with sample MCP requests/responses for Slack, Jira, and database query scenarios. Include test utilities: createMockMcpRequest(), createMockLLMResponse(), createTestProxyInstance(). The mock provider must support configurable latency simulation and error injection for chaos testing.
11.2. Design and implement LLMProvider interface with pluggable adapter architecture
Status: pending
Dependencies: 11.1
Create the abstract LLMProvider interface and adapter factory pattern that allows swapping LLM backends (Ollama, Gemini, vLLM, DeepSeek) without changing proxy logic.
Details:
Define LLMProvider interface in src/types/llm.ts with methods: refineQuery(params: RefineParams): Promise, filterResponse(params: FilterParams): Promise, healthCheck(): Promise, getMetrics(): ProviderMetrics. Create LLMProviderFactory that accepts provider configuration and returns appropriate implementation. Design for composability - allow chaining providers (e.g., Ollama for refinement, Gemini for filtering). Include connection pooling interface for providers that support it. Create abstract BaseLLMProvider class with common retry logic, timeout handling, and metrics collection. Define clear error types: LLMUnavailableError, LLMTimeoutError, LLMRateLimitError, PromptInjectionDetectedError.
11.3. Implement MCP SDK server handlers with request/response transformation and validation
Status: pending
Dependencies: 11.1, 11.2
Create the core McpctlLocalProxy class using @modelcontextprotocol/sdk with handlers for tools/list and tools/call, including MCP protocol message validation to prevent malformed requests.
Details:
Implement McpctlLocalProxy in src/index.ts following the architecture from task details. Create setRequestHandler for 'tools/list' that fetches available tools from mcpd and caches them with TTL. Create setRequestHandler for 'tools/call' with three-phase processing: (1) refineQuery phase using LLM, (2) forward to mcpd phase, (3) filterResponse phase using LLM. Implement MCP protocol validation middleware using Zod schemas - validate all incoming JSON-RPC messages against MCP specification before processing. Create McpdClient class in src/services/mcpd-client.ts with HTTP client for mcpd communication, including connection pooling and health checks. Handle stdio transport initialization with proper cleanup on SIGTERM/SIGINT.
11.4. Implement security layer with prompt injection prevention and data isolation
Status: pending
Dependencies: 11.2, 11.3
Create security middleware that validates all inputs, prevents prompt injection in LLM queries, ensures no data leakage between users, and sanitizes all MCP protocol messages.
Details:
Create src/middleware/security.ts with: (1) PromptInjectionValidator that scans user inputs for common injection patterns before sending to LLM - detect and reject inputs containing 'ignore previous', 'system:', role-switching attempts. (2) InputSanitizer that validates and sanitizes all tool arguments against expected schemas. (3) ResponseSanitizer that removes potentially sensitive data patterns (API keys, passwords, PII) from LLM-filtered responses before returning to Claude. (4) RequestIsolation middleware ensuring each request has its own context with no shared mutable state - critical for multi-tenant scenarios. Create SECURITY_AUDIT.md documenting all security controls and their test coverage. Implement allowlist-based argument validation for known MCP tools.
11.5. Implement configurable filtering strategies with per-profile aggressiveness settings
Status: pending
Dependencies: 11.2, 11.3
Create composable filtering strategy system that allows data scientists to configure filtering aggressiveness per MCP server type, supporting different needs for raw SQL vs pre-aggregated dashboards.
Details:
Design FilterStrategy interface in src/services/filter-engine.ts with methods: shouldFilter(response: MpcResponse): boolean, filter(response: MpcResponse, config: FilterConfig): FilteredResponse, getAggressiveness(): number. Implement AggressiveFilter for raw SQL results (summarize, limit rows, remove redundant columns), MinimalFilter for pre-aggregated data (pass-through with size limits only), and AdaptiveFilter that adjusts based on response characteristics. Create FilterConfig type with per-profile settings stored in mcpd: { profileId: string, strategy: 'aggressive' | 'minimal' | 'adaptive', maxTokens: number, preserveFields: string[], summaryPrompt?: string }. Implement FilterStrategyComposer that chains multiple strategies. Support runtime strategy switching without proxy restart.
11.6. Implement chunking and streaming for large data responses with pagination support
Status: pending
Dependencies: 11.3, 11.5
Design pagination and streaming strategy for handling large data responses (100k+ rows from database MCPs) that cannot be simply filtered, supporting cursor-based pagination in the proxy layer.
Details:
Create src/services/pagination.ts with PaginationManager class handling: (1) Detection of large responses that require chunking (configurable threshold, default 10K rows), (2) Cursor-based pagination with stable cursors stored in proxy memory with TTL, (3) Response streaming using async iterators for progressive delivery, (4) Chunk size optimization based on estimated token count. Implement PagedResponse type with { data: any[], cursor?: string, hasMore: boolean, totalEstimate?: number, chunkIndex: number }. Create ChunkingStrategy interface for different data types - TabularChunker for SQL results, JSONChunker for nested objects, TextChunker for large text responses. Add pagination metadata to MCP tool responses so Claude can request next pages. Handle cursor expiration gracefully with re-query capability.
11.7. Implement observability with metrics endpoint and structured logging for SRE monitoring
Status: pending
Dependencies: 11.2, 11.3, 11.5
Create comprehensive metrics collection and exposure system with /metrics endpoint (Prometheus format) and structured JSON logging for monitoring proxy health, performance, and LLM efficiency.
Details:
Create src/services/metrics.ts with MetricsCollector class tracking: requests_total (counter), request_duration_seconds (histogram), llm_inference_duration_seconds (histogram), filter_reduction_ratio (gauge - original_size/filtered_size), active_connections (gauge), error_total by error_type (counter), tokens_saved_total (counter). Implement /metrics HTTP endpoint on configurable port (separate from stdio MCP transport) serving Prometheus exposition format. Create structured logger in src/utils/logger.ts outputting JSON with fields: timestamp, level, requestId, toolName, phase (refine/forward/filter), duration_ms, input_tokens, output_tokens, reduction_percent. Add request tracing with correlation IDs propagated to mcpd. Include health check endpoint /health with component status (llm: ok/degraded, mcpd: ok/disconnected).
11.8. Create integration tests and local development environment with docker-compose
Status: pending
Dependencies: 11.1, 11.2, 11.3, 11.4, 11.5, 11.6, 11.7
Build comprehensive integration test suite testing the complete proxy flow against local mcpd and local Ollama, plus docker-compose setup for easy local development without external dependencies.
Details:
Create deploy/docker-compose.proxy.yml with services: ollama (with pre-pulled model), mcpd (from src/mcpd), postgres (for mcpd), and local-proxy. Add scripts/setup-local-dev.sh that pulls Ollama models, starts services, and verifies connectivity. Create integration test suite in tests/integration/ testing: (1) Full request flow from Claude-style request through proxy to mcpd and back, (2) LLM refinement actually modifies queries appropriately, (3) Response filtering reduces token count measurably, (4) Pagination works for large responses, (5) Error handling when Ollama is unavailable (falls back gracefully), (6) Metrics are recorded correctly during real requests. Create performance benchmark suite measuring latency overhead vs direct mcpd access. Document local development setup in LOCAL_DEV.md.