Task ID: 19

Title: Implement Local LLM Pre-filtering Proxy

Status: cancelled

Dependencies: None

Priority: high

Description: Build the local proxy component that intercepts Claude's MCP requests, uses local LLMs (Gemini CLI, Ollama, vLLM, or DeepSeek API) to interpret questions, fetch relevant data from mcpd, and filter/refine responses to minimize context window usage before returning to Claude.

Details:

Create src/local-proxy/src/ with the following architecture:

Core Components:

MCP Protocol Handler (mcp-handler.ts):
- Implement MCP server interface using @modelcontextprotocol/sdk
- Register as the MCP endpoint Claude connects to
- Parse incoming tool calls and extract the semantic intent

LLM Provider Abstraction (providers/):

interface LLMProvider {
  name: string;
  interpretQuery(query: string, context: McpToolCall): Promise<InterpretedQuery>;
  filterResponse(data: unknown, originalQuery: string, maxTokens: number): Promise<FilteredResponse>;
}

Implement providers:

gemini-cli.ts: Shell out to gemini CLI binary
ollama.ts: HTTP client to local Ollama server (localhost:11434)
vllm.ts: OpenAI-compatible API client
deepseek.ts: DeepSeek API client

Query Interpreter (interpreter.ts):
- Takes Claude's raw MCP request (e.g., 'get_slack_messages')
- Uses local LLM to understand semantic intent: "Find messages related to security and linux servers from my team"
- Generates optimized query parameters for mcpd
Response Filter (filter.ts):
- Receives raw data from mcpd (potentially thousands of Slack messages, large Terraform docs)
- Uses local LLM to extract ONLY relevant information matching original query
- Implements token counting to stay within configured limits
- Returns compressed, relevant subset of data
mcpd Client (mcpd-client.ts):
- HTTP client to communicate with mcpd server
- Handles authentication (forwards Claude session token)
- Supports all MCP operations exposed by mcpd

Configuration:

interface ProxyConfig {
  mcpdUrl: string;              // e.g., 'http://mcpd.local:3000'
  llmProvider: 'gemini-cli' | 'ollama' | 'vllm' | 'deepseek';
  llmConfig: {
    model?: string;             // e.g., 'llama3.2', 'gemini-pro'
    endpoint?: string;          // for vllm/deepseek
    maxTokensPerFilter: number; // target output size
  };
  filteringEnabled: boolean;     // can be disabled for passthrough
}

Flow:

Claude calls local-proxy MCP server
Proxy interprets query semantics via local LLM
Proxy calls mcpd with optimized query
mcpd returns raw MCP data
Proxy filters response via local LLM
Claude receives minimal, relevant context

Pseudo-code for filter.ts:

async function filterResponse(
  rawData: unknown,
  originalQuery: string,
  provider: LLMProvider
): Promise<FilteredResponse> {
  const dataStr = JSON.stringify(rawData);
  if (dataStr.length < 4000) return { data: rawData, filtered: false };
  
  const prompt = `Given this query: "${originalQuery}"
  Extract ONLY the relevant information from this data.
  Return a JSON array of relevant items, max 10 items.
  Data: ${dataStr.slice(0, 50000)}`; // Truncate for LLM input
  
  const filtered = await provider.filterResponse(dataStr, originalQuery, 2000);
  return { data: filtered, filtered: true, originalSize: dataStr.length };
}

Test Strategy:

Unit tests for each LLM provider with mocked HTTP/CLI responses. Integration tests with actual Ollama instance (docker-compose service). Test query interpretation produces valid mcpd parameters. Test filtering reduces data size while preserving relevant content. Load test with large payloads (10MB JSON) to verify memory handling. Test fallback behavior when LLM provider is unavailable. Test passthrough mode when filtering is disabled.

3.9 KiB Raw Blame History

Task ID: 19

3.9 KiB

Raw Blame History