feat: implement v2 3-tier architecture (mcpctl → mcplocal → mcpd)

- Rename local-proxy to mcplocal with HTTP server, LLM pipeline, mcpd discovery - Add LLM pre-processing: token estimation, filter cache, metrics, Gemini CLI + DeepSeek providers - Add mcpd auth (login/logout) and MCP proxy endpoints - Update CLI: dual URLs (mcplocalUrl/mcpdUrl), auth commands, --direct flag - Add tiered health monitoring, shell completions, e2e integration tests - 57 test files, 597 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 11:42:06 +00:00
parent a4fe5fdbe2
commit b8c5cf718a
82 changed files with 5832 additions and 123 deletions
--- a/.taskmaster/docs/prd-v2-architecture.md
+++ b/.taskmaster/docs/prd-v2-architecture.md
@@ -0,0 +1,272 @@
+# mcpctl v2 - Corrected 3-Tier Architecture PRD
+
+## Overview
+
+mcpctl is a kubectl-inspired system for managing MCP (Model Context Protocol) servers. It consists of 4 components arranged in a 3-tier architecture:
+
+```
+Claude Code
+    |
+    v (stdio - MCP protocol)
+mcplocal (Local Daemon - runs on developer machine)
+    |
+    v (HTTP REST)
+mcpd (External Daemon - runs on server/NAS)
+    |
+    v (Docker API / K8s API)
+mcp_servers (MCP server containers)
+```
+
+## Components
+
+### 1. mcpctl (CLI Tool)
+- **Package**: `src/cli/` (`@mcpctl/cli`)
+- **What it is**: kubectl-like CLI for managing the entire system
+- **Talks to**: mcplocal (local daemon) via HTTP REST
+- **Key point**: mcpctl does NOT talk to mcpd directly. It always goes through mcplocal.
+- **Distributed as**: RPM package via Gitea registry (bun compile + nfpm)
+- **Commands**: get, describe, apply, setup, instance, claude, project, backup, restore, config, status
+
+### 2. mcplocal (Local Daemon)
+- **Package**: `src/local-proxy/` (rename to `src/mcplocal/`)
+- **What it is**: Local daemon running on the developer's machine
+- **Talks to**: mcpd (external daemon) via HTTP REST
+- **Exposes to Claude**: MCP protocol via stdio (tools, resources, prompts)
+- **Exposes to mcpctl**: HTTP REST API for management commands
+
+**Core responsibility: LLM Pre-processing**
+
+This is the intelligence layer. When Claude asks for data from MCP servers, mcplocal:
+
+1. Receives Claude's request (e.g., "get Slack messages about security")
+2. Uses a local/cheap LLM (Gemini CLI binary, Ollama, vLLM, DeepSeek API) to interpret what Claude actually wants
+3. Sends narrow, filtered requests to mcpd which forwards to the actual MCP servers
+4. Receives raw results from MCP servers (via mcpd)
+5. Uses the local LLM again to filter/summarize results - extracting only what's relevant
+6. Returns the smallest, most comprehensive response to Claude
+
+**Why**: Claude Code tokens are expensive. Instead of dumping 500 Slack messages into Claude's context window, mcplocal uses a cheap LLM to pre-filter to the 12 relevant ones.
+
+**LLM Provider Strategy** (already partially exists):
+- Gemini CLI binary (local, free)
+- Ollama (local, free)
+- vLLM (local, free)
+- DeepSeek API (cheap)
+- OpenAI API (fallback)
+- Anthropic API (fallback)
+
+**Additional mcplocal responsibilities**:
+- MCP protocol routing (namespace tools: `slack/send_message`, `jira/create_issue`)
+- Connection health monitoring for upstream MCP servers
+- Caching frequently requested data
+- Proxying mcpctl management commands to mcpd
+
+### 3. mcpd (External Daemon)
+- **Package**: `src/mcpd/` (`@mcpctl/mcpd`)
+- **What it is**: Server-side daemon that runs on centralized infrastructure (Synology NAS, cloud server, etc.)
+- **Deployed via**: Docker Compose (Dockerfile + docker-compose.yml)
+- **Database**: PostgreSQL for state, audit logs, access control
+
+**Core responsibilities**:
+- **Deploy and run MCP server containers** (Docker now, Kubernetes later)
+- **Instance lifecycle management**: start, stop, restart, logs, inspect
+- **MCP server registry**: Store server definitions, configuration templates, profiles
+- **Project management**: Group MCP profiles into projects for Claude sessions
+- **Auditing**: Log every operation - who ran what, when, with what result
+- **Access management**: Users, sessions, permissions - who can access which MCP servers
+- **Credential storage**: MCP servers often need API tokens (Slack, Jira, GitHub) - stored securely on server side, never exposed to local machine
+- **Backup/restore**: Export and import configuration
+
+**Key point**: mcpd holds the credentials. When mcplocal asks mcpd to query Slack, mcpd runs the Slack MCP server container with the proper SLACK_TOKEN injected - mcplocal never sees the token.
+
+### 4. mcp_servers (MCP Server Containers)
+- **What they are**: The actual MCP server processes (Slack, Jira, GitHub, Terraform, filesystem, postgres, etc.)
+- **Managed by**: mcpd via Docker/Podman API
+- **Network**: Isolated network, only accessible by mcpd
+- **Credentials**: Injected by mcpd as environment variables
+- **Communication**: MCP protocol (stdio or SSE/HTTP) between mcpd and the containers
+
+## Data Flow Examples
+
+### Example 1: Claude asks for Slack messages
+```
+Claude: "Get messages about security incidents from the last week"
+    |
+    v (MCP tools/call: slack/search_messages)
+mcplocal:
+    1. Intercepts the tool call
+    2. Calls local Gemini: "User wants security incident messages from last week.
+       Generate optimal Slack search query and date filters."
+    3. Gemini returns: query="security incident OR vulnerability OR CVE", after="2024-01-15"
+    4. Sends filtered request to mcpd
+    |
+    v (HTTP POST /api/v1/mcp/proxy)
+mcpd:
+    1. Looks up Slack MCP instance (injects SLACK_TOKEN)
+    2. Forwards narrowed query to Slack MCP server container
+    3. Returns raw results (200 messages)
+    |
+    v (response)
+mcplocal:
+    1. Receives 200 messages
+    2. Calls local Gemini: "Filter these 200 Slack messages. Keep only those
+       directly about security incidents. Return message IDs and 1-line summaries."
+    3. Gemini returns: 15 relevant messages with summaries
+    4. Returns filtered result to Claude
+    |
+    v (MCP response: 15 messages instead of 200)
+Claude: processes only the relevant 15 messages
+```
+
+### Example 2: mcpctl management command
+```
+$ mcpctl get servers
+    |
+    v (HTTP GET)
+mcplocal:
+    1. Recognizes this is a management command (not MCP data)
+    2. Proxies directly to mcpd (no LLM processing needed)
+    |
+    v (HTTP GET /api/v1/servers)
+mcpd:
+    1. Queries PostgreSQL for server definitions
+    2. Returns list
+    |
+    v (proxied response)
+mcplocal -> mcpctl -> formatted table output
+```
+
+### Example 3: mcpctl instance management
+```
+$ mcpctl instance start slack
+    |
+    v
+mcplocal -> mcpd:
+    1. Creates Docker container for Slack MCP server
+    2. Injects SLACK_TOKEN from secure storage
+    3. Connects to isolated mcp-servers network
+    4. Logs audit entry: "user X started slack instance"
+    5. Returns instance status
+```
+
+## What Already Exists (completed work)
+
+### Done and reusable as-is:
+- Project structure: pnpm monorepo, TypeScript strict mode, Vitest, ESLint
+- Database schema: Prisma + PostgreSQL (User, McpServer, McpProfile, Project, McpInstance, AuditLog)
+- mcpd server framework: Fastify 5, routes, services, repositories, middleware
+- mcpd MCP server CRUD: registration, profiles, projects
+- mcpd Docker container management: dockerode, instance lifecycle
+- mcpd audit logging, health monitoring, metrics, backup/restore
+- mcpctl CLI framework: Commander.js, commands, config, API client, formatters
+- mcpctl RPM distribution: bun compile, nfpm, Gitea publishing, shell completions
+- MCP protocol routing in local-proxy: namespace tools, resources, prompts
+- LLM provider abstractions: OpenAI, Anthropic, Ollama adapters (defined but unused)
+- Shared types and profile templates
+
+### Needs rework:
+- mcpctl currently talks to mcpd directly -> must talk to mcplocal instead
+- local-proxy is just a dumb router -> needs LLM pre-processing intelligence
+- local-proxy has no HTTP API for mcpctl -> needs REST endpoints for management proxying
+- mcpd has no MCP proxy endpoint -> needs endpoint that mcplocal can call to execute MCP tool calls on managed instances
+- No integration between LLM providers and MCP request/response pipeline
+
+## New Tasks Needed
+
+### Phase 1: Rename and restructure local-proxy -> mcplocal
+- Rename `src/local-proxy/` to `src/mcplocal/`
+- Update all package references and imports
+- Add HTTP REST server (Fastify) alongside existing stdio server
+- mcplocal needs TWO interfaces: stdio for Claude, HTTP for mcpctl
+
+### Phase 2: mcplocal management proxy
+- Add REST endpoints that mirror mcpd's API (get servers, instances, projects, etc.)
+- mcpctl config changes: `daemonUrl` now points to mcplocal (e.g., localhost:3200) instead of mcpd
+- mcplocal proxies management requests to mcpd (configurable `mcpdUrl` e.g., http://nas:3100)
+- Pass-through with no LLM processing for management commands
+
+### Phase 3: mcpd MCP proxy endpoint
+- Add `/api/v1/mcp/proxy` endpoint to mcpd
+- Accepts: `{ serverId, method, params }` - execute an MCP tool call on a managed instance
+- mcpd looks up the instance, connects to the container, executes the MCP call, returns result
+- This is how mcplocal talks to MCP servers without needing direct Docker access
+
+### Phase 4: LLM pre-processing pipeline in mcplocal
+- Create request interceptor in mcplocal's MCP router
+- Before forwarding `tools/call` to mcpd, run the request through LLM for interpretation
+- After receiving response from mcpd, run through LLM for filtering/summarization
+- LLM provider selection based on config (prefer local/cheap models)
+- Configurable: enable/disable pre-processing per server or per tool
+- Bypass for simple operations (list, create, delete - no filtering needed)
+
+### Phase 5: Smart context optimization
+- Token counting: estimate how many tokens the raw response would consume
+- Decision logic: if raw response < threshold, skip LLM filtering (not worth the latency)
+- If raw response > threshold, filter with LLM
+- Cache LLM filtering decisions for repeated similar queries
+- Metrics: track tokens saved, latency added by filtering
+
+### Phase 6: mcpctl -> mcplocal migration
+- Update mcpctl's default daemonUrl to point to mcplocal (localhost:3200)
+- Update all CLI commands to work through mcplocal proxy
+- Add `mcpctl config set mcpd-url <url>` for configuring upstream mcpd
+- Add `mcpctl config set mcplocal-url <url>` for configuring local daemon
+- Health check: `mcpctl status` shows both mcplocal and mcpd connectivity
+- Shell completions update if needed
+
+### Phase 7: End-to-end integration testing
+- Test full flow: mcpctl -> mcplocal -> mcpd -> mcp_server -> response -> LLM filter -> Claude
+- Test management commands pass through correctly
+- Test LLM pre-processing reduces context window size
+- Test credential isolation (mcplocal never sees MCP server credentials)
+- Test health monitoring across all tiers
+
+## Authentication & Authorization
+
+### Database ownership
+- **mcpd owns the database** (PostgreSQL). It is the only component that talks to the DB.
+- mcplocal has NO database. It is stateless (config file only).
+- mcpctl has NO database. It stores user credentials locally in `~/.mcpctl/config.yaml`.
+
+### Auth flow
+```
+mcpctl login
+    |
+    v (user enters mcpd URL + credentials)
+mcpctl stores API token in ~/.mcpctl/config.yaml
+    |
+    v (passes token to mcplocal config)
+mcplocal authenticates to mcpd using Bearer token on every request
+    |
+    v (Authorization: Bearer <token>)
+mcpd validates token against Session table in PostgreSQL
+    |
+    v (authenticated request proceeds)
+```
+
+### mcpctl responsibilities
+- `mcpctl login` command: prompts user for mcpd URL and credentials (username/password or API token)
+- `mcpctl login` calls mcpd's auth endpoint to get a session token
+- Stores the token in `~/.mcpctl/config.yaml` (or `~/.mcpctl/credentials` with restricted permissions)
+- Passes the token to mcplocal (either via config or as startup argument)
+- `mcpctl logout` command: invalidates the session token
+
+### mcplocal responsibilities
+- Reads auth token from its config (set by mcpctl)
+- Attaches `Authorization: Bearer <token>` header to ALL requests to mcpd
+- If mcpd returns 401, mcplocal returns appropriate error to mcpctl/Claude
+- Does NOT store credentials itself - they come from mcpctl's config
+
+### mcpd responsibilities
+- Owns User and Session tables
+- Provides auth endpoints: `POST /api/v1/auth/login`, `POST /api/v1/auth/logout`
+- Validates Bearer tokens on every request via auth middleware (already exists)
+- Returns 401 for invalid/expired tokens
+- Audit logs include the authenticated user
+
+## Non-functional Requirements
+- mcplocal must start fast (developer's machine, runs per-session or as daemon)
+- LLM pre-processing must not add more than 2-3 seconds latency
+- If local LLM is unavailable, fall back to passing data through unfiltered
+- All components must be independently deployable and testable
+- mcpd must remain stateless (outside of DB) and horizontally scalable