- Rename local-proxy to mcplocal with HTTP server, LLM pipeline, mcpd discovery - Add LLM pre-processing: token estimation, filter cache, metrics, Gemini CLI + DeepSeek providers - Add mcpd auth (login/logout) and MCP proxy endpoints - Update CLI: dual URLs (mcplocalUrl/mcpdUrl), auth commands, --direct flag - Add tiered health monitoring, shell completions, e2e integration tests - 57 test files, 597 tests passing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
12 KiB
mcpctl v2 - Corrected 3-Tier Architecture PRD
Overview
mcpctl is a kubectl-inspired system for managing MCP (Model Context Protocol) servers. It consists of 4 components arranged in a 3-tier architecture:
Claude Code
|
v (stdio - MCP protocol)
mcplocal (Local Daemon - runs on developer machine)
|
v (HTTP REST)
mcpd (External Daemon - runs on server/NAS)
|
v (Docker API / K8s API)
mcp_servers (MCP server containers)
Components
1. mcpctl (CLI Tool)
- Package:
src/cli/(@mcpctl/cli) - What it is: kubectl-like CLI for managing the entire system
- Talks to: mcplocal (local daemon) via HTTP REST
- Key point: mcpctl does NOT talk to mcpd directly. It always goes through mcplocal.
- Distributed as: RPM package via Gitea registry (bun compile + nfpm)
- Commands: get, describe, apply, setup, instance, claude, project, backup, restore, config, status
2. mcplocal (Local Daemon)
- Package:
src/local-proxy/(rename tosrc/mcplocal/) - What it is: Local daemon running on the developer's machine
- Talks to: mcpd (external daemon) via HTTP REST
- Exposes to Claude: MCP protocol via stdio (tools, resources, prompts)
- Exposes to mcpctl: HTTP REST API for management commands
Core responsibility: LLM Pre-processing
This is the intelligence layer. When Claude asks for data from MCP servers, mcplocal:
- Receives Claude's request (e.g., "get Slack messages about security")
- Uses a local/cheap LLM (Gemini CLI binary, Ollama, vLLM, DeepSeek API) to interpret what Claude actually wants
- Sends narrow, filtered requests to mcpd which forwards to the actual MCP servers
- Receives raw results from MCP servers (via mcpd)
- Uses the local LLM again to filter/summarize results - extracting only what's relevant
- Returns the smallest, most comprehensive response to Claude
Why: Claude Code tokens are expensive. Instead of dumping 500 Slack messages into Claude's context window, mcplocal uses a cheap LLM to pre-filter to the 12 relevant ones.
LLM Provider Strategy (already partially exists):
- Gemini CLI binary (local, free)
- Ollama (local, free)
- vLLM (local, free)
- DeepSeek API (cheap)
- OpenAI API (fallback)
- Anthropic API (fallback)
Additional mcplocal responsibilities:
- MCP protocol routing (namespace tools:
slack/send_message,jira/create_issue) - Connection health monitoring for upstream MCP servers
- Caching frequently requested data
- Proxying mcpctl management commands to mcpd
3. mcpd (External Daemon)
- Package:
src/mcpd/(@mcpctl/mcpd) - What it is: Server-side daemon that runs on centralized infrastructure (Synology NAS, cloud server, etc.)
- Deployed via: Docker Compose (Dockerfile + docker-compose.yml)
- Database: PostgreSQL for state, audit logs, access control
Core responsibilities:
- Deploy and run MCP server containers (Docker now, Kubernetes later)
- Instance lifecycle management: start, stop, restart, logs, inspect
- MCP server registry: Store server definitions, configuration templates, profiles
- Project management: Group MCP profiles into projects for Claude sessions
- Auditing: Log every operation - who ran what, when, with what result
- Access management: Users, sessions, permissions - who can access which MCP servers
- Credential storage: MCP servers often need API tokens (Slack, Jira, GitHub) - stored securely on server side, never exposed to local machine
- Backup/restore: Export and import configuration
Key point: mcpd holds the credentials. When mcplocal asks mcpd to query Slack, mcpd runs the Slack MCP server container with the proper SLACK_TOKEN injected - mcplocal never sees the token.
4. mcp_servers (MCP Server Containers)
- What they are: The actual MCP server processes (Slack, Jira, GitHub, Terraform, filesystem, postgres, etc.)
- Managed by: mcpd via Docker/Podman API
- Network: Isolated network, only accessible by mcpd
- Credentials: Injected by mcpd as environment variables
- Communication: MCP protocol (stdio or SSE/HTTP) between mcpd and the containers
Data Flow Examples
Example 1: Claude asks for Slack messages
Claude: "Get messages about security incidents from the last week"
|
v (MCP tools/call: slack/search_messages)
mcplocal:
1. Intercepts the tool call
2. Calls local Gemini: "User wants security incident messages from last week.
Generate optimal Slack search query and date filters."
3. Gemini returns: query="security incident OR vulnerability OR CVE", after="2024-01-15"
4. Sends filtered request to mcpd
|
v (HTTP POST /api/v1/mcp/proxy)
mcpd:
1. Looks up Slack MCP instance (injects SLACK_TOKEN)
2. Forwards narrowed query to Slack MCP server container
3. Returns raw results (200 messages)
|
v (response)
mcplocal:
1. Receives 200 messages
2. Calls local Gemini: "Filter these 200 Slack messages. Keep only those
directly about security incidents. Return message IDs and 1-line summaries."
3. Gemini returns: 15 relevant messages with summaries
4. Returns filtered result to Claude
|
v (MCP response: 15 messages instead of 200)
Claude: processes only the relevant 15 messages
Example 2: mcpctl management command
$ mcpctl get servers
|
v (HTTP GET)
mcplocal:
1. Recognizes this is a management command (not MCP data)
2. Proxies directly to mcpd (no LLM processing needed)
|
v (HTTP GET /api/v1/servers)
mcpd:
1. Queries PostgreSQL for server definitions
2. Returns list
|
v (proxied response)
mcplocal -> mcpctl -> formatted table output
Example 3: mcpctl instance management
$ mcpctl instance start slack
|
v
mcplocal -> mcpd:
1. Creates Docker container for Slack MCP server
2. Injects SLACK_TOKEN from secure storage
3. Connects to isolated mcp-servers network
4. Logs audit entry: "user X started slack instance"
5. Returns instance status
What Already Exists (completed work)
Done and reusable as-is:
- Project structure: pnpm monorepo, TypeScript strict mode, Vitest, ESLint
- Database schema: Prisma + PostgreSQL (User, McpServer, McpProfile, Project, McpInstance, AuditLog)
- mcpd server framework: Fastify 5, routes, services, repositories, middleware
- mcpd MCP server CRUD: registration, profiles, projects
- mcpd Docker container management: dockerode, instance lifecycle
- mcpd audit logging, health monitoring, metrics, backup/restore
- mcpctl CLI framework: Commander.js, commands, config, API client, formatters
- mcpctl RPM distribution: bun compile, nfpm, Gitea publishing, shell completions
- MCP protocol routing in local-proxy: namespace tools, resources, prompts
- LLM provider abstractions: OpenAI, Anthropic, Ollama adapters (defined but unused)
- Shared types and profile templates
Needs rework:
- mcpctl currently talks to mcpd directly -> must talk to mcplocal instead
- local-proxy is just a dumb router -> needs LLM pre-processing intelligence
- local-proxy has no HTTP API for mcpctl -> needs REST endpoints for management proxying
- mcpd has no MCP proxy endpoint -> needs endpoint that mcplocal can call to execute MCP tool calls on managed instances
- No integration between LLM providers and MCP request/response pipeline
New Tasks Needed
Phase 1: Rename and restructure local-proxy -> mcplocal
- Rename
src/local-proxy/tosrc/mcplocal/ - Update all package references and imports
- Add HTTP REST server (Fastify) alongside existing stdio server
- mcplocal needs TWO interfaces: stdio for Claude, HTTP for mcpctl
Phase 2: mcplocal management proxy
- Add REST endpoints that mirror mcpd's API (get servers, instances, projects, etc.)
- mcpctl config changes:
daemonUrlnow points to mcplocal (e.g., localhost:3200) instead of mcpd - mcplocal proxies management requests to mcpd (configurable
mcpdUrle.g., http://nas:3100) - Pass-through with no LLM processing for management commands
Phase 3: mcpd MCP proxy endpoint
- Add
/api/v1/mcp/proxyendpoint to mcpd - Accepts:
{ serverId, method, params }- execute an MCP tool call on a managed instance - mcpd looks up the instance, connects to the container, executes the MCP call, returns result
- This is how mcplocal talks to MCP servers without needing direct Docker access
Phase 4: LLM pre-processing pipeline in mcplocal
- Create request interceptor in mcplocal's MCP router
- Before forwarding
tools/callto mcpd, run the request through LLM for interpretation - After receiving response from mcpd, run through LLM for filtering/summarization
- LLM provider selection based on config (prefer local/cheap models)
- Configurable: enable/disable pre-processing per server or per tool
- Bypass for simple operations (list, create, delete - no filtering needed)
Phase 5: Smart context optimization
- Token counting: estimate how many tokens the raw response would consume
- Decision logic: if raw response < threshold, skip LLM filtering (not worth the latency)
- If raw response > threshold, filter with LLM
- Cache LLM filtering decisions for repeated similar queries
- Metrics: track tokens saved, latency added by filtering
Phase 6: mcpctl -> mcplocal migration
- Update mcpctl's default daemonUrl to point to mcplocal (localhost:3200)
- Update all CLI commands to work through mcplocal proxy
- Add
mcpctl config set mcpd-url <url>for configuring upstream mcpd - Add
mcpctl config set mcplocal-url <url>for configuring local daemon - Health check:
mcpctl statusshows both mcplocal and mcpd connectivity - Shell completions update if needed
Phase 7: End-to-end integration testing
- Test full flow: mcpctl -> mcplocal -> mcpd -> mcp_server -> response -> LLM filter -> Claude
- Test management commands pass through correctly
- Test LLM pre-processing reduces context window size
- Test credential isolation (mcplocal never sees MCP server credentials)
- Test health monitoring across all tiers
Authentication & Authorization
Database ownership
- mcpd owns the database (PostgreSQL). It is the only component that talks to the DB.
- mcplocal has NO database. It is stateless (config file only).
- mcpctl has NO database. It stores user credentials locally in
~/.mcpctl/config.yaml.
Auth flow
mcpctl login
|
v (user enters mcpd URL + credentials)
mcpctl stores API token in ~/.mcpctl/config.yaml
|
v (passes token to mcplocal config)
mcplocal authenticates to mcpd using Bearer token on every request
|
v (Authorization: Bearer <token>)
mcpd validates token against Session table in PostgreSQL
|
v (authenticated request proceeds)
mcpctl responsibilities
mcpctl logincommand: prompts user for mcpd URL and credentials (username/password or API token)mcpctl logincalls mcpd's auth endpoint to get a session token- Stores the token in
~/.mcpctl/config.yaml(or~/.mcpctl/credentialswith restricted permissions) - Passes the token to mcplocal (either via config or as startup argument)
mcpctl logoutcommand: invalidates the session token
mcplocal responsibilities
- Reads auth token from its config (set by mcpctl)
- Attaches
Authorization: Bearer <token>header to ALL requests to mcpd - If mcpd returns 401, mcplocal returns appropriate error to mcpctl/Claude
- Does NOT store credentials itself - they come from mcpctl's config
mcpd responsibilities
- Owns User and Session tables
- Provides auth endpoints:
POST /api/v1/auth/login,POST /api/v1/auth/logout - Validates Bearer tokens on every request via auth middleware (already exists)
- Returns 401 for invalid/expired tokens
- Audit logs include the authenticated user
Non-functional Requirements
- mcplocal must start fast (developer's machine, runs per-session or as daemon)
- LLM pre-processing must not add more than 2-3 seconds latency
- If local LLM is unavailable, fall back to passing data through unfiltered
- All components must be independently deployable and testable
- mcpd must remain stateless (outside of DB) and horizontally scalable