mcpctl/.taskmaster/docs/prd-v2-architecture.md

# mcpctl v2 - Corrected 3-Tier Architecture PRD

## Overview

mcpctl is a kubectl-inspired system for managing MCP (Model Context Protocol) servers. It consists of 4 components arranged in a 3-tier architecture:

```
Claude Code
    |
    v (stdio - MCP protocol)
mcplocal (Local Daemon - runs on developer machine)
    |
    v (HTTP REST)
mcpd (External Daemon - runs on server/NAS)
    |
    v (Docker API / K8s API)
mcp_servers (MCP server containers)
```

## Components

### 1. mcpctl (CLI Tool)
- **Package**: `src/cli/` (`@mcpctl/cli`)
- **What it is**: kubectl-like CLI for managing the entire system
- **Talks to**: mcplocal (local daemon) via HTTP REST
- **Key point**: mcpctl does NOT talk to mcpd directly. It always goes through mcplocal.
- **Distributed as**: RPM package via Gitea registry (bun compile + nfpm)
- **Commands**: get, describe, apply, setup, instance, claude, project, backup, restore, config, status

### 2. mcplocal (Local Daemon)
- **Package**: `src/local-proxy/` (rename to `src/mcplocal/`)
- **What it is**: Local daemon running on the developer's machine
- **Talks to**: mcpd (external daemon) via HTTP REST
- **Exposes to Claude**: MCP protocol via stdio (tools, resources, prompts)
- **Exposes to mcpctl**: HTTP REST API for management commands

**Core responsibility: LLM Pre-processing**

This is the intelligence layer. When Claude asks for data from MCP servers, mcplocal:

1. Receives Claude's request (e.g., "get Slack messages about security")
2. Uses a local/cheap LLM (Gemini CLI binary, Ollama, vLLM, DeepSeek API) to interpret what Claude actually wants
3. Sends narrow, filtered requests to mcpd which forwards to the actual MCP servers
4. Receives raw results from MCP servers (via mcpd)
5. Uses the local LLM again to filter/summarize results - extracting only what's relevant
6. Returns the smallest, most comprehensive response to Claude

**Why**: Claude Code tokens are expensive. Instead of dumping 500 Slack messages into Claude's context window, mcplocal uses a cheap LLM to pre-filter to the 12 relevant ones.

**LLM Provider Strategy** (already partially exists):
- Gemini CLI binary (local, free)
- Ollama (local, free)
- vLLM (local, free)
- DeepSeek API (cheap)
- OpenAI API (fallback)
- Anthropic API (fallback)

**Additional mcplocal responsibilities**:
- MCP protocol routing (namespace tools: `slack/send_message`, `jira/create_issue`)
- Connection health monitoring for upstream MCP servers
- Caching frequently requested data
- Proxying mcpctl management commands to mcpd

### 3. mcpd (External Daemon)
- **Package**: `src/mcpd/` (`@mcpctl/mcpd`)
- **What it is**: Server-side daemon that runs on centralized infrastructure (Synology NAS, cloud server, etc.)
- **Deployed via**: Docker Compose (Dockerfile + docker-compose.yml)
- **Database**: PostgreSQL for state, audit logs, access control

**Core responsibilities**:
- **Deploy and run MCP server containers** (Docker now, Kubernetes later)
- **Instance lifecycle management**: start, stop, restart, logs, inspect
- **MCP server registry**: Store server definitions, configuration templates, profiles
- **Project management**: Group MCP profiles into projects for Claude sessions
- **Auditing**: Log every operation - who ran what, when, with what result
- **Access management**: Users, sessions, permissions - who can access which MCP servers
- **Credential storage**: MCP servers often need API tokens (Slack, Jira, GitHub) - stored securely on server side, never exposed to local machine
- **Backup/restore**: Export and import configuration

**Key point**: mcpd holds the credentials. When mcplocal asks mcpd to query Slack, mcpd runs the Slack MCP server container with the proper SLACK_TOKEN injected - mcplocal never sees the token.

### 4. mcp_servers (MCP Server Containers)
- **What they are**: The actual MCP server processes (Slack, Jira, GitHub, Terraform, filesystem, postgres, etc.)
- **Managed by**: mcpd via Docker/Podman API
- **Network**: Isolated network, only accessible by mcpd
- **Credentials**: Injected by mcpd as environment variables
- **Communication**: MCP protocol (stdio or SSE/HTTP) between mcpd and the containers

## Data Flow Examples

### Example 1: Claude asks for Slack messages
```
Claude: "Get messages about security incidents from the last week"
    |
    v (MCP tools/call: slack/search_messages)
mcplocal:
    1. Intercepts the tool call
    2. Calls local Gemini: "User wants security incident messages from last week.
       Generate optimal Slack search query and date filters."
    3. Gemini returns: query="security incident OR vulnerability OR CVE", after="2024-01-15"
    4. Sends filtered request to mcpd
    |
    v (HTTP POST /api/v1/mcp/proxy)
mcpd:
    1. Looks up Slack MCP instance (injects SLACK_TOKEN)
    2. Forwards narrowed query to Slack MCP server container
    3. Returns raw results (200 messages)
    |
    v (response)
mcplocal:
    1. Receives 200 messages
    2. Calls local Gemini: "Filter these 200 Slack messages. Keep only those
       directly about security incidents. Return message IDs and 1-line summaries."
    3. Gemini returns: 15 relevant messages with summaries
    4. Returns filtered result to Claude
    |
    v (MCP response: 15 messages instead of 200)
Claude: processes only the relevant 15 messages
```

### Example 2: mcpctl management command
```
$ mcpctl get servers
    |
    v (HTTP GET)
mcplocal:
    1. Recognizes this is a management command (not MCP data)
    2. Proxies directly to mcpd (no LLM processing needed)
    |
    v (HTTP GET /api/v1/servers)
mcpd:
    1. Queries PostgreSQL for server definitions
    2. Returns list
    |
    v (proxied response)
mcplocal -> mcpctl -> formatted table output
```

### Example 3: mcpctl instance management
```
$ mcpctl instance start slack
    |
    v
mcplocal -> mcpd:
    1. Creates Docker container for Slack MCP server
    2. Injects SLACK_TOKEN from secure storage
    3. Connects to isolated mcp-servers network
    4. Logs audit entry: "user X started slack instance"
    5. Returns instance status
```

## What Already Exists (completed work)

### Done and reusable as-is:
- Project structure: pnpm monorepo, TypeScript strict mode, Vitest, ESLint
- Database schema: Prisma + PostgreSQL (User, McpServer, McpProfile, Project, McpInstance, AuditLog)
- mcpd server framework: Fastify 5, routes, services, repositories, middleware
- mcpd MCP server CRUD: registration, profiles, projects
- mcpd Docker container management: dockerode, instance lifecycle
- mcpd audit logging, health monitoring, metrics, backup/restore
- mcpctl CLI framework: Commander.js, commands, config, API client, formatters
- mcpctl RPM distribution: bun compile, nfpm, Gitea publishing, shell completions
- MCP protocol routing in local-proxy: namespace tools, resources, prompts
- LLM provider abstractions: OpenAI, Anthropic, Ollama adapters (defined but unused)
- Shared types and profile templates

### Needs rework:
- mcpctl currently talks to mcpd directly -> must talk to mcplocal instead
- local-proxy is just a dumb router -> needs LLM pre-processing intelligence
- local-proxy has no HTTP API for mcpctl -> needs REST endpoints for management proxying
- mcpd has no MCP proxy endpoint -> needs endpoint that mcplocal can call to execute MCP tool calls on managed instances
- No integration between LLM providers and MCP request/response pipeline

## New Tasks Needed

### Phase 1: Rename and restructure local-proxy -> mcplocal
- Rename `src/local-proxy/` to `src/mcplocal/`
- Update all package references and imports
- Add HTTP REST server (Fastify) alongside existing stdio server
- mcplocal needs TWO interfaces: stdio for Claude, HTTP for mcpctl

### Phase 2: mcplocal management proxy
- Add REST endpoints that mirror mcpd's API (get servers, instances, projects, etc.)
- mcpctl config changes: `daemonUrl` now points to mcplocal (e.g., localhost:3200) instead of mcpd
- mcplocal proxies management requests to mcpd (configurable `mcpdUrl` e.g., http://nas:3100)
- Pass-through with no LLM processing for management commands

### Phase 3: mcpd MCP proxy endpoint
- Add `/api/v1/mcp/proxy` endpoint to mcpd
- Accepts: `{ serverId, method, params }` - execute an MCP tool call on a managed instance
- mcpd looks up the instance, connects to the container, executes the MCP call, returns result
- This is how mcplocal talks to MCP servers without needing direct Docker access

### Phase 4: LLM pre-processing pipeline in mcplocal
- Create request interceptor in mcplocal's MCP router
- Before forwarding `tools/call` to mcpd, run the request through LLM for interpretation
- After receiving response from mcpd, run through LLM for filtering/summarization
- LLM provider selection based on config (prefer local/cheap models)
- Configurable: enable/disable pre-processing per server or per tool
- Bypass for simple operations (list, create, delete - no filtering needed)

### Phase 5: Smart context optimization
- Token counting: estimate how many tokens the raw response would consume
- Decision logic: if raw response < threshold, skip LLM filtering (not worth the latency)
- If raw response > threshold, filter with LLM
- Cache LLM filtering decisions for repeated similar queries
- Metrics: track tokens saved, latency added by filtering

### Phase 6: mcpctl -> mcplocal migration
- Update mcpctl's default daemonUrl to point to mcplocal (localhost:3200)
- Update all CLI commands to work through mcplocal proxy
- Add `mcpctl config set mcpd-url <url>` for configuring upstream mcpd
- Add `mcpctl config set mcplocal-url <url>` for configuring local daemon
- Health check: `mcpctl status` shows both mcplocal and mcpd connectivity
- Shell completions update if needed

### Phase 7: End-to-end integration testing
- Test full flow: mcpctl -> mcplocal -> mcpd -> mcp_server -> response -> LLM filter -> Claude
- Test management commands pass through correctly
- Test LLM pre-processing reduces context window size
- Test credential isolation (mcplocal never sees MCP server credentials)
- Test health monitoring across all tiers

## Authentication & Authorization

### Database ownership
- **mcpd owns the database** (PostgreSQL). It is the only component that talks to the DB.
- mcplocal has NO database. It is stateless (config file only).
- mcpctl has NO database. It stores user credentials locally in `~/.mcpctl/config.yaml`.

### Auth flow
```
mcpctl login
    |
    v (user enters mcpd URL + credentials)
mcpctl stores API token in ~/.mcpctl/config.yaml
    |
    v (passes token to mcplocal config)
mcplocal authenticates to mcpd using Bearer token on every request
    |
    v (Authorization: Bearer <token>)
mcpd validates token against Session table in PostgreSQL
    |
    v (authenticated request proceeds)
```

### mcpctl responsibilities
- `mcpctl login` command: prompts user for mcpd URL and credentials (username/password or API token)
- `mcpctl login` calls mcpd's auth endpoint to get a session token
- Stores the token in `~/.mcpctl/config.yaml` (or `~/.mcpctl/credentials` with restricted permissions)
- Passes the token to mcplocal (either via config or as startup argument)
- `mcpctl logout` command: invalidates the session token

### mcplocal responsibilities
- Reads auth token from its config (set by mcpctl)
- Attaches `Authorization: Bearer <token>` header to ALL requests to mcpd
- If mcpd returns 401, mcplocal returns appropriate error to mcpctl/Claude
- Does NOT store credentials itself - they come from mcpctl's config

### mcpd responsibilities
- Owns User and Session tables
- Provides auth endpoints: `POST /api/v1/auth/login`, `POST /api/v1/auth/logout`
- Validates Bearer tokens on every request via auth middleware (already exists)
- Returns 401 for invalid/expired tokens
- Audit logs include the authenticated user

## Non-functional Requirements
- mcplocal must start fast (developer's machine, runs per-session or as daemon)
- LLM pre-processing must not add more than 2-3 seconds latency
- If local LLM is unavailable, fall back to passing data through unfiltered
- All components must be independently deployable and testable
- mcpd must remain stateless (outside of DB) and horizontally scalable