- Add warmup() to LlmProvider interface for eager subprocess startup - ManagedVllmProvider.warmup() starts vLLM in background on project load - ProviderRegistry.warmupAll() triggers all managed providers - NamedProvider proxies warmup() to inner provider - paginate stage generates LLM-powered descriptive page titles when available, cached by content hash, falls back to generic "Page N" - project-mcp-endpoint calls warmupAll() on router creation so vLLM is loading while the session initializes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1465 lines
67 KiB
Markdown
1465 lines
67 KiB
Markdown
# PRD: ProxyModels — Programmable MCP Content Processing
|
|
|
|
## The Concept
|
|
|
|
A **proxymodel** is a named, composable pipeline that defines how mcplocal transforms content between upstream MCP servers and the client LLM.
|
|
|
|
### Relationship to proxyMode
|
|
|
|
The existing `proxyMode` field on projects is the on/off switch:
|
|
|
|
```
|
|
proxyMode: direct → clients connect to upstream servers directly
|
|
no proxy in the path, no processing, no gating
|
|
(generates MCP config with direct server entries)
|
|
|
|
proxyMode: proxy → all traffic flows through mcplocal
|
|
proxyModel pipeline applies
|
|
(generates MCP config pointing to mcplocal endpoint)
|
|
```
|
|
|
|
`proxyMode: filtered` (current name) gets renamed to `proxyMode: proxy`.
|
|
|
|
### The "default" proxymodel — what we already built
|
|
|
|
Everything we've implemented so far IS a proxymodel. It becomes the `default` model that ships with mcpctl:
|
|
|
|
| Feature | Already implemented |
|
|
|---|---|
|
|
| Gated sessions (begin_session / ungate) | Yes |
|
|
| Prompt tag matching + scoring | Yes |
|
|
| LLM-based prompt selection (when provider configured) | Yes |
|
|
| Deterministic tag matching (no LLM fallback) | Yes |
|
|
| read_prompts for on-demand context | Yes |
|
|
| Gated intercept (auto-ungate on real tool call) | Yes |
|
|
| Pagination for large responses | Yes |
|
|
| tools/list_changed notification | Yes |
|
|
| System prompts (gate-instructions, encouragement, etc.) | Yes |
|
|
| Prompt byte budget with priority scoring | Yes |
|
|
|
|
The `default` proxymodel is NOT replaced — it's the foundation. Future proxymodels extend it by adding processing stages for content that flows through the existing pipeline.
|
|
|
|
Architecturally, the gated session system is itself a proxymodel — it's a **session controller** that intercepts JSON-RPC methods, manages per-session state, injects virtual tools, and dispatches notifications. The framework recognizes two types of processing: **session controllers** (method-level hooks, state management) and **content stages** (text in → text out transformation). The `default` proxymodel combines a gate session controller with passthrough + paginate content stages. See "Gated Sessions as a ProxyModel" section for full analysis.
|
|
|
|
### Future proxymodels build on default
|
|
|
|
```
|
|
"default" proxymodel (what exists today):
|
|
[controller: gate] → prompt match → serve prompts → ungate
|
|
[stages: passthrough, paginate] → route tool calls → paginate large responses
|
|
|
|
"summarize" proxymodel (future):
|
|
[controller: gate] → prompt match → SUMMARIZE prompts → serve summaries → ungate
|
|
[stages: summarize] → route → SUMMARIZE large responses
|
|
|
|
"index" proxymodel (future):
|
|
[controller: gate] → prompt match → INDEX prompts → serve ToC → serve sections on demand → ungate
|
|
[stages: section-split, summarize-tree] → route → INDEX large responses
|
|
|
|
"ungated" proxymodel (for projects that want proxy features without gating):
|
|
[controller: none] → all tools visible immediately
|
|
[stages: summarize] → route → SUMMARIZE large responses
|
|
```
|
|
|
|
Each future model can reuse or replace the gate controller and add content processing stages. A project that wants content summarization without gating uses `controller: none`.
|
|
|
|
### All proxymodels
|
|
|
|
| ProxyModel | Controller | Content Stages | Requires LLM |
|
|
|---|---|---|---|
|
|
| `default` | `gate` | passthrough, paginate | No (optional for prompt selection) |
|
|
| `subindex` | `gate` | section-split, summarize-tree | Yes (for prose summaries) |
|
|
| `summarize` | `gate` | summarize | Yes |
|
|
| `summarize+index` | `gate` | summarize, index | Yes |
|
|
| `enhance` | `gate` | enhance | Yes |
|
|
| `compress` | `gate` | compress | Yes |
|
|
| `ungated-subindex` | none | section-split, summarize-tree | Yes |
|
|
|
|
Proxymodels apply to all content flowing through the proxy: prompt text, tool results, resource content. A 120K char `get_flows` response benefits from a proxymodel that summarizes it before it hits Claude's context window.
|
|
|
|
## Why this matters
|
|
|
|
The proxy sits between the LLM and every piece of content it consumes. That position gives it the power to:
|
|
|
|
- **Reduce token burn** — Claude doesn't read 120K of JSON when a 2K summary would do
|
|
- **Improve task quality** — structured prompts lead to better outcomes than prose
|
|
- **Adapt to the LLM** — what works for Claude may not work for GPT, Gemini, etc.
|
|
- **Measure and iterate** — same content, different proxymodels, compare results
|
|
|
|
But without caching, any proxymodel involving LLM processing adds 3-10 seconds per request (Gemini, local models). The cache is what makes proxymodels practical — compute once, serve forever until source changes.
|
|
|
|
## Architecture: The ProxyModel Framework
|
|
|
|
The framework is a **plugin runtime**. It provides the API contract, services, and execution environment. Proxymodel authors — whether us or 300 external users — write stages against this contract without touching mcpctl internals.
|
|
|
|
### The Stage Contract
|
|
|
|
A **stage** is the atomic unit. It's a function that takes content in and returns content out, with access to platform services:
|
|
|
|
```typescript
|
|
// This is the public API that proxymodel authors write against
|
|
|
|
export interface StageHandler {
|
|
(content: string, ctx: StageContext): Promise<StageResult>;
|
|
}
|
|
|
|
/** Services the framework provides to every stage */
|
|
export interface StageContext {
|
|
// What are we processing?
|
|
contentType: 'prompt' | 'toolResult' | 'resource';
|
|
sourceName: string; // prompt name, "server/tool", resource URI
|
|
projectName: string;
|
|
sessionId: string;
|
|
|
|
// The original unmodified content (even if a previous stage changed it)
|
|
originalContent: string;
|
|
|
|
// Platform services — stages don't build these, they use them
|
|
llm: LLMProvider; // call the configured LLM (Gemini, Ollama, etc.)
|
|
cache: CacheProvider; // content-addressed read/write
|
|
log: Logger;
|
|
|
|
// Stage-specific configuration from the proxymodel YAML
|
|
config: Record<string, unknown>;
|
|
}
|
|
|
|
export interface StageResult {
|
|
content: string; // the transformed content
|
|
sections?: Section[]; // optional: section index for drill-down
|
|
metadata?: Record<string, unknown>; // optional: metrics, debug info
|
|
}
|
|
|
|
export interface Section {
|
|
id: string; // addressable key (e.g. "token-handling")
|
|
title: string; // human-readable label
|
|
content: string; // full section content (served on drill-down)
|
|
}
|
|
```
|
|
|
|
**Key principle: stages never import mcpctl internals.** They only import types from `mcpctl/proxymodel` (a public package/entrypoint). This is what makes 300 people able to write their own stages without forking the app.
|
|
|
|
### Services the Framework Provides
|
|
|
|
| Service | What it does | Why stages need it |
|
|
|---|---|---|
|
|
| `ctx.llm` | Call any configured LLM provider | Summarize, index, enhance, compress all need LLM |
|
|
| `ctx.cache` | Content-addressed read/write cache | Avoid re-processing unchanged content |
|
|
| `ctx.log` | Structured logging tied to session/stage | Debug and metrics without console.log |
|
|
| `ctx.config` | Stage-specific settings from YAML | `maxTokens: 500`, `keepHeaders: true`, etc. |
|
|
| `ctx.originalContent` | The raw content before any stage touched it | Stages can reference original even after prior stages modified it |
|
|
|
|
The framework wires these up. A stage author writing a custom summarizer does:
|
|
|
|
```typescript
|
|
// ~/.mcpctl/stages/my-summarizer.ts
|
|
import type { StageHandler } from 'mcpctl/proxymodel';
|
|
|
|
const handler: StageHandler = async (content, ctx) => {
|
|
// Use the platform LLM — don't care if it's Gemini, Ollama, or Claude
|
|
const summary = await ctx.llm.complete(
|
|
`Summarize this ${ctx.contentType} in ${ctx.config.maxTokens ?? 500} tokens:\n\n${content}`
|
|
);
|
|
return { content: summary };
|
|
};
|
|
export default handler;
|
|
```
|
|
|
|
They never think about HTTP, caching, session management, or database access.
|
|
|
|
### ProxyModel Definition
|
|
|
|
A proxymodel is a named **pipeline** — an optional session controller plus an ordered list of content stages. It's a YAML file:
|
|
|
|
```yaml
|
|
# ~/.mcpctl/proxymodels/summarize+index.yaml
|
|
kind: ProxyModel
|
|
metadata:
|
|
name: summarize+index
|
|
spec:
|
|
controller: gate # session controller (optional, default: gate)
|
|
controllerConfig: # config passed to the controller
|
|
byteBudget: 8192
|
|
stages:
|
|
- type: summarize # built-in stage
|
|
config:
|
|
maxTokens: 500
|
|
includeSectionLinks: true
|
|
- type: index # built-in stage
|
|
config:
|
|
maxDepth: 2
|
|
- type: my-summarizer # custom stage (resolved from ~/.mcpctl/stages/)
|
|
config:
|
|
keepHeaders: true
|
|
appliesTo:
|
|
- prompts
|
|
- toolResults
|
|
cacheable: true
|
|
```
|
|
|
|
The `controller` field specifies a session controller that handles method-level hooks (tools/list, initialize, tool call intercept). Default is `gate` — the existing gated session system. Set to `none` for projects that want content processing without gating. Content stages compose left-to-right — output of stage N becomes input of stage N+1.
|
|
|
|
### Stage Resolution
|
|
|
|
Stage `type` names resolve in order:
|
|
|
|
```
|
|
type: "summarize"
|
|
→ check ~/.mcpctl/stages/summarize.ts → found? load it
|
|
→ check built-in stages (compiled) → found? use it
|
|
→ error: unknown stage type "summarize"
|
|
```
|
|
|
|
This means users can:
|
|
- **Use built-in stages** by name (`summarize`, `index`, `compress`)
|
|
- **Write custom stages** as `.ts` files in `~/.mcpctl/stages/`
|
|
- **Override built-in stages** by placing a file with the same name in `~/.mcpctl/stages/`
|
|
|
|
### Built-in Stages (ship with mcpctl)
|
|
|
|
| Stage | What it does | Requires LLM |
|
|
|---|---|---|
|
|
| `passthrough` | Returns content unchanged | No |
|
|
| `paginate` | Splits into pages with navigation | No |
|
|
| `section-split` | Splits on headers into named sections | No |
|
|
| `summarize` | LLM-generated summary with section refs | Yes |
|
|
| `index` | Table of contents with section drill-down | Yes (or heuristic) |
|
|
| `enhance` | Restructure for LLM consumption (action items first, bullets) | Yes |
|
|
| `compress` | Strip boilerplate, keep actionable content | Yes |
|
|
|
|
These are reference implementations. A user who wants a different summarization strategy writes their own `summarize.ts` and drops it in `~/.mcpctl/stages/` — it overrides the built-in.
|
|
|
|
### Where ProxyModels Live
|
|
|
|
**Built-in proxymodels** — compiled into the binary:
|
|
- `default` — current behavior (gate, prompt match, paginate). Always present.
|
|
- May ship others as reference (e.g. `summarize`, `index`)
|
|
|
|
**Local proxymodels** — YAML files in `~/.mcpctl/proxymodels/<name>.yaml`:
|
|
- Created by users
|
|
- Can reference both built-in and custom stages
|
|
- Can override built-in proxymodels by using the same name
|
|
|
|
**Custom stages** — TypeScript files in `~/.mcpctl/stages/<name>.ts`:
|
|
- Implement the `StageHandler` interface
|
|
- Loaded dynamically by the framework at startup
|
|
- Hot-reloadable (file watcher)
|
|
|
|
**No database table for proxymodel or stage definitions.** mcpd stores:
|
|
- RBAC bindings (who can use which proxymodel on which project)
|
|
- Cache artifacts (produced by stages)
|
|
- Session metrics (which proxymodel was active, performance data)
|
|
|
|
### Resolution & RBAC
|
|
|
|
**Proxymodel resolution** when a project references one by name:
|
|
```
|
|
project.proxyModel: "summarize"
|
|
→ check ~/.mcpctl/proxymodels/ → found? use it
|
|
→ check built-in (compiled) → found? use it
|
|
→ error: unknown proxymodel "summarize"
|
|
```
|
|
|
|
**RBAC controls usage, not creation.** Proxymodels are files — anyone can create them locally. RBAC controls which proxymodels a user can **activate** on shared projects:
|
|
|
|
```yaml
|
|
kind: RbacBinding
|
|
spec:
|
|
subject: group/developers
|
|
role: run
|
|
resource: proxymodels
|
|
name: summarize # specific model, or * for all
|
|
```
|
|
|
|
Without `run` permission, the project falls back to `default`.
|
|
|
|
### Project Configuration
|
|
|
|
```yaml
|
|
kind: Project
|
|
metadata:
|
|
name: homeautomation
|
|
spec:
|
|
proxyModel: summarize+index # default for this project
|
|
proxyModelOverrides:
|
|
prompts:
|
|
security-policy: enhance+index # this prompt gets special treatment
|
|
toolResults:
|
|
"*/get_flows": summarize # large tool results get summarized
|
|
```
|
|
|
|
### Framework Runtime
|
|
|
|
```
|
|
Client request arrives at mcplocal
|
|
→ Content identified (prompt text / tool result / resource)
|
|
→ Resolve proxymodel name from project config
|
|
→ Resolve each stage in the pipeline (local → built-in)
|
|
→ For each stage in order:
|
|
→ Compute cache key: (contentHash, modelName, stageName, config)
|
|
→ Cache hit? → skip stage, use cached result
|
|
→ Cache miss? → call stage handler with content + context
|
|
→ Cache result if proxymodel.cacheable
|
|
→ Serve final content to client
|
|
→ Record metrics (tokens, timing, cache hit rate)
|
|
```
|
|
|
|
### Content Addressing for Drill-Down
|
|
|
|
When a stage produces sections (via `StageResult.sections`), the framework enables drill-down:
|
|
|
|
```
|
|
# Initial: Claude gets summary
|
|
read_prompts({ tags: ["security"] })
|
|
→ "Key requirements: [1] Token handling [2] Network security [3] Audit logging"
|
|
|
|
# Drill-down: Claude requests a specific section
|
|
read_prompts({ tags: ["security"], section: "token-handling" })
|
|
→ Full section content about token handling
|
|
```
|
|
|
|
For tool results, the existing pagination mechanism extends with section addressing:
|
|
|
|
```
|
|
# Tool returns 120K of flows
|
|
my-node-red/get_flows()
|
|
→ "10 flows found: [1] Thermostat (12 nodes) [2] Lighting (8 nodes) ... call with _section for details"
|
|
|
|
# Client requests specific flow
|
|
my-node-red/get_flows({ _section: "thermostat" })
|
|
→ Full flow definition for Thermostat only
|
|
```
|
|
|
|
### CLI
|
|
|
|
```bash
|
|
# List all proxymodels (built-in + local)
|
|
mcpctl get proxymodels
|
|
NAME SOURCE STAGES REQUIRES-LLM CACHEABLE
|
|
default built-in passthrough,paginate no no
|
|
summarize built-in summarize yes yes
|
|
my-experiment local my-summarizer,compress yes yes
|
|
|
|
# List all stages (built-in + custom)
|
|
mcpctl get stages
|
|
NAME SOURCE REQUIRES-LLM
|
|
passthrough built-in no
|
|
paginate built-in no
|
|
summarize built-in yes
|
|
my-summarizer local yes
|
|
|
|
# Inspect
|
|
mcpctl describe proxymodel summarize
|
|
mcpctl describe stage summarize
|
|
|
|
# Scaffold a new stage (generates boilerplate .ts file)
|
|
mcpctl create stage my-filter
|
|
# → Created ~/.mcpctl/stages/my-filter.ts
|
|
|
|
# Scaffold a new proxymodel
|
|
mcpctl create proxymodel my-pipeline --stages summarize,my-filter
|
|
# → Created ~/.mcpctl/proxymodels/my-pipeline.yaml
|
|
|
|
# Delete local resources (can't delete built-ins)
|
|
mcpctl delete proxymodel my-experiment
|
|
mcpctl delete stage my-filter
|
|
|
|
# Validate a proxymodel (check all stages resolve, config valid)
|
|
mcpctl proxymodel validate my-experiment
|
|
```
|
|
|
|
## Cache System
|
|
|
|
### Why caching is non-negotiable
|
|
|
|
Any proxymodel stage that involves LLM processing costs 3-10s (Gemini, local models) or real money (cloud APIs). Without caching:
|
|
- First `begin_session` for a gated project: 5-15s just to summarize prompts
|
|
- Every `get_flows` call: 5-10s to summarize results
|
|
- Users would see this as broken, not enhanced
|
|
|
|
### Content-addressed, two-tier
|
|
|
|
**Cache key:** `(contentHash, proxyModelName, stageName)` → artifact
|
|
|
|
Content hash makes invalidation automatic — when source changes, hash changes, old entries become unreachable.
|
|
|
|
**Tier 1: mcplocal (local, per-user)**
|
|
- `~/.mcpctl/cache/proxymodel/`
|
|
- Instant lookup, no network
|
|
- LRU eviction at configurable size limit (default 100MB)
|
|
|
|
**Tier 2: mcpd (shared, central)**
|
|
- `prompt_cache` database table
|
|
- Shared across all users of a project
|
|
- Requires `cache` RBAC permission to push
|
|
- Pull available to anyone with `view` on the project
|
|
|
|
**Lookup order:** local → mcpd → generate → cache locally → optionally push to mcpd
|
|
|
|
### Cache CLI
|
|
|
|
```bash
|
|
mcpctl cache list --project homeautomation # show cached artifacts
|
|
mcpctl cache push --project homeautomation # push local → shared
|
|
mcpctl cache clear --project homeautomation # clear local
|
|
mcpctl cache stats # hit rates, sizes
|
|
```
|
|
|
|
### RBAC
|
|
|
|
Two new permissions:
|
|
- `cache` on `proxymodels` — grants ability to push cached artifacts to shared cache
|
|
- `run` on `proxymodels` (name-scoped) — grants ability to use a specific proxymodel on projects
|
|
|
|
Without `run` permission on a proxymodel, the project falls back to `default` (which requires no permission).
|
|
|
|
## Model Studio: Live ProxyModel Development
|
|
|
|
The development workflow is **live and interactive**. You watch a real Claude session, intervene when things go wrong, and teach a monitoring Claude to fix the proxymodel — all without breaking the running session.
|
|
|
|
### The Setup: Three Windows
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Window 1: Claude Client │
|
|
│ claude │
|
|
│ (connected to mcplocal, working on homeautomation project) │
|
|
│ (uses whatever proxyModel is configured for the project) │
|
|
│ │
|
|
│ Window 2: Model Studio (TUI) │
|
|
│ mcpctl console --model-studio homeautomation │
|
|
│ (you watch traffic, see original vs transformed content, │
|
|
│ pause messages, edit them, switch models in-flight) │
|
|
│ │
|
|
│ Window 3: Claude Monitor │
|
|
│ claude │
|
|
│ (connected to mcpctl-studio MCP server in .mcp.json, │
|
|
│ observes traffic + your corrections, modifies the proxymodel) │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Window 1** is a normal Claude Code session. It doesn't know it's being watched. It connects to mcplocal, goes through the gate, uses tools. The proxymodel processes content before Claude sees it.
|
|
|
|
**Window 2** is `mcpctl console --model-studio` — an Ink TUI that extends `--inspect` with:
|
|
- **Original vs. Transformed view**: for every prompt/tool result, see the raw content and what the proxymodel turned it into
|
|
- **Pause/Resume**: hold outgoing responses so you can inspect or edit before Claude receives them
|
|
- **Inline editing**: modify a response before it's sent to Claude
|
|
- **Model switching**: change the active proxymodel for the project mid-session
|
|
- **Same keyboard patterns** as `--inspect`: `j`/`k` navigate, `Enter` expand, `s` sidebar, arrows scroll
|
|
|
|
**Window 3** is a Claude session with the `mcpctl-studio` MCP server added to `.mcp.json`. This Claude can:
|
|
- See all traffic events (same as `--inspect --stdin-mcp`)
|
|
- See your corrections (edits you made in the studio)
|
|
- Modify proxymodel files (stages + YAML)
|
|
- Hot-swap the active proxymodel on the project
|
|
- The corrections you make become its training signal
|
|
|
|
### The Workflow
|
|
|
|
```
|
|
1. Start Claude Client in window 1 — it begins working on a task
|
|
2. Watch traffic in Model Studio (window 2)
|
|
3. Claude Client receives a prompt through the proxymodel...
|
|
→ You see: ORIGINAL (raw prompt) vs TRANSFORMED (what proxymodel produced)
|
|
→ It looks wrong — the summary dropped important security requirements
|
|
4. You PAUSE outgoing messages
|
|
5. You EDIT the transformed content to fix it
|
|
6. You RESUME — Claude Client receives your edited version
|
|
7. mcplocal records a CORRECTION event: { original, transformed, edited }
|
|
8. In window 3, you tell Claude Monitor:
|
|
"The summarize stage dropped security requirements. Look at correction #3.
|
|
Adjust the stage to always preserve lines containing 'MUST' or 'REQUIRED'."
|
|
9. Claude Monitor:
|
|
- Calls get_corrections to see your edit
|
|
- Reads the current stage file
|
|
- Modifies ~/.mcpctl/stages/summarize.ts
|
|
- Calls switch_model to reload the stage
|
|
10. Next time Claude Client triggers that content, the updated stage runs
|
|
11. You tell Claude Client: "retry that last step"
|
|
(or /clear and start fresh if needed)
|
|
```
|
|
|
|
### Traffic Events for Model Studio
|
|
|
|
Extends the existing inspector events with new types:
|
|
|
|
| Event Type | Description |
|
|
|---|---|
|
|
| `content_original` | Raw content before proxymodel processing |
|
|
| `content_transformed` | Content after proxymodel pipeline |
|
|
| `content_paused` | User paused this response in studio |
|
|
| `content_edited` | User edited the transformed content (includes before + after) |
|
|
| `content_released` | Paused/edited content sent to client |
|
|
| `model_switched` | Active proxymodel changed mid-session |
|
|
| `stage_reloaded` | A stage file was modified and hot-reloaded |
|
|
|
|
Correction events (`content_edited`) carry the full diff:
|
|
|
|
```typescript
|
|
interface CorrectionEvent {
|
|
eventType: 'content_edited';
|
|
sessionId: string;
|
|
contentType: 'prompt' | 'toolResult';
|
|
sourceName: string; // which prompt or tool
|
|
original: string; // raw content from upstream
|
|
transformed: string; // what the proxymodel produced
|
|
edited: string; // what the user changed it to
|
|
activeModel: string; // which proxymodel was active
|
|
activeStages: string[]; // which stages ran
|
|
timestamp: number;
|
|
}
|
|
```
|
|
|
|
These are streamed via the existing SSE `/inspect` endpoint and available through the MCP server tools.
|
|
|
|
### Model Studio TUI
|
|
|
|
```
|
|
┌─ Model Studio: homeautomation ──────────────────────── model: summarize ─┐
|
|
│ │
|
|
│ Sessions │ Traffic │
|
|
│ ▸ session-abc (active) │ 11:03:25 → initialize client=claude-code │
|
|
│ │ 11:03:25 ← initialize server=mcpctl-proxy │
|
|
│ │ 11:03:26 → tools/list │
|
|
│ │ 11:03:26 ← tools/list 1 tool: begin_session │
|
|
│ │ 11:03:27 → begin_session(tags: security,flows) │
|
|
│ │ 11:03:27 ← begin_session [2 prompts matched] │
|
|
│ │ │
|
|
│ │ ┌─ ORIGINAL ─────────────────────────────────┐ │
|
|
│ │ │ # Security Policy │ │
|
|
│ │ │ All tokens MUST be rotated every 90 days. │ │
|
|
│ │ │ Network access MUST use mTLS. │ │
|
|
│ │ │ ... +45 more lines │ │
|
|
│ │ ├─ TRANSFORMED (summarize) ──────────────────┤ │
|
|
│ │ │ Security policy covers token management │ │
|
|
│ │ │ and network security practices. │ │
|
|
│ │ │ [!] MUST requirements dropped │ │
|
|
│ │ ├─ ⏸ PAUSED ─── [e]dit [r]elease [d]rop ───┤ │
|
|
│ │ │ │ │
|
|
│ │ └────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ [m] switch model [p] pause/resume [e] edit [j/k] navigate │
|
|
└───────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Keyboard shortcuts (extends `--inspect` patterns):**
|
|
|
|
| Key | Action |
|
|
|---|---|
|
|
| `j`/`k` | Navigate events |
|
|
| `Enter` | Expand event (original vs transformed view) |
|
|
| `s` | Toggle sidebar |
|
|
| `p` | Toggle pause mode (hold all outgoing responses) |
|
|
| `e` | Edit the selected/paused response (opens in `$EDITOR` or inline) |
|
|
| `r` | Release paused message (send to client) |
|
|
| `d` | Drop paused message (don't send) |
|
|
| `m` | Switch active proxymodel (picker) |
|
|
| `o` | Toggle original/transformed/diff view |
|
|
| `Esc` | Close expanded view / exit edit |
|
|
| `↑`/`↓` | Scroll within expanded view |
|
|
| `G` | Jump to latest event |
|
|
| `c` | Clear traffic log |
|
|
| `q` | Quit |
|
|
|
|
### Studio MCP Server (for Claude Monitor)
|
|
|
|
Added to `.mcp.json` as a separate server:
|
|
|
|
```json
|
|
{
|
|
"mcpctl-studio": {
|
|
"command": "mcpctl",
|
|
"args": ["console", "--model-studio", "--stdin-mcp"]
|
|
}
|
|
}
|
|
```
|
|
|
|
**Tools available to Claude Monitor:**
|
|
|
|
| Tool | Description |
|
|
|---|---|
|
|
| `list_sessions` | Active sessions with project, model, event count |
|
|
| `get_traffic { sessionId, limit?, method? }` | Traffic events (same as inspector) |
|
|
| `get_content_diff { eventId }` | Original vs transformed vs edited for a specific event |
|
|
| `get_corrections { sessionId? }` | All user corrections (edits) in a session |
|
|
| `get_active_model { project }` | Current proxymodel name + stage list |
|
|
| `switch_model { project, model }` | Hot-swap the active proxymodel (reloads stages) |
|
|
| `reload_stages` | Force reload all stages from `~/.mcpctl/stages/` |
|
|
| `get_stage_source { name }` | Read the source of a stage file |
|
|
| `list_models` | Available proxymodels (built-in + local) |
|
|
| `list_stages` | Available stages (built-in + custom) |
|
|
|
|
**Example conversation with Claude Monitor:**
|
|
|
|
> **User:** The summarize stage is dropping all the MUST requirements from the security prompt. Look at correction #2.
|
|
>
|
|
> **Claude Monitor:** *calls `get_corrections`* I see — the original had "All tokens MUST be rotated every 90 days" but the summary just says "token management". Let me check the stage... *calls `get_stage_source { name: "summarize" }`*
|
|
>
|
|
> The current prompt doesn't instruct the LLM to preserve requirement markers. I'll add that.
|
|
>
|
|
> *modifies `~/.mcpctl/stages/summarize.ts`*
|
|
> *calls `reload_stages`*
|
|
>
|
|
> Done. The summarize stage now preserves lines containing MUST, REQUIRED, or CRITICAL verbatim in the summary. The next time this content is processed, it should retain those requirements.
|
|
|
|
### Hot-Swap Without Breaking Sessions
|
|
|
|
When the model or stages change mid-session:
|
|
|
|
1. **Stage reload**: mcplocal watches `~/.mcpctl/stages/` for changes. When a file is modified, it re-imports the module. The next content that flows through the pipeline uses the new version. No session restart needed.
|
|
|
|
2. **Model switch**: When `switch_model` is called (or user presses `m` in studio), mcplocal updates the project's active proxymodel reference. The session transport stays open. The next content processing call uses the new pipeline. Previous responses are not re-processed — they were already sent.
|
|
|
|
3. **Cache invalidation on stage change**: When a stage file changes, all cached artifacts produced by that stage are invalidated (the stage file hash is part of the cache key). This ensures the new stage logic runs fresh.
|
|
|
|
```
|
|
switch_model called or stage file modified
|
|
→ mcplocal reloads stage modules
|
|
→ invalidate affected cache entries
|
|
→ emit stage_reloaded / model_switched event (visible in studio + MCP)
|
|
→ next content flows through updated pipeline
|
|
→ client session unaffected (transport stays open)
|
|
```
|
|
|
|
### Pause/Edit Flow in mcplocal
|
|
|
|
When the studio is active and pause mode is on:
|
|
|
|
```
|
|
Content arrives (prompt match or tool result)
|
|
→ Pipeline runs stages → produces transformed content
|
|
→ Instead of sending to client immediately:
|
|
→ Emit content_original + content_transformed events
|
|
→ Hold response in a pending queue
|
|
→ Studio shows ⏸ PAUSED indicator
|
|
→ User can:
|
|
[r] release → send as-is → emit content_released
|
|
[e] edit → modify → emit content_edited → send edited version
|
|
[d] drop → discard → don't send (client sees timeout or empty)
|
|
```
|
|
|
|
Without the studio connected, or with pause mode off, content flows through normally — zero overhead.
|
|
|
|
## Implementation Phases
|
|
|
|
Framework and first model (`subindex`) are built together. The framework API is shaped by real usage — every interface gets validated against `subindex` before being finalized. Don't build Phase 1 in isolation and hope it fits; build them in lockstep.
|
|
|
|
The same applies to monitoring. While building and debugging `subindex`, the developer (or Claude via `--inspect --stdin-mcp`) will naturally discover what debugging information is missing. "I need to see what the section-split produced before summarize-tree ran." "I need to see the cache key that was computed." "I need to see why this JSON wasn't detected as structured." These discoveries drive the Model Studio feature set — don't design all the monitoring tools upfront, add them as you hit real debugging needs during Phase 1.
|
|
|
|
### Phase 1: Framework Core + `subindex` Model
|
|
|
|
Build the minimal framework needed to run the `subindex` model end-to-end. **Critical architectural constraint:** design the pipeline executor and endpoint integration so that the existing gated session logic occupies a clear "session controller" slot — don't weave content stages into the gating code or vice versa. Even though `SessionController` won't be a public API in Phase 1, the internal separation must be clean enough that extracting it later is a refactor, not a rewrite.
|
|
|
|
1. `StageHandler`, `StageContext`, `StageResult`, `Section` types — the public contract (`mcpctl/proxymodel` entrypoint)
|
|
2. `LLMProvider` interface + adapter for existing provider registry
|
|
3. `CacheProvider` interface (in-memory for now — enough to prove the API)
|
|
4. Content type detection: JSON, YAML, XML, code, prose
|
|
5. `section-split` stage: structural splitting per content type (JSON keys, markdown headers, etc.)
|
|
6. `summarize-tree` stage: recursive summarization with structural summaries for programmatic content, LLM summaries for prose
|
|
7. Section drill-down: framework serves `sections[id].content` when client requests a specific section. Leaf = exact original content, never rewritten.
|
|
8. Pipeline executor: wire stages, pass context, run in order. **Separate method routing (controller layer) from content processing (stage layer)** — the executor calls stages only after the controller has decided what content to process.
|
|
9. `subindex` proxymodel definition (YAML) using `section-split` + `summarize-tree`
|
|
10. `default` proxymodel wrapping current behavior (gate controller + `passthrough` + `paginate`)
|
|
11. Refactor `project-mcp-endpoint.ts` to route content through pipeline — **gate logic stays but is cleanly separated from stage execution**. Identify the 5 extension points (initialize, tools/list, tool call intercept, tool result, close) as internal interfaces even if not yet exposed as `SessionController`.
|
|
12. ProxyModel YAML schema + loader (`~/.mcpctl/proxymodels/`) — includes `controller` and `controllerConfig` fields
|
|
13. Custom stage loader (dynamic import from `~/.mcpctl/stages/`)
|
|
14. Stage + proxymodel registry: merge built-in + local, resolve by name
|
|
15. Hot-reload: file watcher on `~/.mcpctl/stages/` and `~/.mcpctl/proxymodels/`
|
|
16. Hot-swap: API to switch active proxymodel on a project without dropping session
|
|
|
|
17. Extend `--inspect` traffic events as needed during debugging (e.g. per-stage input/output, cache hits/misses, content type detection results). The existing inspector (`mcpctl console --inspect --stdin-mcp`) gives Claude access to debug alongside the developer.
|
|
|
|
**Milestone: `subindex` model runs on a real project. Claude navigates 120K `get_flows` via structural index without reading the full JSON.**
|
|
|
|
### Phase 2: Cache Layer
|
|
|
|
The `subindex` model works but LLM summaries are slow without caching. Fix that.
|
|
|
|
17. `CacheProvider` real implementation — content-addressed local cache (`~/.mcpctl/cache/`)
|
|
18. Cache key: `(contentHash, proxyModelName, stageName, configHash, stageFileHash)` → artifact
|
|
19. LRU eviction at configurable size limit
|
|
20. Stage file hash in cache key — automatic invalidation when stage code changes
|
|
21. Cache lookup integration in pipeline executor (before calling stage handler)
|
|
22. Shared cache in mcpd (table + API) — push/pull with RBAC `cache` permission
|
|
23. `mcpctl cache list/push/clear/stats` CLI commands
|
|
|
|
**Milestone: Second `begin_session` on same project is instant — all summaries served from cache.**
|
|
|
|
### Phase 3: CLI & Integration
|
|
|
|
Wire everything into mcpctl properly.
|
|
|
|
24. `mcpctl get proxymodels` + `mcpctl get stages` (merged built-in + local)
|
|
25. `mcpctl describe proxymodel` / `mcpctl describe stage`
|
|
26. `mcpctl create stage <name>` — scaffold boilerplate `.ts` file
|
|
27. `mcpctl create proxymodel <name> --stages ...` — scaffold YAML
|
|
28. `mcpctl proxymodel validate <name>` — check stages resolve, config valid
|
|
29. Project-level `proxyModel` field + `proxyModelOverrides`
|
|
30. Rename `proxyMode: filtered` → `proxyMode: proxy`
|
|
31. `run` RBAC permission on proxymodels resource
|
|
32. Shell completions for all new commands, resources, and flags
|
|
|
|
### Phase 4: Model Studio
|
|
|
|
The live development environment.
|
|
|
|
33. New traffic event types: `content_original`, `content_transformed`, `content_paused`, `content_edited`, `content_released`, `model_switched`, `stage_reloaded`
|
|
34. Emit original + transformed events in pipeline executor
|
|
35. Pause queue in mcplocal: hold outgoing responses when studio pause is active
|
|
36. Edit API: accept modified content from studio, emit correction event, forward to client
|
|
37. `mcpctl console --model-studio` TUI: original vs transformed view, pause/resume, inline edit, model picker
|
|
38. Same keyboard patterns as `--inspect` plus `p` pause, `e` edit, `r` release, `d` drop, `m` model switch, `o` toggle original/transformed/diff
|
|
39. `mcpctl console --model-studio --stdin-mcp` — MCP server for Claude Monitor
|
|
40. Studio MCP tools: `get_content_diff`, `get_corrections`, `switch_model`, `reload_stages`, `get_stage_source`, `get_active_model`
|
|
41. Correction events visible to Claude Monitor so it can learn from user edits
|
|
|
|
**Milestone: User can watch Claude using `subindex` model, pause a response, edit a summary, and have Claude Monitor adjust the stage to produce better summaries.**
|
|
|
|
### Phase 5: Additional Built-in Stages
|
|
|
|
More reference stages, informed by what we learned from `subindex` and Model Studio.
|
|
|
|
42. `enhance` stage — restructure prose for LLM consumption (action items first, bullets)
|
|
43. `compress` stage — strip boilerplate, keep actionable content
|
|
44. `summarize` standalone stage — flat LLM summary without hierarchy (simpler than `summarize-tree`)
|
|
45. Future stages driven by studio observations
|
|
|
|
## First Model: Hierarchical Subindexing (`subindex`)
|
|
|
|
The first real proxymodel beyond `default`. Building it drives the framework — we don't build the framework in isolation, we build it alongside `subindex` so the API is shaped by real usage. Every framework interface (`StageHandler`, `StageContext`, `CacheProvider`) gets validated against this model before it's finalized.
|
|
|
|
Instead of sending Claude a 120K prompt or tool result as a wall of text, `subindex` breaks content into a navigable hierarchy of summaries.
|
|
|
|
### How it works
|
|
|
|
Content is split into sections, each section gets an LLM-generated summary, summaries are grouped and summarized again, creating a tree. Claude only sees the top-level summary with links to drill into specific areas.
|
|
|
|
```
|
|
Original content (120,000 chars)
|
|
└─ split into ~10 sections by headers/structure
|
|
├─ Section 1: "Thermostat Control" (12,000 chars) → summary (200 chars)
|
|
├─ Section 2: "Lighting Automation" (8,000 chars) → summary (150 chars)
|
|
├─ Section 3: "Security Monitoring" (15,000 chars) → summary (250 chars)
|
|
│ └─ Sub-sections split further if section is large
|
|
│ ├─ 3.1 "Camera Config" → sub-summary (100 chars)
|
|
│ ├─ 3.2 "Alert Rules" → sub-summary (100 chars)
|
|
│ └─ 3.3 "Access Control" → sub-summary (120 chars)
|
|
└─ ...
|
|
|
|
What Claude sees first (top-level, ~1,500 chars):
|
|
"10 sections covering home automation flows:
|
|
[1] Thermostat Control — manages temperature schedules and HVAC...
|
|
[2] Lighting Automation — room-based lighting scenes with motion...
|
|
[3] Security Monitoring — camera feeds, alert rules, access control...
|
|
→ 3 sub-sections available
|
|
...
|
|
Use section parameter to read details."
|
|
|
|
Drill-down level 1 — Claude requests section 3:
|
|
"Security Monitoring (3 sub-sections):
|
|
[3.1] Camera Config — IP camera integration with recording schedules...
|
|
[3.2] Alert Rules — motion detection triggers, notification routing...
|
|
[3.3] Access Control — door lock automation, guest codes, audit log...
|
|
Use section parameter to read full content."
|
|
|
|
Drill-down level 2 — Claude requests section 3.2:
|
|
→ Full original text of the "Alert Rules" section (no summary, raw content)
|
|
```
|
|
|
|
### Why this works
|
|
|
|
- Claude burns ~400 tokens reading the top-level summary instead of ~30,000 for the full content
|
|
- If Claude only needs "Alert Rules", it drills down in 2 requests: 400 + 200 + 2,000 tokens = 2,600 instead of 30,000
|
|
- If Claude needs everything, it can still get it — section by section
|
|
- Summaries are cached (content-addressed), so the LLM cost is paid once per unique content
|
|
|
|
### Pipeline
|
|
|
|
```yaml
|
|
# ~/.mcpctl/proxymodels/subindex.yaml (or built-in)
|
|
kind: ProxyModel
|
|
metadata:
|
|
name: subindex
|
|
spec:
|
|
stages:
|
|
- type: section-split # built-in: split on headers/structure
|
|
config:
|
|
minSectionSize: 2000 # don't split tiny sections
|
|
maxSectionSize: 15000 # re-split sections larger than this
|
|
- type: summarize-tree # new stage: recursive summarization
|
|
config:
|
|
maxSummaryTokens: 200 # per-section summary length
|
|
maxGroupSize: 5 # group N sections before summarizing group
|
|
maxDepth: 3 # max nesting levels
|
|
leafIsFullContent: true # leaf drill-down returns raw content, not summary
|
|
appliesTo:
|
|
- prompts
|
|
- toolResults
|
|
cacheable: true
|
|
```
|
|
|
|
### The `summarize-tree` stage
|
|
|
|
This is the core new stage. It does:
|
|
|
|
1. Receive sections from `section-split` (or from raw content if no prior split)
|
|
2. For each section, generate an LLM summary → cache it
|
|
3. If there are many sections, group them and generate group-level summaries
|
|
4. Return the top-level summary as `content`, with the full tree as `sections`
|
|
5. Each section in the tree has its own `sections` (sub-sections) for hierarchical drill-down
|
|
|
|
```typescript
|
|
// Built-in stage: summarize-tree
|
|
import type { StageHandler, Section } from 'mcpctl/proxymodel';
|
|
|
|
const handler: StageHandler = async (content, ctx) => {
|
|
const maxTokens = (ctx.config.maxSummaryTokens as number) ?? 200;
|
|
const maxGroup = (ctx.config.maxGroupSize as number) ?? 5;
|
|
const maxDepth = (ctx.config.maxDepth as number) ?? 3;
|
|
|
|
// Content arrives pre-split into sections from section-split stage
|
|
// (or as a single block if no prior stage split it)
|
|
const sections = parseSections(content);
|
|
|
|
// Recursively build summary tree
|
|
const tree = await buildTree(sections, ctx, { maxTokens, maxGroup, maxDepth, depth: 0 });
|
|
|
|
// Top-level output: summary of summaries with drill-down links
|
|
const toc = tree.map((s, i) =>
|
|
`[${s.id}] ${s.title} — ${s.summary}` +
|
|
(s.subSections?.length ? `\n → ${s.subSections.length} sub-sections available` : '')
|
|
).join('\n');
|
|
|
|
return {
|
|
content: `${tree.length} sections:\n${toc}\n\nUse section parameter to read details.`,
|
|
sections: tree,
|
|
};
|
|
};
|
|
|
|
async function buildTree(sections, ctx, opts) {
|
|
// For each section: summarize (cached), recurse if large
|
|
for (const section of sections) {
|
|
section.summary = await ctx.cache.getOrCompute(
|
|
`summary:${ctx.cache.hash(section.content)}:${opts.maxTokens}`,
|
|
() => ctx.llm.complete(
|
|
`Summarize in ${opts.maxTokens} tokens, preserve MUST/REQUIRED items:\n\n${section.content}`
|
|
)
|
|
);
|
|
|
|
// If section is large and we haven't hit max depth, split and recurse
|
|
if (section.content.length > 5000 && opts.depth < opts.maxDepth) {
|
|
section.subSections = await buildTree(
|
|
splitContent(section.content),
|
|
ctx,
|
|
{ ...opts, depth: opts.depth + 1 }
|
|
);
|
|
}
|
|
}
|
|
|
|
// If too many sections at this level, group and summarize groups
|
|
if (sections.length > opts.maxGroup) {
|
|
return groupAndSummarize(sections, ctx, opts);
|
|
}
|
|
|
|
return sections;
|
|
}
|
|
```
|
|
|
|
### What the cache stores
|
|
|
|
```
|
|
~/.mcpctl/cache/proxymodel/
|
|
├── summary:<hash1>:200 → "Thermostat Control — manages temperature..."
|
|
├── summary:<hash2>:200 → "Lighting Automation — room-based lighting..."
|
|
├── summary:<hash3>:200 → "Security Monitoring — camera feeds, alert..."
|
|
├── summary:<hash4>:200 → "Camera Config — IP camera integration..."
|
|
├── tree:<hash-full>:subindex → serialized section tree (full hierarchy)
|
|
```
|
|
|
|
When any section's source content changes, its hash changes, and only that summary is regenerated. The rest of the tree serves from cache.
|
|
|
|
### Structured Content Detection
|
|
|
|
Not all content is prose. Tool results are often JSON, YAML, XML, or code. The `section-split` stage must detect content type and split structurally — **never rewrite programmatic content** because the LLM may need to use it verbatim in tool calls.
|
|
|
|
| Detected Type | How to split | Summary strategy | Leaf content |
|
|
|---|---|---|---|
|
|
| **Prose/Markdown** | Split on `##` headers | LLM summary | Raw text |
|
|
| **JSON array** | Split on array elements | Structural: key names, counts, sizes | Exact JSON element |
|
|
| **JSON object** | Split on top-level keys | Key name + value type + size | Exact JSON value |
|
|
| **YAML** | Split on top-level keys | Key name + child count | Exact YAML block |
|
|
| **XML** | Split on top-level elements | Tag name + child count + attributes | Exact XML element |
|
|
| **Code** | Split on functions/classes/blocks | Function signature + docstring | Exact code block |
|
|
| **Mixed** | Detect boundaries, split by type | Per-type strategy | Exact original |
|
|
|
|
**Critical rule: leaf drill-down ALWAYS returns exact original content.** Summaries are navigation aids — they help Claude find what it needs. But when Claude drills to the leaf, it gets the untouched original. This is essential for JSON/code because:
|
|
|
|
- Claude may need to pass the exact JSON as a tool argument
|
|
- Modified JSON might have wrong types, missing commas, or altered values
|
|
- Code needs to be syntactically valid
|
|
|
|
**Example: JSON array from `get_flows`**
|
|
|
|
```
|
|
Original: [{"id":"flow1","label":"Thermostat","nodes":[...]}, {"id":"flow2",...}, ...]
|
|
(120,000 chars, 10 flow objects)
|
|
|
|
Top-level summary (structural, no LLM needed):
|
|
"10 flows:
|
|
[flow1] Thermostat (12 nodes, 3 subflows)
|
|
[flow2] Lighting (8 nodes, 1 subflow)
|
|
[flow3] Security (22 nodes, 5 subflows)
|
|
...
|
|
Use _section=flow1 to get the full flow definition."
|
|
|
|
Drill-down _section=flow3:
|
|
→ Exact JSON object for flow3 (if small enough, return as-is)
|
|
→ Or sub-index it further:
|
|
"Security flow (22 nodes):
|
|
[inject-1] Trigger: every 30s
|
|
[mqtt-1] MQTT subscribe: cameras/motion
|
|
[function-1] Process motion event (48 lines)
|
|
...
|
|
Use _section=flow3.function-1 to get the node definition."
|
|
|
|
Drill-down _section=flow3.function-1:
|
|
→ Exact JSON: {"id":"function-1","type":"function","func":"...","wires":[...]}
|
|
```
|
|
|
|
**No LLM was needed for the JSON navigation.** The structure IS the index — key names, array indices, type fields. The `section-split` stage detects JSON and uses structural splitting. LLM summaries are only needed for prose content where headers aren't enough.
|
|
|
|
**Content type detection** (in `section-split` stage):
|
|
|
|
```typescript
|
|
function detectContentType(content: string): 'json' | 'yaml' | 'xml' | 'code' | 'prose' {
|
|
const trimmed = content.trimStart();
|
|
if (trimmed.startsWith('{') || trimmed.startsWith('[')) return 'json';
|
|
if (trimmed.startsWith('<?xml') || trimmed.startsWith('<')) return 'xml';
|
|
if (/^[a-zA-Z_]+:\s/m.test(trimmed)) return 'yaml';
|
|
if (/^(function |class |def |const |import |export |package )/m.test(trimmed)) return 'code';
|
|
return 'prose';
|
|
}
|
|
```
|
|
|
|
For structured content, the splitter uses the content's own structure (JSON keys, YAML blocks, XML elements) instead of looking for markdown headers. The summaries are generated structurally (key names, counts, types) rather than via LLM — which means they're instant, free, and deterministic.
|
|
|
|
### Testing with Model Studio
|
|
|
|
This is where the studio shines:
|
|
|
|
1. Start Claude working on homeautomation with `subindex` model active
|
|
2. In Model Studio, watch the original 120K response and the tree summary Claude receives
|
|
3. If a summary drops important details, PAUSE → EDIT → add the missing info
|
|
4. Tell Claude Monitor about the pattern: "summaries are dropping MUST requirements"
|
|
5. Claude Monitor adjusts the `summarize-tree` stage prompt to preserve requirement markers
|
|
6. Reload stages → next content gets better summaries automatically
|
|
|
|
## Design Principles
|
|
|
|
1. **Stages never import mcpctl internals.** The `StageHandler` contract + `StageContext` services is the entire public API. If a stage needs to reach inside mcpctl, the framework is missing a service — add the service, not the import.
|
|
|
|
2. **The default proxymodel must be zero-cost.** Projects that don't configure a proxymodel get `passthrough` + `paginate` — no LLM calls, no latency, same as today.
|
|
|
|
3. **Cache makes everything practical.** Any proxymodel with LLM stages MUST be cacheable. The first session pays the processing cost; every session after is instant.
|
|
|
|
4. **Source content is never modified.** ProxyModels produce derived artifacts alongside the original. The original is always accessible via `ctx.originalContent`. No semantic drift.
|
|
|
|
5. **The framework is dumb, stages are smart.** The executor doesn't know what summarization or indexing means. It runs handlers in order and manages cache/metrics/context. All intelligence lives in stages — which are replaceable.
|
|
|
|
6. **Local overrides everything.** Users can override any built-in stage or proxymodel by placing a file with the same name in `~/.mcpctl/`. This makes experimentation frictionless and the upgrade path safe.
|
|
|
|
7. **Measurement before optimization.** Phases 1-2 establish the framework and cache. Only then do we build LLM stages, because we need to prove they actually help.
|
|
|
|
8. **`mcpctl create stage` is the starting point for new authors.** It scaffolds a working `.ts` file with the correct imports and a placeholder handler. The barrier to entry is: edit one function, restart mcplocal.
|
|
|
|
## Gated Sessions as a ProxyModel: Framework Requirements
|
|
|
|
The existing **gated session** system (begin_session → ungate → tools/list_changed) is a proxymodel. Don't remove it yet — it works and is well-tested — but the framework must be designed so that gating COULD be reimplemented as a proxymodel. This drives framework requirements that pure content-transformation stages wouldn't surface.
|
|
|
|
### What Gating Does Today (Mapped to ProxyModel Concepts)
|
|
|
|
| Gating Behavior | ProxyModel Equivalent | Current Stage Contract Covers It? |
|
|
|---|---|---|
|
|
| Intercept `tools/list` → return only `begin_session` when gated | **Method hook** — transform a specific JSON-RPC method response | No — stages only see content strings, not method types |
|
|
| Track gated/ungated state per session | **Session state** — mutable state that persists across requests within a session | No — `StageHandler` is stateless (pure function) |
|
|
| Inject `begin_session`, `read_prompts`, `propose_prompt` tools | **Virtual tool registration** — proxymodel adds tools that don't exist in any upstream server | No — stages have no tool registration API |
|
|
| Send `tools/list_changed` notification after ungating | **Client notification dispatch** — proxymodel triggers notifications to the MCP client | No — stages have no notification API |
|
|
| Auto-ungate when Claude calls a real tool while still gated | **Request intercept** — inspect incoming tool calls, change behavior based on call + session state | No — stages only process response content, not incoming requests |
|
|
| Build composite response: matched prompts + index + encouragement + tool inventory | **Response assembly** — construct structured multi-part response from multiple sources | Partially — `StageResult.sections` exists but isn't rich enough |
|
|
| Prompt tag matching with byte-budget scoring | **Content selection** — choose which content to include based on relevance scoring | No — stages process individual content items, not select across a set |
|
|
| Gate instructions in `initialize` response | **Lifecycle hook** — modify the initialize response with session-specific instructions | No — stages don't hook into session lifecycle events |
|
|
|
|
### What the Framework Needs (Beyond StageHandler)
|
|
|
|
The current `StageHandler` is the right contract for content transformation stages (summarize, index, compress). But gating reveals a **second type of stage**: a **session-level controller** that:
|
|
|
|
1. **Hooks into JSON-RPC methods** — not just content, but `tools/list`, `tools/call`, `initialize`
|
|
2. **Maintains session state** — mutable state that survives across multiple requests in the same session
|
|
3. **Registers virtual tools** — adds tools to the tool list that the proxymodel handles itself
|
|
4. **Dispatches notifications** — sends `tools/list_changed` or custom notifications to the client
|
|
5. **Intercepts requests** — examines incoming requests and can short-circuit, modify, or augment responses
|
|
6. **Selects content** — chooses from a set of available content items (prompts) based on relevance
|
|
|
|
This suggests the framework needs two handler types:
|
|
|
|
```typescript
|
|
// Type 1: Content Stage (what we have) — pure content transformation
|
|
export interface StageHandler {
|
|
(content: string, ctx: StageContext): Promise<StageResult>;
|
|
}
|
|
|
|
// Type 2: Session Controller — method-level hooks with session state
|
|
export interface SessionController {
|
|
/** Called once when session starts (initialize) */
|
|
onInitialize?(ctx: SessionContext): Promise<InitializeHook>;
|
|
|
|
/** Called when tools/list is requested — can modify the tool list */
|
|
onToolsList?(tools: ToolDefinition[], ctx: SessionContext): Promise<ToolDefinition[]>;
|
|
|
|
/** Called before a tool call is routed — can intercept */
|
|
onToolCall?(toolName: string, args: unknown, ctx: SessionContext): Promise<InterceptResult | null>;
|
|
|
|
/** Called after a tool call returns — can transform the result */
|
|
onToolResult?(toolName: string, result: unknown, ctx: SessionContext): Promise<unknown>;
|
|
|
|
/** Called when session ends */
|
|
onClose?(ctx: SessionContext): Promise<void>;
|
|
}
|
|
|
|
export interface SessionContext extends StageContext {
|
|
/** Per-session mutable state (persists across requests) */
|
|
state: Map<string, unknown>;
|
|
|
|
/** Register a virtual tool that this controller handles */
|
|
registerTool(tool: ToolDefinition, handler: VirtualToolHandler): void;
|
|
|
|
/** Queue a notification to the MCP client */
|
|
queueNotification(method: string, params?: unknown): void;
|
|
|
|
/** Access the prompt index (for content selection patterns) */
|
|
prompts: PromptIndex;
|
|
}
|
|
|
|
interface InitializeHook {
|
|
/** Additional instructions to append to the initialize response */
|
|
instructions?: string;
|
|
}
|
|
|
|
interface InterceptResult {
|
|
/** If set, this replaces the normal tool call response */
|
|
result: unknown;
|
|
/** If true, also ungate the session (emit tools/list_changed) */
|
|
ungate?: boolean;
|
|
}
|
|
```
|
|
|
|
### How Gating Would Look as a ProxyModel
|
|
|
|
```yaml
|
|
# Built-in: proxymodels/gated.yaml
|
|
kind: ProxyModel
|
|
metadata:
|
|
name: gated
|
|
spec:
|
|
controller: gate-controller # session controller (not a content stage)
|
|
stages: # content stages still apply after ungating
|
|
- type: passthrough
|
|
- type: paginate
|
|
controllerConfig:
|
|
byteBudget: 8192
|
|
promptScoring: keyword # or "llm" if provider configured
|
|
interceptEnabled: true # auto-ungate on real tool call while gated
|
|
```
|
|
|
|
```typescript
|
|
// Built-in controller: gate-controller.ts
|
|
import type { SessionController, SessionContext } from 'mcpctl/proxymodel';
|
|
|
|
const controller: SessionController = {
|
|
async onInitialize(ctx) {
|
|
ctx.state.set('gated', ctx.config.gated !== false);
|
|
if (ctx.state.get('gated')) {
|
|
const instructions = await buildGatedInstructions(ctx);
|
|
return { instructions };
|
|
}
|
|
return {};
|
|
},
|
|
|
|
async onToolsList(tools, ctx) {
|
|
if (ctx.state.get('gated')) {
|
|
return [getBeginSessionTool()]; // hide all tools except begin_session
|
|
}
|
|
// After ungating: include virtual tools alongside real ones
|
|
return [...tools, getReadPromptsTool(), getProposePromptTool()];
|
|
},
|
|
|
|
async onToolCall(toolName, args, ctx) {
|
|
if (toolName === 'begin_session') {
|
|
const matchResult = await matchPrompts(args, ctx);
|
|
ctx.state.set('gated', false);
|
|
ctx.queueNotification('notifications/tools/list_changed');
|
|
return { result: matchResult };
|
|
}
|
|
// Auto-ungate on real tool call while gated
|
|
if (ctx.state.get('gated') && ctx.config.interceptEnabled) {
|
|
const briefing = await buildInterceptBriefing(toolName, args, ctx);
|
|
ctx.state.set('gated', false);
|
|
ctx.queueNotification('notifications/tools/list_changed');
|
|
return null; // let the real tool call proceed, briefing is prepended
|
|
}
|
|
return null; // don't intercept — let normal routing handle it
|
|
},
|
|
};
|
|
|
|
export default controller;
|
|
```
|
|
|
|
### What This Means for Framework Design
|
|
|
|
**Don't build `SessionController` in Phase 1.** The gated system works today. But design the framework's internal architecture so that:
|
|
|
|
1. The **pipeline executor** separates "method routing" from "content processing" cleanly
|
|
2. The points where gating hooks in today (`tools/list` check, `tools/call` intercept, `initialize` instructions) are **identifiable extension points** — not spaghetti woven into the handler
|
|
3. The `StageContext` can be extended to `SessionContext` without breaking existing stages
|
|
4. Virtual tools and notifications are dispatched through interfaces, not hardcoded in the endpoint
|
|
|
|
**Phase 1 builds `StageHandler` for content transformation.** A future phase extracts the gating logic into `SessionController` and makes it a proper proxymodel. The current code stays as-is until then — it's tested, it works, and reimplementing it is not the priority. But the framework should not make reimplementation impossible.
|
|
|
|
### Benefits of Gating-as-ProxyModel (Future)
|
|
|
|
- **Users could write their own session controllers** — custom gate flows, different prompt selection strategies, progressive disclosure patterns
|
|
- **Gate behavior becomes configurable per-project** — not just on/off, but which controller runs
|
|
- **Testing becomes uniform** — same Model Studio, same inspector, same correction workflow for gate behavior as for content transformation
|
|
- **Composability** — a proxymodel could combine a custom session controller with content stages: custom gate → ungate → summarize → serve
|
|
|
|
---
|
|
|
|
## Authoring Guide: How to Build a ProxyModel
|
|
|
|
This section is the complete reference for anyone (human or AI) creating a new proxymodel or stage. Follow it step by step.
|
|
|
|
### Concepts
|
|
|
|
- A **stage** is a single content transformation: text in → text out. It's a TypeScript file exporting a `StageHandler` function.
|
|
- A **proxymodel** is a YAML file listing an ordered pipeline of stages with per-stage configuration.
|
|
- The **framework** loads stages, wires them into a pipeline, and provides services (`ctx.llm`, `ctx.cache`, etc.) so stages don't need to know about mcpctl internals.
|
|
- Content flows through mcplocal's proxy in two places: **prompt content** (delivered via `begin_session` and `read_prompts`) and **tool results** (responses from upstream MCP servers). A proxymodel can process either or both.
|
|
|
|
### File Locations
|
|
|
|
```
|
|
~/.mcpctl/
|
|
├── stages/ # Custom stage implementations
|
|
│ ├── my-summarizer.ts # A stage handler
|
|
│ └── my-filter.ts # Another stage handler
|
|
├── proxymodels/ # Custom proxymodel definitions
|
|
│ ├── my-pipeline.yaml # Pipeline: stages + config
|
|
│ └── smart-summary.yaml # Another pipeline
|
|
└── cache/ # Content cache (managed by framework)
|
|
└── proxymodel/ # Cached stage outputs
|
|
```
|
|
|
|
### Step 1: Write a Stage
|
|
|
|
A stage is a single `.ts` file in `~/.mcpctl/stages/`. It exports a default `StageHandler`:
|
|
|
|
```typescript
|
|
// ~/.mcpctl/stages/bullet-points.ts
|
|
import type { StageHandler } from 'mcpctl/proxymodel';
|
|
|
|
const handler: StageHandler = async (content, ctx) => {
|
|
// ctx.contentType is 'prompt' | 'toolResult' | 'resource'
|
|
// ctx.sourceName is the prompt name, "server/tool", or resource URI
|
|
// ctx.config has settings from the proxymodel YAML
|
|
|
|
const maxBullets = (ctx.config.maxBullets as number) ?? 10;
|
|
|
|
const result = await ctx.llm.complete(
|
|
`Convert the following ${ctx.contentType} into a bullet-point summary ` +
|
|
`with at most ${maxBullets} bullets. Preserve all actionable items.\n\n${content}`
|
|
);
|
|
|
|
return { content: result };
|
|
};
|
|
|
|
export default handler;
|
|
```
|
|
|
|
**Rules for stages:**
|
|
|
|
1. **Import only from `mcpctl/proxymodel`** — never import mcpctl internal modules
|
|
2. **Export default a `StageHandler`** — the framework looks for the default export
|
|
3. **Use `ctx.llm` for any LLM calls** — don't instantiate your own client
|
|
4. **Use `ctx.cache` for expensive sub-computations** — the framework handles top-level caching, but stages can cache their own intermediate results
|
|
5. **Return `{ content }` at minimum** — optionally include `sections` for drill-down or `metadata` for metrics
|
|
6. **Read config from `ctx.config`** — all stage-specific settings come from the proxymodel YAML, not from hardcoded values
|
|
7. **Access original via `ctx.originalContent`** — even if a prior stage modified the content, the original is always available
|
|
8. **Never throw errors for recoverable situations** — return the input content unchanged if processing fails, and log via `ctx.log.warn()`
|
|
|
|
### Step 2: Write a ProxyModel
|
|
|
|
A proxymodel is a YAML file in `~/.mcpctl/proxymodels/`:
|
|
|
|
```yaml
|
|
# ~/.mcpctl/proxymodels/smart-summary.yaml
|
|
kind: ProxyModel
|
|
metadata:
|
|
name: smart-summary
|
|
spec:
|
|
stages:
|
|
- type: bullet-points # resolves to ~/.mcpctl/stages/bullet-points.ts
|
|
config:
|
|
maxBullets: 8
|
|
- type: section-split # built-in stage (no custom file needed)
|
|
config:
|
|
splitOn: headers
|
|
appliesTo:
|
|
- prompts # process prompt content
|
|
- toolResults # process tool response content
|
|
cacheable: true # cache stage results for unchanged content
|
|
```
|
|
|
|
**ProxyModel YAML fields:**
|
|
|
|
| Field | Required | Description |
|
|
|---|---|---|
|
|
| `metadata.name` | Yes | Unique name. This is what projects reference in `proxyModel: smart-summary` |
|
|
| `spec.controller` | No | Session controller name. Default: `gate` (gated sessions). Set `none` for no controller |
|
|
| `spec.controllerConfig` | No | Config passed to the session controller (e.g. `byteBudget`, `promptScoring`) |
|
|
| `spec.stages` | Yes | Ordered list of content stages. Each has `type` (stage name) and optional `config` |
|
|
| `spec.stages[].type` | Yes | Stage name. Resolved: local `~/.mcpctl/stages/` → built-in |
|
|
| `spec.stages[].config` | No | Arbitrary key-value config passed to the stage as `ctx.config` |
|
|
| `spec.appliesTo` | No | Array of `prompts`, `toolResults`, `resource`. Default: all |
|
|
| `spec.cacheable` | No | Whether the framework should cache stage results. Default: `true` |
|
|
|
|
### Step 3: Assign to a Project
|
|
|
|
```bash
|
|
# Via CLI
|
|
mcpctl patch project homeautomation --set proxyModel=smart-summary
|
|
|
|
# Or via YAML
|
|
mcpctl apply -f - <<EOF
|
|
kind: Project
|
|
metadata:
|
|
name: homeautomation
|
|
spec:
|
|
proxyModel: smart-summary
|
|
EOF
|
|
```
|
|
|
|
Restart mcplocal or wait for config refresh. The next session on this project will route content through the `smart-summary` pipeline.
|
|
|
|
### Step 4: Test
|
|
|
|
```bash
|
|
# Validate the proxymodel (checks all stages resolve, config valid)
|
|
mcpctl proxymodel validate smart-summary
|
|
|
|
# Run a scripted test session
|
|
mcpctl console --fake-llm --script test.json --project homeautomation --proxy-model smart-summary
|
|
|
|
# Watch live traffic through the inspector
|
|
mcpctl console --inspect homeautomation
|
|
# (In another terminal, connect Claude or any MCP client to the project)
|
|
|
|
# Compare against other models
|
|
mcpctl proxymodel benchmark --script test.json --project homeautomation \
|
|
--models default,smart-summary,summarize
|
|
```
|
|
|
|
### Producing Sections for Drill-Down
|
|
|
|
If your stage splits content into sections, return them in the `sections` field:
|
|
|
|
```typescript
|
|
const handler: StageHandler = async (content, ctx) => {
|
|
// Split content into logical sections
|
|
const parts = content.split(/^## /m).filter(Boolean);
|
|
|
|
const sections: Section[] = parts.map((part, i) => {
|
|
const firstLine = part.split('\n')[0].trim();
|
|
return {
|
|
id: firstLine.toLowerCase().replace(/\s+/g, '-'),
|
|
title: firstLine,
|
|
content: part,
|
|
};
|
|
});
|
|
|
|
// Return a summary as the main content, with full sections available for drill-down
|
|
const toc = sections.map((s, i) => `[${i + 1}] ${s.title}`).join('\n');
|
|
return {
|
|
content: `${sections.length} sections found:\n${toc}\n\nUse section parameter to read a specific section.`,
|
|
sections,
|
|
};
|
|
};
|
|
```
|
|
|
|
When the framework sees `sections` in the result, it enables drill-down via `read_prompts({ section: "token-handling" })` or `tool_call({ _section: "thermostat" })`.
|
|
|
|
### Using the Cache Manually
|
|
|
|
The framework caches full-stage results automatically (keyed by content hash + stage + config). But stages can also cache their own sub-computations:
|
|
|
|
```typescript
|
|
const handler: StageHandler = async (content, ctx) => {
|
|
// Cache an expensive intermediate result
|
|
const embedding = await ctx.cache.getOrCompute(
|
|
`embedding:${ctx.cache.hash(content)}`,
|
|
async () => {
|
|
return await ctx.llm.complete(`Generate a semantic embedding description for:\n${content}`);
|
|
}
|
|
);
|
|
|
|
// Use the cached embedding for further processing
|
|
const summary = await ctx.llm.complete(
|
|
`Given this semantic description: ${embedding}\nSummarize the original:\n${content}`
|
|
);
|
|
|
|
return { content: summary };
|
|
};
|
|
```
|
|
|
|
### Composing Stages
|
|
|
|
Stages receive the output of the previous stage as their `content` parameter, and can always access `ctx.originalContent` for the raw input. This enables patterns like:
|
|
|
|
```yaml
|
|
# Pipeline: first summarize, then convert to bullet points
|
|
stages:
|
|
- type: summarize # built-in: produces a prose summary
|
|
config:
|
|
maxTokens: 1000
|
|
- type: bullet-points # custom: converts prose to bullets
|
|
config:
|
|
maxBullets: 8
|
|
```
|
|
|
|
The `summarize` stage gets the original content. The `bullet-points` stage gets the summary. Both can read `ctx.originalContent` if they need the raw input.
|
|
|
|
### Error Handling
|
|
|
|
Stages should be resilient:
|
|
|
|
```typescript
|
|
const handler: StageHandler = async (content, ctx) => {
|
|
try {
|
|
const result = await ctx.llm.complete(`Summarize:\n${content}`);
|
|
return { content: result };
|
|
} catch (err) {
|
|
// LLM unavailable — return content unchanged, log the failure
|
|
ctx.log.warn(`summarize stage failed, passing through: ${err}`);
|
|
return { content }; // passthrough on failure
|
|
}
|
|
};
|
|
```
|
|
|
|
The framework also wraps each stage call — if a stage throws, the pipeline continues with the content from the previous stage and logs the error.
|
|
|
|
### Available `ctx.llm` Methods
|
|
|
|
```typescript
|
|
interface LLMProvider {
|
|
/** Simple completion — send a prompt, get text back */
|
|
complete(prompt: string): Promise<string>;
|
|
|
|
/** Completion with system prompt */
|
|
complete(prompt: string, options: { system?: string; maxTokens?: number }): Promise<string>;
|
|
|
|
/** Check if an LLM provider is configured and available */
|
|
available(): boolean;
|
|
}
|
|
```
|
|
|
|
`ctx.llm` uses whatever LLM provider is configured for the project (Gemini, Ollama, Claude, etc.). The stage doesn't choose the provider — the user does via project config.
|
|
|
|
### Available `ctx.cache` Methods
|
|
|
|
```typescript
|
|
interface CacheProvider {
|
|
/** Get a cached value by key, or compute and cache it */
|
|
getOrCompute(key: string, compute: () => Promise<string>): Promise<string>;
|
|
|
|
/** Hash content for use as a cache key component */
|
|
hash(content: string): string;
|
|
|
|
/** Manually read from cache (returns null if miss) */
|
|
get(key: string): Promise<string | null>;
|
|
|
|
/** Manually write to cache */
|
|
set(key: string, value: string): Promise<void>;
|
|
}
|
|
```
|
|
|
|
### Quick Reference: Built-in Stage Types
|
|
|
|
| Name | What it does | Config keys |
|
|
|---|---|---|
|
|
| `passthrough` | Returns content unchanged | none |
|
|
| `paginate` | Splits content into pages by size | `pageSize` (chars, default 8000) |
|
|
| `section-split` | Splits on markdown headers | `splitOn` (`headers` or `blank-lines`) |
|
|
| `summarize` | LLM summary with section refs | `maxTokens`, `includeSectionLinks` |
|
|
| `index` | ToC with section drill-down | `maxDepth`, `sectionAddressing` |
|
|
| `enhance` | Restructure for LLM consumption | `format` (`bullets`, `action-items`) |
|
|
| `compress` | Strip boilerplate | `keepHeaders`, `minLineLength` |
|
|
|
|
### Full Example: Building a "Security Audit" ProxyModel
|
|
|
|
Goal: for security-related prompts, extract action items and add severity ratings.
|
|
|
|
**Stage: `~/.mcpctl/stages/security-audit.ts`**
|
|
|
|
```typescript
|
|
import type { StageHandler } from 'mcpctl/proxymodel';
|
|
|
|
const handler: StageHandler = async (content, ctx) => {
|
|
if (!ctx.llm.available()) {
|
|
ctx.log.warn('No LLM configured, returning content as-is');
|
|
return { content };
|
|
}
|
|
|
|
const result = await ctx.llm.complete(
|
|
`You are a security auditor. Analyze this ${ctx.contentType} and produce:\n` +
|
|
`1. A severity rating (critical/high/medium/low)\n` +
|
|
`2. Action items as a numbered list\n` +
|
|
`3. A one-paragraph executive summary\n\n` +
|
|
`Content:\n${content}`,
|
|
{ maxTokens: (ctx.config.maxTokens as number) ?? 800 }
|
|
);
|
|
|
|
return { content: result };
|
|
};
|
|
|
|
export default handler;
|
|
```
|
|
|
|
**ProxyModel: `~/.mcpctl/proxymodels/security-audit.yaml`**
|
|
|
|
```yaml
|
|
kind: ProxyModel
|
|
metadata:
|
|
name: security-audit
|
|
spec:
|
|
stages:
|
|
- type: security-audit
|
|
config:
|
|
maxTokens: 800
|
|
appliesTo:
|
|
- prompts
|
|
cacheable: true
|
|
```
|
|
|
|
**Assign to project with override:**
|
|
|
|
```yaml
|
|
kind: Project
|
|
metadata:
|
|
name: homeautomation
|
|
spec:
|
|
proxyModel: default # default model for most content
|
|
proxyModelOverrides:
|
|
prompts:
|
|
security-policy: security-audit # this specific prompt gets the audit treatment
|
|
```
|
|
|
|
Now every time Claude triggers `begin_session` and the `security-policy` prompt matches, it gets the audited version instead of the raw prompt text.
|