docs: ProxyModel authoring guide in README, mark cache tasks done

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 23:37:07 +00:00
parent d773419ccd
commit 9fc31e5945
2 changed files with 294 additions and 62 deletions
--- a/README.md
+++ b/README.md
@@ -256,6 +256,215 @@ mcpctl describe proxymodel default  # Pipeline details (stages, controller)
 mcpctl describe proxymodel gate     # Plugin details (hooks, extends)
 ```

+### Custom Stages
+
+Drop `.js` or `.mjs` files in `~/.mcpctl/stages/` to add custom transformation stages. Each file must `export default` an async function matching the `StageHandler` contract:
+
+```javascript
+// ~/.mcpctl/stages/redact-keys.js
+export default async function(content, ctx) {
+  // ctx provides: contentType, sourceName, projectName, sessionId,
+  //               originalContent, llm, cache, log, config
+  const redacted = content.replace(/([A-Z_]+_KEY)=\S+/g, '$1=***');
+  ctx.log.info(`Redacted ${content.length - redacted.length} chars of secrets`);
+  return { content: redacted };
+}
+```
+
+Stages loaded from disk appear as `local` source. Use them in a custom ProxyModel YAML:
+
+```yaml
+kind: ProxyModel
+metadata:
+  name: secure-pipeline
+spec:
+  stages:
+    - type: redact-keys        # matches filename without extension
+    - type: section-split
+    - type: summarize-tree
+```
+
+**Stage contract reference:**
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `content` | `string` | Input content (from previous stage or raw upstream) |
+| `ctx.contentType` | `'toolResult' \| 'prompt' \| 'resource'` | What kind of content is being processed |
+| `ctx.sourceName` | `string` | Tool name, prompt name, or resource URI |
+| `ctx.originalContent` | `string` | The unmodified content before any stage ran |
+| `ctx.llm` | `LLMProvider` | Call `ctx.llm.complete(prompt)` for LLM summarization |
+| `ctx.cache` | `CacheProvider` | Call `ctx.cache.getOrCompute(key, fn)` to cache expensive results |
+| `ctx.log` | `StageLogger` | `debug()`, `info()`, `warn()`, `error()` |
+| `ctx.config` | `Record<string, unknown>` | Config values from the ProxyModel YAML |
+
+**Return value:**
+
+```typescript
+{ content: string; sections?: Section[]; metadata?: Record<string, unknown> }
+```
+
+If `sections` is returned, the framework stores them and presents a table of contents to the client. The client can drill into individual sections via `_resultId` + `_section` parameters on subsequent tool or prompt calls.
+
+### Section Drill-Down
+
+When a stage (like `section-split`) produces sections, the pipeline automatically:
+
+1. Replaces the full content with a compact table of contents
+2. Appends a `_resultId` for subsequent drill-down
+3. Stores the full sections in memory (5-minute TTL)
+
+Claude then calls the same tool (or `prompts/get`) again with `_resultId` and `_section` parameters to retrieve a specific section. This works for both tool results and prompt responses.
+
+```
+# What Claude sees (tool result):
+3 sections (json):
+[users] Users (4K chars)
+[config] Config (1K chars)
+[logs] Logs (8K chars)
+
+_resultId: pm-abc123 — use _resultId and _section parameters to drill into a section.
+
+# Claude drills down:
+→ tools/call: grafana/query  { _resultId: "pm-abc123", _section: "logs" }
+← [full 8K content of the logs section]
+```
+
+### Hot-Reload
+
+Stages and ProxyModels reload automatically when files change — no restart needed.
+
+- **Stages** (`~/.mcpctl/stages/*.js`): File watcher with 300ms debounce. Add, edit, or remove stage files and they take effect on the next tool call.
+- **ProxyModels** (`~/.mcpctl/proxymodels/*.yaml`): Re-read from disk on every request, so changes are always picked up.
+
+Force a manual reload via the HTTP API:
+
+```bash
+curl -X POST http://localhost:3200/proxymodels/reload
+# {"loaded": 3}
+
+curl http://localhost:3200/proxymodels/stages
+# [{"name":"passthrough","source":"built-in"},{"name":"redact-keys","source":"local"},...]
+```
+
+### Built-in Stages Reference
+
+| Stage | Description | Key Config |
+|-------|------------|------------|
+| `passthrough` | Returns content unchanged | — |
+| `paginate` | Splits large content into numbered pages | `pageSize` (default: 8000 chars) |
+| `section-split` | Splits content into named sections by structure (headers, JSON keys, code boundaries) | `minSectionSize` (500), `maxSectionSize` (15000) |
+| `summarize-tree` | Generates LLM summaries for each section | `maxTokens` (200), `maxDepth` (2) |
+
+`section-split` detects content type automatically:
+
+| Content Type | Split Strategy |
+|-------------|---------------|
+| JSON array | One section per array element, using `name`/`id`/`label` as section ID |
+| JSON object | One section per top-level key |
+| YAML | One section per top-level key |
+| Markdown | One section per `##` header |
+| Code | One section per function/class boundary |
+| XML | One section per top-level element |
+
+### Pause Queue (Model Studio)
+
+The pause queue lets you intercept pipeline results in real-time — inspect what the pipeline produced, edit it, or drop it before Claude receives the response.
+
+```bash
+# Enable pause mode
+curl -X PUT http://localhost:3200/pause -d '{"paused":true}'
+
+# View queued items (blocked tool calls waiting for your decision)
+curl http://localhost:3200/pause/queue
+
+# Release an item (send transformed content to Claude)
+curl -X POST http://localhost:3200/pause/queue/<id>/release
+
+# Edit and release (send your modified content instead)
+curl -X POST http://localhost:3200/pause/queue/<id>/edit -d '{"content":"modified content"}'
+
+# Drop an item (send empty response)
+curl -X POST http://localhost:3200/pause/queue/<id>/drop
+
+# Release all queued items at once
+curl -X POST http://localhost:3200/pause/release-all
+
+# Disable pause mode
+curl -X PUT http://localhost:3200/pause -d '{"paused":false}'
+```
+
+The pause queue is also available as MCP tools via `mcpctl console --stdin-mcp`, which gives Claude direct access to `pause`, `get_pause_queue`, and `release_paused` tools for self-monitoring.
+
+## LLM Providers
+
+ProxyModel stages that need LLM capabilities (like `summarize-tree`) use configurable providers. Configure in `~/.mcpctl/config.yaml`:
+
+```yaml
+llm:
+  - name: vllm-local
+    type: openai-compatible
+    baseUrl: http://localhost:8000/v1
+    model: Qwen/Qwen3-32B
+  - name: anthropic
+    type: anthropic
+    model: claude-sonnet-4-20250514
+    # API key from: mcpctl create secret llm-keys --data ANTHROPIC_API_KEY=sk-...
+```
+
+Providers support **tiered routing** (`fast` for quick summaries, `heavy` for complex analysis) and **automatic failover** — if one provider is down, the next is tried.
+
+```bash
+# Check active providers
+mcpctl status   # Shows LLM provider status
+
+# View provider details
+curl http://localhost:3200/llm/providers
+```
+
+## Pipeline Cache
+
+ProxyModel pipelines cache LLM-generated results (summaries, section indexes) to avoid redundant API calls. The cache is persistent across mcplocal restarts.
+
+### Namespace Isolation
+
+Each combination of **LLM provider + model + ProxyModel** gets its own cache namespace:
+
+```
+~/.mcpctl/cache/openai--gpt-4o--content-pipeline/
+~/.mcpctl/cache/anthropic--claude-sonnet-4-20250514--content-pipeline/
+~/.mcpctl/cache/vllm--qwen-72b--subindex/
+```
+
+Switching LLM providers or models automatically uses a fresh cache — no stale results from a different model.
+
+### CLI Management
+
+```bash
+# View cache statistics (per-namespace breakdown)
+mcpctl cache stats
+
+# Clear all cache entries
+mcpctl cache clear
+
+# Clear a specific namespace
+mcpctl cache clear openai--gpt-4o--content-pipeline
+
+# Clear entries older than 7 days
+mcpctl cache clear --older-than 7
+```
+
+### Size Limits
+
+The cache enforces a configurable maximum size (default: 256MB). When exceeded, the oldest entries are evicted (LRU). Entries older than 30 days are automatically expired.
+
+Size can be specified as bytes, human-readable units, or a percentage of the filesystem:
+
+```typescript
+new FileCache('ns', { maxSize: '512MB' })   // fixed size
+new FileCache('ns', { maxSize: '1.5GB' })   // fractional units
+new FileCache('ns', { maxSize: '10%' })     // 10% of partition
+```
+
 ## Resources

 | Resource | What it is | Example |
@@ -301,6 +510,8 @@ mcpctl delete server grafana
 mcpctl logs grafana                 # Container logs
 mcpctl console monitoring           # Interactive MCP console
 mcpctl console --inspect            # Traffic inspector
+mcpctl console --audit              # Audit event timeline
+mcpctl console --stdin-mcp          # Claude monitor (MCP tools for Claude)

 # Backup and restore
 mcpctl backup -o backup.json
@@ -387,6 +598,27 @@ The traffic inspector watches MCP traffic from other clients in real-time:
 mcpctl console --inspect
 ```

+### Claude Monitor (stdin-mcp)
+
+Connect Claude itself as a monitor via the inspect MCP server:
+
+```bash
+mcpctl console --stdin-mcp
+```
+
+This exposes MCP tools that let Claude observe and control traffic:
+
+| Tool | Description |
+|------|------------|
+| `list_models` | List configured LLM providers and their status |
+| `list_stages` | List all available pipeline stages (built-in + custom) |
+| `switch_model` | Change the active LLM provider for pipeline stages |
+| `get_model_info` | Get details about a specific LLM provider |
+| `reload_stages` | Force reload custom stages from disk |
+| `pause` | Toggle pause mode (intercept pipeline results) |
+| `get_pause_queue` | List items held in the pause queue |
+| `release_paused` | Release, edit, or drop a paused item |
+
 ## Architecture

 ```