docs: ProxyModel authoring guide in README, mark cache tasks done

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Michal
2026-03-07 23:37:07 +00:00
parent d773419ccd
commit 9fc31e5945
2 changed files with 294 additions and 62 deletions

232
README.md
View File

@@ -256,6 +256,215 @@ mcpctl describe proxymodel default # Pipeline details (stages, controller)
mcpctl describe proxymodel gate # Plugin details (hooks, extends)
```
### Custom Stages
Drop `.js` or `.mjs` files in `~/.mcpctl/stages/` to add custom transformation stages. Each file must `export default` an async function matching the `StageHandler` contract:
```javascript
// ~/.mcpctl/stages/redact-keys.js
export default async function(content, ctx) {
// ctx provides: contentType, sourceName, projectName, sessionId,
// originalContent, llm, cache, log, config
const redacted = content.replace(/([A-Z_]+_KEY)=\S+/g, '$1=***');
ctx.log.info(`Redacted ${content.length - redacted.length} chars of secrets`);
return { content: redacted };
}
```
Stages loaded from disk appear as `local` source. Use them in a custom ProxyModel YAML:
```yaml
kind: ProxyModel
metadata:
name: secure-pipeline
spec:
stages:
- type: redact-keys # matches filename without extension
- type: section-split
- type: summarize-tree
```
**Stage contract reference:**
| Field | Type | Description |
|-------|------|-------------|
| `content` | `string` | Input content (from previous stage or raw upstream) |
| `ctx.contentType` | `'toolResult' \| 'prompt' \| 'resource'` | What kind of content is being processed |
| `ctx.sourceName` | `string` | Tool name, prompt name, or resource URI |
| `ctx.originalContent` | `string` | The unmodified content before any stage ran |
| `ctx.llm` | `LLMProvider` | Call `ctx.llm.complete(prompt)` for LLM summarization |
| `ctx.cache` | `CacheProvider` | Call `ctx.cache.getOrCompute(key, fn)` to cache expensive results |
| `ctx.log` | `StageLogger` | `debug()`, `info()`, `warn()`, `error()` |
| `ctx.config` | `Record<string, unknown>` | Config values from the ProxyModel YAML |
**Return value:**
```typescript
{ content: string; sections?: Section[]; metadata?: Record<string, unknown> }
```
If `sections` is returned, the framework stores them and presents a table of contents to the client. The client can drill into individual sections via `_resultId` + `_section` parameters on subsequent tool or prompt calls.
### Section Drill-Down
When a stage (like `section-split`) produces sections, the pipeline automatically:
1. Replaces the full content with a compact table of contents
2. Appends a `_resultId` for subsequent drill-down
3. Stores the full sections in memory (5-minute TTL)
Claude then calls the same tool (or `prompts/get`) again with `_resultId` and `_section` parameters to retrieve a specific section. This works for both tool results and prompt responses.
```
# What Claude sees (tool result):
3 sections (json):
[users] Users (4K chars)
[config] Config (1K chars)
[logs] Logs (8K chars)
_resultId: pm-abc123 — use _resultId and _section parameters to drill into a section.
# Claude drills down:
→ tools/call: grafana/query { _resultId: "pm-abc123", _section: "logs" }
← [full 8K content of the logs section]
```
### Hot-Reload
Stages and ProxyModels reload automatically when files change — no restart needed.
- **Stages** (`~/.mcpctl/stages/*.js`): File watcher with 300ms debounce. Add, edit, or remove stage files and they take effect on the next tool call.
- **ProxyModels** (`~/.mcpctl/proxymodels/*.yaml`): Re-read from disk on every request, so changes are always picked up.
Force a manual reload via the HTTP API:
```bash
curl -X POST http://localhost:3200/proxymodels/reload
# {"loaded": 3}
curl http://localhost:3200/proxymodels/stages
# [{"name":"passthrough","source":"built-in"},{"name":"redact-keys","source":"local"},...]
```
### Built-in Stages Reference
| Stage | Description | Key Config |
|-------|------------|------------|
| `passthrough` | Returns content unchanged | — |
| `paginate` | Splits large content into numbered pages | `pageSize` (default: 8000 chars) |
| `section-split` | Splits content into named sections by structure (headers, JSON keys, code boundaries) | `minSectionSize` (500), `maxSectionSize` (15000) |
| `summarize-tree` | Generates LLM summaries for each section | `maxTokens` (200), `maxDepth` (2) |
`section-split` detects content type automatically:
| Content Type | Split Strategy |
|-------------|---------------|
| JSON array | One section per array element, using `name`/`id`/`label` as section ID |
| JSON object | One section per top-level key |
| YAML | One section per top-level key |
| Markdown | One section per `##` header |
| Code | One section per function/class boundary |
| XML | One section per top-level element |
### Pause Queue (Model Studio)
The pause queue lets you intercept pipeline results in real-time — inspect what the pipeline produced, edit it, or drop it before Claude receives the response.
```bash
# Enable pause mode
curl -X PUT http://localhost:3200/pause -d '{"paused":true}'
# View queued items (blocked tool calls waiting for your decision)
curl http://localhost:3200/pause/queue
# Release an item (send transformed content to Claude)
curl -X POST http://localhost:3200/pause/queue/<id>/release
# Edit and release (send your modified content instead)
curl -X POST http://localhost:3200/pause/queue/<id>/edit -d '{"content":"modified content"}'
# Drop an item (send empty response)
curl -X POST http://localhost:3200/pause/queue/<id>/drop
# Release all queued items at once
curl -X POST http://localhost:3200/pause/release-all
# Disable pause mode
curl -X PUT http://localhost:3200/pause -d '{"paused":false}'
```
The pause queue is also available as MCP tools via `mcpctl console --stdin-mcp`, which gives Claude direct access to `pause`, `get_pause_queue`, and `release_paused` tools for self-monitoring.
## LLM Providers
ProxyModel stages that need LLM capabilities (like `summarize-tree`) use configurable providers. Configure in `~/.mcpctl/config.yaml`:
```yaml
llm:
- name: vllm-local
type: openai-compatible
baseUrl: http://localhost:8000/v1
model: Qwen/Qwen3-32B
- name: anthropic
type: anthropic
model: claude-sonnet-4-20250514
# API key from: mcpctl create secret llm-keys --data ANTHROPIC_API_KEY=sk-...
```
Providers support **tiered routing** (`fast` for quick summaries, `heavy` for complex analysis) and **automatic failover** — if one provider is down, the next is tried.
```bash
# Check active providers
mcpctl status # Shows LLM provider status
# View provider details
curl http://localhost:3200/llm/providers
```
## Pipeline Cache
ProxyModel pipelines cache LLM-generated results (summaries, section indexes) to avoid redundant API calls. The cache is persistent across mcplocal restarts.
### Namespace Isolation
Each combination of **LLM provider + model + ProxyModel** gets its own cache namespace:
```
~/.mcpctl/cache/openai--gpt-4o--content-pipeline/
~/.mcpctl/cache/anthropic--claude-sonnet-4-20250514--content-pipeline/
~/.mcpctl/cache/vllm--qwen-72b--subindex/
```
Switching LLM providers or models automatically uses a fresh cache — no stale results from a different model.
### CLI Management
```bash
# View cache statistics (per-namespace breakdown)
mcpctl cache stats
# Clear all cache entries
mcpctl cache clear
# Clear a specific namespace
mcpctl cache clear openai--gpt-4o--content-pipeline
# Clear entries older than 7 days
mcpctl cache clear --older-than 7
```
### Size Limits
The cache enforces a configurable maximum size (default: 256MB). When exceeded, the oldest entries are evicted (LRU). Entries older than 30 days are automatically expired.
Size can be specified as bytes, human-readable units, or a percentage of the filesystem:
```typescript
new FileCache('ns', { maxSize: '512MB' }) // fixed size
new FileCache('ns', { maxSize: '1.5GB' }) // fractional units
new FileCache('ns', { maxSize: '10%' }) // 10% of partition
```
## Resources
| Resource | What it is | Example |
@@ -301,6 +510,8 @@ mcpctl delete server grafana
mcpctl logs grafana # Container logs
mcpctl console monitoring # Interactive MCP console
mcpctl console --inspect # Traffic inspector
mcpctl console --audit # Audit event timeline
mcpctl console --stdin-mcp # Claude monitor (MCP tools for Claude)
# Backup and restore
mcpctl backup -o backup.json
@@ -387,6 +598,27 @@ The traffic inspector watches MCP traffic from other clients in real-time:
mcpctl console --inspect
```
### Claude Monitor (stdin-mcp)
Connect Claude itself as a monitor via the inspect MCP server:
```bash
mcpctl console --stdin-mcp
```
This exposes MCP tools that let Claude observe and control traffic:
| Tool | Description |
|------|------------|
| `list_models` | List configured LLM providers and their status |
| `list_stages` | List all available pipeline stages (built-in + custom) |
| `switch_model` | Change the active LLM provider for pipeline stages |
| `get_model_info` | Get details about a specific LLM provider |
| `reload_stages` | Force reload custom stages from disk |
| `pause` | Toggle pause mode (intercept pipeline results) |
| `get_pause_queue` | List items held in the pause queue |
| `release_paused` | Release, edit, or drop a paused item |
## Architecture
```