docs: ProxyModel authoring guide in README, mark cache tasks done
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
232
README.md
232
README.md
@@ -256,6 +256,215 @@ mcpctl describe proxymodel default # Pipeline details (stages, controller)
|
||||
mcpctl describe proxymodel gate # Plugin details (hooks, extends)
|
||||
```
|
||||
|
||||
### Custom Stages
|
||||
|
||||
Drop `.js` or `.mjs` files in `~/.mcpctl/stages/` to add custom transformation stages. Each file must `export default` an async function matching the `StageHandler` contract:
|
||||
|
||||
```javascript
|
||||
// ~/.mcpctl/stages/redact-keys.js
|
||||
export default async function(content, ctx) {
|
||||
// ctx provides: contentType, sourceName, projectName, sessionId,
|
||||
// originalContent, llm, cache, log, config
|
||||
const redacted = content.replace(/([A-Z_]+_KEY)=\S+/g, '$1=***');
|
||||
ctx.log.info(`Redacted ${content.length - redacted.length} chars of secrets`);
|
||||
return { content: redacted };
|
||||
}
|
||||
```
|
||||
|
||||
Stages loaded from disk appear as `local` source. Use them in a custom ProxyModel YAML:
|
||||
|
||||
```yaml
|
||||
kind: ProxyModel
|
||||
metadata:
|
||||
name: secure-pipeline
|
||||
spec:
|
||||
stages:
|
||||
- type: redact-keys # matches filename without extension
|
||||
- type: section-split
|
||||
- type: summarize-tree
|
||||
```
|
||||
|
||||
**Stage contract reference:**
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| `content` | `string` | Input content (from previous stage or raw upstream) |
|
||||
| `ctx.contentType` | `'toolResult' \| 'prompt' \| 'resource'` | What kind of content is being processed |
|
||||
| `ctx.sourceName` | `string` | Tool name, prompt name, or resource URI |
|
||||
| `ctx.originalContent` | `string` | The unmodified content before any stage ran |
|
||||
| `ctx.llm` | `LLMProvider` | Call `ctx.llm.complete(prompt)` for LLM summarization |
|
||||
| `ctx.cache` | `CacheProvider` | Call `ctx.cache.getOrCompute(key, fn)` to cache expensive results |
|
||||
| `ctx.log` | `StageLogger` | `debug()`, `info()`, `warn()`, `error()` |
|
||||
| `ctx.config` | `Record<string, unknown>` | Config values from the ProxyModel YAML |
|
||||
|
||||
**Return value:**
|
||||
|
||||
```typescript
|
||||
{ content: string; sections?: Section[]; metadata?: Record<string, unknown> }
|
||||
```
|
||||
|
||||
If `sections` is returned, the framework stores them and presents a table of contents to the client. The client can drill into individual sections via `_resultId` + `_section` parameters on subsequent tool or prompt calls.
|
||||
|
||||
### Section Drill-Down
|
||||
|
||||
When a stage (like `section-split`) produces sections, the pipeline automatically:
|
||||
|
||||
1. Replaces the full content with a compact table of contents
|
||||
2. Appends a `_resultId` for subsequent drill-down
|
||||
3. Stores the full sections in memory (5-minute TTL)
|
||||
|
||||
Claude then calls the same tool (or `prompts/get`) again with `_resultId` and `_section` parameters to retrieve a specific section. This works for both tool results and prompt responses.
|
||||
|
||||
```
|
||||
# What Claude sees (tool result):
|
||||
3 sections (json):
|
||||
[users] Users (4K chars)
|
||||
[config] Config (1K chars)
|
||||
[logs] Logs (8K chars)
|
||||
|
||||
_resultId: pm-abc123 — use _resultId and _section parameters to drill into a section.
|
||||
|
||||
# Claude drills down:
|
||||
→ tools/call: grafana/query { _resultId: "pm-abc123", _section: "logs" }
|
||||
← [full 8K content of the logs section]
|
||||
```
|
||||
|
||||
### Hot-Reload
|
||||
|
||||
Stages and ProxyModels reload automatically when files change — no restart needed.
|
||||
|
||||
- **Stages** (`~/.mcpctl/stages/*.js`): File watcher with 300ms debounce. Add, edit, or remove stage files and they take effect on the next tool call.
|
||||
- **ProxyModels** (`~/.mcpctl/proxymodels/*.yaml`): Re-read from disk on every request, so changes are always picked up.
|
||||
|
||||
Force a manual reload via the HTTP API:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3200/proxymodels/reload
|
||||
# {"loaded": 3}
|
||||
|
||||
curl http://localhost:3200/proxymodels/stages
|
||||
# [{"name":"passthrough","source":"built-in"},{"name":"redact-keys","source":"local"},...]
|
||||
```
|
||||
|
||||
### Built-in Stages Reference
|
||||
|
||||
| Stage | Description | Key Config |
|
||||
|-------|------------|------------|
|
||||
| `passthrough` | Returns content unchanged | — |
|
||||
| `paginate` | Splits large content into numbered pages | `pageSize` (default: 8000 chars) |
|
||||
| `section-split` | Splits content into named sections by structure (headers, JSON keys, code boundaries) | `minSectionSize` (500), `maxSectionSize` (15000) |
|
||||
| `summarize-tree` | Generates LLM summaries for each section | `maxTokens` (200), `maxDepth` (2) |
|
||||
|
||||
`section-split` detects content type automatically:
|
||||
|
||||
| Content Type | Split Strategy |
|
||||
|-------------|---------------|
|
||||
| JSON array | One section per array element, using `name`/`id`/`label` as section ID |
|
||||
| JSON object | One section per top-level key |
|
||||
| YAML | One section per top-level key |
|
||||
| Markdown | One section per `##` header |
|
||||
| Code | One section per function/class boundary |
|
||||
| XML | One section per top-level element |
|
||||
|
||||
### Pause Queue (Model Studio)
|
||||
|
||||
The pause queue lets you intercept pipeline results in real-time — inspect what the pipeline produced, edit it, or drop it before Claude receives the response.
|
||||
|
||||
```bash
|
||||
# Enable pause mode
|
||||
curl -X PUT http://localhost:3200/pause -d '{"paused":true}'
|
||||
|
||||
# View queued items (blocked tool calls waiting for your decision)
|
||||
curl http://localhost:3200/pause/queue
|
||||
|
||||
# Release an item (send transformed content to Claude)
|
||||
curl -X POST http://localhost:3200/pause/queue/<id>/release
|
||||
|
||||
# Edit and release (send your modified content instead)
|
||||
curl -X POST http://localhost:3200/pause/queue/<id>/edit -d '{"content":"modified content"}'
|
||||
|
||||
# Drop an item (send empty response)
|
||||
curl -X POST http://localhost:3200/pause/queue/<id>/drop
|
||||
|
||||
# Release all queued items at once
|
||||
curl -X POST http://localhost:3200/pause/release-all
|
||||
|
||||
# Disable pause mode
|
||||
curl -X PUT http://localhost:3200/pause -d '{"paused":false}'
|
||||
```
|
||||
|
||||
The pause queue is also available as MCP tools via `mcpctl console --stdin-mcp`, which gives Claude direct access to `pause`, `get_pause_queue`, and `release_paused` tools for self-monitoring.
|
||||
|
||||
## LLM Providers
|
||||
|
||||
ProxyModel stages that need LLM capabilities (like `summarize-tree`) use configurable providers. Configure in `~/.mcpctl/config.yaml`:
|
||||
|
||||
```yaml
|
||||
llm:
|
||||
- name: vllm-local
|
||||
type: openai-compatible
|
||||
baseUrl: http://localhost:8000/v1
|
||||
model: Qwen/Qwen3-32B
|
||||
- name: anthropic
|
||||
type: anthropic
|
||||
model: claude-sonnet-4-20250514
|
||||
# API key from: mcpctl create secret llm-keys --data ANTHROPIC_API_KEY=sk-...
|
||||
```
|
||||
|
||||
Providers support **tiered routing** (`fast` for quick summaries, `heavy` for complex analysis) and **automatic failover** — if one provider is down, the next is tried.
|
||||
|
||||
```bash
|
||||
# Check active providers
|
||||
mcpctl status # Shows LLM provider status
|
||||
|
||||
# View provider details
|
||||
curl http://localhost:3200/llm/providers
|
||||
```
|
||||
|
||||
## Pipeline Cache
|
||||
|
||||
ProxyModel pipelines cache LLM-generated results (summaries, section indexes) to avoid redundant API calls. The cache is persistent across mcplocal restarts.
|
||||
|
||||
### Namespace Isolation
|
||||
|
||||
Each combination of **LLM provider + model + ProxyModel** gets its own cache namespace:
|
||||
|
||||
```
|
||||
~/.mcpctl/cache/openai--gpt-4o--content-pipeline/
|
||||
~/.mcpctl/cache/anthropic--claude-sonnet-4-20250514--content-pipeline/
|
||||
~/.mcpctl/cache/vllm--qwen-72b--subindex/
|
||||
```
|
||||
|
||||
Switching LLM providers or models automatically uses a fresh cache — no stale results from a different model.
|
||||
|
||||
### CLI Management
|
||||
|
||||
```bash
|
||||
# View cache statistics (per-namespace breakdown)
|
||||
mcpctl cache stats
|
||||
|
||||
# Clear all cache entries
|
||||
mcpctl cache clear
|
||||
|
||||
# Clear a specific namespace
|
||||
mcpctl cache clear openai--gpt-4o--content-pipeline
|
||||
|
||||
# Clear entries older than 7 days
|
||||
mcpctl cache clear --older-than 7
|
||||
```
|
||||
|
||||
### Size Limits
|
||||
|
||||
The cache enforces a configurable maximum size (default: 256MB). When exceeded, the oldest entries are evicted (LRU). Entries older than 30 days are automatically expired.
|
||||
|
||||
Size can be specified as bytes, human-readable units, or a percentage of the filesystem:
|
||||
|
||||
```typescript
|
||||
new FileCache('ns', { maxSize: '512MB' }) // fixed size
|
||||
new FileCache('ns', { maxSize: '1.5GB' }) // fractional units
|
||||
new FileCache('ns', { maxSize: '10%' }) // 10% of partition
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
| Resource | What it is | Example |
|
||||
@@ -301,6 +510,8 @@ mcpctl delete server grafana
|
||||
mcpctl logs grafana # Container logs
|
||||
mcpctl console monitoring # Interactive MCP console
|
||||
mcpctl console --inspect # Traffic inspector
|
||||
mcpctl console --audit # Audit event timeline
|
||||
mcpctl console --stdin-mcp # Claude monitor (MCP tools for Claude)
|
||||
|
||||
# Backup and restore
|
||||
mcpctl backup -o backup.json
|
||||
@@ -387,6 +598,27 @@ The traffic inspector watches MCP traffic from other clients in real-time:
|
||||
mcpctl console --inspect
|
||||
```
|
||||
|
||||
### Claude Monitor (stdin-mcp)
|
||||
|
||||
Connect Claude itself as a monitor via the inspect MCP server:
|
||||
|
||||
```bash
|
||||
mcpctl console --stdin-mcp
|
||||
```
|
||||
|
||||
This exposes MCP tools that let Claude observe and control traffic:
|
||||
|
||||
| Tool | Description |
|
||||
|------|------------|
|
||||
| `list_models` | List configured LLM providers and their status |
|
||||
| `list_stages` | List all available pipeline stages (built-in + custom) |
|
||||
| `switch_model` | Change the active LLM provider for pipeline stages |
|
||||
| `get_model_info` | Get details about a specific LLM provider |
|
||||
| `reload_stages` | Force reload custom stages from disk |
|
||||
| `pause` | Toggle pause mode (intercept pipeline results) |
|
||||
| `get_pause_queue` | List items held in the pause queue |
|
||||
| `release_paused` | Release, edit, or drop a paused item |
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user