**kubectl for MCP servers.** A management system for [Model Context Protocol](https://modelcontextprotocol.io) servers — define, deploy, and connect MCP servers to Claude using familiar kubectl-style commands.
```
mcpctl get servers
NAME TRANSPORT REPLICAS DOCKER IMAGE DESCRIPTION
grafana STDIO 1 grafana/mcp-grafana:latest Grafana MCP server
home-assistant SSE 1 ghcr.io/homeassistant-ai/ha-mcp:latest Home Assistant MCP
docmost SSE 1 10.0.0.194:3012/michal/docmost-mcp:latest Docmost wiki MCP
```
## What is this?
mcpctl manages MCP servers the same way kubectl manages Kubernetes pods. You define servers declaratively in YAML, group them into projects, and connect them to Claude Code or any MCP client through a local proxy.
- **mcplocal** — local proxy. Runs on your machine, presents a single MCP endpoint to Claude that merges tools from all your servers. Handles namespacing (`grafana/search_dashboards`), plugin execution (gating, content pipelines), and prompt delivery.
The `default` plugin doesn't reimplement anything — it inherits the gating hooks from `gate` and the content hooks from `content-pipeline`. Custom plugins can extend built-in ones the same way.
**Gating** means Claude initially sees only a `begin_session` tool. After calling it with a task description, relevant prompts are delivered and the full tool list is revealed. This keeps Claude's context focused.
Plugins intercept MCP requests/responses at specific lifecycle points. When a plugin extends another, it inherits all the parent's hooks. If both parent and child define the same hook, the child's version wins.
When multiple parents define the same hook, lifecycle hooks (`onSessionCreate`, `onSessionDestroy`) chain sequentially. All other hooks require the child to override — otherwise it's a conflict error.
Drop `.js` or `.mjs` files in `~/.mcpctl/stages/` to add custom transformation stages. Each file must `export default` an async function matching the `StageHandler` contract:
If `sections` is returned, the framework stores them and presents a table of contents to the client. The client can drill into individual sections via `_resultId` + `_section` parameters on subsequent tool or prompt calls.
### Section Drill-Down
When a stage (like `section-split`) produces sections, the pipeline automatically:
1. Replaces the full content with a compact table of contents
2. Appends a `_resultId` for subsequent drill-down
3. Stores the full sections in memory (5-minute TTL)
Claude then calls the same tool (or `prompts/get`) again with `_resultId` and `_section` parameters to retrieve a specific section. This works for both tool results and prompt responses.
```
# What Claude sees (tool result):
3 sections (json):
[users] Users (4K chars)
[config] Config (1K chars)
[logs] Logs (8K chars)
_resultId: pm-abc123 — use _resultId and _section parameters to drill into a section.
| `paginate` | Splits large content into numbered pages | `pageSize` (default: 8000 chars) |
| `section-split` | Splits content into named sections by structure (headers, JSON keys, code boundaries) | `minSectionSize` (500), `maxSectionSize` (15000) |
| `summarize-tree` | Generates LLM summaries for each section | `maxTokens` (200), `maxDepth` (2) |
`section-split` detects content type automatically:
| Content Type | Split Strategy |
|-------------|---------------|
| JSON array | One section per array element, using `name`/`id`/`label` as section ID |
| JSON object | One section per top-level key |
| YAML | One section per top-level key |
| Markdown | One section per `##` header |
| Code | One section per function/class boundary |
| XML | One section per top-level element |
### Pause Queue (Model Studio)
The pause queue lets you intercept pipeline results in real-time — inspect what the pipeline produced, edit it, or drop it before Claude receives the response.
```bash
# Enable pause mode
curl -X PUT http://localhost:3200/pause -d '{"paused":true}'
# View queued items (blocked tool calls waiting for your decision)
curl http://localhost:3200/pause/queue
# Release an item (send transformed content to Claude)
curl -X POST http://localhost:3200/pause/queue/<id>/release
# Edit and release (send your modified content instead)
curl -X POST http://localhost:3200/pause/queue/<id>/edit -d '{"content":"modified content"}'
# Drop an item (send empty response)
curl -X POST http://localhost:3200/pause/queue/<id>/drop
# Release all queued items at once
curl -X POST http://localhost:3200/pause/release-all
# Disable pause mode
curl -X PUT http://localhost:3200/pause -d '{"paused":false}'
```
The pause queue is also available as MCP tools via `mcpctl console --stdin-mcp`, which gives Claude direct access to `pause`, `get_pause_queue`, and `release_paused` tools for self-monitoring.
## LLM Providers
ProxyModel stages that need LLM capabilities (like `summarize-tree`) use configurable providers. Configure in `~/.mcpctl/config.yaml`:
```yaml
llm:
- name: vllm-local
type: openai-compatible
baseUrl: http://localhost:8000/v1
model: Qwen/Qwen3-32B
- name: anthropic
type: anthropic
model: claude-sonnet-4-20250514
# API key from: mcpctl create secret llm-keys --data ANTHROPIC_API_KEY=sk-...
```
Providers support **tiered routing** (`fast` for quick summaries, `heavy` for complex analysis) and **automatic failover** — if one provider is down, the next is tried.
```bash
# Check active providers
mcpctl status # Shows LLM provider status
# View provider details
curl http://localhost:3200/llm/providers
```
## Pipeline Cache
ProxyModel pipelines cache LLM-generated results (summaries, section indexes) to avoid redundant API calls. The cache is persistent across mcplocal restarts.
### Namespace Isolation
Each combination of **LLM provider + model + ProxyModel** gets its own cache namespace:
The cache enforces a configurable maximum size (default: 256MB). When exceeded, the oldest entries are evicted (LRU). Entries older than 30 days are automatically expired.
Size can be specified as bytes, human-readable units, or a percentage of the filesystem:
```typescript
new FileCache('ns', { maxSize: '512MB' }) // fixed size
new FileCache('ns', { maxSize: '1.5GB' }) // fractional units
new FileCache('ns', { maxSize: '10%' }) // 10% of partition
1. Claude sees only a `begin_session` tool initially
2. Claude calls `begin_session` with a description of its task
3. mcplocal matches relevant prompts and delivers them
4. The full tool list is revealed
This keeps Claude's context focused — instead of dumping 100+ tools and pages of docs upfront, only the relevant ones are delivered based on the task at hand.
Prompts are curated content delivered to Claude through the MCP protocol. They can be plain text or linked to external MCP resources (like wiki pages).
```bash
# Create a text prompt
mcpctl create prompt deployment-guide \
--project monitoring \
--content-file docs/deployment.md \
--priority 7
# Create a linked prompt (content fetched live from an MCP resource)
Clients never connect to MCP server containers directly — all tool calls go through mcplocal → mcpd, which proxies them to the right container via STDIO/SSE/HTTP. This keeps containers unexposed and lets mcpd enforce RBAC and health checks.
**Tool namespacing**: When Claude connects to a project with servers `grafana` and `slack`, it sees tools like `grafana/search_dashboards` and `slack/send_message`. mcplocal routes each call through mcpd to the correct upstream server.