233 lines
12 KiB
Markdown
233 lines
12 KiB
Markdown
|
|
# Gated MCP Sessions: What Claude Recognizes (and What It Doesn't)
|
||
|
|
|
||
|
|
Lessons learned from building and testing mcpctl's gated session system with Claude Code (Opus 4.6, v2.1.59). These patterns apply to any MCP proxy that needs to control tool access through a gate step.
|
||
|
|
|
||
|
|
## The Problem
|
||
|
|
|
||
|
|
When Claude connects to an MCP server, it receives an `initialize` response with `instructions`, then calls `tools/list` to see available tools. In a gated session, we want Claude to call `begin_session` before accessing real tools. This is surprisingly hard to get right because Claude has strong default behaviors that fight against the gate pattern.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What Works
|
||
|
|
|
||
|
|
### 1. One gate tool, zero ambiguity
|
||
|
|
|
||
|
|
When `tools/list` returns exactly ONE tool (`begin_session`), Claude recognizes it must call that tool first. Having multiple tools available in the gated state confuses Claude — it may try to call a "real" tool and skip the gate entirely.
|
||
|
|
|
||
|
|
**Working pattern:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"tools": [{
|
||
|
|
"name": "begin_session",
|
||
|
|
"description": "Start your session by providing keywords...",
|
||
|
|
"inputSchema": { ... }
|
||
|
|
}]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. "Check its input schema" instead of naming parameters
|
||
|
|
|
||
|
|
Claude reads the tool's `inputSchema` to understand what arguments are needed. When the instructions **name a specific parameter** that doesn't exist in the schema, Claude gets confused and may not call the tool at all.
|
||
|
|
|
||
|
|
**FAILED — named wrong parameter:**
|
||
|
|
> "Call begin_session with a description of the user's task"
|
||
|
|
|
||
|
|
This failed because the noLLM mode tool has `tags`, not `description`. Claude saw the mismatch between instructions and schema, got confused, and went exploring the filesystem instead.
|
||
|
|
|
||
|
|
**WORKS — schema-agnostic:**
|
||
|
|
> "Call begin_session immediately using the arguments it requires (check its input schema). If it accepts a description, briefly describe the user's task. If it accepts tags, provide 3-7 keywords relevant to the user's request."
|
||
|
|
|
||
|
|
This works for both LLM mode (`description` param) and noLLM mode (`tags` param) because Claude reads the actual schema.
|
||
|
|
|
||
|
|
### 3. Instructions must say "immediately" and "required"
|
||
|
|
|
||
|
|
Without urgency words, Claude may acknowledge the gate exists but decide to "explore first" before calling it. Two critical phrases:
|
||
|
|
|
||
|
|
- **"immediately"** — prevents Claude from doing reconnaissance first
|
||
|
|
- **"required before using other tools"** — makes it clear this isn't optional
|
||
|
|
|
||
|
|
**Working instruction block:**
|
||
|
|
```
|
||
|
|
This project uses a gated session. Before you can access tools, you must start a session by calling begin_session.
|
||
|
|
|
||
|
|
Call begin_session immediately using the arguments it requires (check its input schema).
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Show available tools as a preview (names only)
|
||
|
|
|
||
|
|
Listing tool names in the initialize instructions (without making them callable) helps Claude understand what's available and craft better `begin_session` keywords. Claude uses this list to generate relevant tags.
|
||
|
|
|
||
|
|
**Working pattern:**
|
||
|
|
```
|
||
|
|
Available MCP server tools (accessible after begin_session):
|
||
|
|
my-node-red/get_flows
|
||
|
|
my-node-red/create_flow
|
||
|
|
my-home-assistant/ha_get_entity
|
||
|
|
...
|
||
|
|
```
|
||
|
|
|
||
|
|
Claude then produces tags like `["node-red", "flows", "automation"]` — directly informed by the tool names it saw.
|
||
|
|
|
||
|
|
### 5. Show prompt index with priorities
|
||
|
|
|
||
|
|
When the instructions list available prompts with priorities, Claude uses them to choose relevant `begin_session` keywords:
|
||
|
|
|
||
|
|
```
|
||
|
|
Available project prompts:
|
||
|
|
- pnpm (priority 5)
|
||
|
|
- stack (priority 5)
|
||
|
|
|
||
|
|
Choose your begin_session keywords based on which of these prompts seem relevant to your task.
|
||
|
|
```
|
||
|
|
|
||
|
|
### 6. `tools/list_changed` notification after ungating
|
||
|
|
|
||
|
|
After `begin_session` succeeds, the server must send a `notifications/tools/list_changed` notification. Claude then re-fetches `tools/list` and sees all 108+ tools. Without this notification, Claude continues thinking only `begin_session` is available.
|
||
|
|
|
||
|
|
### 7. The intercept fallback (auto-ungate on real tool call)
|
||
|
|
|
||
|
|
If Claude somehow bypasses the gate and calls a real tool directly, the server auto-ungates the session, extracts keywords from the tool call, matches relevant prompts, and prepends the context as a preamble to the tool result. This is a safety net, not the primary path.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What Fails
|
||
|
|
|
||
|
|
### 1. Referencing parameters that don't exist in the schema
|
||
|
|
|
||
|
|
If instructions say "call begin_session with a description" but the schema only has `tags`, Claude recognizes the inconsistency and may refuse to call the tool entirely. It falls back to filesystem exploration or asks the user for help.
|
||
|
|
|
||
|
|
**Root cause:** Claude cross-references instruction text against tool schemas. Mismatches create distrust.
|
||
|
|
|
||
|
|
### 2. Complex conditional instructions
|
||
|
|
|
||
|
|
Don't write instructions like:
|
||
|
|
> "If the project is gated, check for begin_session. If begin_session accepts tags, provide tags. Otherwise if it accepts description, provide a description. But first check if..."
|
||
|
|
|
||
|
|
Claude handles simple, direct instructions better than decision trees. One clear path: "Call begin_session immediately, check its input schema for what arguments it needs."
|
||
|
|
|
||
|
|
### 3. Having read_prompts available in gated state
|
||
|
|
|
||
|
|
In early iterations, both `begin_session` and `read_prompts` were available in the gated state. Claude sometimes called `read_prompts` instead of `begin_session`, or tried to use `read_prompts` to understand the environment before beginning the session. This delayed or skipped the gate.
|
||
|
|
|
||
|
|
**Fix:** Only `begin_session` is available when gated. `read_prompts` appears after ungating.
|
||
|
|
|
||
|
|
### 4. Putting gate instructions only in the tool description
|
||
|
|
|
||
|
|
The tool description alone is not enough. Claude reads `instructions` from the initialize response first and forms its plan there. If the initialize instructions don't mention the gate, Claude may ignore the tool description and try to find other ways to accomplish the task.
|
||
|
|
|
||
|
|
**Both are needed:**
|
||
|
|
- Initialize `instructions` field: explains the gate and what to do
|
||
|
|
- Tool `description` field: reinforces the purpose of begin_session
|
||
|
|
|
||
|
|
### 5. Long instructions that bury the call-to-action
|
||
|
|
|
||
|
|
If the initialize instructions contain 200 lines of context before mentioning "call begin_session", Claude may not reach that instruction. The gate call-to-action must be in the **first few lines** of the instructions.
|
||
|
|
|
||
|
|
### 6. Expecting Claude to remember instructions across reconnects
|
||
|
|
|
||
|
|
Each new session starts fresh. Claude doesn't carry over knowledge from previous sessions. The gate instructions must be self-contained in every initialize response.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Prompt Scoring: Ensuring Prompts Reach Claude
|
||
|
|
|
||
|
|
### The byte budget problem
|
||
|
|
|
||
|
|
When `begin_session` returns matched prompts, there's a byte budget (default 8KB) to prevent token overflow. Prompts are included in score order until the budget is full. Prompts that don't fit get listed as index-only (name + summary).
|
||
|
|
|
||
|
|
### Scoring formula: `priority + (matchCount * priority)`
|
||
|
|
|
||
|
|
- **Priority alone is the baseline** — every prompt gets at least its priority score
|
||
|
|
- **Tag matches multiply the priority** — relevant prompts score much higher
|
||
|
|
- **Priority 10 = Infinity** — system prompts always included regardless of budget
|
||
|
|
|
||
|
|
**Failed formula:** `matchCount * priority`
|
||
|
|
This meant prompts with zero tag matches scored 0 and were never included, even if they were high-priority global prompts (like "stack" with priority 5). A priority-5 prompt with no tag matches should still compete for inclusion.
|
||
|
|
|
||
|
|
**Working formula:** `priority + (matchCount * priority)`
|
||
|
|
A priority-5 prompt with 0 matches scores 5 (baseline). With 2 matches it scores 15. This ensures global prompts are included when budget allows.
|
||
|
|
|
||
|
|
### Response truncation safety cap
|
||
|
|
|
||
|
|
All responses are capped at 24,000 characters. Larger responses get truncated with a message to use `read_prompts` for the full content. This prevents a single massive prompt from consuming Claude's entire context window.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## The Complete Flow (What Actually Happens)
|
||
|
|
|
||
|
|
```
|
||
|
|
Client mcplocal upstream servers
|
||
|
|
│ │ │
|
||
|
|
│── initialize ───────────>│ │
|
||
|
|
│<── instructions + caps ──│ (instructions contain │
|
||
|
|
│ │ gate-instructions, │
|
||
|
|
│ │ tool list preview, │
|
||
|
|
│ │ prompt index) │
|
||
|
|
│── tools/list ──────────>│ │
|
||
|
|
│<── [begin_session] ─────│ (ONLY begin_session) │
|
||
|
|
│ │ │
|
||
|
|
│── prompts/list ────────>│ │
|
||
|
|
│<── [] ──────────────────│ (empty - gated) │
|
||
|
|
│ │ │
|
||
|
|
│── resources/list ──────>│ │
|
||
|
|
│<── [prompt resources] ──│ (prompts visible as │
|
||
|
|
│ │ resources always) │
|
||
|
|
│ │ │
|
||
|
|
│ Claude reads instructions, sees begin_session is the │
|
||
|
|
│ only tool, calls it with relevant tags/description │
|
||
|
|
│ │ │
|
||
|
|
│── tools/call ──────────>│ │
|
||
|
|
│ begin_session │── match prompts ────────────>│
|
||
|
|
│ {tags:[...]} │<── prompt content ──────────│
|
||
|
|
│ │ │
|
||
|
|
│<── matched prompts ─────│ (full content of matched │
|
||
|
|
│ + tool list │ prompts, tool names, │
|
||
|
|
│ + encouragement │ encouragement to use │
|
||
|
|
│ │ read_prompts later) │
|
||
|
|
│ │ │
|
||
|
|
│<── notification ────────│ tools/list_changed │
|
||
|
|
│ │ │
|
||
|
|
│── tools/list ──────────>│ │
|
||
|
|
│<── [108 tools] ─────────│ (ALL tools now visible) │
|
||
|
|
│ │ │
|
||
|
|
│ Claude proceeds with the user's original request │
|
||
|
|
│ using the full tool set │
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Testing Gate Behavior
|
||
|
|
|
||
|
|
The MCP Inspector (`mcpctl console --inspect`) is essential for debugging gate issues. It shows the exact sequence of requests/responses between Claude and mcplocal, including:
|
||
|
|
|
||
|
|
- What Claude sees in the initialize response
|
||
|
|
- Whether Claude calls `begin_session` or tries to bypass it
|
||
|
|
- What tags/description Claude provides
|
||
|
|
- What prompts are matched and returned
|
||
|
|
- Whether `tools/list_changed` notification fires
|
||
|
|
- The full tool list after ungating
|
||
|
|
|
||
|
|
Run it alongside Claude Code to see exactly what happens:
|
||
|
|
```bash
|
||
|
|
# Terminal 1: Inspector
|
||
|
|
mcpctl console --inspect
|
||
|
|
|
||
|
|
# Terminal 2: Claude Code connected to the project
|
||
|
|
claude
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Checklist for New Gate Configurations
|
||
|
|
|
||
|
|
- [ ] Initialize instructions mention gate in first 3 lines
|
||
|
|
- [ ] Instructions say "immediately" and "required"
|
||
|
|
- [ ] Instructions say "check its input schema" (not "pass description/tags")
|
||
|
|
- [ ] Only `begin_session` in tools/list when gated
|
||
|
|
- [ ] Tool names listed in instructions as preview
|
||
|
|
- [ ] Prompt index shown with priorities
|
||
|
|
- [ ] `tools/list_changed` notification sent after ungate
|
||
|
|
- [ ] Response size under 24K characters
|
||
|
|
- [ ] Prompt scoring uses baseline priority (not just match count)
|
||
|
|
- [ ] Test with Inspector to verify the full flow
|