New package @mcpctl/agent that replaces LiteLLM's broken MCP
integration (dropped Mcp-Session-Id, ignored tools/list_changed) with
a thin ~200 LOC loop built on @modelcontextprotocol/sdk +
openai SDK. LiteLLM stays in its actual lane — OpenAI-compatible model
routing — and this agent handles MCP correctly.
Core (src/agent.ts):
- StreamableHTTPClientTransport for MCP (auto-preserves Mcp-Session-Id).
- Re-fetches tools/list at the top of every loop so list_changed
notifications surface new tools to the model on the next turn
(fixes the gated-session case: begin_session reveals the full
upstream tool set, next round's inference sees all of them).
- OpenAI-compatible inference via process.env.AGENT_LLM_BASE_URL
— points at LiteLLM or vLLM directly.
- Graceful failure: broken tool calls are serialized back into the
conversation as the tool's response, agent keeps going.
- maxIterations cap stops runaway loops; hitIterationLimit surfaces
truncation in the result.
- Structural `McpLike` / `LlmLike` interfaces keep the loop testable
without booting real SDKs.
CLI (src/cli.ts):
mcpctl-agent run "<prompt>" \
--model qwen3-thinking --project sre \
[--system "..."] [--max-iterations N] [-o text|json] [--verbose]
Env fallbacks: AGENT_MCP_URL, AGENT_MCP_TOKEN,
AGENT_LLM_BASE_URL, AGENT_LLM_API_KEY, AGENT_MODEL
Tests (7 cases):
- direct answer (no tool call) → ok
- single-round tool call + synthesis → message history correct
- list_changed refresh: tools/list called at startup + after each
round → next inference sees newly-exposed tools
- maxIterations cap → hitIterationLimit flag set
- failing tool → error serialized into conversation, agent recovers
- systemPrompt prepended
- mcp.close() runs even when loop throws (finally-block guarantee)
End-to-end verified against live cluster:
Round 1: sees 1 tool (begin_session) → calls it
Round 2: sees 115 tools (gate opened) → calls aws-docs/search_documentation
Final: model synthesizes answer
— LiteLLM's chat UI cannot do this today; this loop does.
Still to do (follow-up PRs):
- Wire into mcpctl binary as `mcpctl agent run ...`
- Docker image + Pulumi deploy for a long-running HTTP service mode
- Minimal chat UI (HTMX or plain fetch)
- Streaming responses
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
167 KiB
167 KiB