Files
mcpctl/pnpm-lock.yaml
Michal 3a28128fb4 feat(agent): MCP-correct chat agent shim on top of LiteLLM
New package @mcpctl/agent that replaces LiteLLM's broken MCP
integration (dropped Mcp-Session-Id, ignored tools/list_changed) with
a thin ~200 LOC loop built on @modelcontextprotocol/sdk +
openai SDK. LiteLLM stays in its actual lane — OpenAI-compatible model
routing — and this agent handles MCP correctly.

Core (src/agent.ts):
  - StreamableHTTPClientTransport for MCP (auto-preserves Mcp-Session-Id).
  - Re-fetches tools/list at the top of every loop so list_changed
    notifications surface new tools to the model on the next turn
    (fixes the gated-session case: begin_session reveals the full
    upstream tool set, next round's inference sees all of them).
  - OpenAI-compatible inference via process.env.AGENT_LLM_BASE_URL
    — points at LiteLLM or vLLM directly.
  - Graceful failure: broken tool calls are serialized back into the
    conversation as the tool's response, agent keeps going.
  - maxIterations cap stops runaway loops; hitIterationLimit surfaces
    truncation in the result.
  - Structural `McpLike` / `LlmLike` interfaces keep the loop testable
    without booting real SDKs.

CLI (src/cli.ts):
  mcpctl-agent run "<prompt>" \
    --model qwen3-thinking --project sre \
    [--system "..."] [--max-iterations N] [-o text|json] [--verbose]
  Env fallbacks: AGENT_MCP_URL, AGENT_MCP_TOKEN,
                 AGENT_LLM_BASE_URL, AGENT_LLM_API_KEY, AGENT_MODEL

Tests (7 cases):
  - direct answer (no tool call) → ok
  - single-round tool call + synthesis → message history correct
  - list_changed refresh: tools/list called at startup + after each
    round → next inference sees newly-exposed tools
  - maxIterations cap → hitIterationLimit flag set
  - failing tool → error serialized into conversation, agent recovers
  - systemPrompt prepended
  - mcp.close() runs even when loop throws (finally-block guarantee)

End-to-end verified against live cluster:
  Round 1: sees 1 tool (begin_session) → calls it
  Round 2: sees 115 tools (gate opened) → calls aws-docs/search_documentation
  Final: model synthesizes answer
  — LiteLLM's chat UI cannot do this today; this loop does.

Still to do (follow-up PRs):
  - Wire into mcpctl binary as `mcpctl agent run ...`
  - Docker image + Pulumi deploy for a long-running HTTP service mode
  - Minimal chat UI (HTMX or plain fetch)
  - Streaming responses

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:24:29 +01:00

167 KiB