feat(agents): smoke tests + README + docs (Stage 6, final)

Closes the agents feature.

Smoke tests (run via `pnpm test:smoke` against a live mcpd at
$MCPD_URL, default https://mcpctl.ad.itaz.eu):

* tests/smoke/agent.smoke.test.ts — full CRUD round-trip:
  create secret + Llm + agent with sampling defaults; `get agents`
  surfaces it; `get agent foo -o yaml | apply -f` round-trips
  identically; create + list a thread via the HTTP API; agent delete
  leaves Llm + secret intact (Restrict + SetNull as designed). Self-
  skips with a warning when /healthz is unreachable.

* tests/smoke/agent-chat.smoke.test.ts — gated on
  MCPCTL_SMOKE_LLM_URL + MCPCTL_SMOKE_LLM_KEY. Provisions secret +
  Llm + agent against a real upstream, runs `mcpctl chat -m … --no-
  stream` (asserts a reply lands), then runs the streaming default
  (asserts text on stdout + `(thread: …)` on stderr). The fast path
  for verifying the in-cluster qwen3-thinking deployment:

      MCPCTL_SMOKE_LLM_URL=http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
      MCPCTL_SMOKE_LLM_MODEL=qwen3-thinking \
      MCPCTL_SMOKE_LLM_KEY=$(pulumi config get --stack homelab \
        secrets:litellmMcpctlGatewayToken) \
        pnpm test:smoke

Docs:

* README.md — new "Agents" section under Resources with the
  qwen3-thinking quickstart and links to docs/agents.md and
  docs/chat.md. Adds llm + agent rows to the resources table.

* docs/agents.md (new) — full reference: data model, chat-parameter
  table, HTTP API, RBAC mapping, tool-use loop semantics, yaml
  round-trip shorthand, the kubernetes-deployment wiring recipe,
  and a troubleshooting section (namespace collision, llm-in-use,
  pending-row recovery, Anthropic-tool limitation).

* docs/chat.md (new) — user-facing `mcpctl chat` walkthrough:
  modes, per-call flags, slash-commands, threads, and a
  troubleshooting section.

* CLAUDE.md — adds a "Resource types" cheatsheet with one-line
  pointers to each, including the new `agent` row that links to
  the docs.

All suites still green: mcpd 759/759, mcplocal 715/715, cli 430/430.
Smoke tests typecheck and self-skip when no live mcpd is reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Michal
2026-04-25 17:08:37 +01:00
parent 727e7d628c
commit 8b56f09f25
6 changed files with 767 additions and 0 deletions

197
docs/agents.md Normal file
View File

@@ -0,0 +1,197 @@
# Agents
An `Agent` is an LLM persona pinned to a specific `Llm`, with a system prompt,
a description that surfaces in MCP `tools/list`, optional attachment to a
`Project`, and LiteLLM-style sampling defaults. Conversations are persisted
as `ChatThread` + `ChatMessage` rows so REPL sessions resume across runs.
Two surfaces use an agent:
1. **Direct chat** via `mcpctl chat <name>` (interactive REPL or one-shot
`-m "msg"`). Streams over SSE; tool calls and tool results print to
stderr in dim brackets. Slash-commands `/set`, `/system`, `/tools`,
`/clear`, `/save`, `/quit` adjust runtime behavior.
2. **Virtual MCP server** registered into every project session by
mcplocal's agents plugin. The agent shows up as `agent-<name>` with
one tool `chat`, whose description is the agent's own description.
Other Claude sessions / MCP clients see the agent as just another
tool in `tools/list` and can consult it.
## Data model
Three Prisma models added to `src/db/prisma/schema.prisma`:
- **`Agent`** — `name` (unique), `description`, `systemPrompt`, `llmId`
(FK Restrict — an Llm in active use cannot be deleted), `projectId`
(FK SetNull — agents survive project deletion), `proxyModelName`
(optional informational override), `defaultParams` (Json,
LiteLLM-style), `extras` (Json, reserved for future LoRA / tool
allowlists), `ownerId`, version, timestamps.
- **`ChatThread`** — `agentId`, `ownerId`, `title`, `lastTurnAt`,
timestamps. Cascade delete on agent.
- **`ChatMessage`** — `threadId`, `turnIndex` (monotonic per thread,
enforced by `@@unique([threadId, turnIndex])`), `role`
(`'system' | 'user' | 'assistant' | 'tool'`), `content`, `toolCalls`
(Json — assistant turn's `[{id,name,arguments}]`), `toolCallId`
(which call a tool turn answers), `status`
(`'pending' | 'complete' | 'error'`), `createdAt`. Cascade delete
on thread.
`status` stays `pending` while the orchestrator runs an in-flight assistant
or tool turn, then flips to `complete` once the round settles. On any
exception in the chat loop, every `pending` row in the thread is flipped to
`error` so the trail stays auditable.
## Chat parameters (LiteLLM-style passthrough)
Per-call resolution: request body → `agent.defaultParams` → adapter default.
Setting a key to `null` in the request explicitly clears a default.
| Key | Type | Notes |
|---|---|---|
| `temperature` | number | 0..2 |
| `top_p` | number | 0..1 |
| `top_k` | integer | Anthropic-only; OpenAI ignores |
| `max_tokens` | integer | adapter clamps to provider max |
| `stop` | string \| string[] | up to 4 sequences |
| `presence_penalty` | number | OpenAI |
| `frequency_penalty` | number | OpenAI |
| `seed` | integer | reproducibility (provider-dependent) |
| `response_format` | object | `text` \| `json_object` \| `json_schema` |
| `tool_choice` | enum/object | `auto`\|`none`\|`required`\|`{type:'function',function:{name}}` |
| `tools_allowlist` | string[] | restricts which project MCP tools the agent can call this turn |
| `systemOverride` | string | replaces `agent.systemPrompt` for this call |
| `systemAppend` | string | concatenated to system block (after project Prompts) |
| `messages` | array | full message history override; if set, `message`/threadId history is ignored |
| `extra` | object | provider-specific knobs (Anthropic `metadata.user_id`, vLLM `repetition_penalty`) — adapters cherry-pick |
## HTTP API (mcpd)
```
GET /api/v1/agents list (RBAC: view:agents)
GET /api/v1/agents/:idOrName describe (view:agents)
POST /api/v1/agents create (create:agents)
PUT /api/v1/agents/:idOrName update (edit:agents)
DELETE /api/v1/agents/:idOrName delete (delete:agents)
POST /api/v1/agents/:name/chat chat — non-streaming or SSE (run:agents:<name>)
POST /api/v1/agents/:name/threads create thread (run:agents:<name>)
GET /api/v1/agents/:name/threads list threads (run:agents:<name>)
GET /api/v1/threads/:id/messages replay history (view:agents)
GET /api/v1/projects/:p/agents project-scoped list (view:projects:<p>)
```
The chat endpoint reuses the SSE pattern from `llm-infer.ts` exactly: same
headers (`text/event-stream`, `X-Accel-Buffering: no`), same `data: …\n\n`
framing, same `[DONE]` terminator. SSE chunk types:
- `{type:'text', delta}` — assistant text increments
- `{type:'tool_call', toolName, args}` — model decided to call a tool
- `{type:'tool_result', toolName, ok}` — tool dispatch outcome
- `{type:'final', threadId, turnIndex}` — terminal turn
- `{type:'error', message}` — fatal error in the loop
## Tool-use loop
When the agent's project has MCP servers attached, mcpd's `ChatService` lists
each server's tools (via `mcp-proxy.service.ts` — same path real MCP traffic
uses) and presents them to the model namespaced as `<server>__<tool>`. On a
`tool_calls` response the loop dispatches each call back through the same
proxy, persists the assistant + tool turns linked by `toolCallId`, and loops
(cap = 12 iterations) until the model returns terminal text.
Persistence is **non-transactional across the loop** because tool calls can
take minutes; long-held DB transactions would starve other writers.
## RBAC
Agents are their own resource (`agents`), independent of project bindings.
Recommended:
- `view:agents` — list / describe
- `create:agents` / `edit:agents` / `delete:agents` — CRUD
- `run:agents:<name>` — drive a chat turn or manage its threads
Project-attached agents do not implicitly inherit project RBAC. If a project
member should be able to chat with the project's agents, grant them
`run:agents:<each-name>` (or wildcard `run:agents`) explicitly.
## YAML round-trip
`get agent foo -o yaml | mcpctl apply -f -` is a no-op. The `apply` schema
also accepts shorthand:
```yaml
apiVersion: mcpctl.io/v1
kind: agent
metadata: { name: deployer }
spec:
description: "I help you deploy code"
llm: qwen3-thinking # shorthand for `{ name: qwen3-thinking }`
project: mcpctl-dev # shorthand for `{ name: mcpctl-dev }`
systemPrompt: |
You are a deployment assistant for mcpctl. Always check fulldeploy.sh
and the k8s context before suggesting actions.
defaultParams:
temperature: 0.2
max_tokens: 4096
top_p: 0.9
stop: ["</deploy>"]
```
## Wiring against your in-cluster qwen3-thinking
The `kubernetes-deployment` repo provisions LiteLLM in the `nvidia-nim`
namespace (`http://litellm.nvidia-nim.svc.cluster.local:4000/v1` in-cluster,
`https://llm.ad.itaz.eu/v1` external) and a virtual key reserved for mcpctl
in the Pulumi secret `secrets:litellmMcpctlGatewayToken`. Pulling it once:
```bash
cd /path/to/kubernetes-deployment
LITELLM_TOKEN=$(pulumi config get --stack homelab secrets:litellmMcpctlGatewayToken)
# fallback if Pulumi isn't authed locally:
# LITELLM_TOKEN=$(kubectl --context worker0-k8s0 -n nvidia-nim get secret litellm-secrets \
# -o jsonpath='{.data.LITELLM_MCPCTL_GATEWAY_TOKEN}' | base64 -d)
cd /path/to/mcpctl
mcpctl create secret litellm-key --data "API_KEY=${LITELLM_TOKEN}"
mcpctl create llm qwen3-thinking \
--type openai --model qwen3-thinking \
--url http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
--api-key-ref litellm-key/API_KEY \
--description "Qwen3-30B-A3B-Thinking-FP8 via in-cluster vLLM behind LiteLLM"
mcpctl create agent reviewer \
--llm qwen3-thinking \
--description "I review what you're shipping; ask after each major change." \
--default-temperature 0.2 --default-max-tokens 4096
mcpctl chat reviewer
```
## Troubleshooting
- **Namespace collision** in mcplocal: if a project has an upstream MCP
server literally named `agent-<x>`, the agents plugin detects the
collision in `onSessionCreate`, skips that agent's registration, and
emits a `ctx.log.warn` line. Document the `agent-` prefix as reserved
on real server names.
- **Llm-in-use blocks delete**: `Agent.llm` is `onDelete: Restrict`. Detach
every agent (or delete them) before deleting the underlying Llm.
- **Stale `pending` rows**: a crash mid-loop leaves `pending` ChatMessage
rows. The next request recovers — `markPendingAsError` flips them on the
next failure path, and `loadHistory` filters out `error` rows when
rebuilding context for the next turn.
- **`proxyModelName` is informational only** for agents. The agent's own
internal tool loop runs server-side in mcpd and bypasses mcplocal's
proxymodel pipeline entirely. Don't try to plumb it.
- **Anthropic + tools**: the Anthropic adapter currently drops `tool` role
messages and doesn't translate OpenAI `tool_calls` to Anthropic
`tool_use` / `tool_result` blocks. Use an OpenAI-compatible provider
(LiteLLM, vLLM, OpenAI) for agents that need tool calling until that
translation lands.

124
docs/chat.md Normal file
View File

@@ -0,0 +1,124 @@
# `mcpctl chat`
Open an interactive chat session with an `Agent`, or send a single message
in one shot. See [agents.md](agents.md) for what an Agent is and how to
create one.
## Modes
```bash
mcpctl chat <agent> # interactive REPL, new thread
mcpctl chat <agent> --thread <id> # interactive REPL, resume thread
mcpctl chat <agent> -m "hi" # one-shot, prints reply, no REPL
mcpctl chat <agent> -m "hi" --no-stream # one-shot, single JSON response (no SSE)
```
Streaming is on by default. Text deltas land on stdout as they arrive; tool
calls and tool results print to stderr in dim brackets so the chat output
stays clean.
## Per-call flags
All optional. They override the agent's `defaultParams` for this session
only — use the in-REPL `/save` slash-command to persist the current set
back to the agent.
```bash
--system <text> # replace agent.systemPrompt for this session
--system-file <path> # read --system text from a file
--system-append <text> # append to the agent system block (after project Prompts)
--temperature <n> # 0..2
--top-p <n> # 0..1
--top-k <n> # integer; Anthropic-only, OpenAI ignores
--max-tokens <n> # cap on assistant tokens
--seed <n> # reproducibility (provider-dependent)
--stop <text> # stop sequence (repeatable, up to 4)
--allow-tool <name> # repeat to allowlist project MCP tools
--extra <key=value> # provider-specific knob (repeatable)
--no-stream # disable SSE; single JSON response
```
`--extra` is the LiteLLM-style escape hatch: pass anything the underlying
adapter understands. Numeric values are auto-parsed (`--extra
repetition_penalty=1.1`); strings stay strings.
## In-REPL slash-commands
```
/set KEY VALUE adjust an override for the rest of the session
(temperature, top-p, top-k, max-tokens, seed, stop,
or any provider-specific knob — unknown keys go
into `extra`)
/system <text> set systemAppend for this turn onward (empty = clear)
/tools list MCP servers the agent can call as tools
/clear start a fresh thread (same agent)
/save PATCH agent.defaultParams = current overrides
(systemOverride / systemAppend are NOT persisted)
/quit, /exit leave the REPL (Ctrl-D works too)
```
## Threads
Threads persist server-side. To resume:
```bash
mcpctl get threads --agent reviewer
mcpctl chat reviewer --thread <id>
```
A `mcpctl get thread <id>` reads the message log:
```bash
mcpctl get thread c0abc… -o yaml
```
## Examples
**Quick gut-check on a deploy:**
```bash
$ mcpctl chat reviewer -m "is fulldeploy.sh safe to run on the current branch?"
Yes — I checked: tests are green on commit 727e7d6 and there's no
in-flight migration. The k8s context is worker0-k8s0 (production); confirm
that's intended before running.
(thread: cm9k…)
```
**Resuming with overrides:**
```bash
$ mcpctl chat deployer --thread cm9k… --temperature 0.0 --max-tokens 256
> walk me through what changed since the last deploy
```
**Pinning sampling defaults to the agent:**
```
$ mcpctl chat deployer --temperature 0.0 --max-tokens 8000
> /save
(saved current overrides as agent.defaultParams)
> /quit
```
## Troubleshooting
- **No agents appear in `tools/list`** — check the agent has a project
attach (`mcpctl describe agent <name>`). The mcplocal plugin only
exposes agents on their attached project's session.
- **Tool calls fail with `Project not found`** — the agent has no project
attach. Either attach it (`mcpctl edit agent <name>` and set the project
field), or expect text-only chat.
- **Anthropic agents can't call tools** — known limitation; the Anthropic
adapter doesn't translate OpenAI tool format yet. Use LiteLLM or a
direct OpenAI-compatible provider for tool-using agents until the
translator ships.
- **`mcpctl chat <agent>` returns 404** — the agent name doesn't resolve.
`mcpctl get agents` to confirm spelling.
- **REPL feels stuck** — agent tool calls can take minutes (e.g. running a
Grafana query). Watch stderr for `[tool_call: …]` / `[tool_result: …]`
brackets; those tell you the loop is alive.