Closes the agents feature. Smoke tests (run via `pnpm test:smoke` against a live mcpd at $MCPD_URL, default https://mcpctl.ad.itaz.eu): * tests/smoke/agent.smoke.test.ts — full CRUD round-trip: create secret + Llm + agent with sampling defaults; `get agents` surfaces it; `get agent foo -o yaml | apply -f` round-trips identically; create + list a thread via the HTTP API; agent delete leaves Llm + secret intact (Restrict + SetNull as designed). Self- skips with a warning when /healthz is unreachable. * tests/smoke/agent-chat.smoke.test.ts — gated on MCPCTL_SMOKE_LLM_URL + MCPCTL_SMOKE_LLM_KEY. Provisions secret + Llm + agent against a real upstream, runs `mcpctl chat -m … --no- stream` (asserts a reply lands), then runs the streaming default (asserts text on stdout + `(thread: …)` on stderr). The fast path for verifying the in-cluster qwen3-thinking deployment: MCPCTL_SMOKE_LLM_URL=http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \ MCPCTL_SMOKE_LLM_MODEL=qwen3-thinking \ MCPCTL_SMOKE_LLM_KEY=$(pulumi config get --stack homelab \ secrets:litellmMcpctlGatewayToken) \ pnpm test:smoke Docs: * README.md — new "Agents" section under Resources with the qwen3-thinking quickstart and links to docs/agents.md and docs/chat.md. Adds llm + agent rows to the resources table. * docs/agents.md (new) — full reference: data model, chat-parameter table, HTTP API, RBAC mapping, tool-use loop semantics, yaml round-trip shorthand, the kubernetes-deployment wiring recipe, and a troubleshooting section (namespace collision, llm-in-use, pending-row recovery, Anthropic-tool limitation). * docs/chat.md (new) — user-facing `mcpctl chat` walkthrough: modes, per-call flags, slash-commands, threads, and a troubleshooting section. * CLAUDE.md — adds a "Resource types" cheatsheet with one-line pointers to each, including the new `agent` row that links to the docs. All suites still green: mcpd 759/759, mcplocal 715/715, cli 430/430. Smoke tests typecheck and self-skip when no live mcpd is reachable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.7 KiB
Agents
An Agent is an LLM persona pinned to a specific Llm, with a system prompt,
a description that surfaces in MCP tools/list, optional attachment to a
Project, and LiteLLM-style sampling defaults. Conversations are persisted
as ChatThread + ChatMessage rows so REPL sessions resume across runs.
Two surfaces use an agent:
-
Direct chat via
mcpctl chat <name>(interactive REPL or one-shot-m "msg"). Streams over SSE; tool calls and tool results print to stderr in dim brackets. Slash-commands/set,/system,/tools,/clear,/save,/quitadjust runtime behavior. -
Virtual MCP server registered into every project session by mcplocal's agents plugin. The agent shows up as
agent-<name>with one toolchat, whose description is the agent's own description. Other Claude sessions / MCP clients see the agent as just another tool intools/listand can consult it.
Data model
Three Prisma models added to src/db/prisma/schema.prisma:
-
Agent—name(unique),description,systemPrompt,llmId(FK Restrict — an Llm in active use cannot be deleted),projectId(FK SetNull — agents survive project deletion),proxyModelName(optional informational override),defaultParams(Json, LiteLLM-style),extras(Json, reserved for future LoRA / tool allowlists),ownerId, version, timestamps. -
ChatThread—agentId,ownerId,title,lastTurnAt, timestamps. Cascade delete on agent. -
ChatMessage—threadId,turnIndex(monotonic per thread, enforced by@@unique([threadId, turnIndex])),role('system' | 'user' | 'assistant' | 'tool'),content,toolCalls(Json — assistant turn's[{id,name,arguments}]),toolCallId(which call a tool turn answers),status('pending' | 'complete' | 'error'),createdAt. Cascade delete on thread.
status stays pending while the orchestrator runs an in-flight assistant
or tool turn, then flips to complete once the round settles. On any
exception in the chat loop, every pending row in the thread is flipped to
error so the trail stays auditable.
Chat parameters (LiteLLM-style passthrough)
Per-call resolution: request body → agent.defaultParams → adapter default.
Setting a key to null in the request explicitly clears a default.
| Key | Type | Notes |
|---|---|---|
temperature |
number | 0..2 |
top_p |
number | 0..1 |
top_k |
integer | Anthropic-only; OpenAI ignores |
max_tokens |
integer | adapter clamps to provider max |
stop |
string | string[] | up to 4 sequences |
presence_penalty |
number | OpenAI |
frequency_penalty |
number | OpenAI |
seed |
integer | reproducibility (provider-dependent) |
response_format |
object | text | json_object | json_schema |
tool_choice |
enum/object | auto|none|required|{type:'function',function:{name}} |
tools_allowlist |
string[] | restricts which project MCP tools the agent can call this turn |
systemOverride |
string | replaces agent.systemPrompt for this call |
systemAppend |
string | concatenated to system block (after project Prompts) |
messages |
array | full message history override; if set, message/threadId history is ignored |
extra |
object | provider-specific knobs (Anthropic metadata.user_id, vLLM repetition_penalty) — adapters cherry-pick |
HTTP API (mcpd)
GET /api/v1/agents list (RBAC: view:agents)
GET /api/v1/agents/:idOrName describe (view:agents)
POST /api/v1/agents create (create:agents)
PUT /api/v1/agents/:idOrName update (edit:agents)
DELETE /api/v1/agents/:idOrName delete (delete:agents)
POST /api/v1/agents/:name/chat chat — non-streaming or SSE (run:agents:<name>)
POST /api/v1/agents/:name/threads create thread (run:agents:<name>)
GET /api/v1/agents/:name/threads list threads (run:agents:<name>)
GET /api/v1/threads/:id/messages replay history (view:agents)
GET /api/v1/projects/:p/agents project-scoped list (view:projects:<p>)
The chat endpoint reuses the SSE pattern from llm-infer.ts exactly: same
headers (text/event-stream, X-Accel-Buffering: no), same data: …\n\n
framing, same [DONE] terminator. SSE chunk types:
{type:'text', delta}— assistant text increments{type:'tool_call', toolName, args}— model decided to call a tool{type:'tool_result', toolName, ok}— tool dispatch outcome{type:'final', threadId, turnIndex}— terminal turn{type:'error', message}— fatal error in the loop
Tool-use loop
When the agent's project has MCP servers attached, mcpd's ChatService lists
each server's tools (via mcp-proxy.service.ts — same path real MCP traffic
uses) and presents them to the model namespaced as <server>__<tool>. On a
tool_calls response the loop dispatches each call back through the same
proxy, persists the assistant + tool turns linked by toolCallId, and loops
(cap = 12 iterations) until the model returns terminal text.
Persistence is non-transactional across the loop because tool calls can take minutes; long-held DB transactions would starve other writers.
RBAC
Agents are their own resource (agents), independent of project bindings.
Recommended:
view:agents— list / describecreate:agents/edit:agents/delete:agents— CRUDrun:agents:<name>— drive a chat turn or manage its threads
Project-attached agents do not implicitly inherit project RBAC. If a project
member should be able to chat with the project's agents, grant them
run:agents:<each-name> (or wildcard run:agents) explicitly.
YAML round-trip
get agent foo -o yaml | mcpctl apply -f - is a no-op. The apply schema
also accepts shorthand:
apiVersion: mcpctl.io/v1
kind: agent
metadata: { name: deployer }
spec:
description: "I help you deploy code"
llm: qwen3-thinking # shorthand for `{ name: qwen3-thinking }`
project: mcpctl-dev # shorthand for `{ name: mcpctl-dev }`
systemPrompt: |
You are a deployment assistant for mcpctl. Always check fulldeploy.sh
and the k8s context before suggesting actions.
defaultParams:
temperature: 0.2
max_tokens: 4096
top_p: 0.9
stop: ["</deploy>"]
Wiring against your in-cluster qwen3-thinking
The kubernetes-deployment repo provisions LiteLLM in the nvidia-nim
namespace (http://litellm.nvidia-nim.svc.cluster.local:4000/v1 in-cluster,
https://llm.ad.itaz.eu/v1 external) and a virtual key reserved for mcpctl
in the Pulumi secret secrets:litellmMcpctlGatewayToken. Pulling it once:
cd /path/to/kubernetes-deployment
LITELLM_TOKEN=$(pulumi config get --stack homelab secrets:litellmMcpctlGatewayToken)
# fallback if Pulumi isn't authed locally:
# LITELLM_TOKEN=$(kubectl --context worker0-k8s0 -n nvidia-nim get secret litellm-secrets \
# -o jsonpath='{.data.LITELLM_MCPCTL_GATEWAY_TOKEN}' | base64 -d)
cd /path/to/mcpctl
mcpctl create secret litellm-key --data "API_KEY=${LITELLM_TOKEN}"
mcpctl create llm qwen3-thinking \
--type openai --model qwen3-thinking \
--url http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
--api-key-ref litellm-key/API_KEY \
--description "Qwen3-30B-A3B-Thinking-FP8 via in-cluster vLLM behind LiteLLM"
mcpctl create agent reviewer \
--llm qwen3-thinking \
--description "I review what you're shipping; ask after each major change." \
--default-temperature 0.2 --default-max-tokens 4096
mcpctl chat reviewer
Troubleshooting
-
Namespace collision in mcplocal: if a project has an upstream MCP server literally named
agent-<x>, the agents plugin detects the collision inonSessionCreate, skips that agent's registration, and emits actx.log.warnline. Document theagent-prefix as reserved on real server names. -
Llm-in-use blocks delete:
Agent.llmisonDelete: Restrict. Detach every agent (or delete them) before deleting the underlying Llm. -
Stale
pendingrows: a crash mid-loop leavespendingChatMessage rows. The next request recovers —markPendingAsErrorflips them on the next failure path, andloadHistoryfilters outerrorrows when rebuilding context for the next turn. -
proxyModelNameis informational only for agents. The agent's own internal tool loop runs server-side in mcpd and bypasses mcplocal's proxymodel pipeline entirely. Don't try to plumb it. -
Anthropic + tools: the Anthropic adapter currently drops
toolrole messages and doesn't translate OpenAItool_callsto Anthropictool_use/tool_resultblocks. Use an OpenAI-compatible provider (LiteLLM, vLLM, OpenAI) for agents that need tool calling until that translation lands.