# Agents An `Agent` is an LLM persona pinned to a specific `Llm`, with a system prompt, a description that surfaces in MCP `tools/list`, optional attachment to a `Project`, and LiteLLM-style sampling defaults. Conversations are persisted as `ChatThread` + `ChatMessage` rows so REPL sessions resume across runs. Two surfaces use an agent: 1. **Direct chat** via `mcpctl chat ` (interactive REPL or one-shot `-m "msg"`). Streams over SSE; tool calls and tool results print to stderr in dim brackets. Slash-commands `/set`, `/system`, `/tools`, `/clear`, `/save`, `/quit` adjust runtime behavior. 2. **Virtual MCP server** registered into every project session by mcplocal's agents plugin. The agent shows up as `agent-` with one tool `chat`, whose description is the agent's own description. Other Claude sessions / MCP clients see the agent as just another tool in `tools/list` and can consult it. ## Data model Three Prisma models added to `src/db/prisma/schema.prisma`: - **`Agent`** — `name` (unique), `description`, `systemPrompt`, `llmId` (FK Restrict — an Llm in active use cannot be deleted), `projectId` (FK SetNull — agents survive project deletion), `proxyModelName` (optional informational override), `defaultParams` (Json, LiteLLM-style), `extras` (Json, reserved for future LoRA / tool allowlists), `ownerId`, version, timestamps. - **`ChatThread`** — `agentId`, `ownerId`, `title`, `lastTurnAt`, timestamps. Cascade delete on agent. - **`ChatMessage`** — `threadId`, `turnIndex` (monotonic per thread, enforced by `@@unique([threadId, turnIndex])`), `role` (`'system' | 'user' | 'assistant' | 'tool'`), `content`, `toolCalls` (Json — assistant turn's `[{id,name,arguments}]`), `toolCallId` (which call a tool turn answers), `status` (`'pending' | 'complete' | 'error'`), `createdAt`. Cascade delete on thread. `status` stays `pending` while the orchestrator runs an in-flight assistant or tool turn, then flips to `complete` once the round settles. On any exception in the chat loop, every `pending` row in the thread is flipped to `error` so the trail stays auditable. ## Chat parameters (LiteLLM-style passthrough) Per-call resolution: request body → `agent.defaultParams` → adapter default. Setting a key to `null` in the request explicitly clears a default. | Key | Type | Notes | |---|---|---| | `temperature` | number | 0..2 | | `top_p` | number | 0..1 | | `top_k` | integer | Anthropic-only; OpenAI ignores | | `max_tokens` | integer | adapter clamps to provider max | | `stop` | string \| string[] | up to 4 sequences | | `presence_penalty` | number | OpenAI | | `frequency_penalty` | number | OpenAI | | `seed` | integer | reproducibility (provider-dependent) | | `response_format` | object | `text` \| `json_object` \| `json_schema` | | `tool_choice` | enum/object | `auto`\|`none`\|`required`\|`{type:'function',function:{name}}` | | `tools_allowlist` | string[] | restricts which project MCP tools the agent can call this turn | | `systemOverride` | string | replaces `agent.systemPrompt` for this call | | `systemAppend` | string | concatenated to system block (after project Prompts) | | `messages` | array | full message history override; if set, `message`/threadId history is ignored | | `extra` | object | provider-specific knobs (Anthropic `metadata.user_id`, vLLM `repetition_penalty`) — adapters cherry-pick | ## HTTP API (mcpd) ``` GET /api/v1/agents list (RBAC: view:agents) GET /api/v1/agents/:idOrName describe (view:agents) POST /api/v1/agents create (create:agents) PUT /api/v1/agents/:idOrName update (edit:agents) DELETE /api/v1/agents/:idOrName delete (delete:agents) POST /api/v1/agents/:name/chat chat — non-streaming or SSE (run:agents:) POST /api/v1/agents/:name/threads create thread (run:agents:) GET /api/v1/agents/:name/threads list threads (run:agents:) GET /api/v1/threads/:id/messages replay history (view:agents) GET /api/v1/projects/:p/agents project-scoped list (view:projects:

) ``` The chat endpoint reuses the SSE pattern from `llm-infer.ts` exactly: same headers (`text/event-stream`, `X-Accel-Buffering: no`), same `data: …\n\n` framing, same `[DONE]` terminator. SSE chunk types: - `{type:'text', delta}` — assistant text increments - `{type:'tool_call', toolName, args}` — model decided to call a tool - `{type:'tool_result', toolName, ok}` — tool dispatch outcome - `{type:'final', threadId, turnIndex}` — terminal turn - `{type:'error', message}` — fatal error in the loop ## Tool-use loop When the agent's project has MCP servers attached, mcpd's `ChatService` lists each server's tools (via `mcp-proxy.service.ts` — same path real MCP traffic uses) and presents them to the model namespaced as `__`. On a `tool_calls` response the loop dispatches each call back through the same proxy, persists the assistant + tool turns linked by `toolCallId`, and loops (cap = 12 iterations) until the model returns terminal text. Persistence is **non-transactional across the loop** because tool calls can take minutes; long-held DB transactions would starve other writers. ## RBAC Agents are their own resource (`agents`), independent of project bindings. Recommended: - `view:agents` — list / describe - `create:agents` / `edit:agents` / `delete:agents` — CRUD - `run:agents:` — drive a chat turn or manage its threads Project-attached agents do not implicitly inherit project RBAC. If a project member should be able to chat with the project's agents, grant them `run:agents:` (or wildcard `run:agents`) explicitly. ## YAML round-trip `get agent foo -o yaml | mcpctl apply -f -` is a no-op. The `apply` schema also accepts shorthand: ```yaml apiVersion: mcpctl.io/v1 kind: agent metadata: { name: deployer } spec: description: "I help you deploy code" llm: qwen3-thinking # shorthand for `{ name: qwen3-thinking }` project: mcpctl-dev # shorthand for `{ name: mcpctl-dev }` systemPrompt: | You are a deployment assistant for mcpctl. Always check fulldeploy.sh and the k8s context before suggesting actions. defaultParams: temperature: 0.2 max_tokens: 4096 top_p: 0.9 stop: [""] ``` ## Wiring against your in-cluster qwen3-thinking The `kubernetes-deployment` repo provisions LiteLLM in the `nvidia-nim` namespace (`http://litellm.nvidia-nim.svc.cluster.local:4000/v1` in-cluster, `https://llm.ad.itaz.eu/v1` external) and a virtual key reserved for mcpctl in the Pulumi secret `secrets:litellmMcpctlGatewayToken`. Pulling it once: ```bash cd /path/to/kubernetes-deployment LITELLM_TOKEN=$(pulumi config get --stack homelab secrets:litellmMcpctlGatewayToken) # fallback if Pulumi isn't authed locally: # LITELLM_TOKEN=$(kubectl --context worker0-k8s0 -n nvidia-nim get secret litellm-secrets \ # -o jsonpath='{.data.LITELLM_MCPCTL_GATEWAY_TOKEN}' | base64 -d) cd /path/to/mcpctl mcpctl create secret litellm-key --data "API_KEY=${LITELLM_TOKEN}" mcpctl create llm qwen3-thinking \ --type openai --model qwen3-thinking \ --url http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \ --api-key-ref litellm-key/API_KEY \ --description "Qwen3-30B-A3B-Thinking-FP8 via in-cluster vLLM behind LiteLLM" mcpctl create agent reviewer \ --llm qwen3-thinking \ --description "I review what you're shipping; ask after each major change." \ --default-temperature 0.2 --default-max-tokens 4096 mcpctl chat reviewer ``` ## Troubleshooting - **Namespace collision** in mcplocal: if a project has an upstream MCP server literally named `agent-`, the agents plugin detects the collision in `onSessionCreate`, skips that agent's registration, and emits a `ctx.log.warn` line. Document the `agent-` prefix as reserved on real server names. - **Llm-in-use blocks delete**: `Agent.llm` is `onDelete: Restrict`. Detach every agent (or delete them) before deleting the underlying Llm. - **Stale `pending` rows**: a crash mid-loop leaves `pending` ChatMessage rows. The next request recovers — `markPendingAsError` flips them on the next failure path, and `loadHistory` filters out `error` rows when rebuilding context for the next turn. - **`proxyModelName` is informational only** for agents. The agent's own internal tool loop runs server-side in mcpd and bypasses mcplocal's proxymodel pipeline entirely. Don't try to plumb it. - **Anthropic + tools**: the Anthropic adapter currently drops `tool` role messages and doesn't translate OpenAI `tool_calls` to Anthropic `tool_use` / `tool_result` blocks. Use an OpenAI-compatible provider (LiteLLM, vLLM, OpenAI) for agents that need tool calling until that translation lands. ## See also - [personalities.md](./personalities.md) — named overlays of prompts on top of an agent. Same agent, different prompt bundles, picked per-turn via `--personality ` or `agent.defaultPersonality`. - [chat.md](./chat.md) — `mcpctl chat` flow and LiteLLM-style flags.