feat(agents+chat): agents feature + live chat UX #57

Merged
michal merged 14 commits from feat/agents-and-chat-ux into main 2026-04-26 17:53:30 +00:00
Owner

Summary

  • Agents feature (Stages 1–6): Agent / ChatThread / ChatMessage Prisma schema, mcpd repos + services with tool-use loop, mcpd routes + RBAC, mcplocal agents plugin, mcpctl chat REPL + agent CRUD, smoke tests + README + docs.
  • Smoke env repair: closed 27 post-deploy smoke failures (rotator/auth/RBAC).
  • LLM auth probe: verify upstream credentials at mcpctl create llm time so misconfigured tokens fail fast (422) instead of at first chat.
  • Chat UX:
    • Surface reasoning_content from qwen3-thinking / deepseek-reasoner / o1 as dim+italic thinking chunks on stderr.
    • Live tokens/sec ticker pinned to a DECSTBM bottom-row status bar (no more mid-word smearing from cursor save/restore).
    • Final stats footer in the scroll region for copy-pasted transcripts.
    • Startup banner showing agent + LLM + project + effective system prompt + active session overrides.
  • --system, --system-file, --system-append flags layered on top of agent.systemPrompt.

Test plan

  • Unit: pnpm --filter @mcpctl/cli exec vitest run (430/430)
  • Unit: pnpm --filter @mcpctl/mcpd exec vitest run
  • Smoke: full pipeline via `bash fulldeploy.sh` (18/18 files, 123/123 tests)
  • Manual TTY: REPL banner visible, status bar pins to bottom row during streaming, terminal cleans up on /quit and Ctrl-C
  • Pipe sanity: `mcpctl chat reviewer -m hi 2>&1 | tee /tmp/chat.log` — no escape leaks
## Summary - **Agents feature** (Stages 1–6): Agent / ChatThread / ChatMessage Prisma schema, mcpd repos + services with tool-use loop, mcpd routes + RBAC, mcplocal agents plugin, `mcpctl chat` REPL + agent CRUD, smoke tests + README + docs. - **Smoke env repair**: closed 27 post-deploy smoke failures (rotator/auth/RBAC). - **LLM auth probe**: verify upstream credentials at `mcpctl create llm` time so misconfigured tokens fail fast (422) instead of at first chat. - **Chat UX**: - Surface `reasoning_content` from qwen3-thinking / deepseek-reasoner / o1 as dim+italic `thinking` chunks on stderr. - Live tokens/sec ticker pinned to a DECSTBM bottom-row status bar (no more mid-word smearing from cursor save/restore). - Final stats footer in the scroll region for copy-pasted transcripts. - Startup banner showing agent + LLM + project + effective system prompt + active session overrides. - `--system`, `--system-file`, `--system-append` flags layered on top of agent.systemPrompt. ## Test plan - [x] Unit: `pnpm --filter @mcpctl/cli exec vitest run` (430/430) - [x] Unit: `pnpm --filter @mcpctl/mcpd exec vitest run` - [x] Smoke: full pipeline via \`bash fulldeploy.sh\` (18/18 files, 123/123 tests) - [ ] Manual TTY: REPL banner visible, status bar pins to bottom row during streaming, terminal cleans up on /quit and Ctrl-C - [ ] Pipe sanity: \`mcpctl chat reviewer -m hi 2>&1 | tee /tmp/chat.log\` — no escape leaks
michal added 14 commits 2026-04-26 17:53:05 +00:00
Introduces the persistence layer for the upcoming Agent feature: an LLM
persona pinned to a specific Llm, optionally attached to a Project, with
persisted chat threads/messages so conversations survive REPL exits.

Constraint shape:
- Agent.llm uses ON DELETE RESTRICT — deleting an Llm in active use fails.
- Agent.project uses ON DELETE SET NULL — agents survive project deletion.
- ChatThread → ChatMessage cascade so deleting an agent purges its history.
- ChatMessage @@unique([threadId, turnIndex]) gives append ordering even
  under racing writers (services retry on collision).

LiteLLM-style per-call overrides will live in Agent.defaultParams (Json);
the loose extras Json field is reserved for future LoRA/tool-allowlist work.

Pinned vitest fileParallelism=false in @mcpctl/db: all suites share the
same Postgres, and adding a second suite exposed FK contention between a
clearAllTables in one file and a create in another. Per-test isolation
still comes from beforeEach.

Tests: 8/8 green in src/db/tests/agent-schema.test.ts (defaults, name
uniqueness, llm-in-use Restrict, project-delete SetNull, agent-delete
cascade, duplicate (threadId, turnIndex) blocked, tool-call payload
round-trip, lastTurnAt DESC ordering).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Layers the persistence-side logic on top of the Stage 1 schema. AgentService
mirrors LlmService's CRUD shape with name-resolved llm/project references and
yaml round-trip support; ChatService is the orchestrator that drives one chat
turn end-to-end: build the merged system block (agent.systemPrompt + project
Prompts ordered by priority desc + per-call systemAppend), persist the user
turn, run the adapter, dispatch any tool_calls through an injected
ChatToolDispatcher, persist tool turns linked back via toolCallId, and loop
until the model returns terminal text.

Per-call params resolve LiteLLM-style: request body → agent.defaultParams →
adapter default. The escape hatch `extra` is forwarded as-is so each adapter
can cherry-pick provider-specific knobs (Anthropic metadata, vLLM
repetition_penalty, etc.) without code changes here.

Persistence is non-transactional across the loop because tool calls can take
minutes; long-held DB transactions would starve other writers. Instead each
in-flight assistant turn is written `pending` and flipped to `complete` only
after its tool results land. On failure or max-iter overrun, every `pending`
row in the thread is flipped to `error` so the trail is auditable.

Tools are namespaced on the wire as `<server>__<tool>`, unmarshalled at
dispatch time; `tools_allowlist` filters before the model sees the list.

Tests:
  agent-service.test.ts (7) — CRUD with name-resolved llm/project, conflict
    on duplicate, llm switch, project detach, listByProject filtering,
    upsertByName branch coverage.
  chat-service.test.ts (9) — plain text turn, full text→tool→text loop with
    toolCallId linkage, max-iter cap leaves zero pending, adapter-throws
    leaves zero pending, body→defaultParams merge, `extra` passthrough,
    project-Prompt priority ordering in the system block, tool-without-
    project rejection, tools_allowlist filtering.

All 16 green; full mcpd suite still 737/737.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the Stage 2 services into HTTP. New routes:

  GET    /api/v1/agents                — list
  GET    /api/v1/agents/:idOrName       — describe
  POST   /api/v1/agents                 — create
  PUT    /api/v1/agents/:idOrName       — update
  DELETE /api/v1/agents/:idOrName       — delete
  GET    /api/v1/projects/:p/agents     — project-scoped list (mcplocal disco)
  POST   /api/v1/agents/:name/chat      — chat (non-streaming or SSE stream)
  POST   /api/v1/agents/:name/threads   — create thread explicitly
  GET    /api/v1/agents/:name/threads   — list threads
  GET    /api/v1/threads/:id/messages   — replay history

The chat endpoint reuses the SSE pattern from llm-infer.ts (same headers
incl. X-Accel-Buffering:no, same `data: …\n\n` framing, same `[DONE]`
terminator). Each ChatService chunk is one frame. Non-streaming returns
{threadId, assistant, turnIndex} as JSON.

RBAC mapping in main.ts:mapUrlToPermission:
  - /agents/:name/{chat,threads*}  → run:agents:<name>
  - /threads/:id/*                 → view:agents (service-level owner check
    handles fine-grained access since the URL doesn't carry the agent name)
  - /agents and /agents/:idOrName  → default {GET:view, POST:create,
    PUT:edit, DELETE:delete} on resource 'agents'.
'agents' added to nameResolvers so RBAC's CUID→name lookup works.

ChatToolDispatcherImpl bridges ChatService to McpProxyService: it lists a
project's MCP servers, fans out tools/list calls to each, namespaces tool
names as `<server>__<tool>`, and routes tools/call back to the right
serverId on dispatch. tools/list errors on a single server are logged and
that server's tools are dropped from the turn's tool surface — one bad
server doesn't poison the whole list.

Tests:
  agent-routes.test.ts (15) — full HTTP CRUD round-trip, 404/409 paths,
    project-scoped list, non-streaming + SSE chat, thread create/list,
    /threads/:id/messages replay, body-required 400.
  chat-tool-dispatcher.test.ts (7) — empty list when no project / no
    servers, namespacing + inputSchema forwarding, partial-failure
    skipping with audit log, callTool dispatch shape, missing-server
    rejection, JSON-RPC error surfacing.

All 22 new green; mcpd suite now 759/759 (was 737).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a Claude (or any other MCP client) connects to a project's mcplocal
endpoint, every Agent attached to that project now appears in the
session's tools/list as a virtual MCP server named `agent-<agentName>`
with one tool `chat`. Calling that tool POSTs to the Stage 3 chat
endpoint and returns the assistant's reply as MCP content. The tool's
description is the agent's own description, so connecting clients see
prose like "I review security design — ask me after each major change."
This is what makes one agent reachable from another's MCP session.

Plumbing:
  * src/mcplocal/src/proxymodel/plugins/agents.ts (new) — the plugin.
    onSessionCreate fetches /api/v1/projects/:p/agents via mcpd, then
    registers a VirtualServer per agent. The chat tool's inputSchema
    mirrors the LiteLLM-style override surface (temperature, top_p,
    top_k, max_tokens, stop, seed, tools_allowlist, extra) plus
    threadId for follow-ups. Namespace collision with an existing
    upstream MCP server named `agent-<x>` is detected and skipped with
    a `ctx.log.warn` line — better to surface the conflict than to
    silently shadow real tool entries in the virtualTools map.
  * src/mcplocal/src/proxymodel/plugins/compose.ts (new) — generic
    N-plugin composition helper. Lifecycle hooks fan out in order;
    transform hooks (onToolsList, onResourcesList, onPromptsList,
    onToolCallAfter) pipeline; intercept hooks (onToolCallBefore,
    onResourceRead, onPromptGet, onInitialize) short-circuit on the
    first non-null. Generalizes what createDefaultPlugin does for
    two fixed parents.
  * src/mcplocal/src/http/project-mcp-endpoint.ts — every project
    session now uses composePlugins([defaultPlugin, agentsPlugin]) so
    agents show up no matter which proxymodel the project is on.
  * Plugin context: added getFromMcpd(path) alongside postToMcpd. The
    existing postToMcpd was hard-coded to POST; the agents plugin
    needs GET to discover. Wired through plugin.ts → plugin-context.ts
    → router.ts.

Tests:
  plugin-agents.test.ts (8) — registers per agent, falls back to a
    generic description, skips on namespace collision, no-ops with
    zero agents, logs and continues on mcpd error, chat handler
    POSTs correct body and returns content array, isError surfacing
    on mcpd error, onSessionDestroy unregisters everything.
  plugin-compose.test.ts (6) — single-plugin pass-through, empty
    rejection, lifecycle ordering, intercept short-circuit, list
    pipeline, no-op composition stays minimal.

mcplocal suite: 715/715. mcpd suite still 759/759.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is the moment the user can actually talk to an agent end-to-end:

  mcpctl create llm qwen3-thinking --type openai --model qwen3-thinking \
    --url http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
    --api-key-ref litellm-key/API_KEY
  mcpctl create agent reviewer --llm qwen3-thinking --project mcpctl-dev \
    --description "I review security design — ask me after each major change."
  mcpctl chat reviewer

Pieces:

* src/cli/src/commands/chat.ts (new) — REPL + one-shot. Streams the SSE
  endpoint and prints text deltas to stdout as they arrive; tool_call /
  tool_result events go to stderr in dim-style brackets so the chat
  output stays clean. LiteLLM-style flags (--temperature / --top-p /
  --top-k / --max-tokens / --seed / --stop / --allow-tool / --extra)
  layer over agent.defaultParams. In-REPL slash-commands: /set KEY VAL,
  /system <text>, /tools (list project's MCP servers), /clear (new
  thread), /save (PATCH agent.defaultParams = current overrides),
  /quit.

* src/cli/src/commands/create.ts — `create agent` mirroring the llm
  pattern. Every yaml-applyable field has a corresponding flag (memory
  rule); --default-temperature / --default-top-p / --default-top-k /
  --default-max-tokens / --default-seed / --default-stop /
  --default-extra / --default-params-file all populate agent.defaultParams.

* src/cli/src/commands/apply.ts — AgentSpecSchema accepts both `llm:
  qwen3-thinking` shorthand and `llm: { name: ... }` long form; runs
  after llms in the apply order so apiKey/llm references resolve. Round-
  trips with `get agent foo -o yaml | apply -f -` (memory rule).

* src/cli/src/commands/get.ts — agentColumns (NAME, LLM, PROJECT,
  DESCRIPTION, ID); RESOURCE_KIND mapping for yaml export.

* src/cli/src/commands/shared.ts — `agent`/`agents`/`thread`/`threads`
  added to RESOURCE_ALIASES.

* src/cli/src/index.ts — wires createChatCommand into the program; passes
  the resolved baseUrl + token so chat can stream SSE without going
  through ApiClient (which only does buffered request/response).

* completions/mcpctl.{fish,bash} regenerated. scripts/generate-completions.ts
  knows about agents (canonical + aliases) and emits a special-case
  `chat)` block that completes the first arg with `mcpctl get agents`
  names. tests/completions.test.ts: +9 new assertions covering agents in
  the resource list, chat in the commands list, --llm flag for create
  agent, agent-name completion for chat, etc.

CLI suite: 430/430 (was 421). Completions --check is clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the agents feature.

Smoke tests (run via `pnpm test:smoke` against a live mcpd at
$MCPD_URL, default https://mcpctl.ad.itaz.eu):

* tests/smoke/agent.smoke.test.ts — full CRUD round-trip:
  create secret + Llm + agent with sampling defaults; `get agents`
  surfaces it; `get agent foo -o yaml | apply -f` round-trips
  identically; create + list a thread via the HTTP API; agent delete
  leaves Llm + secret intact (Restrict + SetNull as designed). Self-
  skips with a warning when /healthz is unreachable.

* tests/smoke/agent-chat.smoke.test.ts — gated on
  MCPCTL_SMOKE_LLM_URL + MCPCTL_SMOKE_LLM_KEY. Provisions secret +
  Llm + agent against a real upstream, runs `mcpctl chat -m … --no-
  stream` (asserts a reply lands), then runs the streaming default
  (asserts text on stdout + `(thread: …)` on stderr). The fast path
  for verifying the in-cluster qwen3-thinking deployment:

      MCPCTL_SMOKE_LLM_URL=http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
      MCPCTL_SMOKE_LLM_MODEL=qwen3-thinking \
      MCPCTL_SMOKE_LLM_KEY=$(pulumi config get --stack homelab \
        secrets:litellmMcpctlGatewayToken) \
        pnpm test:smoke

Docs:

* README.md — new "Agents" section under Resources with the
  qwen3-thinking quickstart and links to docs/agents.md and
  docs/chat.md. Adds llm + agent rows to the resources table.

* docs/agents.md (new) — full reference: data model, chat-parameter
  table, HTTP API, RBAC mapping, tool-use loop semantics, yaml
  round-trip shorthand, the kubernetes-deployment wiring recipe,
  and a troubleshooting section (namespace collision, llm-in-use,
  pending-row recovery, Anthropic-tool limitation).

* docs/chat.md (new) — user-facing `mcpctl chat` walkthrough:
  modes, per-call flags, slash-commands, threads, and a
  troubleshooting section.

* CLAUDE.md — adds a "Resource types" cheatsheet with one-line
  pointers to each, including the new `agent` row that links to
  the docs.

All suites still green: mcpd 759/759, mcplocal 715/715, cli 430/430.
Smoke tests typecheck and self-skip when no live mcpd is reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
caused 27 post-deploy smoke failures

This commit lands the durable side of the post-deploy investigation:
genuine bugs that let the upstream OpenBao re-init silently break every
secret write for 4 days, plus test-code bugs that masked the same
breakage in the smoke output.

mcpd — fail loud on dead OpenBao tokens
=======================================
secret-backend-rotator.service.ts
  When `mintRoleToken` or `lookupSelf` returns 403/401, classify it as
  BACKEND_TOKEN_DEAD (likely cause: upstream OpenBao re-init invalidated
  every pre-existing token), wrap the thrown error with explicit
  remediation (mint via root + `mcpctl create secret <name> --data
  <key>=<token> --force`), persist the same message to
  tokenMeta.lastRotationError, and emit a structured `level:fatal`
  console.error so it shows up in `kubectl logs deploy/mcpd` with grep-
  friendly `kind:BACKEND_TOKEN_DEAD`. Adds a `healthCheck(backendId)`
  method that runs lookup-self without minting — so the boot-time loop
  can detect the dead-token state immediately, not 24 hours later.

secret-backend-rotator-loop.ts
  Boot-time health check: in `start()`, for every rotatable backend, call
  `rotator.healthCheck(b.id)` and on failure log a structured fatal entry.
  This converts the prior silent failure mode (24h wait until scheduled
  rotation reveals the dead token, with secret writes failing under it
  the entire time) into "mcpd boots, immediately sees the dead token,
  alerts loudly". Existing isOverdue path is unchanged.

mcpd — Prisma userId crash on /me
=================================
routes/auth.ts
  GET /api/v1/auth/me used `request.userId!` which lied: an authenticated
  McpToken bearer satisfies the auth middleware but has no associated
  User row, so userId stayed undefined and `findUnique({ id: undefined })`
  threw PrismaClientValidationError. Now returns 401 with a clear
  "service-account/token-bound principal cannot be queried via /me"
  message instead of bubbling a 500.

mcplocal — token revocation propagation
=======================================
http/token-auth.ts
  Lowered default introspection positiveTtl from 30s → 5s. mcpd's
  introspection endpoint is a single DB lookup; the cache only protects
  against burst restart storms, not steady-state load. The 30s window
  let revoked tokens keep working for the full window after revocation
  (caught by mcptoken.smoke's negative-cache assertion). Aligns with the
  existing 5s negativeTtl and the test's `wait 7s after revoke` expectation.

smoke tests — read URL the same way the CLI does
================================================
mcp-client.ts
  Adds `loadMcpdAuth()`: URL from `~/.mcpctl/config.json`, token from
  `~/.mcpctl/credentials`. Critically, the URL does NOT come from
  credentials. credentials.mcpdUrl carries a stale field for legacy
  reasons and goes out of sync (left over from old `mcpctl login
  --mcpd-url localhost:3xxx` invocations) — tests reading it ended up
  hitting whatever URL the user last logged into rather than the URL
  the CLI is actually using right now. audit/security/system-prompts
  smoke now use loadMcpdAuth(), eliminating ~10 cascade failures.
  Also: switch httpRequest to https.request when scheme is https
  (matching audit/security/system-prompts/mcp-client/agent helpers).
  Bumps default callTool timeout from 30s → 60s; many tools that fetch
  external resources routinely run 10-30s.

agent.smoke.test.ts
  - readToken read from `credentials.json`; the file is `credentials`
    (no extension). Caused 401 on POST /threads.
  - `mcpctl get <resource> <name> -o json` returns an array, not a bare
    object. Round-trip yaml test now indexes [0] before reading
    description.

secretbackend.smoke.test.ts
  Two genuine assertion-drift fixes (env was right, test was stale):
  - "lists at least one secretbackend": stop hard-coding the default
    backend type as 'plaintext'; the invariant is "exactly one default
    exists". The seeded plaintext is the bootstrap default but operators
    routinely promote a remote backend (openbao etc.) once it's healthy.
  - "refuses to delete the seeded default": widen the regex from
    /default|in use|cannot delete/ to also accept "referenced" — the
    exact wording has shifted to "is still referenced by N secret(s);
    migrate them first".

audit.test.ts / system-prompts.test.ts / security.test.ts
  Switch http.request → https.request when URL is https (each had its
  own copy of the helper). Drop the now-orphan loadMcpdCredentials in
  favour of loadMcpdAuth from mcp-client.ts.

Tests
=====
mcpd 759/759, mcplocal 715/715 unit suites still green. Smoke (live):
  Run 1 (pre-commit, post bao-token rotation):  27 → 12 failures.
  Run 2 (after fixes-batch, pre-redeploy):      12 →  2 failures.
The remaining 2 (mcptoken cache TTL, proxy-pipeline timeout) are what
the durable code changes here address; verify after the next redeploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Followup to e51b924. The middleware default in token-auth.ts is 5s, but
serve.ts wraps the construction with its own env-fallback default of
30000ms — so when MCPLOCAL_TOKEN_POSITIVE_TTL_MS isn't set in the
environment, serve.ts always wins and revoked tokens still propagate
slowly. Lowered serve.ts to 5s for symmetry; operators wanting a longer
window can set the env var explicitly.

Caught by mcptoken.smoke continuing to fail after the previous redeploy:
verified the token-auth.js shipped with `?? 5_000`, but the wrapper was
overriding it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 — thread reads now enforce ownership
========================================
chat.service.ts / routes/agent-chat.ts
  GET /api/v1/threads/:id/messages was previously RBAC-mapped to
  view:agents (no resourceName scope) with the route comment promising
  "service-level owner check enforces fine-grained access" — but the
  service didn't actually check. Any caller with view:agents could read
  another user's thread by guessing/learning the threadId. CUIDs are
  hard to brute-force but they leak: SSE `final` chunks, agents-plugin
  `_meta.threadId`, and several response bodies surface them. Now
  ChatService.listMessages(threadId, ownerId) loads the thread, returns
  404 (not 403, to avoid id-enumeration via differential status codes)
  if ownerId doesn't match. Regression test in chat-service.test.ts
  covers Alice/Bob isolation + nonexistent-thread same-shape 404.

P2 — AgentChatRequestSchema strict mode
========================================
validation/agent.schema.ts
  `.merge()` does NOT inherit `.strict()` from AgentChatParamsSchema.
  Typo'd fields (e.g. `temprature`) silently fell through and the agent
  silently used the default — debuggable only by reading the LLM call
  payload. Re-applied `.strict()` on the merged schema.

P2 — per-agent maxIterations override + clamp
==============================================
chat.service.ts
  Loop cap was a hard-coded module constant (12), wrong for both
  research-style agents (need higher) and cheap-probe agents (could opt
  lower). Now reads `agent.extras.maxIterations`, clamps 1..50, falls
  back to 12 default. The clamp is the soft-DoS guard: a hostile agent
  definition with `maxIterations:1000000` can't burn unbounded LLM calls
  per request. Both chat() and chatStream() use ctx.maxIterations now.
  Regression test covers low-cap override (rejects with `exceeded 2`)
  and hostile-value clamp (rejects with `exceeded 50`).

P3 — SSE write to closed socket
================================
routes/agent-chat.ts
  When the upstream adapter throws after some chunks were already
  written AND the client disconnected, the catch block tried to flush
  more chunks to a closed socket. Without an `on('error')` handler
  Node emits unhandled error events; once Pino is wired to alerts
  this'd page on every disconnect-mid-stream. writeSseChunk now
  checks `reply.raw.destroyed || writableEnded` before write.

P3 — BACKEND_TOKEN_DEAD preserves original stack
=================================================
services/secret-backend-rotator.service.ts
  When wrapping mintRoleToken/lookupSelf failures as
  BACKEND_TOKEN_DEAD, the new Error() discarded the original throw —
  hard to tell whether the inner failure was a network blip vs an
  OpenBao API mismatch vs DNS. Now uses `new Error(msg, { cause: err })`
  so the inner stack survives.

P3 — .gitignore .claude/scheduled_tasks.lock
=============================================
This persisted state file was leaking into every `git status`.

Tests
=====
mcpd 761/761 (+2 regression tests). mcplocal 715/715. cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mcpd now runs a cheap auth probe whenever an Llm is created (or its
apiKeyRef/url is updated). Catches misconfigured tokens / wrong URLs at
registration with a 422 + structured error message, instead of silently
500-ing on first chat with a generic "fetch failed". Caught in the wild
today: the homelab Pulumi config exposed `MCPCTL_GATEWAY_TOKEN` (which
is mcpctl_pat_-prefixed, intended for LiteLLM→mcplocal direction) where
LiteLLM expects `LITELLM_MASTER_KEY` (sk-prefixed). The probe makes
this immediate.

Probe shape (LlmAdapter.verifyAuth):
  - OpenAI passthrough → GET <url>/v1/models. Cheap, idempotent, gated
    by the same auth as chat/completions.
  - Anthropic → POST /v1/messages with max_tokens:1, "ping". Anthropic
    has no list-models endpoint; this is the cheapest auth-exercising
    call.
  - Returns one of:
      { ok: true }
      { ok: false, reason: "auth", status, body }    — 401/403, fail hard
      { ok: false, reason: "unreachable", error }    — network, warn-only
      { ok: false, reason: "unexpected", status, body } — non-auth 4xx, warn-only

Behavior:
  - LlmService.create()/update() runs the probe after resolveApiKey.
    Throws LlmAuthVerificationError on `auth`, logs warn for
    unreachable/unexpected, swallows for offline registration.
  - Probe is skipped when there's no apiKeyRef (nothing to verify) or
    when the caller passes skipAuthCheck=true.
  - update() probes only when apiKeyRef OR url changes — pure
    description/tier updates don't trigger upstream calls.
  - Routes catch LlmAuthVerificationError and return 422 with
    `{ error, status }`. The CLI surfaces the message verbatim via
    ApiError.

Opt-out:
  - CLI: `mcpctl create llm ... --skip-auth-check` for offline
    registration before the upstream is reachable.
  - HTTP: side-channel body field `_skipAuthCheck: true` (stripped
    before validation, never persisted on the row).

Side fix in same commit (caught while testing): src/cli/src/index.ts
read `program.opts()` BEFORE `program.parse()`, so `--direct` was a
no-op for ApiClient — every command went to mcplocal regardless. Some
commands accidentally still worked because mcplocal forwards plain
`/api/v1/*` to mcpd, but flows that need direct SSE streaming (e.g.
`mcpctl chat`) couldn't reach mcpd. Fixed by peeking at process.argv
directly for the two global flags before Commander's parse runs.

Tests:
  - llm-adapters.test.ts (+8): OpenAI 200/401/403/404/network, Anthropic
    200/401/400 (typo'd model = unexpected, NOT auth — registration
    shouldn't block on bad model names that surface at chat time).
  - llm-service.test.ts (+6): create-throws-on-auth-fail (no row
    written), warn-only on unreachable/unexpected, skipAuthCheck
    bypass, no-key skip, update-only-probes-on-auth-affecting-change.

mcpd 775/775, mcplocal 715/715, cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reasoning models (qwen3-thinking, deepseek-reasoner, OpenAI o1 family) emit
their scratchpad as `delta.reasoning_content` (or `delta.reasoning`,
or `delta.provider_specific_fields.reasoning_content` when LiteLLM passes
through from vLLM) — separate from `delta.content`. Before this commit
mcpd's parseStreamingChunk only watched `content`, so the model's 30-90s
reasoning phase looked like dead air to the REPL: streaming connection
open, no chunks, no progress. Caught during the agents-feature shakedown
when qwen3-thinking sat silent for 90s on a docmost__list_pages call.

mcpd
====
chat.service.ts
  - parseStreamingChunk extracts a `reasoningDelta` from the chunk body,
    accepting all four spellings (reasoning_content / reasoning /
    provider_specific_fields.{reasoning_content,reasoning}). Future
    providers can add their own field names by extending the
    fallback chain.
  - chatStream yields `{ type: 'thinking', delta }` chunks as reasoning
    arrives, alongside the existing `{ type: 'text', delta }` for content.
  - Reasoning is intentionally NOT persisted to the thread. It's the
    model's scratchpad, not part of the conversation. Subsequent turns
    don't see it.
  - Adds 'thinking' to the ChatStreamChunk.type union.

CLI
===
chat.ts
  - streamOnce handles 'thinking' chunks: writes them dim+italic to
    stderr (ANSI 2;3m) so the model's reasoning visually flows like a
    quote block while the final answer streams to stdout. Plain text
    when stderr isn't a TTY (pipe to file → no escape codes leak).
  - chatRequestNonStream replaces the shared ApiClient.post() for the
    --no-stream path. ApiClient defaults to a 10s timeout, way too tight
    for any chat that calls a tool: LLM round + tool dispatch + LLM
    summary easily exceeds 10s. The new helper uses the same 600s timeout
    the streaming path has been using all along.

Tests:
  chat-service.test.ts (+2):
    - reasoning_content deltas surface as `thinking` chunks (not text);
      reasoning is NOT persisted to the assistant turn's content.
    - LiteLLM's provider_specific_fields.reasoning_content shape parses
      identically to the vendor-native shape.

mcpd 777/777, cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
While streaming, the REPL now shows a live word/sec counter on a status
line one row below the cursor — refreshes every 250ms via ANSI cursor
save+restore so it floats with the content as the response grows.
After each response, a dim stats footer prints on stderr:

  (47w · 12.3 w/s · 3.9s | thinking 234w · 38 w/s · 6.2s)

The ticker is stderr-only and only emits when stderr is a TTY — pipes
to a file stay clean for grepping/redirect. Words are whitespace-
separated tokens (good enough across English/code/Markdown without a
tokenizer dependency; CJK under-counts but the rate is still
directional).

Both phases tracked separately:
  - thinking: reasoning_content from qwen3-thinking / deepseek-reasoner
    / o1, where the model's scratchpad is the long part
  - content: the actual assistant answer

Final stats also added to the --no-stream path: total HTTP duration
and word count, since we don't get per-token timing there.

CLI suite still 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous ticker used cursor save/restore (\x1b[s / \x1b[u) to draw
a stats line one row below the cursor. Save/restore is unreliable when
content scrolls or wraps — the saved row drifts off the visible area
and the restore lands inside content lines, smearing the ticker into
mid-word positions:

  Here are the available tools you can
  ⏵ 7w · 56.5 w/s · 0.1s | thinking 41 use with Docmost:6s

Replace it with a DECSTBM scroll region. Lock the bottom row, scroll
rows 1..N-1 for content, redraw the locked row in place every 250 ms.
This is how htop / tig / mosh status pin their footers — content and
status physically can't overlap.

Lifecycle: install once per chat-session (REPL or one-shot), tear down
on close / Ctrl-D / /quit / SIGINT / SIGTERM / uncaughtException. Pipes
and small terminals (<5 rows) get a no-op StatusBar so output stays
clean. Resize re-emits the scroll region with the new height.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(chat): print agent + system prompt banner at chat start
Some checks failed
CI/CD / typecheck (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m5s
CI/CD / lint (pull_request) Successful in 2m29s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 5m30s
CI/CD / publish (pull_request) Has been skipped
21f406037a
When you launch \`mcpctl chat <agent>\` it's not always obvious which
agent, LLM, project, or system prompt you're actually wired to,
especially when --system / --system-append flags are layered on top
of the agent's defaults. The session would just start at \`> \` with
no confirmation of the configuration.

Now both REPL and one-shot modes print a banner to stderr listing:
  - agent name + description
  - LLM + project (if attached)
  - effective system prompt (or --system override) and any
    --system-append addendum, indented for readability
  - active sampling overrides (temperature, top_p, etc.)

Goes through stderr so \`mcpctl chat ... -m "hi" 2>/dev/null\` keeps
piping clean. Best-effort: a metadata fetch failure logs and lets
the chat proceed rather than blocking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
michal merged commit 9389ffff3c into main 2026-04-26 17:53:30 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: michal/mcpctl#57