Five real bugs surfaced by the agent-chat smoke against live
qwen3-thinking. None of these are fixed by changing the test — the
test was right to fail.
1. openai-passthrough adapter doubled `/v1` in the request URL. The
adapter hard-codes `/v1/chat/completions` after the configured base,
but every OpenAI-compat provider documents its base URL with a
trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users
pasting that conventional shape produced
`https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a
trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style)
is preserved.
2. Non-streaming chat returned an empty assistant when thinking models
(qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only
`reasoning_content` with `content: null`. extractChoice now also
pulls reasoning (every spelling the streaming parser already knows
about), and a new pickAssistantText helper falls back to it when
content is empty. A `[response truncated by max_tokens]` marker is
appended when finish_reason is `length`, so users see the cut-off
instead of guessing why the answer is short. Symmetric streaming
fix: the chatStream loop accumulates reasoning and yields ONE
synthesized `text` frame at the end when content stayed empty,
keeping the CLI's stdout (which only prints `text` deltas) in sync
with the persisted thread message.
3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3
lifecycle field) instead of `kind: agent` (apply envelope), so
round-tripping through `apply -f` failed. Same fix shape as the v1
Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/
inactiveSince/providerSessionId for the agents resource too.
4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on
stderr; streaming printed `(thread: <cuid>)` (with space). Tests
and any other regex watching for one form missed the other.
Standardize on `thread: <cuid>` (single space) in both paths.
5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on
success — making any `expect(stderr).toMatch(...)` assertion
structurally impossible to satisfy in the happy path. Switch to
`spawnSync` so stderr is actually captured. Includes a small
shell-style argv splitter so the existing call sites with quoted
multi-word values (`--system-prompt "..."`) keep working.
Tests: +6 new mcpd unit tests (4 chat-service for the reasoning
fallback / truncation marker / content-preference / streaming synth;
2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd
+ mcplocal + smoke green: 860/860 + 723/723 + 139/139.
Closes the agents feature.
Smoke tests (run via `pnpm test:smoke` against a live mcpd at
$MCPD_URL, default https://mcpctl.ad.itaz.eu):
* tests/smoke/agent.smoke.test.ts — full CRUD round-trip:
create secret + Llm + agent with sampling defaults; `get agents`
surfaces it; `get agent foo -o yaml | apply -f` round-trips
identically; create + list a thread via the HTTP API; agent delete
leaves Llm + secret intact (Restrict + SetNull as designed). Self-
skips with a warning when /healthz is unreachable.
* tests/smoke/agent-chat.smoke.test.ts — gated on
MCPCTL_SMOKE_LLM_URL + MCPCTL_SMOKE_LLM_KEY. Provisions secret +
Llm + agent against a real upstream, runs `mcpctl chat -m … --no-
stream` (asserts a reply lands), then runs the streaming default
(asserts text on stdout + `(thread: …)` on stderr). The fast path
for verifying the in-cluster qwen3-thinking deployment:
MCPCTL_SMOKE_LLM_URL=http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
MCPCTL_SMOKE_LLM_MODEL=qwen3-thinking \
MCPCTL_SMOKE_LLM_KEY=$(pulumi config get --stack homelab \
secrets:litellmMcpctlGatewayToken) \
pnpm test:smoke
Docs:
* README.md — new "Agents" section under Resources with the
qwen3-thinking quickstart and links to docs/agents.md and
docs/chat.md. Adds llm + agent rows to the resources table.
* docs/agents.md (new) — full reference: data model, chat-parameter
table, HTTP API, RBAC mapping, tool-use loop semantics, yaml
round-trip shorthand, the kubernetes-deployment wiring recipe,
and a troubleshooting section (namespace collision, llm-in-use,
pending-row recovery, Anthropic-tool limitation).
* docs/chat.md (new) — user-facing `mcpctl chat` walkthrough:
modes, per-call flags, slash-commands, threads, and a
troubleshooting section.
* CLAUDE.md — adds a "Resource types" cheatsheet with one-line
pointers to each, including the new `agent` row that links to
the docs.
All suites still green: mcpd 759/759, mcplocal 715/715, cli 430/430.
Smoke tests typecheck and self-skip when no live mcpd is reachable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>