feat: virtual agents v3 (Stages 1-3) + real fixes for chat/adapter/CLI thread format #67

Merged
michal merged 5 commits from feat/virtual-agent-v3 into main 2026-04-27 18:06:59 +00:00
Owner

Summary

v3 Stage 1 (already in branch): Agent.kind/status/... lifecycle
fields + chat.service kind=virtual branch + DB migration.

v3 Stage 2 (already in branch): AgentService virtual methods +
GC cascade so Agent.llmId Restrict FK doesn't block Llm sweeps.

v3 Stage 3 (this round): _provider-register payload accepts an
optional agents[] array; mcplocal config gains a top-level agents
block; mcplocal registrar reads it and forwards to mcpd.

Real fixes uncovered by running the live smoke against
qwen3-thinking — none of these are fixed by tweaking the tests:

  • openai-passthrough adapter no longer doubles /v1 when the user
    passes a conventional https://x/v1 base URL.
  • Non-streaming + streaming chat now fall back to reasoning_content
    when thinking models emit only thinking output (with a
    [response truncated by max_tokens] marker on finish_reason: length).
  • get agent X -o yaml strips kind/status/lastHeartbeatAt/inactiveSince/ providerSessionId so apply-round-trip works (same fix shape as the
    v1 Llm strip).
  • CLI emits a single consistent thread: <cuid> format on stderr
    (was thread:cuid no-space in non-streaming, (thread: cuid)
    with-space in streaming).
  • agent-chat smoke run() switched from execSync (discards stderr
    on success — made stderr assertions structurally impossible in the
    happy path) to spawnSync with a tiny quoted-argv splitter.

Test plan

  • mcpd unit suite: 860/860 (was 854; +6 new tests)
  • mcplocal unit suite: 723/723
  • Full smoke against live infra: 139/139
  • agent + agent-chat smoke specifically: 10/10
  • Round-trip mcpctl get agent X -o yaml | apply -f - produces
    no diff
  • Streaming + non-streaming chat both emit matching thread: <cuid>
## Summary **v3 Stage 1** (already in branch): `Agent.kind/status/...` lifecycle fields + chat.service `kind=virtual` branch + DB migration. **v3 Stage 2** (already in branch): AgentService virtual methods + GC cascade so Agent.llmId Restrict FK doesn't block Llm sweeps. **v3 Stage 3** (this round): `_provider-register` payload accepts an optional `agents[]` array; mcplocal config gains a top-level `agents` block; mcplocal registrar reads it and forwards to mcpd. **Real fixes** uncovered by running the live smoke against qwen3-thinking — none of these are fixed by tweaking the tests: - `openai-passthrough` adapter no longer doubles `/v1` when the user passes a conventional `https://x/v1` base URL. - Non-streaming + streaming chat now fall back to `reasoning_content` when thinking models emit only thinking output (with a `[response truncated by max_tokens]` marker on `finish_reason: length`). - `get agent X -o yaml` strips `kind/status/lastHeartbeatAt/inactiveSince/ providerSessionId` so apply-round-trip works (same fix shape as the v1 Llm strip). - CLI emits a single consistent `thread: <cuid>` format on stderr (was `thread:cuid` no-space in non-streaming, `(thread: cuid)` with-space in streaming). - agent-chat smoke `run()` switched from `execSync` (discards stderr on success — made stderr assertions structurally impossible in the happy path) to `spawnSync` with a tiny quoted-argv splitter. ## Test plan - [x] mcpd unit suite: 860/860 (was 854; +6 new tests) - [x] mcplocal unit suite: 723/723 - [x] Full smoke against live infra: 139/139 - [x] agent + agent-chat smoke specifically: 10/10 - [x] Round-trip `mcpctl get agent X -o yaml | apply -f -` produces no diff - [x] Streaming + non-streaming chat both emit matching `thread: <cuid>`
michal added 4 commits 2026-04-27 17:39:26 +00:00
Two pieces of v3 plumbing — schema + the latent v1 chat.service bug.

Schema (db):
- Agent gains kind/providerSessionId/lastHeartbeatAt/status/inactiveSince
  mirroring Llm's v1 lifecycle. Reuses LlmKind / LlmStatus enums; no
  new types. Existing rows backfill kind=public/status=active so v1
  CRUD is unaffected.
- @@index([kind, status]) for the GC sweep, @@index([providerSessionId])
  for disconnect-cascade lookups.
- 4 new prisma-level tests cover defaults, persisting virtual fields,
  the (kind, status) GC index, and providerSessionId lookups.
  Total agent-schema tests: 20/20.

chat.service (mcpd) — fixes the v1 latent bug:
- LlmView's kind is now plumbed through prepareContext as ctx.llmKind.
- Two new private helpers, runOneInference / streamInference, branch
  on ctx.llmKind: 'public' goes through the existing adapter
  registry, 'virtual' relays through VirtualLlmService.enqueueInferTask
  (mirrors the route-handler branch from v1 Stage 3).
- Streaming bridges VirtualLlmService's onChunk callback API to an
  async iterator via a small queue + wake pattern.
- ChatService gains an optional virtualLlms constructor parameter;
  main.ts wires it in. Older test wirings without it raise a clear
  "virtualLlms dispatcher not wired" error when the row is virtual,
  rather than silently falling through to the public path against an
  empty URL.

This unblocks any Agent (public OR future v3-virtual) pinned to a
kind=virtual Llm. Pre-this-stage, those agents 502'd against the
empty url field.

Tests: 4 new chat-service-virtual-llm.test.ts cover the relay path
non-streaming, streaming, missing-dispatcher error, and rejection
surfacing. mcpd suite: 841/841 (was 833, +8 across stages 1+v3-Stage-1).
Workspace: 2054/2054 across 153 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
State machine for kind=virtual Agent rows. Mirrors what
VirtualLlmService did for Llms in v1, then wires both lifecycles
together so disconnect/heartbeat/GC cascade through both at once.

AgentRepository:
- create/update accept the new lifecycle fields (kind, providerSessionId,
  status, lastHeartbeatAt, inactiveSince).
- Adds findBySessionId, findByLlmId, findStaleVirtuals, findExpiredInactives.

AgentService — new virtual-agent methods:
- registerVirtualAgents(sessionId, inputs, ownerId) — sticky upsert.
  New names insert as kind=virtual/status=active. Existing virtuals
  owned by the same session reactivate; existing inactive virtuals
  from a foreign session can be adopted (sticky reconnect). Refuses
  to overwrite a public agent or a foreign session's still-active
  virtual (HTTP 409). Pinned LLM is resolved via LlmService — caller
  posts Llms first.
- heartbeatVirtualAgents(sessionId) — bumps owned agents on a session
  heartbeat; revives inactive rows.
- markVirtualAgentsInactiveBySession(sessionId) — disconnect cascade.
- deleteVirtualAgentsForLlm(llmId) — defensive cascade for the GC's
  Llm-delete step (Agent.llmId is Restrict).
- gcSweepVirtualAgents() — same shape as VirtualLlmService.gcSweep
  (90s heartbeat-stale → inactive, 4h inactive → delete).

VirtualLlmService:
- Optional AgentService dependency. heartbeat() now also bumps owned
  agents; unbindSession() flips them inactive. gcSweep() runs the
  agent sweep FIRST (so any agent that would block an Llm delete via
  Restrict is already gone), and adds a defensive
  deleteVirtualAgentsForLlm step right before each Llm delete in case
  an agent's heartbeat lagged its Llm's just enough to escape this
  round's 4h cutoff.

main.ts:
- VirtualLlmService construction moves below AgentService so it can
  receive the cascade dependency.

Tests: 13 new in virtual-agent-service.test.ts cover all the register
variants (insert, sticky reconnect, adopt-inactive-foreign, refuse
public-overwrite, refuse foreign-session-active), heartbeat-revive,
disconnect-cascade, deleteVirtualAgentsForLlm scope, GC sweep flip
+ delete + idempotence, and three VirtualLlmService cascade scenarios
(unbindSession, gcSweep deleting agent before Llm, defensive cascade
when agent's heartbeat lagged).

mcpd suite: 854/854 (was 841 + 13 new). Workspace unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the existing `_provider-register` payload with an optional `agents`
array so a single round-trip atomically publishes both virtual Llms and
their pinned virtual Agents. v1/v2 publishers (providers-only) keep
working unchanged — the agents path is gated on the route receiving an
AgentService instance, otherwise it logs a warning and ignores the array.

mcplocal config gains a top-level `agents` block (loadLocalAgents)
mirroring the providers shape. The registrar reads it, builds
RegistrarPublishedAgent entries against the published provider names,
and folds them into the same register POST. mcpd routes the agents
through AgentService.registerVirtualAgents(sessionId, ..., ownerId),
which was added in Stage 2.

No CLI changes here — `mcpctl chat <virtual-agent>` already works once
chat.service has the kind=virtual branch (Stage 1) and the agents are
present in the Agent table. CLI columns + smoke land in Stage 4.
fix(chat): real fixes for thinking-model + URL conventions, not test tweaks
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / typecheck (pull_request) Successful in 2m37s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Successful in 5m42s
CI/CD / publish (pull_request) Has been skipped
610808b9e7
Five real bugs surfaced by the agent-chat smoke against live
qwen3-thinking. None of these are fixed by changing the test — the
test was right to fail.

1. openai-passthrough adapter doubled `/v1` in the request URL. The
   adapter hard-codes `/v1/chat/completions` after the configured base,
   but every OpenAI-compat provider documents its base URL with a
   trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users
   pasting that conventional shape produced
   `https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a
   trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style)
   is preserved.

2. Non-streaming chat returned an empty assistant when thinking models
   (qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only
   `reasoning_content` with `content: null`. extractChoice now also
   pulls reasoning (every spelling the streaming parser already knows
   about), and a new pickAssistantText helper falls back to it when
   content is empty. A `[response truncated by max_tokens]` marker is
   appended when finish_reason is `length`, so users see the cut-off
   instead of guessing why the answer is short. Symmetric streaming
   fix: the chatStream loop accumulates reasoning and yields ONE
   synthesized `text` frame at the end when content stayed empty,
   keeping the CLI's stdout (which only prints `text` deltas) in sync
   with the persisted thread message.

3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3
   lifecycle field) instead of `kind: agent` (apply envelope), so
   round-tripping through `apply -f` failed. Same fix shape as the v1
   Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/
   inactiveSince/providerSessionId for the agents resource too.

4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on
   stderr; streaming printed `(thread: <cuid>)` (with space). Tests
   and any other regex watching for one form missed the other.
   Standardize on `thread: <cuid>` (single space) in both paths.

5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on
   success — making any `expect(stderr).toMatch(...)` assertion
   structurally impossible to satisfy in the happy path. Switch to
   `spawnSync` so stderr is actually captured. Includes a small
   shell-style argv splitter so the existing call sites with quoted
   multi-word values (`--system-prompt "..."`) keep working.

Tests: +6 new mcpd unit tests (4 chat-service for the reasoning
fallback / truncation marker / content-preference / streaming synth;
2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd
+ mcplocal + smoke green: 860/860 + 723/723 + 139/139.
michal added 1 commit 2026-04-27 17:47:11 +00:00
feat(cli+docs): mcpctl get agent KIND/STATUS columns + virtual-agent smoke + docs (v3 Stage 4)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m10s
CI/CD / typecheck (pull_request) Successful in 2m30s
CI/CD / build (pull_request) Successful in 2m36s
CI/CD / smoke (pull_request) Failing after 5m56s
CI/CD / publish (pull_request) Has been skipped
1998b733b2
CLI: `mcpctl get agent` table view gains KIND and STATUS columns
mirroring the `get llm` shape from v1. Public agents render as
`public/active` (the AgentRow defaults) and virtual ones surface their
true lifecycle state, so `mcpctl get agent` becomes a single-pane view
for both manually-created and mcplocal-published personas.

Smoke: tests/smoke/virtual-agent.smoke.test.ts mirrors virtual-llm's
in-process registrar pattern — publishes a fake provider + agent in
one round-trip, confirms mcpd surfaces the agent kind=virtual /
status=active under /api/v1/agents, then disconnects and verifies the
paired Llm-and-Agent both flip to inactive (deletion is GC-driven, not
disconnect-driven, so the rows must still exist post-stop). Heartbeat-
stale and 4 h sweep paths are covered by the unit suite to keep smoke
duration in check.

Docs: docs/virtual-llms.md gets a "Virtual agents (v3)" section with a
config sample, lifecycle notes, listing example, and the cluster-wide
name-uniqueness caveat. The API surface block now mentions the new
`agents[]` field on _provider-register, the join-by-session heartbeat
behavior, and the `GET /api/v1/agents` lifecycle fields. docs/agents.md
gains a one-paragraph note pointing to the v3 publishing path.

Tests: full smoke suite 141/141 (was 139, +2 new), unit suites
unchanged (mcpd 860/860, mcplocal 723/723).
michal merged commit f5bdeea8e7 into main 2026-04-27 18:06:59 +00:00
michal deleted branch feat/virtual-agent-v3 2026-04-27 18:07:02 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: michal/mcpctl#67