Commit Graph

338 Commits

Author SHA1 Message Date
Michal
7b18bb6d6b feat(mcpd): VirtualLlmService rewires through durable queue (v5 Stage 2)
The in-memory `tasksById` map for inference tasks is gone. Every
inference call lands as a row in `InferenceTask`; the result POST
updates the row + emits a wakeup; the in-flight HTTP handler unblocks
on the wake. mcpd surviving a restart no longer drops in-flight tasks,
and a worker disconnecting mid-task no longer fails the caller — the
row reverts to pending and a sibling worker on the same pool drains it.

Wake tasks (publisher control messages, not inference) keep their own
small in-memory map (`wakeTasks`). They're millisecond-scoped and
don't benefit from durability — a missed wake on restart just means
the next infer fires a fresh wake.

Behavioral changes worth flagging:

- Worker disconnect mid-task: WAS reject ref.done with "publisher
  disconnected"; NOW revert claimed/running rows to pending. Original
  caller's ref.done keeps waiting up to INFER_AWAIT_TIMEOUT_MS (10
  min); whichever worker delivers the result fulfills it.

- bindSession drains pending tasks for the session's pool keys. So
  tasks queued while no worker was up automatically get dispatched
  when one shows up. The drain matches by *effective pool key*
  (poolName ?? name) — tasks queued against vllm-alice get drained
  by any session whose owned Llms share alice's pool.

- New `failFast: true` option on enqueueInferTask (default: false).
  Existing callers that NEED fast-fail get it explicitly:
    - Direct `/api/v1/llms/<name>/infer` route: caller pinned a
      specific Llm and wants 503 immediately if the publisher is
      offline; queueing for an unknown future worker would surprise.
    - chat.service pool failover loop: it iterates pool candidates
      and needs each candidate's transport failure to surface fast.
      Without failFast, a downed pool member would absorb the call
      into the queue and the loop would wait 10 min before trying
      the next.
  The async API route (Stage 3) leaves failFast=false — that's the
  whole point of the durable queue path.

- VirtualLlmService now requires an InferenceTaskService dep at
  construction. Older test wirings that didn't pass it get a clear
  "InferenceTaskService not wired" error from enqueueInferTask
  rather than a confusing in-memory stub.

Tests:

- 12 existing virtual-llm-service tests updated for the new
  semantics: "rejects when no session" → "queues durably"; "rejects
  when row inactive" → "still queues (pool may have a sibling)";
  "unbindSession rejects in-flight tasks" → "reverts to pending".
  Wake-task probing now uses `wakeTasks` instead of `tasksById`.

- 3 new v5-specific tests: drain-on-bind matches by effective pool
  key (not just name); enqueue without a session keeps the row
  pending; completeTask via the result-route updates the DB and
  emits the wakeup that resolves ref.done.

- chat-service-virtual-llm + llm-infer-route assertions updated to
  expect the new {failFast: true} option arg.

mcpd 884/884 (was 881; +3 v5 cases). mcplocal 723/723. Full smoke
suite 144/144 against the deployed queue-backed mcpd.

Stage 3 (next): expose the durable queue via async API endpoints.
POST /api/v1/inference-tasks (enqueue with failFast=false), GET
/api/v1/inference-tasks/:id (poll), GET /api/v1/inference-tasks/:id/stream
(SSE), DELETE /api/v1/inference-tasks/:id (cancel). New `tasks` RBAC
resource.
2026-04-28 02:33:26 +01:00
Michal
ed21ad1b5a feat(mcpd+db): durable InferenceTask queue + state machine (v5 Stage 1)
The persistence + signaling layer for v5. No integration with the
existing in-flight inference path yet — that's Stage 2. This commit
just lands the durable queue underneath, with a state machine that
mcpd's HTTP handlers, the worker result-POST route, and the GC sweep
will all build on.

Schema (src/db/prisma/schema.prisma + migration):

- New `InferenceTask` model + `InferenceTaskStatus` enum
  (pending|claimed|running|completed|error|cancelled).
- Routing fields stored at enqueue time so a later rename of
  `Llm.poolName` doesn't reroute already-queued work: `poolName`
  (effective pool key), `llmName` (pinned target), `model`, `tier`.
- Worker tracking: `claimedBy` (providerSessionId) + `claimedAt`,
  cleared on revert.
- Bodies as `Json`: requestBody (always set), responseBody (set at
  completion). Streaming chunks are NOT persisted — too expensive at
  delta granularity. The final assembled body lands once per task.
- Lifecycle timestamps: createdAt, claimedAt, streamStartedAt,
  completedAt. Plus ownerId (RBAC + audit) and agentId (null for
  direct chat-llm calls).
- Indexes for the hot paths: (status, poolName) for the dispatcher's
  drain query, claimedBy for the disconnect revert, completedAt for
  the GC retention sweep, owner/agent for the async API listing.

Repository (src/mcpd/src/repositories/inference-task.repository.ts):

- CRUD + state transitions as conditional CAS via `updateMany`. Two
  workers racing to claim the same row both run the UPDATE; whichever
  the DB serializes first sees affected=1 and gets the row, the loser
  sees 0 and falls through to the next candidate. No application-
  level locking required.
- findPendingForPools(poolNames[]) for the worker drain on bind.
- findHeldBy(claimedBy) for the unbindSession revert.
- findStalePending + findExpiredTerminal for the GC sweep.

Service (src/mcpd/src/services/inference-task.service.ts):

- Owns the in-process EventEmitter that wakes blocked HTTP handlers
  when a worker POSTs results. The DB row is the source of truth for
  *state*; the EventEmitter just signals "go re-read row X" so we
  don't have to poll. Single-instance assumption for v5; pg
  LISTEN/NOTIFY is the v6 swap when scaling horizontally — no schema
  change needed, just replace the emitter wakeup.
- waitFor(taskId, timeoutMs) returns { done, chunks }: the terminal
  promise + an async iterator of streaming deltas. Throws on cancel
  (clear message) or error (worker's errorMessage propagates) or
  timeout. Polls the row once at subscribe time so an already-
  terminal task resolves immediately without waiting for an event
  that's never coming.
- gcSweep flips stale pending rows to error (with a clear message
  about the timeout) and deletes terminal rows past retention.
  Defaults: 1h pending timeout, 7d terminal retention; both
  configurable.

Tests:
- 6 db-level schema tests (defaults, json roundtrip, drain query
  shape, claimedBy filter, GC predicate, agentId nullable).
- 13 service tests covering enqueue, the CAS race on tryClaim,
  complete/fail/cancel, idempotent terminal transitions, revertHeldBy
  on disconnect, and the full waitFor signal lifecycle (immediate
  resolve, wake on event, chunk streaming, cancel/error/timeout
  paths). Plus a gcSweep test with a fixed clock.

mcpd 881/881 (was 868; +13). db pool-schema 14/14, +6 new
inference-task-schema. Pre-existing failures in models.test.ts
(Secret FK fixture issue, also fails on main HEAD) are unrelated.

Stage 2 (next): VirtualLlmService rewires through this — remove the
in-memory pendingTasks map; enqueue creates a row, dispatch picks an
active session, the result-route updates the row + emits the wakeup.
Worker disconnect reverts; worker bind drains.
2026-04-28 02:14:45 +01:00
256e117021 Merge pull request 'feat: v4 LB pools by shared poolName' (#69) from feat/llm-pool-by-name into main
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m10s
CI/CD / typecheck (push) Successful in 2m45s
CI/CD / smoke (push) Failing after 1m43s
CI/CD / build (push) Successful in 5m54s
CI/CD / publish (push) Has been skipped
Reviewed-on: #69
2026-04-28 01:02:45 +00:00
Michal
137711fdf6 feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3)
Some checks failed
CI/CD / lint (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / typecheck (pull_request) Successful in 2m53s
CI/CD / smoke (pull_request) Failing after 1m47s
CI/CD / build (pull_request) Successful in 6m20s
CI/CD / publish (pull_request) Has been skipped
Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars
publish virtual Llms with distinct names but a shared poolName, then:

  1. /api/v1/llms/<name>/members surfaces both with the correct
     effective pool key, size, activeCount, and per-member kind/status.
  2. Chat through an agent pinned to one pool member dispatches across
     the pool — verified by running 12 calls and asserting at least
     one response from each backend (the random-shuffle selection
     would have to hit only-A or only-B in 12 fair coin flips, ~1/2048).
  3. Failover: stop one publisher, the surviving member still serves
     chat. /members shows the stopped row as inactive immediately
     (unbindSession runs synchronously on SSE close).

docs/virtual-llms.md gets a full "LB pools (v4)" section with the
two-field schema model, dispatcher selection + failover semantics,
public + virtual declaration examples, list/describe rendering, the
"pin to specific instance" escape hatch, and an API surface entry
for /members. docs/agents.md cross-link extended.

Tests: full smoke 144/144 (was 141, +3 for the new pool smoke).
Stages 1-3 ship the complete v4 — public and virtual Llms can both
join pools, agents transparently load-balance across them, yaml
round-trip preserves poolName, and the existing single-Llm world
keeps working byte-identically when poolName is null.
2026-04-27 23:22:15 +01:00
Michal
e21f96080d feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2)
Surfaces the v4 pool model end-to-end:

- mcpd: GET /api/v1/llms/:name/members returns the effective pool the
  named anchor belongs to, plus aggregate stats (size, activeCount,
  explicit vs implicit pool key). RBAC inherits from `view:llms` —
  same as the single-Llm route. Members are full LlmView shapes so
  callers don't need a second roundtrip to render the pool block.

- mcpd: VirtualLlmService.register accepts an optional `poolName` on
  RegisterProviderInput; the route's `coerceProviderInput` validates
  the same character set as CreateLlmSchema.poolName. Backwards
  compatible — older mcplocals that don't send the field continue to
  publish solo Llms.

- CLI `get llm` table: new POOL column right after NAME. Solo rows
  show "-" so the "no pool / pool of 1" case is unambiguous (per
  user direction "make sure we see it, prominently visible and
  impossible to mistake").

- CLI `describe llm`: fetches /members and renders a Pool block at
  the top of the detail view when the row is in an explicit pool OR
  when its implicit pool has size > 1. Each member line shows
  kind/status; the anchor row gets "← this row". Block is suppressed
  for solo rows so describe stays compact in the common case.

- CLI `create llm --pool-name <name>` flag and apply schema both
  accept the new field. Yaml round-trip preserves it: get -o yaml
  emits `poolName: <name>`, apply -f re-imports it without diff.
  Verified end-to-end against the live mcpd.

- mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts
  and registrar.ts thread it through into the register payload. Use
  case for distributed inference: each user's mcplocal picks a
  unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName`
  (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool
  that auto-grows as workers come online.

- Shell completions: regenerated from source via the existing
  scripts/generate-completions.ts. `--pool-name` now suggests in
  fish + bash for `mcpctl create llm`.

Tests: +3 new mcpd route tests for /members (explicit pool, solo
pool of 1, missing-anchor 404). All suites green:
  mcpd 868/868 (was 865, +3),
  mcplocal 723/723,
  cli 437/437.

Stage 3 (next): live smoke against 2 publishers sharing a pool name +
docs.
2026-04-27 23:18:53 +01:00
Michal
7949e1393d feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1)
Adds LB-pool-by-shared-name without introducing a new resource. The
existing `Llm.name` stays globally unique; a new optional `poolName`
column declares membership in a pool. Multiple Llms sharing a non-null
`poolName` stack into one load-balanced pool that the chat dispatcher
expands at request time.

Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are
addressable as a "pool of 1" via their own name, so existing single-Llm
agents and YAMLs keep working unchanged. A solo row whose name happens
to match an explicit poolName joins the same pool — by design — so an
operator can transparently promote an existing Llm to pool seed.

Dispatcher (chat.service): prepareContext now resolves a randomly-
shuffled list of viable pool candidates (status != inactive) once per
turn. runOneInference and streamInference iterate the list on
transport-level failure (network, virtual publisher disconnect) until
one succeeds or the list is exhausted. Streaming failover only covers
"failed before first chunk" — once we've yielded text, we're committed
to that backend. Auth/4xx errors surfaced as result.status are NOT
retried; siblings with the same key/model would fail identically.

When the agent's pinned Llm is itself inactive but a sibling pool
member is up, dispatch transparently uses the sibling — that's the
whole point. When every member is inactive, prepareContext throws a
clear "No active Llm in pool '<key>' (pinned: <name>)" error rather
than letting the dispatcher's "exhausted" branch surface it.

Tests:
- 5 new chat-service tests for pool dispatch / failover / pinned-down /
  all-inactive (chat-service.test.ts).
- 7 new db schema tests for the column, the unique-name invariant, the
  fallback-to-name semantics, and the solo-name-joins-explicit-pool
  edge case (llm-pool-schema.test.ts).
- mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions.

Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool
stats on the existing single-Llm route, CLI POOL column + describe
block + --pool-name flag, yaml round-trip.
2026-04-27 22:02:41 +01:00
c0b4dc89f3 Merge pull request 'chore: fulldeploy uses bao-backed pulumi wrapper for drift check' (#68) from chore/fulldeploy-pulumi-wrapper into main
Some checks failed
CI/CD / lint (push) Successful in 54s
CI/CD / test (push) Successful in 1m8s
CI/CD / typecheck (push) Successful in 2m23s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 5m46s
CI/CD / publish (push) Has been skipped
Reviewed-on: #68
2026-04-27 20:21:33 +00:00
Michal
7f49294b36 chore(fulldeploy): use kubernetes-deployment/scripts/pulumi.sh wrapper
Some checks failed
CI/CD / lint (pull_request) Successful in 2m22s
CI/CD / typecheck (pull_request) Successful in 2m57s
CI/CD / test (pull_request) Failing after 4m36s
CI/CD / smoke (pull_request) Has been skipped
CI/CD / build (pull_request) Has been skipped
CI/CD / publish (pull_request) Has been skipped
The pre-flight drift check now calls the bao-backed pulumi wrapper
that landed with the litellm key persistence work, so deploys no
longer need PULUMI_CONFIG_PASSPHRASE in .env or shell env. The
passphrase is fetched from OpenBao at runtime by the wrapper and
exec-passed to pulumi only — never touches the parent shell's
state.

Falls back to a clear warning if the wrapper isn't present (older
clone of kubernetes-deployment) instead of pretending to skip the
check silently.
2026-04-27 19:14:36 +01:00
f5bdeea8e7 Merge pull request 'feat: virtual agents v3 (Stages 1-3) + real fixes for chat/adapter/CLI thread format' (#67) from feat/virtual-agent-v3 into main
Some checks failed
CI/CD / typecheck (push) Successful in 55s
CI/CD / test (push) Successful in 1m10s
CI/CD / lint (push) Successful in 2m32s
CI/CD / smoke (push) Failing after 1m44s
CI/CD / build (push) Successful in 5m4s
CI/CD / publish (push) Has been skipped
Reviewed-on: #67
2026-04-27 18:06:59 +00:00
Michal
1998b733b2 feat(cli+docs): mcpctl get agent KIND/STATUS columns + virtual-agent smoke + docs (v3 Stage 4)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m10s
CI/CD / typecheck (pull_request) Successful in 2m30s
CI/CD / build (pull_request) Successful in 2m36s
CI/CD / smoke (pull_request) Failing after 5m56s
CI/CD / publish (pull_request) Has been skipped
CLI: `mcpctl get agent` table view gains KIND and STATUS columns
mirroring the `get llm` shape from v1. Public agents render as
`public/active` (the AgentRow defaults) and virtual ones surface their
true lifecycle state, so `mcpctl get agent` becomes a single-pane view
for both manually-created and mcplocal-published personas.

Smoke: tests/smoke/virtual-agent.smoke.test.ts mirrors virtual-llm's
in-process registrar pattern — publishes a fake provider + agent in
one round-trip, confirms mcpd surfaces the agent kind=virtual /
status=active under /api/v1/agents, then disconnects and verifies the
paired Llm-and-Agent both flip to inactive (deletion is GC-driven, not
disconnect-driven, so the rows must still exist post-stop). Heartbeat-
stale and 4 h sweep paths are covered by the unit suite to keep smoke
duration in check.

Docs: docs/virtual-llms.md gets a "Virtual agents (v3)" section with a
config sample, lifecycle notes, listing example, and the cluster-wide
name-uniqueness caveat. The API surface block now mentions the new
`agents[]` field on _provider-register, the join-by-session heartbeat
behavior, and the `GET /api/v1/agents` lifecycle fields. docs/agents.md
gains a one-paragraph note pointing to the v3 publishing path.

Tests: full smoke suite 141/141 (was 139, +2 new), unit suites
unchanged (mcpd 860/860, mcplocal 723/723).
2026-04-27 18:47:03 +01:00
Michal
610808b9e7 fix(chat): real fixes for thinking-model + URL conventions, not test tweaks
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / typecheck (pull_request) Successful in 2m37s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Successful in 5m42s
CI/CD / publish (pull_request) Has been skipped
Five real bugs surfaced by the agent-chat smoke against live
qwen3-thinking. None of these are fixed by changing the test — the
test was right to fail.

1. openai-passthrough adapter doubled `/v1` in the request URL. The
   adapter hard-codes `/v1/chat/completions` after the configured base,
   but every OpenAI-compat provider documents its base URL with a
   trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users
   pasting that conventional shape produced
   `https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a
   trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style)
   is preserved.

2. Non-streaming chat returned an empty assistant when thinking models
   (qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only
   `reasoning_content` with `content: null`. extractChoice now also
   pulls reasoning (every spelling the streaming parser already knows
   about), and a new pickAssistantText helper falls back to it when
   content is empty. A `[response truncated by max_tokens]` marker is
   appended when finish_reason is `length`, so users see the cut-off
   instead of guessing why the answer is short. Symmetric streaming
   fix: the chatStream loop accumulates reasoning and yields ONE
   synthesized `text` frame at the end when content stayed empty,
   keeping the CLI's stdout (which only prints `text` deltas) in sync
   with the persisted thread message.

3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3
   lifecycle field) instead of `kind: agent` (apply envelope), so
   round-tripping through `apply -f` failed. Same fix shape as the v1
   Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/
   inactiveSince/providerSessionId for the agents resource too.

4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on
   stderr; streaming printed `(thread: <cuid>)` (with space). Tests
   and any other regex watching for one form missed the other.
   Standardize on `thread: <cuid>` (single space) in both paths.

5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on
   success — making any `expect(stderr).toMatch(...)` assertion
   structurally impossible to satisfy in the happy path. Switch to
   `spawnSync` so stderr is actually captured. Includes a small
   shell-style argv splitter so the existing call sites with quoted
   multi-word values (`--system-prompt "..."`) keep working.

Tests: +6 new mcpd unit tests (4 chat-service for the reasoning
fallback / truncation marker / content-preference / streaming synth;
2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd
+ mcplocal + smoke green: 860/860 + 723/723 + 139/139.
2026-04-27 18:39:01 +01:00
Michal
58bc277242 feat(mcpd+mcplocal): register-agents endpoint + mcplocal agents block (v3 Stage 3)
Extends the existing `_provider-register` payload with an optional `agents`
array so a single round-trip atomically publishes both virtual Llms and
their pinned virtual Agents. v1/v2 publishers (providers-only) keep
working unchanged — the agents path is gated on the route receiving an
AgentService instance, otherwise it logs a warning and ignores the array.

mcplocal config gains a top-level `agents` block (loadLocalAgents)
mirroring the providers shape. The registrar reads it, builds
RegistrarPublishedAgent entries against the published provider names,
and folds them into the same register POST. mcpd routes the agents
through AgentService.registerVirtualAgents(sessionId, ..., ownerId),
which was added in Stage 2.

No CLI changes here — `mcpctl chat <virtual-agent>` already works once
chat.service has the kind=virtual branch (Stage 1) and the agents are
present in the Agent table. CLI columns + smoke land in Stage 4.
2026-04-27 18:38:37 +01:00
Michal
c7b1bd8e2c feat(mcpd): AgentService virtual methods + GC cascade (v3 Stage 2)
State machine for kind=virtual Agent rows. Mirrors what
VirtualLlmService did for Llms in v1, then wires both lifecycles
together so disconnect/heartbeat/GC cascade through both at once.

AgentRepository:
- create/update accept the new lifecycle fields (kind, providerSessionId,
  status, lastHeartbeatAt, inactiveSince).
- Adds findBySessionId, findByLlmId, findStaleVirtuals, findExpiredInactives.

AgentService — new virtual-agent methods:
- registerVirtualAgents(sessionId, inputs, ownerId) — sticky upsert.
  New names insert as kind=virtual/status=active. Existing virtuals
  owned by the same session reactivate; existing inactive virtuals
  from a foreign session can be adopted (sticky reconnect). Refuses
  to overwrite a public agent or a foreign session's still-active
  virtual (HTTP 409). Pinned LLM is resolved via LlmService — caller
  posts Llms first.
- heartbeatVirtualAgents(sessionId) — bumps owned agents on a session
  heartbeat; revives inactive rows.
- markVirtualAgentsInactiveBySession(sessionId) — disconnect cascade.
- deleteVirtualAgentsForLlm(llmId) — defensive cascade for the GC's
  Llm-delete step (Agent.llmId is Restrict).
- gcSweepVirtualAgents() — same shape as VirtualLlmService.gcSweep
  (90s heartbeat-stale → inactive, 4h inactive → delete).

VirtualLlmService:
- Optional AgentService dependency. heartbeat() now also bumps owned
  agents; unbindSession() flips them inactive. gcSweep() runs the
  agent sweep FIRST (so any agent that would block an Llm delete via
  Restrict is already gone), and adds a defensive
  deleteVirtualAgentsForLlm step right before each Llm delete in case
  an agent's heartbeat lagged its Llm's just enough to escape this
  round's 4h cutoff.

main.ts:
- VirtualLlmService construction moves below AgentService so it can
  receive the cascade dependency.

Tests: 13 new in virtual-agent-service.test.ts cover all the register
variants (insert, sticky reconnect, adopt-inactive-foreign, refuse
public-overwrite, refuse foreign-session-active), heartbeat-revive,
disconnect-cascade, deleteVirtualAgentsForLlm scope, GC sweep flip
+ delete + idempotence, and three VirtualLlmService cascade scenarios
(unbindSession, gcSweep deleting agent before Llm, defensive cascade
when agent's heartbeat lagged).

mcpd suite: 854/854 (was 841 + 13 new). Workspace unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:07:23 +01:00
Michal
9afd24a3aa feat(db+mcpd): Agent lifecycle + chat.service kind=virtual branch (v3 Stage 1)
Two pieces of v3 plumbing — schema + the latent v1 chat.service bug.

Schema (db):
- Agent gains kind/providerSessionId/lastHeartbeatAt/status/inactiveSince
  mirroring Llm's v1 lifecycle. Reuses LlmKind / LlmStatus enums; no
  new types. Existing rows backfill kind=public/status=active so v1
  CRUD is unaffected.
- @@index([kind, status]) for the GC sweep, @@index([providerSessionId])
  for disconnect-cascade lookups.
- 4 new prisma-level tests cover defaults, persisting virtual fields,
  the (kind, status) GC index, and providerSessionId lookups.
  Total agent-schema tests: 20/20.

chat.service (mcpd) — fixes the v1 latent bug:
- LlmView's kind is now plumbed through prepareContext as ctx.llmKind.
- Two new private helpers, runOneInference / streamInference, branch
  on ctx.llmKind: 'public' goes through the existing adapter
  registry, 'virtual' relays through VirtualLlmService.enqueueInferTask
  (mirrors the route-handler branch from v1 Stage 3).
- Streaming bridges VirtualLlmService's onChunk callback API to an
  async iterator via a small queue + wake pattern.
- ChatService gains an optional virtualLlms constructor parameter;
  main.ts wires it in. Older test wirings without it raise a clear
  "virtualLlms dispatcher not wired" error when the row is virtual,
  rather than silently falling through to the public path against an
  empty URL.

This unblocks any Agent (public OR future v3-virtual) pinned to a
kind=virtual Llm. Pre-this-stage, those agents 502'd against the
empty url field.

Tests: 4 new chat-service-virtual-llm.test.ts cover the relay path
non-streaming, streaming, missing-dispatcher error, and rejection
surfacing. mcpd suite: 841/841 (was 833, +8 across stages 1+v3-Stage-1).
Workspace: 2054/2054 across 153 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:07:23 +01:00
9374a2652b perf: vitest threads pool + Dockerfile pnpm cache mount (#66)
Some checks failed
CI/CD / lint (push) Successful in 58s
CI/CD / test (push) Successful in 1m11s
CI/CD / typecheck (push) Successful in 2m35s
CI/CD / smoke (push) Failing after 1m43s
CI/CD / build (push) Successful in 2m21s
CI/CD / publish (push) Has been skipped
2026-04-27 16:07:05 +00:00
Michal
18245be0c1 perf: vitest threads pool + Dockerfile pnpm cache mount
Some checks failed
CI/CD / typecheck (pull_request) Successful in 56s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / lint (pull_request) Successful in 2m40s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Failing after 7m6s
CI/CD / publish (pull_request) Has been skipped
Two tuning knobs that were leaving most of the host idle:

1) vitest.config.ts pool=threads with maxThreads ≈ cores/2.
   Default left this 64-core workstation at ~10% CPU during
   \`pnpm test:run\`. Threads pool uses the box: same 152-file/2050-test
   suite now runs at ~700% CPU instead of ~150%. Wall time gain is
   modest (workload is dominated by a handful of slow individual files
   that one thread must run serially), but the parallel headroom is
   there for when the suite grows. Cap = max(2, cores/2) keeps laptops
   reasonable; override with \`VITEST_MAX_THREADS=N\` in the env.

2) Dockerfile.mcpd uses BuildKit cache mounts on both pnpm install
   steps. Adds \`# syntax=docker/dockerfile:1.6\` and a
   \`--mount=type=cache,target=/root/.local/share/pnpm/store\` so
   pnpm's content-addressed store survives across image rebuilds.
   Cold rebuilds where the lockfile changed are unaffected; warm
   rebuilds where only source changed drop the install step from
   ~60s to <5s. fulldeploy.sh's mcpd image rebuild gets that back
   minus the docker push hash mismatch.

Test parity: 2050/2050 across 152 files; per-package mcpd 837/837.
Both unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:06:39 +01:00
45c7737ee1 feat: virtual LLMs v2 (wake-on-demand) (#65)
Some checks failed
CI/CD / lint (push) Successful in 54s
CI/CD / test (push) Successful in 1m12s
CI/CD / typecheck (push) Successful in 2m42s
CI/CD / smoke (push) Failing after 1m43s
CI/CD / build (push) Successful in 2m33s
CI/CD / publish (push) Has been skipped
2026-04-27 14:20:59 +00:00
Michal
e0cfe0ba4d feat: virtual-LLM v2 smoke + docs (v2 Stage 3)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / typecheck (pull_request) Successful in 2m43s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Successful in 5m28s
CI/CD / publish (pull_request) Has been skipped
Closes v2 (wake-on-demand). Same shape as v1's stage 6: smoke
exercises the live-cluster path, docs lose the "v2 reserved" caveat
and gain a full wake-recipe section.

Smoke (virtual-llm.smoke.test.ts):
- New "wake-on-demand" describe block runs alongside the v1 tests.
- Spins a tiny in-process HTTP "wake controller"; the published
  provider's isAvailable() returns false until the wake POST flips
  the bool. Asserts:
    1. Provider publishes as kind=virtual / status=hibernating.
    2. First inference triggers the wake recipe, the recipe POSTs
       to the controller, the provider becomes available, mcpd
       relays the inference, and the row settles to active.
- Cleans up the row + wake server in afterAll.

Docs (docs/virtual-llms.md):
- Lifecycle table updates the `hibernating` description from
  "reserved for v2" to the actual v2 semantics.
- New "Wake-on-demand (v2)" section: configuration shapes for both
  recipe types (http + command), the wake-then-infer flow diagram,
  concurrent-infer dedup, failure semantics.
- Roadmap drops v2; v3-v5 still listed.

Workspace: 2050/2050 (smoke runs separately; the new SSE-based wake
test runs only against a live cluster, not under \`pnpm test:run\`).

v2 closes. v3 = virtual agents, v4 = LB pool by model, v5 = queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:20:18 +01:00
Michal
db839afc57 feat(mcpd): wake-before-infer for hibernating virtual LLMs (v2 Stage 2)
Second half of v2. mcpd now dispatches a \`wake\` task on the SSE
control channel when an inference request hits a row whose
status=hibernating, waits for the publisher to confirm readiness,
then proceeds with the infer task. Concurrent infers for the same
hibernating Llm share a single wake task — \`wakeInFlight\` map
dedupes by Llm name.

State machine in enqueueInferTask:
  active        → push infer task immediately (existing path).
  inactive      → 503, publisher offline (existing path).
  hibernating   → ensureAwake() → push infer task (new in v2).

ensureAwake/runWake (private):
- Allocates a fresh taskId on the existing PendingTask plumbing.
- Pushes \`{ kind: "wake", taskId, llmName }\` on the SSE handle.
- Awaits the publisher's result POST. On 2xx, flips the row to
  active + bumps lastHeartbeatAt, so all queued + future infers
  hit the active path. On non-2xx or service.failTask, the row
  stays hibernating (next request retries).

Tests: 4 new in virtual-llm-service.test.ts cover happy path
(wake → infer in order), concurrent dedup (3 parallel infers, 1
wake task), wake failure surfaces to all queued infers and leaves
the row hibernating, inactive ≠ hibernating (still rejects with 503,
no wake attempt). 22/22 service tests, 2050/2050 workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:18:24 +01:00
Michal
af0fabd84f feat(mcplocal+mcpd): wake-recipe config + wake-task execution (v2 Stage 1)
First half of v2 — mcplocal can now declare a hibernating backend and
respond to a `wake` task by running a configured recipe. v2 Stage 2
will wire mcpd to dispatch the wake task before relaying inference.

Config (LlmProviderFileEntry):
- New \`wake\` block on a published provider:
    wake:
      type: http        # or: command
      url: ...           # http only
      method: POST       # http only, default POST
      headers: {...}     # http only
      body: ...          # http only
      command: ...       # command only
      args: [...]        # command only
      maxWaitSeconds: 60 # how long to poll isAvailable() after wake fires

Registrar (mcplocal):
- At publish time, providers with a wake recipe whose isAvailable()
  returns false report initialStatus=hibernating to mcpd. Without a
  wake recipe (legacy v1) or when already up, status stays active.
- handleWakeTask: runs the recipe (HTTP request OR child-process
  spawn), then polls isAvailable() up to maxWaitSeconds, sending a
  heartbeat each loop so mcpd's GC sweep doesn't time us out
  mid-boot. Reports { ok, ms } on success or { error } on
  timeout/recipe failure via the existing _provider-task/:id/result.
- Replaces the v1 stub that rejected wake tasks with "not implemented".

mcpd VirtualLlmService:
- RegisterProviderInput gains optional initialStatus ('active' |
  'hibernating'). The register/upsert path uses it for both new and
  reconnecting rows. Defaults to 'active' so v1 publishers still
  work unchanged.
- Provider-register route's coercer accepts the new field.

Tests: 3 new in registrar.test.ts cover initialStatus selection
(hibernating when wake configured + unavailable, active otherwise,
active when no wake even if unavailable). 8/8 registrar tests, 833/833
mcpd unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:15:46 +01:00
700d1683c2 fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML (#64)
Some checks failed
CI/CD / lint (push) Successful in 56s
CI/CD / test (push) Successful in 1m11s
CI/CD / typecheck (push) Successful in 2m49s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 3m10s
CI/CD / publish (push) Has been skipped
2026-04-27 13:47:18 +00:00
Michal
2a44f60785 fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / typecheck (pull_request) Successful in 2m59s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Successful in 6m35s
CI/CD / publish (pull_request) Has been skipped
The smoke test \`llm.smoke > round-trips yaml output → apply -f\` failed
after v1 of the virtual-LLM feature: \`mcpctl get llm <name> -o yaml\`
output now starts with \`kind: public\` (the new schema column) instead
of \`kind: llm\` (the apply-doc envelope), because toApplyDocs spread
the cleaned item AFTER setting the kind, so the cleaned item's \`kind\`
overwrote.

Fix: in toApplyDocs, when serialising the \`llms\` resource, drop the
new lifecycle fields (kind, status, lastHeartbeatAt, inactiveSince,
providerSessionId) before merging. They collide with the apply-doc
envelope and aren't apply-able anyway — they're derived runtime state
owned by VirtualLlmService. Public-LLM round-trip is now byte-clean
(those fields default to public/active anyway). Virtual rows are
created by the registrar, not via apply -f, so dropping them on
output is the right call.

CLI suite: 437/437. Smoke will re-run against the live mcpd via
scripts/release.sh after merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:47:00 +01:00
65b6b265d9 feat: virtual LLMs v1 (registration skeleton) (#63)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m12s
CI/CD / typecheck (push) Successful in 2m13s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 4m50s
CI/CD / publish (push) Has been skipped
2026-04-27 13:38:50 +00:00
Michal
866f6abc88 feat: virtual-LLM smoke test + docs (v1 Stage 6)
Some checks failed
CI/CD / typecheck (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / lint (pull_request) Successful in 2m6s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 2m11s
CI/CD / publish (pull_request) Has been skipped
Final stage of v1.

Smoke (mcplocal/tests/smoke/virtual-llm.smoke.test.ts):
- Spins an in-process LlmProvider that returns canned content.
- Runs the registrar against the live mcpd in fulldeploy.
- Asserts: row appears with kind=virtual / status=active, infer
  through /api/v1/llms/<name>/infer comes back through the SSE
  relay with the provider's content + finish_reason, and a 503
  appears immediately after registrar.stop() (publisher offline).
- Times out / cleanup paths idempotent so re-runs against the same
  cluster don't litter rows. The 90-s heartbeat-stale flip and 4-h
  GC are unit-tested — too slow for smoke.

Docs:
- New docs/virtual-llms.md: when to use this vs creating a regular
  Llm row, how to opt-in via publish: true, the lifecycle table,
  the inference-relay sequence, the v1 streaming caveat, the v2-v5
  roadmap, and the full /api/v1/llms/_provider-* surface.
- agents.md cross-links virtual-llms.md alongside personalities/chat.
- README's Agents section gains a "Virtual LLMs" subsection.

Workspace suite: 2043/2043 (smoke files run separately). v1 closes.

Stage roadmap (each its own future PR):
  v2 wake-on-demand · v3 virtual agents · v4 LB pool · v5 task queue

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:28:43 +01:00
Michal
7e6b0cab44 feat(cli): mcpctl chat-llm + KIND/STATUS columns (v1 Stage 5)
Closes the loop on user-facing surface:

  $ mcpctl get llm
  NAME             KIND     STATUS    TYPE     MODEL                       TIER  KEY  ID
  qwen3-thinking   public   active    openai   qwen3-thinking              fast  ...  ...
  vllm-local       virtual  active    openai   Qwen/Qwen2.5-7B-Instruct    fast  -    ...

  $ mcpctl chat-llm vllm-local
  ────────────────────────────────────────
  LLM: vllm-local  openai → Qwen/Qwen2.5-7B-Instruct-AWQ
  Kind: virtual    Status: active
  ────────────────────────────────────────
  > hello?
  Hi! …

New: chat-llm command (commands/chat-llm.ts)
- Stateless chat with any mcpd-registered LLM. No threads, no tools,
  no project prompts. POSTs to /api/v1/llms/<name>/infer; mcpd's
  kind=virtual branch handles relay-through-mcplocal transparently,
  so the same CLI command works for both public and virtual LLMs.
- Reuses installStatusBar / formatStats / recordDelta / styleStats /
  PhaseStats from chat.ts (now exported) so the bottom-row tokens-per-
  second ticker behaves identically to mcpctl chat.
- Flags: --message (one-shot), --system, --temperature, --max-tokens,
  --no-stream. Streaming uses OpenAI chat.completion.chunk SSE.
- REPL mode keeps a per-session history array so multi-turn flows
  feel natural; each turn is an independent inference call.

Updated: get.ts
- LlmRow gains optional kind/status fields.
- llmColumns layout: NAME, KIND, STATUS, TYPE, MODEL, TIER, KEY, ID.
  Defaults gracefully when older mcpd responses don't return them.

Updated: chat.ts
- Re-exports the helpers chat-llm.ts needs (PhaseStats, newPhase,
  recordDelta, formatStats, styleStats, styleThinking, STDERR_IS_TTY,
  StatusBar, installStatusBar). No behavior change.

Completions: chat-llm picks up the standard option enumeration
automatically; bash gets a special-case for first-arg LLM-name
completion via _mcpctl_resource_names "llms".

CLI suite: 437/437 (was 430, +7 from auto-discovered test cases in
the regenerated completions golden). Workspace: 2043/2043 across
152 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:25:38 +01:00
Michal
97174f450f feat(mcplocal): virtual-LLM registrar (v1 Stage 4)
The mcplocal counterpart to mcpd's VirtualLlmService. After this stage,
flipping \`publish: true\` on a provider in ~/.mcpctl/config.json makes
the provider show up in mcpctl get llm with kind=virtual the next time
mcplocal restarts; running an inference against it relays through this
client back to the local LlmProvider.

Config:
- LlmProviderFileEntry gains optional \`publish: boolean\` (default false,
  so existing setups don't change).

Registrar (new file: providers/registrar.ts):
- start(): if any provider is opted-in, POSTs to
  /api/v1/llms/_provider-register with the publishable set, persists
  the returned providerSessionId to ~/.mcpctl/provider-session for
  sticky reconnects, then opens the SSE control channel and starts a
  30-s heartbeat ticker.
- SSE listener parses event/data lines from text/event-stream frames.
  task frames trigger handleInferTask: convert OpenAI body to
  CompletionOptions, call provider.complete(), POST the result back as
  either { status, body } (non-streaming) or two chunk POSTs
  (streaming: one delta + a [DONE] marker).
- Disconnect → exponential backoff reconnect from 5 s up to 60 s. On
  successful reconnect the persisted sessionId revives the same Llm
  rows in mcpd (mcpd flips them back to active on heartbeat).
- stop() destroys the SSE socket and clears the timer; cleanly handed
  off from main.ts's existing shutdown handler.

Wired into mcplocal main.ts via maybeStartVirtualLlmRegistrar:
- Filters opted-in providers, looks up their LlmProvider instances in
  the registry.
- Reads ~/.mcpctl/credentials for mcpdUrl + bearer; absence is a
  best-effort skip (logs a warning, returns null) — never a boot
  blocker.

v1 caveat documented in the file header: LlmProvider returns a
finalized CompletionResult, not a token stream, so streaming requests
get a single delta chunk + [DONE]. Real per-token streaming is a v2
concern.

Tests: 5 new in tests/registrar.test.ts using a tiny in-process HTTP
server. Cover: no-op when nothing opted-in, register POST + sticky
sessionId persistence, sticky reconnect from disk, heartbeat ticker
fires at the configured interval, register HTTP error surfaces.

Workspace suite: 2043/2043 across 152 files (was 2006/149, +5
new tests + the new file gets discovered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:20:54 +01:00
Michal
192a3831df feat(mcpd): virtual-LLM routes + GC ticker (v1 Stage 3)
End-to-end backend wiring. After this stage, an mcplocal client can
register a provider, hold the SSE channel open, heartbeat, and have
its inference requests fanned through the relay — all without
touching the agent layer or the public-LLM path.

Routes (new file: routes/virtual-llms.ts):
  POST /api/v1/llms/_provider-register    → returns { providerSessionId, llms[] }
  GET  /api/v1/llms/_provider-stream      → SSE channel keyed by
                                            x-mcpctl-provider-session header.
                                            Emits `event: hello` on open,
                                            `event: task` on inference fan-out,
                                            `: ping` every 20 s for proxies.
  POST /api/v1/llms/_provider-heartbeat   → bumps lastHeartbeatAt
  POST /api/v1/llms/_provider-task/:id/result
                                          → mcplocal pushes result back;
                                            body shape is one of:
                                              { error: 'msg' }
                                              { chunk: { data, done? } }
                                              { status, body }

LlmService:
- LlmView gains kind/status/lastHeartbeatAt/inactiveSince so route
  handlers + the upcoming `mcpctl get llm` columns can branch on
  kind without re-fetching the row.

llm-infer.ts:
- Detects llm.kind === 'virtual' and delegates to
  VirtualLlmService.enqueueInferTask. Streaming + non-streaming both
  supported; on 503 (publisher offline) the existing audit hook still
  fires with the right status code.
- Adds optional `virtualLlms: VirtualLlmService` to LlmInferDeps;
  absence in test fixtures returns a 500 with a clear "server
  misconfiguration" message rather than silently falling through to
  the public path against an empty URL.

main.ts:
- Constructs VirtualLlmService(llmRepo).
- Passes it to registerLlmInferRoutes.
- Calls registerVirtualLlmRoutes(app, virtualLlmService).
- 60-s GC ticker started after app.listen; clears on graceful
  shutdown alongside the existing reconcile timer.

Tests: 11 new virtual-LLM route assertions (validation paths,
service plumbing for register/heartbeat/task-result) + 3 new
infer-route assertions (kind=virtual non-streaming relay, 503 path,
500 when virtualLlms dep missing). mcpd suite: 833/833 (was 819,
+14). Typecheck clean.

The full SSE handshake is exercised by the smoke test in Stage 6;
under app.inject the keep-alive blocks until close so unit-level
SSE testing isn't worth the complexity here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:15:18 +01:00
Michal
2215922618 feat(mcpd): VirtualLlmService + repo lifecycle helpers (v1 Stage 2)
The state machine for kind=virtual Llm rows. Wires the schema added
in Stage 1 into something that can register, heartbeat, time out,
and relay inference tasks. The HTTP routes (Stage 3) plug into this.

Repository (extends ILlmRepository):
- create/update accept kind/providerSessionId/lastHeartbeatAt/status/
  inactiveSince/type so VirtualLlmService can drive the lifecycle.
- findBySessionId(sessionId) — the reconnect lookup.
- findStaleVirtuals(cutoff) — heartbeat-stale rows for the GC sweep.
- findExpiredInactives(cutoff) — 4h-expired rows for deletion.

VirtualLlmService:
- register(): sticky-id-aware upsert. New names insert as kind=virtual/
  status=active. Existing virtual rows from the same session reactivate
  in place; existing inactive virtuals from a foreign session can be
  adopted (sticky reconnect). Refuses to overwrite a public row or a
  foreign session's still-active virtual.
- heartbeat(): bumps lastHeartbeatAt for every row owned by the
  session; revives inactive rows.
- bindSession()/unbindSession(): in-memory map of sessionId → SSE
  handle. Disconnect immediately flips owned rows to inactive AND
  rejects any in-flight tasks for that session.
- enqueueInferTask(): pushes an `infer` task frame to the SSE handle,
  returns a PendingTaskRef whose `done` resolves when the publisher
  POSTs the result back. Streaming variant exposes onChunk(cb).
- completeTask/pushTaskChunk/failTask: route-side hooks called from
  the result POST handler (lands in Stage 3).
- gcSweep(): flips heartbeat-stale active virtuals to inactive (90s
  cutoff), deletes inactives past 4h. Idempotent.

Lifecycle constants live in this file (HEARTBEAT_TIMEOUT_MS=90s,
INACTIVE_RETENTION_MS=4h) so future stages can tune in one place.

18 new mocked-repo tests cover: register variants (insert, sticky
reconnect, refuse public-overwrite, refuse foreign-session, adopt
inactive-foreign), heartbeat-revive, unbind cascade, enqueue happy
path + 503 paths (no session, inactive, public-Llm), complete/fail/
streaming chunk fan-out, GC sweep flip + delete + idempotence.

mcpd suite: 819/819 (was 801, +18). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:05:19 +01:00
Michal
1acd8b58bc feat(db): Llm.kind discriminator + virtual-provider lifecycle (v1 Stage 1)
First step of the virtual-LLM feature. A virtual Llm row is one that
gets *registered by an mcplocal client* rather than created via
\`mcpctl create llm\`. Its inference is relayed back through an SSE
control channel to the publishing session (mcpd routes added in
Stage 3). The lifecycle fields below let mcpd reap stale rows when
the publisher goes away.

Schema additions:
- enum LlmKind (public | virtual). Default public.
- enum LlmStatus (active | inactive | hibernating). Default active.
  hibernating is reserved for v2 wake-on-demand.
- Llm.kind, providerSessionId, lastHeartbeatAt, status, inactiveSince.
- @@index([kind, status]) for the GC sweep.
- @@index([providerSessionId]) for the reconnect lookup.

All existing rows backfill with kind=public/status=active so v1 is
purely additive — public LLMs ignore the lifecycle columns entirely.

7 new prisma-level assertions in tests/llm-virtual-schema.test.ts
cover: defaults, persisting kind=virtual + lifecycle together, the
active→inactive flip, hibernating value, enum rejection, the
(kind,status) GC index, the providerSessionId reconnect index.

mcpd suite still 801/801 (regenerated client) and typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:59:44 +01:00
e65a396d3e fix(cli): status probe accepts reasoning_content for thinking models (#62)
Some checks failed
CI/CD / typecheck (push) Successful in 56s
CI/CD / test (push) Successful in 1m10s
CI/CD / lint (push) Successful in 2m40s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 5m5s
CI/CD / publish (push) Has been skipped
2026-04-27 11:10:15 +00:00
Michal
a84214dad1 fix(cli): status probe accepts reasoning_content for thinking models
Some checks failed
CI/CD / typecheck (pull_request) Successful in 56s
CI/CD / lint (pull_request) Successful in 3m6s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / build (pull_request) Successful in 2m39s
CI/CD / smoke (pull_request) Failing after 3m58s
CI/CD / publish (pull_request) Has been skipped
Live deploy showed qwen3-thinking failing the probe with "empty
content": at max_tokens=8 the model spent its entire budget on the
reasoning trace and never emitted a final \`content\` block.

Fix:
- Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap
  models but gives reasoning models enough headroom.
- If \`message.content\` is empty but \`reasoning_content\` is non-empty,
  count it as alive and prefix the preview with "[thinking]" so the
  user knows the model didn't actually answer "hi" but is responsive.
- Replace the prompt with the terser "Reply with just: hi" — closer
  to what a thinking model can short-circuit on.

Tests: existing 25 pass; the failure-path test still asserts on the
"empty content" path because reasoning_content is empty there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:09:42 +01:00
54e56f7b71 feat(cli): live "say hi" probe for server LLMs in mcpctl status (#61)
Some checks failed
CI/CD / lint (push) Successful in 57s
CI/CD / typecheck (push) Successful in 57s
CI/CD / test (push) Has been cancelled
CI/CD / smoke (push) Has been cancelled
CI/CD / build (push) Has been cancelled
CI/CD / publish (push) Has been cancelled
2026-04-27 11:02:26 +00:00
Michal
e4af16477c feat(cli): live "say hi" probe for server LLMs in mcpctl status
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m13s
CI/CD / typecheck (pull_request) Successful in 3m10s
CI/CD / smoke (pull_request) Failing after 1m46s
CI/CD / build (pull_request) Successful in 3m24s
CI/CD / publish (pull_request) Has been skipped
Status was showing the server-side LLM list but not whether each one
actually serves inference. This adds a per-LLM probe that POSTs a
tiny prompt to /api/v1/llms/<name>/infer:

  messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }]
  max_tokens: 8, temperature: 0

Each registered LLM gets a one-line health line:

  Server LLMs: 2 registered (probing live "say hi"...)
    fast   qwen3-thinking  ✓ "hi" 312ms
              openai → qwen3-thinking  http://litellm.../v1  key:litellm/API_KEY
    heavy  sonnet  ✗ upstream auth failed: 401
              anthropic → claude-sonnet-4-5  provider default  no key

Probes run in parallel so a single slow LLM doesn't gate the others;
each has its own 15-second timeout. JSON/YAML output gains a
\`health: { ok, ms, say?, error? }\` field per server LLM so dashboards
get the same liveness signal.

Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace
suite: 2006/2006 across 149 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:02:00 +01:00
de96af7bf6 feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 (#60)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m9s
CI/CD / typecheck (push) Failing after 7m9s
CI/CD / smoke (push) Has been skipped
CI/CD / build (push) Has been skipped
CI/CD / publish (push) Has been skipped
2026-04-27 10:28:10 +00:00
Michal
0db37e92a4 feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500
Some checks failed
CI/CD / typecheck (pull_request) Successful in 58s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / lint (pull_request) Successful in 2m14s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 2m14s
CI/CD / publish (pull_request) Has been skipped
Two related fixes:

1. \`mcpctl status\` now lists mcpd-managed Llm rows (the ones created via
   \`mcpctl create llm\`) under a new "Server LLMs:" section, grouped by
   tier with type, model, upstream URL, and key reference. JSON/YAML
   output gains a \`serverLlms\` array.

   Bearer token (from \`mcpctl auth login\` / saved credentials) is
   passed through; if mcpd is unreachable or returns non-200 the
   section is silently omitted (the existing mcpd connectivity line
   already conveys that). 6 new tests cover happy path, empty list,
   token plumbing, and JSON shape.

2. SPA fallback at \`/ui/<deeplink>\` was returning 500 because we
   registered \`@fastify/static\` with \`decorateReply: false\` and then
   called \`reply.sendFile\`. Read index.html once at startup and serve
   it with \`reply.send(html)\` instead — also dodges a per-request
   stat call. Drop \`decorateReply: false\` so future code can use
   reply.sendFile if it ever needs to.

Full suite: 2005/2005 across 149 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:27:45 +01:00
899f2c750c fix(test): vitest 4 projects + src/web jsdom env (#59)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m10s
CI/CD / typecheck (push) Successful in 2m37s
CI/CD / smoke (push) Failing after 1m41s
CI/CD / build (push) Successful in 2m38s
CI/CD / publish (push) Has been skipped
2026-04-26 20:31:47 +00:00
Michal
bf0a60bc0a fix(test): switch workspace runner to vitest 4 \projects\ field
Some checks failed
CI/CD / typecheck (pull_request) Successful in 57s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / lint (pull_request) Successful in 2m43s
CI/CD / smoke (pull_request) Failing after 1m45s
CI/CD / build (pull_request) Successful in 5m43s
CI/CD / publish (pull_request) Has been skipped
The workspace-level \`pnpm test:run\` (which fulldeploy.sh runs as a
gate) was failing with \`localStorage is not defined\` on the new
src/web tests. Two intertwined causes:

1. vitest 4 deprecated \`vitest.workspace.ts\`. The file was being
   silently ignored, so per-package configs (cli, mcpd, mcplocal)
   weren't being honored under workspace mode either — the root
   config was being used for all of them.

2. With the root config in charge, src/web/tests ran with the default
   Node environment, no \`localStorage\` global, so the api wrapper's
   test setup blew up.

Fix:
- Move workspace projects into the root \`vitest.config.ts\` under the
  new \`projects\` array (the vitest 4 replacement).
- Add a proper \`src/web/vitest.config.ts\` (vitest 4 doesn't auto-pick
  up vite.config.ts as a test config in workspace mode, even though
  per-package \`pnpm --filter\` does).
- Exclude \`src/web/tests/**\` from the root-level include so we don't
  double-run them under the wrong env.

After: \`pnpm test:run\` runs 1999/1999 across 149 files (was 1992/1996
with 4 web failures). Per-package runs unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:31:27 +01:00
c0ba0a9040 feat: web prompt editor + agent personalities (#58)
Some checks failed
CI/CD / typecheck (push) Successful in 56s
CI/CD / test (push) Failing after 1m10s
CI/CD / lint (push) Successful in 2m34s
CI/CD / smoke (push) Has been skipped
CI/CD / build (push) Has been skipped
CI/CD / publish (push) Has been skipped
2026-04-26 20:21:53 +00:00
Michal
4cbf58d212 feat(mcpd+deploy): serve web UI at /ui + smoke tests + docs (Stage 6)
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Failing after 1m8s
CI/CD / typecheck (pull_request) Successful in 2m35s
CI/CD / smoke (pull_request) Has been skipped
CI/CD / build (pull_request) Has been skipped
CI/CD / publish (pull_request) Has been skipped
The closing stage. mcpd now hosts the Stage 5 SPA, the Docker image
bundles the build artifact, a smoke test exercises the personality
HTTP surface end-to-end, and the user-facing docs spell out the
mental model.

mcpd:
- Add @fastify/static dep.
- New routes/web-ui.ts: registers /ui/* against a static bundle. Looks
  for the bundle at $MCPD_WEB_ROOT, then /usr/share/mcpd/web (the
  Docker image path), then a dev-tree fallback. Logs and skips
  cleanly if missing — API-only deploys keep working.
- SPA fallback: any /ui/<path> that doesn't match a file falls through
  to index.html so direct hits to react-router URLs work.
- /ui/* falls through to `kind: skip` in mapUrlToPermission, so the
  static assets are served unauthenticated. Each API call from the
  SPA still carries the bearer token.

Deploy:
- Dockerfile.mcpd builds the @mcpctl/web bundle in the same builder
  stage and copies dist/ to /usr/share/mcpd/web in the runtime image.

Smoke (personality.smoke.test.ts):
- Live mcpd flow: create secret/llm/agent/personality, attach an
  agent-direct prompt, verify the binding listing, reject double-
  attach (409) + foreign-agent prompt (400), set defaultPersonality
  by name, detach + delete cleanup.

Docs:
- New docs/personalities.md: VLAN-on-ethernet model, system-block
  ordering table, three prompt scopes, CLI walkthrough, web UI
  walkthrough, full API surface, RBAC notes.
- agents.md and chat.md cross-link.
- README's Agents section gains a Personalities subsection.

Test count after Stage 6:
  mcpd:   801/801      cli:  430/430
  web:    7/7          db:   58/62 (4 pre-existing)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:48:43 +01:00
Michal
0010cc18b7 feat(web): browser-based prompt + personality editor (Stage 5)
New workspace package @mcpctl/web — a Vite + React 19 SPA that talks
to mcpd's existing HTTP API. Bundles to a static dist/ which Stage 6
will bake into the RPM and serve from mcpd at /ui via @fastify/static.

Pages:
  /ui/projects                       list projects
  /ui/projects/:name/prompts         CRUD project prompts (Monaco editor)
  /ui/agents                         list agents
  /ui/agents/:name                   tabs: Direct prompts | Personalities
  /ui/personalities/:id              bind/unbind prompts to a personality

Auth: paste a session token (mcpctl auth login) or PAT (mcpctl_pat_*)
once on a login screen, kept in localStorage; logout clears it.

API client: 60-line fetch wrapper, attaches the bearer header from
storage, throws an ApiError with status + parsed body on non-2xx.
A 200-line useFetch hook provides loading/error/data without a
state-management library — we are not building Notion.

UX:
  - Dark terminal-adjacent theme so the page feels like the CLI.
  - Monaco @monaco-editor/react for prompt content (markdown mode,
    word-wrap, search, multi-cursor).
  - Personality detail's "attach prompt" picker filters in-scope
    candidates: agent-direct + same-project + globals.

Dev loop:  pnpm --filter @mcpctl/web dev   (vite at :5173, proxies
  /api to https://mcpctl.ad.itaz.eu — override with MCPCTL_API_URL).
Build:     pnpm --filter @mcpctl/web build → src/web/dist/.

Tests: 7 vitest cases covering the bearer header / 4xx body / 204
no-content path on the api wrapper, and the login storage round-trip
+ help toggle. Production build green: 269 KB JS / 84 KB gzipped.
Typecheck clean (TS strict + exactOptionalPropertyTypes carried over).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:41:57 +01:00
Michal
9050918a83 feat(cli): personality flag + create/get/edit/delete personalities (Stage 4)
End-to-end CLI surface for the personality overlay:

  mcpctl create personality grumpy --agent reviewer --description "be terse"
  mcpctl create prompt tone --agent reviewer --content "Be very terse."
  mcpctl get personalities
  mcpctl get personalities --agent reviewer
  mcpctl edit personality <id>
  mcpctl delete personality grumpy --agent reviewer
  mcpctl chat reviewer --personality grumpy

Chat banner gains a "Personality:" line that shows either the active
flag value or the agent's `defaultPersonality` (when no flag given),
so the user knows which overlay is in effect before sending a message.

`--personality` is stripped from `/save` (it's a per-turn override,
not a `defaultParams` field — the agent's defaultPersonality lives on
its own column and is set via PUT /agents).

Backend (small additions to land Stage 4 cleanly):
- `GET /api/v1/personalities[?agent=name]` so `mcpctl get
  personalities` doesn't require an agent filter.
- PersonalityService.listAll() aggregates across agents.

Completions: regenerated fish + bash. `personalities` added as a
canonical resource with `personality` alias; edit-resource list
extended; the per-resource argument completers pick up the new
type automatically.

CLI suite: 430/430. mcpd: 801/801. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:32:48 +01:00
Michal
faef1e732d feat(mcpd): personality routes + chat system block overlay (Stage 3)
End-to-end backend wiring for the agents-feature evolution. After
this stage you can curl all the endpoints; CLI + Web UI follow.

Routes (new):
  GET    /api/v1/agents/:agentName/personalities
  POST   /api/v1/agents/:agentName/personalities
  GET    /api/v1/personalities/:id
  PUT    /api/v1/personalities/:id
  DELETE /api/v1/personalities/:id
  GET    /api/v1/personalities/:id/prompts
  POST   /api/v1/personalities/:id/prompts
  DELETE /api/v1/personalities/:id/prompts/:promptId
  GET    /api/v1/agents/:agentName/prompts            (agent-direct)

Routes (extended):
  POST /api/v1/prompts now resolves `agent: <name>` like `project: <name>`
  POST /api/v1/agents/:name/chat accepts `personality: <name>`

RBAC: `personalities` segment maps to the `agents` resource so
view/edit/create/delete on the parent agent governs personality access.
No new RBAC roles — piggybacking keeps the surface flat.

System block (chat.service.ts):
  agent.systemPrompt
  + agent-direct prompts (Prompt.agentId === agent.id, priority desc)
  + project prompts        (existing behavior, priority desc)
  + personality prompts    (PersonalityPrompt[chosen], priority desc)
  + systemAppend

Personality is selected by request body `personality: <name>`, falling
back to `agent.defaultPersonalityId` if unset. A typo'd flag throws
404 rather than silently dropping back to no overlay — failing loudly
on misconfiguration is the only way users learn it didn't apply.

Backwards-compatible by construction: when no agent-direct prompts
exist and no personality is selected, the resulting block is byte-
identical to the old layout (verified by a regression test).

Tests: 5 new chat-service.test cases cover ordering, default-
personality fallback, missing-personality 404, and the regression
guard. mcpd suite: 801/801 (was 796). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:27:59 +01:00
Michal
6b5bd78cfa feat(mcpd): personality + prompt-by-agent repos and services (Stage 2)
Wires the schema landed in Stage 1 into the service layer. No HTTP
routes yet — Stage 3 will register `/api/v1/...` endpoints and update
chat.service to read agent-direct + personality prompts when building
the system block.

Repositories:
- PersonalityRepository: CRUD + listPrompts/attach/detach bindings.
- PromptRepository: findByAgent + findByNameAndAgent; create/update
  accept the new agentId column. findGlobal now also filters
  agentId=null so agent-direct prompts don't leak into global lists.
- AgentRepository: defaultPersonalityId on create + connect/disconnect
  in update.

Services:
- PersonalityService: CRUD scoped per agent, plus attach/detach with
  scope enforcement — a prompt may bind only if it's agent-direct on
  the same agent, in the agent's project, or global. Foreign-project
  / foreign-agent attachments are rejected with 400.
- PromptService: createPrompt / upsertByName accept agentId and
  resolve `agent: <name>`, with XOR-with-project guard. Adds
  listPromptsForAgent.
- AgentService: defaultPersonality (by name on the agent's own
  personality set) round-trips through update + AgentView.

Validation:
- prompt.schema.ts: refine() rejects projectId+agentId together.
- personality.schema.ts: new Create/Update/AttachPrompt schemas.
- agent.schema.ts: defaultPersonality { name } | null on update.

Tests: 12 PersonalityService + 7 PromptService agent-scope tests
covering happy paths, XOR/scope enforcement, double-attach guard,
detach-not-bound. mcpd suite: 796/796 (was 777). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:20:51 +01:00
Michal
f60f00f1fd feat(db): add personalities + agent-direct prompts schema (Stage 1)
A Personality is a named overlay on top of an Agent — same agent,
same LLM, but a different bundle of prompts injected into the system
block at chat time. VLAN-on-ethernet semantics: ethernet still works
without VLAN; with a VLAN tag, frames are segmented but still ethernet.

Schema additions:
- Prompt.agentId (nullable FK + index, cascade on delete) so prompts
  can attach directly to an agent without going through a project.
- Personality { id, name, description, agentId, priority } with
  unique (name, agentId).
- PersonalityPrompt join table with per-binding priority override.
- Agent.defaultPersonalityId (SetNull on delete) so an agent can pick
  one personality as the default when no --personality flag is passed.

Backwards-compatible by construction: every new column is nullable;
existing rows are valid as-is; the chat.service systemBlock changes
land in Stage 3.

8 new prisma-level assertions in agent-schema.test.ts cover unique
constraints, cascade behavior, the SetNull on defaultPersonalityId,
and shared-prompt-across-personalities. All 16 db tests pass; mcpd
typecheck + 777 mcpd unit tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:12:22 +01:00
9389ffff3c feat(agents+chat): agents feature + live chat UX (#57)
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / test (push) Successful in 1m6s
CI/CD / typecheck (push) Successful in 2m17s
CI/CD / smoke (push) Failing after 1m38s
CI/CD / build (push) Successful in 2m35s
CI/CD / publish (push) Has been skipped
2026-04-26 17:53:27 +00:00
Michal
21f406037a feat(chat): print agent + system prompt banner at chat start
Some checks failed
CI/CD / typecheck (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m5s
CI/CD / lint (pull_request) Successful in 2m29s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 5m30s
CI/CD / publish (pull_request) Has been skipped
When you launch \`mcpctl chat <agent>\` it's not always obvious which
agent, LLM, project, or system prompt you're actually wired to,
especially when --system / --system-append flags are layered on top
of the agent's defaults. The session would just start at \`> \` with
no confirmation of the configuration.

Now both REPL and one-shot modes print a banner to stderr listing:
  - agent name + description
  - LLM + project (if attached)
  - effective system prompt (or --system override) and any
    --system-append addendum, indented for readability
  - active sampling overrides (temperature, top_p, etc.)

Goes through stderr so \`mcpctl chat ... -m "hi" 2>/dev/null\` keeps
piping clean. Best-effort: a metadata fetch failure logs and lets
the chat proceed rather than blocking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:37:06 +01:00
Michal
ae54210a52 fix(chat): pin live tokens/sec ticker to a bottom-row status bar
The previous ticker used cursor save/restore (\x1b[s / \x1b[u) to draw
a stats line one row below the cursor. Save/restore is unreliable when
content scrolls or wraps — the saved row drifts off the visible area
and the restore lands inside content lines, smearing the ticker into
mid-word positions:

  Here are the available tools you can
  ⏵ 7w · 56.5 w/s · 0.1s | thinking 41 use with Docmost:6s

Replace it with a DECSTBM scroll region. Lock the bottom row, scroll
rows 1..N-1 for content, redraw the locked row in place every 250 ms.
This is how htop / tig / mosh status pin their footers — content and
status physically can't overlap.

Lifecycle: install once per chat-session (REPL or one-shot), tear down
on close / Ctrl-D / /quit / SIGINT / SIGTERM / uncaughtException. Pipes
and small terminals (<5 rows) get a no-op StatusBar so output stays
clean. Resize re-emits the scroll region with the new height.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:49:26 +01:00
Michal
cc9822d38b feat(chat): live tokens/sec ticker + final stats footer
While streaming, the REPL now shows a live word/sec counter on a status
line one row below the cursor — refreshes every 250ms via ANSI cursor
save+restore so it floats with the content as the response grows.
After each response, a dim stats footer prints on stderr:

  (47w · 12.3 w/s · 3.9s | thinking 234w · 38 w/s · 6.2s)

The ticker is stderr-only and only emits when stderr is a TTY — pipes
to a file stay clean for grepping/redirect. Words are whitespace-
separated tokens (good enough across English/code/Markdown without a
tokenizer dependency; CJK under-counts but the rate is still
directional).

Both phases tracked separately:
  - thinking: reasoning_content from qwen3-thinking / deepseek-reasoner
    / o1, where the model's scratchpad is the long part
  - content: the actual assistant answer

Final stats also added to the --no-stream path: total HTTP duration
and word count, since we don't get per-token timing there.

CLI suite still 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:15:26 +01:00
Michal
7cfa449465 feat(chat): surface reasoning_content as thinking chunks; fix --no-stream timeout
Reasoning models (qwen3-thinking, deepseek-reasoner, OpenAI o1 family) emit
their scratchpad as `delta.reasoning_content` (or `delta.reasoning`,
or `delta.provider_specific_fields.reasoning_content` when LiteLLM passes
through from vLLM) — separate from `delta.content`. Before this commit
mcpd's parseStreamingChunk only watched `content`, so the model's 30-90s
reasoning phase looked like dead air to the REPL: streaming connection
open, no chunks, no progress. Caught during the agents-feature shakedown
when qwen3-thinking sat silent for 90s on a docmost__list_pages call.

mcpd
====
chat.service.ts
  - parseStreamingChunk extracts a `reasoningDelta` from the chunk body,
    accepting all four spellings (reasoning_content / reasoning /
    provider_specific_fields.{reasoning_content,reasoning}). Future
    providers can add their own field names by extending the
    fallback chain.
  - chatStream yields `{ type: 'thinking', delta }` chunks as reasoning
    arrives, alongside the existing `{ type: 'text', delta }` for content.
  - Reasoning is intentionally NOT persisted to the thread. It's the
    model's scratchpad, not part of the conversation. Subsequent turns
    don't see it.
  - Adds 'thinking' to the ChatStreamChunk.type union.

CLI
===
chat.ts
  - streamOnce handles 'thinking' chunks: writes them dim+italic to
    stderr (ANSI 2;3m) so the model's reasoning visually flows like a
    quote block while the final answer streams to stdout. Plain text
    when stderr isn't a TTY (pipe to file → no escape codes leak).
  - chatRequestNonStream replaces the shared ApiClient.post() for the
    --no-stream path. ApiClient defaults to a 10s timeout, way too tight
    for any chat that calls a tool: LLM round + tool dispatch + LLM
    summary easily exceeds 10s. The new helper uses the same 600s timeout
    the streaming path has been using all along.

Tests:
  chat-service.test.ts (+2):
    - reasoning_content deltas surface as `thinking` chunks (not text);
      reasoning is NOT persisted to the assistant turn's content.
    - LiteLLM's provider_specific_fields.reasoning_content shape parses
      identically to the vendor-native shape.

mcpd 777/777, cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:04:01 +01:00
Michal
cc225eb70f feat(llm): probe upstream auth at registration time
mcpd now runs a cheap auth probe whenever an Llm is created (or its
apiKeyRef/url is updated). Catches misconfigured tokens / wrong URLs at
registration with a 422 + structured error message, instead of silently
500-ing on first chat with a generic "fetch failed". Caught in the wild
today: the homelab Pulumi config exposed `MCPCTL_GATEWAY_TOKEN` (which
is mcpctl_pat_-prefixed, intended for LiteLLM→mcplocal direction) where
LiteLLM expects `LITELLM_MASTER_KEY` (sk-prefixed). The probe makes
this immediate.

Probe shape (LlmAdapter.verifyAuth):
  - OpenAI passthrough → GET <url>/v1/models. Cheap, idempotent, gated
    by the same auth as chat/completions.
  - Anthropic → POST /v1/messages with max_tokens:1, "ping". Anthropic
    has no list-models endpoint; this is the cheapest auth-exercising
    call.
  - Returns one of:
      { ok: true }
      { ok: false, reason: "auth", status, body }    — 401/403, fail hard
      { ok: false, reason: "unreachable", error }    — network, warn-only
      { ok: false, reason: "unexpected", status, body } — non-auth 4xx, warn-only

Behavior:
  - LlmService.create()/update() runs the probe after resolveApiKey.
    Throws LlmAuthVerificationError on `auth`, logs warn for
    unreachable/unexpected, swallows for offline registration.
  - Probe is skipped when there's no apiKeyRef (nothing to verify) or
    when the caller passes skipAuthCheck=true.
  - update() probes only when apiKeyRef OR url changes — pure
    description/tier updates don't trigger upstream calls.
  - Routes catch LlmAuthVerificationError and return 422 with
    `{ error, status }`. The CLI surfaces the message verbatim via
    ApiError.

Opt-out:
  - CLI: `mcpctl create llm ... --skip-auth-check` for offline
    registration before the upstream is reachable.
  - HTTP: side-channel body field `_skipAuthCheck: true` (stripped
    before validation, never persisted on the row).

Side fix in same commit (caught while testing): src/cli/src/index.ts
read `program.opts()` BEFORE `program.parse()`, so `--direct` was a
no-op for ApiClient — every command went to mcplocal regardless. Some
commands accidentally still worked because mcplocal forwards plain
`/api/v1/*` to mcpd, but flows that need direct SSE streaming (e.g.
`mcpctl chat`) couldn't reach mcpd. Fixed by peeking at process.argv
directly for the two global flags before Commander's parse runs.

Tests:
  - llm-adapters.test.ts (+8): OpenAI 200/401/403/404/network, Anthropic
    200/401/400 (typo'd model = unexpected, NOT auth — registration
    shouldn't block on bad model names that surface at chat time).
  - llm-service.test.ts (+6): create-throws-on-auth-fail (no row
    written), warn-only on unreachable/unexpected, skipAuthCheck
    bypass, no-key skip, update-only-probes-on-auth-affecting-change.

mcpd 775/775, mcplocal 715/715, cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:51:55 +01:00