mcpctl

Author	SHA1	Message	Date
michal	256e117021	Merge pull request 'feat: v4 LB pools by shared poolName' (#69 ) from feat/llm-pool-by-name into main Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / typecheck (push) Successful in 2m45s Details CI/CD / smoke (push) Failing after 1m43s Details CI/CD / build (push) Successful in 5m54s Details CI/CD / publish (push) Has been skipped Details Reviewed-on: #69	2026-04-28 01:02:45 +00:00
Michal	137711fdf6	feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3) Some checks failed CI/CD / lint (pull_request) Successful in 53s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / typecheck (pull_request) Successful in 2m53s Details CI/CD / smoke (pull_request) Failing after 1m47s Details CI/CD / build (pull_request) Successful in 6m20s Details CI/CD / publish (pull_request) Has been skipped Details Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars publish virtual Llms with distinct names but a shared poolName, then: 1. /api/v1/llms/<name>/members surfaces both with the correct effective pool key, size, activeCount, and per-member kind/status. 2. Chat through an agent pinned to one pool member dispatches across the pool — verified by running 12 calls and asserting at least one response from each backend (the random-shuffle selection would have to hit only-A or only-B in 12 fair coin flips, ~1/2048). 3. Failover: stop one publisher, the surviving member still serves chat. /members shows the stopped row as inactive immediately (unbindSession runs synchronously on SSE close). docs/virtual-llms.md gets a full "LB pools (v4)" section with the two-field schema model, dispatcher selection + failover semantics, public + virtual declaration examples, list/describe rendering, the "pin to specific instance" escape hatch, and an API surface entry for /members. docs/agents.md cross-link extended. Tests: full smoke 144/144 (was 141, +3 for the new pool smoke). Stages 1-3 ship the complete v4 — public and virtual Llms can both join pools, agents transparently load-balance across them, yaml round-trip preserves poolName, and the existing single-Llm world keeps working byte-identically when poolName is null.	2026-04-27 23:22:15 +01:00
Michal	e21f96080d	feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2) Surfaces the v4 pool model end-to-end: - mcpd: GET /api/v1/llms/:name/members returns the effective pool the named anchor belongs to, plus aggregate stats (size, activeCount, explicit vs implicit pool key). RBAC inherits from `view:llms` — same as the single-Llm route. Members are full LlmView shapes so callers don't need a second roundtrip to render the pool block. - mcpd: VirtualLlmService.register accepts an optional `poolName` on RegisterProviderInput; the route's `coerceProviderInput` validates the same character set as CreateLlmSchema.poolName. Backwards compatible — older mcplocals that don't send the field continue to publish solo Llms. - CLI `get llm` table: new POOL column right after NAME. Solo rows show "-" so the "no pool / pool of 1" case is unambiguous (per user direction "make sure we see it, prominently visible and impossible to mistake"). - CLI `describe llm`: fetches /members and renders a Pool block at the top of the detail view when the row is in an explicit pool OR when its implicit pool has size > 1. Each member line shows kind/status; the anchor row gets "← this row". Block is suppressed for solo rows so describe stays compact in the common case. - CLI `create llm --pool-name <name>` flag and apply schema both accept the new field. Yaml round-trip preserves it: get -o yaml emits `poolName: <name>`, apply -f re-imports it without diff. Verified end-to-end against the live mcpd. - mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts and registrar.ts thread it through into the register payload. Use case for distributed inference: each user's mcplocal picks a unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName` (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool that auto-grows as workers come online. - Shell completions: regenerated from source via the existing scripts/generate-completions.ts. `--pool-name` now suggests in fish + bash for `mcpctl create llm`. Tests: +3 new mcpd route tests for /members (explicit pool, solo pool of 1, missing-anchor 404). All suites green: mcpd 868/868 (was 865, +3), mcplocal 723/723, cli 437/437. Stage 3 (next): live smoke against 2 publishers sharing a pool name + docs.	2026-04-27 23:18:53 +01:00
Michal	7949e1393d	feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1) Adds LB-pool-by-shared-name without introducing a new resource. The existing `Llm.name` stays globally unique; a new optional `poolName` column declares membership in a pool. Multiple Llms sharing a non-null `poolName` stack into one load-balanced pool that the chat dispatcher expands at request time. Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are addressable as a "pool of 1" via their own name, so existing single-Llm agents and YAMLs keep working unchanged. A solo row whose name happens to match an explicit poolName joins the same pool — by design — so an operator can transparently promote an existing Llm to pool seed. Dispatcher (chat.service): prepareContext now resolves a randomly- shuffled list of viable pool candidates (status != inactive) once per turn. runOneInference and streamInference iterate the list on transport-level failure (network, virtual publisher disconnect) until one succeeds or the list is exhausted. Streaming failover only covers "failed before first chunk" — once we've yielded text, we're committed to that backend. Auth/4xx errors surfaced as result.status are NOT retried; siblings with the same key/model would fail identically. When the agent's pinned Llm is itself inactive but a sibling pool member is up, dispatch transparently uses the sibling — that's the whole point. When every member is inactive, prepareContext throws a clear "No active Llm in pool '<key>' (pinned: <name>)" error rather than letting the dispatcher's "exhausted" branch surface it. Tests: - 5 new chat-service tests for pool dispatch / failover / pinned-down / all-inactive (chat-service.test.ts). - 7 new db schema tests for the column, the unique-name invariant, the fallback-to-name semantics, and the solo-name-joins-explicit-pool edge case (llm-pool-schema.test.ts). - mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions. Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool stats on the existing single-Llm route, CLI POOL column + describe block + --pool-name flag, yaml round-trip.	2026-04-27 22:02:41 +01:00
michal	c0b4dc89f3	Merge pull request 'chore: fulldeploy uses bao-backed pulumi wrapper for drift check' (#68 ) from chore/fulldeploy-pulumi-wrapper into main Some checks failed CI/CD / lint (push) Successful in 54s Details CI/CD / test (push) Successful in 1m8s Details CI/CD / typecheck (push) Successful in 2m23s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 5m46s Details CI/CD / publish (push) Has been skipped Details Reviewed-on: #68	2026-04-27 20:21:33 +00:00
Michal	7f49294b36	chore(fulldeploy): use kubernetes-deployment/scripts/pulumi.sh wrapper Some checks failed CI/CD / lint (pull_request) Successful in 2m22s Details CI/CD / typecheck (pull_request) Successful in 2m57s Details CI/CD / test (pull_request) Failing after 4m36s Details CI/CD / smoke (pull_request) Has been skipped Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish (pull_request) Has been skipped Details The pre-flight drift check now calls the bao-backed pulumi wrapper that landed with the litellm key persistence work, so deploys no longer need PULUMI_CONFIG_PASSPHRASE in .env or shell env. The passphrase is fetched from OpenBao at runtime by the wrapper and exec-passed to pulumi only — never touches the parent shell's state. Falls back to a clear warning if the wrapper isn't present (older clone of kubernetes-deployment) instead of pretending to skip the check silently.	2026-04-27 19:14:36 +01:00
michal	f5bdeea8e7	Merge pull request 'feat: virtual agents v3 (Stages 1-3) + real fixes for chat/adapter/CLI thread format' (#67 ) from feat/virtual-agent-v3 into main Some checks failed CI/CD / typecheck (push) Successful in 55s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / lint (push) Successful in 2m32s Details CI/CD / smoke (push) Failing after 1m44s Details CI/CD / build (push) Successful in 5m4s Details CI/CD / publish (push) Has been skipped Details Reviewed-on: #67	2026-04-27 18:06:59 +00:00
Michal	1998b733b2	feat(cli+docs): mcpctl get agent KIND/STATUS columns + virtual-agent smoke + docs (v3 Stage 4) Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m10s Details CI/CD / typecheck (pull_request) Successful in 2m30s Details CI/CD / build (pull_request) Successful in 2m36s Details CI/CD / smoke (pull_request) Failing after 5m56s Details CI/CD / publish (pull_request) Has been skipped Details CLI: `mcpctl get agent` table view gains KIND and STATUS columns mirroring the `get llm` shape from v1. Public agents render as `public/active` (the AgentRow defaults) and virtual ones surface their true lifecycle state, so `mcpctl get agent` becomes a single-pane view for both manually-created and mcplocal-published personas. Smoke: tests/smoke/virtual-agent.smoke.test.ts mirrors virtual-llm's in-process registrar pattern — publishes a fake provider + agent in one round-trip, confirms mcpd surfaces the agent kind=virtual / status=active under /api/v1/agents, then disconnects and verifies the paired Llm-and-Agent both flip to inactive (deletion is GC-driven, not disconnect-driven, so the rows must still exist post-stop). Heartbeat- stale and 4 h sweep paths are covered by the unit suite to keep smoke duration in check. Docs: docs/virtual-llms.md gets a "Virtual agents (v3)" section with a config sample, lifecycle notes, listing example, and the cluster-wide name-uniqueness caveat. The API surface block now mentions the new `agents[]` field on _provider-register, the join-by-session heartbeat behavior, and the `GET /api/v1/agents` lifecycle fields. docs/agents.md gains a one-paragraph note pointing to the v3 publishing path. Tests: full smoke suite 141/141 (was 139, +2 new), unit suites unchanged (mcpd 860/860, mcplocal 723/723).	2026-04-27 18:47:03 +01:00
Michal	610808b9e7	fix(chat): real fixes for thinking-model + URL conventions, not test tweaks Some checks failed CI/CD / lint (pull_request) Successful in 54s Details CI/CD / test (pull_request) Successful in 1m7s Details CI/CD / typecheck (pull_request) Successful in 2m37s Details CI/CD / smoke (pull_request) Failing after 1m43s Details CI/CD / build (pull_request) Successful in 5m42s Details CI/CD / publish (pull_request) Has been skipped Details Five real bugs surfaced by the agent-chat smoke against live qwen3-thinking. None of these are fixed by changing the test — the test was right to fail. 1. openai-passthrough adapter doubled `/v1` in the request URL. The adapter hard-codes `/v1/chat/completions` after the configured base, but every OpenAI-compat provider documents its base URL with a trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users pasting that conventional shape produced `https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style) is preserved. 2. Non-streaming chat returned an empty assistant when thinking models (qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only `reasoning_content` with `content: null`. extractChoice now also pulls reasoning (every spelling the streaming parser already knows about), and a new pickAssistantText helper falls back to it when content is empty. A `[response truncated by max_tokens]` marker is appended when finish_reason is `length`, so users see the cut-off instead of guessing why the answer is short. Symmetric streaming fix: the chatStream loop accumulates reasoning and yields ONE synthesized `text` frame at the end when content stayed empty, keeping the CLI's stdout (which only prints `text` deltas) in sync with the persisted thread message. 3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3 lifecycle field) instead of `kind: agent` (apply envelope), so round-tripping through `apply -f` failed. Same fix shape as the v1 Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/ inactiveSince/providerSessionId for the agents resource too. 4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on stderr; streaming printed `(thread: <cuid>)` (with space). Tests and any other regex watching for one form missed the other. Standardize on `thread: <cuid>` (single space) in both paths. 5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on success — making any `expect(stderr).toMatch(...)` assertion structurally impossible to satisfy in the happy path. Switch to `spawnSync` so stderr is actually captured. Includes a small shell-style argv splitter so the existing call sites with quoted multi-word values (`--system-prompt "..."`) keep working. Tests: +6 new mcpd unit tests (4 chat-service for the reasoning fallback / truncation marker / content-preference / streaming synth; 2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd + mcplocal + smoke green: 860/860 + 723/723 + 139/139.	2026-04-27 18:39:01 +01:00
Michal	58bc277242	feat(mcpd+mcplocal): register-agents endpoint + mcplocal agents block (v3 Stage 3) Extends the existing `_provider-register` payload with an optional `agents` array so a single round-trip atomically publishes both virtual Llms and their pinned virtual Agents. v1/v2 publishers (providers-only) keep working unchanged — the agents path is gated on the route receiving an AgentService instance, otherwise it logs a warning and ignores the array. mcplocal config gains a top-level `agents` block (loadLocalAgents) mirroring the providers shape. The registrar reads it, builds RegistrarPublishedAgent entries against the published provider names, and folds them into the same register POST. mcpd routes the agents through AgentService.registerVirtualAgents(sessionId, ..., ownerId), which was added in Stage 2. No CLI changes here — `mcpctl chat <virtual-agent>` already works once chat.service has the kind=virtual branch (Stage 1) and the agents are present in the Agent table. CLI columns + smoke land in Stage 4.	2026-04-27 18:38:37 +01:00
Michal	c7b1bd8e2c	feat(mcpd): AgentService virtual methods + GC cascade (v3 Stage 2) State machine for kind=virtual Agent rows. Mirrors what VirtualLlmService did for Llms in v1, then wires both lifecycles together so disconnect/heartbeat/GC cascade through both at once. AgentRepository: - create/update accept the new lifecycle fields (kind, providerSessionId, status, lastHeartbeatAt, inactiveSince). - Adds findBySessionId, findByLlmId, findStaleVirtuals, findExpiredInactives. AgentService — new virtual-agent methods: - registerVirtualAgents(sessionId, inputs, ownerId) — sticky upsert. New names insert as kind=virtual/status=active. Existing virtuals owned by the same session reactivate; existing inactive virtuals from a foreign session can be adopted (sticky reconnect). Refuses to overwrite a public agent or a foreign session's still-active virtual (HTTP 409). Pinned LLM is resolved via LlmService — caller posts Llms first. - heartbeatVirtualAgents(sessionId) — bumps owned agents on a session heartbeat; revives inactive rows. - markVirtualAgentsInactiveBySession(sessionId) — disconnect cascade. - deleteVirtualAgentsForLlm(llmId) — defensive cascade for the GC's Llm-delete step (Agent.llmId is Restrict). - gcSweepVirtualAgents() — same shape as VirtualLlmService.gcSweep (90s heartbeat-stale → inactive, 4h inactive → delete). VirtualLlmService: - Optional AgentService dependency. heartbeat() now also bumps owned agents; unbindSession() flips them inactive. gcSweep() runs the agent sweep FIRST (so any agent that would block an Llm delete via Restrict is already gone), and adds a defensive deleteVirtualAgentsForLlm step right before each Llm delete in case an agent's heartbeat lagged its Llm's just enough to escape this round's 4h cutoff. main.ts: - VirtualLlmService construction moves below AgentService so it can receive the cascade dependency. Tests: 13 new in virtual-agent-service.test.ts cover all the register variants (insert, sticky reconnect, adopt-inactive-foreign, refuse public-overwrite, refuse foreign-session-active), heartbeat-revive, disconnect-cascade, deleteVirtualAgentsForLlm scope, GC sweep flip + delete + idempotence, and three VirtualLlmService cascade scenarios (unbindSession, gcSweep deleting agent before Llm, defensive cascade when agent's heartbeat lagged). mcpd suite: 854/854 (was 841 + 13 new). Workspace unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:07:23 +01:00
Michal	9afd24a3aa	feat(db+mcpd): Agent lifecycle + chat.service kind=virtual branch (v3 Stage 1) Two pieces of v3 plumbing — schema + the latent v1 chat.service bug. Schema (db): - Agent gains kind/providerSessionId/lastHeartbeatAt/status/inactiveSince mirroring Llm's v1 lifecycle. Reuses LlmKind / LlmStatus enums; no new types. Existing rows backfill kind=public/status=active so v1 CRUD is unaffected. - @@index([kind, status]) for the GC sweep, @@index([providerSessionId]) for disconnect-cascade lookups. - 4 new prisma-level tests cover defaults, persisting virtual fields, the (kind, status) GC index, and providerSessionId lookups. Total agent-schema tests: 20/20. chat.service (mcpd) — fixes the v1 latent bug: - LlmView's kind is now plumbed through prepareContext as ctx.llmKind. - Two new private helpers, runOneInference / streamInference, branch on ctx.llmKind: 'public' goes through the existing adapter registry, 'virtual' relays through VirtualLlmService.enqueueInferTask (mirrors the route-handler branch from v1 Stage 3). - Streaming bridges VirtualLlmService's onChunk callback API to an async iterator via a small queue + wake pattern. - ChatService gains an optional virtualLlms constructor parameter; main.ts wires it in. Older test wirings without it raise a clear "virtualLlms dispatcher not wired" error when the row is virtual, rather than silently falling through to the public path against an empty URL. This unblocks any Agent (public OR future v3-virtual) pinned to a kind=virtual Llm. Pre-this-stage, those agents 502'd against the empty url field. Tests: 4 new chat-service-virtual-llm.test.ts cover the relay path non-streaming, streaming, missing-dispatcher error, and rejection surfacing. mcpd suite: 841/841 (was 833, +8 across stages 1+v3-Stage-1). Workspace: 2054/2054 across 153 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:07:23 +01:00
michal	9374a2652b	perf: vitest threads pool + Dockerfile pnpm cache mount (#66 ) Some checks failed CI/CD / lint (push) Successful in 58s Details CI/CD / test (push) Successful in 1m11s Details CI/CD / typecheck (push) Successful in 2m35s Details CI/CD / smoke (push) Failing after 1m43s Details CI/CD / build (push) Successful in 2m21s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 16:07:05 +00:00
Michal	18245be0c1	perf: vitest threads pool + Dockerfile pnpm cache mount Some checks failed CI/CD / typecheck (pull_request) Successful in 56s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / lint (pull_request) Successful in 2m40s Details CI/CD / smoke (pull_request) Failing after 1m43s Details CI/CD / build (pull_request) Failing after 7m6s Details CI/CD / publish (pull_request) Has been skipped Details Two tuning knobs that were leaving most of the host idle: 1) vitest.config.ts pool=threads with maxThreads ≈ cores/2. Default left this 64-core workstation at ~10% CPU during \`pnpm test:run\`. Threads pool uses the box: same 152-file/2050-test suite now runs at ~700% CPU instead of ~150%. Wall time gain is modest (workload is dominated by a handful of slow individual files that one thread must run serially), but the parallel headroom is there for when the suite grows. Cap = max(2, cores/2) keeps laptops reasonable; override with \`VITEST_MAX_THREADS=N\` in the env. 2) Dockerfile.mcpd uses BuildKit cache mounts on both pnpm install steps. Adds \`# syntax=docker/dockerfile:1.6\` and a \`--mount=type=cache,target=/root/.local/share/pnpm/store\` so pnpm's content-addressed store survives across image rebuilds. Cold rebuilds where the lockfile changed are unaffected; warm rebuilds where only source changed drop the install step from ~60s to <5s. fulldeploy.sh's mcpd image rebuild gets that back minus the docker push hash mismatch. Test parity: 2050/2050 across 152 files; per-package mcpd 837/837. Both unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:06:39 +01:00
michal	45c7737ee1	feat: virtual LLMs v2 (wake-on-demand) (#65 ) Some checks failed CI/CD / lint (push) Successful in 54s Details CI/CD / test (push) Successful in 1m12s Details CI/CD / typecheck (push) Successful in 2m42s Details CI/CD / smoke (push) Failing after 1m43s Details CI/CD / build (push) Successful in 2m33s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 14:20:59 +00:00
Michal	e0cfe0ba4d	feat: virtual-LLM v2 smoke + docs (v2 Stage 3) Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / typecheck (pull_request) Successful in 2m43s Details CI/CD / smoke (pull_request) Failing after 1m44s Details CI/CD / build (pull_request) Successful in 5m28s Details CI/CD / publish (pull_request) Has been skipped Details Closes v2 (wake-on-demand). Same shape as v1's stage 6: smoke exercises the live-cluster path, docs lose the "v2 reserved" caveat and gain a full wake-recipe section. Smoke (virtual-llm.smoke.test.ts): - New "wake-on-demand" describe block runs alongside the v1 tests. - Spins a tiny in-process HTTP "wake controller"; the published provider's isAvailable() returns false until the wake POST flips the bool. Asserts: 1. Provider publishes as kind=virtual / status=hibernating. 2. First inference triggers the wake recipe, the recipe POSTs to the controller, the provider becomes available, mcpd relays the inference, and the row settles to active. - Cleans up the row + wake server in afterAll. Docs (docs/virtual-llms.md): - Lifecycle table updates the `hibernating` description from "reserved for v2" to the actual v2 semantics. - New "Wake-on-demand (v2)" section: configuration shapes for both recipe types (http + command), the wake-then-infer flow diagram, concurrent-infer dedup, failure semantics. - Roadmap drops v2; v3-v5 still listed. Workspace: 2050/2050 (smoke runs separately; the new SSE-based wake test runs only against a live cluster, not under \`pnpm test:run\`). v2 closes. v3 = virtual agents, v4 = LB pool by model, v5 = queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:20:18 +01:00
Michal	db839afc57	feat(mcpd): wake-before-infer for hibernating virtual LLMs (v2 Stage 2) Second half of v2. mcpd now dispatches a \`wake\` task on the SSE control channel when an inference request hits a row whose status=hibernating, waits for the publisher to confirm readiness, then proceeds with the infer task. Concurrent infers for the same hibernating Llm share a single wake task — \`wakeInFlight\` map dedupes by Llm name. State machine in enqueueInferTask: active → push infer task immediately (existing path). inactive → 503, publisher offline (existing path). hibernating → ensureAwake() → push infer task (new in v2). ensureAwake/runWake (private): - Allocates a fresh taskId on the existing PendingTask plumbing. - Pushes \`{ kind: "wake", taskId, llmName }\` on the SSE handle. - Awaits the publisher's result POST. On 2xx, flips the row to active + bumps lastHeartbeatAt, so all queued + future infers hit the active path. On non-2xx or service.failTask, the row stays hibernating (next request retries). Tests: 4 new in virtual-llm-service.test.ts cover happy path (wake → infer in order), concurrent dedup (3 parallel infers, 1 wake task), wake failure surfaces to all queued infers and leaves the row hibernating, inactive ≠ hibernating (still rejects with 503, no wake attempt). 22/22 service tests, 2050/2050 workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:18:24 +01:00
Michal	af0fabd84f	feat(mcplocal+mcpd): wake-recipe config + wake-task execution (v2 Stage 1) First half of v2 — mcplocal can now declare a hibernating backend and respond to a `wake` task by running a configured recipe. v2 Stage 2 will wire mcpd to dispatch the wake task before relaying inference. Config (LlmProviderFileEntry): - New \`wake\` block on a published provider: wake: type: http # or: command url: ... # http only method: POST # http only, default POST headers: {...} # http only body: ... # http only command: ... # command only args: [...] # command only maxWaitSeconds: 60 # how long to poll isAvailable() after wake fires Registrar (mcplocal): - At publish time, providers with a wake recipe whose isAvailable() returns false report initialStatus=hibernating to mcpd. Without a wake recipe (legacy v1) or when already up, status stays active. - handleWakeTask: runs the recipe (HTTP request OR child-process spawn), then polls isAvailable() up to maxWaitSeconds, sending a heartbeat each loop so mcpd's GC sweep doesn't time us out mid-boot. Reports { ok, ms } on success or { error } on timeout/recipe failure via the existing _provider-task/:id/result. - Replaces the v1 stub that rejected wake tasks with "not implemented". mcpd VirtualLlmService: - RegisterProviderInput gains optional initialStatus ('active' \| 'hibernating'). The register/upsert path uses it for both new and reconnecting rows. Defaults to 'active' so v1 publishers still work unchanged. - Provider-register route's coercer accepts the new field. Tests: 3 new in registrar.test.ts cover initialStatus selection (hibernating when wake configured + unavailable, active otherwise, active when no wake even if unavailable). 8/8 registrar tests, 833/833 mcpd unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:15:46 +01:00
michal	700d1683c2	fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML (#64 ) Some checks failed CI/CD / lint (push) Successful in 56s Details CI/CD / test (push) Successful in 1m11s Details CI/CD / typecheck (push) Successful in 2m49s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 3m10s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 13:47:18 +00:00
Michal	2a44f60785	fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m12s Details CI/CD / typecheck (pull_request) Successful in 2m59s Details CI/CD / smoke (pull_request) Failing after 1m44s Details CI/CD / build (pull_request) Successful in 6m35s Details CI/CD / publish (pull_request) Has been skipped Details The smoke test \`llm.smoke > round-trips yaml output → apply -f\` failed after v1 of the virtual-LLM feature: \`mcpctl get llm <name> -o yaml\` output now starts with \`kind: public\` (the new schema column) instead of \`kind: llm\` (the apply-doc envelope), because toApplyDocs spread the cleaned item AFTER setting the kind, so the cleaned item's \`kind\` overwrote. Fix: in toApplyDocs, when serialising the \`llms\` resource, drop the new lifecycle fields (kind, status, lastHeartbeatAt, inactiveSince, providerSessionId) before merging. They collide with the apply-doc envelope and aren't apply-able anyway — they're derived runtime state owned by VirtualLlmService. Public-LLM round-trip is now byte-clean (those fields default to public/active anyway). Virtual rows are created by the registrar, not via apply -f, so dropping them on output is the right call. CLI suite: 437/437. Smoke will re-run against the live mcpd via scripts/release.sh after merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:47:00 +01:00
michal	65b6b265d9	feat: virtual LLMs v1 (registration skeleton) (#63 ) Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m12s Details CI/CD / typecheck (push) Successful in 2m13s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 4m50s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 13:38:50 +00:00
Michal	866f6abc88	feat: virtual-LLM smoke test + docs (v1 Stage 6) Some checks failed CI/CD / typecheck (pull_request) Successful in 53s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / lint (pull_request) Successful in 2m6s Details CI/CD / smoke (pull_request) Failing after 1m39s Details CI/CD / build (pull_request) Successful in 2m11s Details CI/CD / publish (pull_request) Has been skipped Details Final stage of v1. Smoke (mcplocal/tests/smoke/virtual-llm.smoke.test.ts): - Spins an in-process LlmProvider that returns canned content. - Runs the registrar against the live mcpd in fulldeploy. - Asserts: row appears with kind=virtual / status=active, infer through /api/v1/llms/<name>/infer comes back through the SSE relay with the provider's content + finish_reason, and a 503 appears immediately after registrar.stop() (publisher offline). - Times out / cleanup paths idempotent so re-runs against the same cluster don't litter rows. The 90-s heartbeat-stale flip and 4-h GC are unit-tested — too slow for smoke. Docs: - New docs/virtual-llms.md: when to use this vs creating a regular Llm row, how to opt-in via publish: true, the lifecycle table, the inference-relay sequence, the v1 streaming caveat, the v2-v5 roadmap, and the full /api/v1/llms/_provider-* surface. - agents.md cross-links virtual-llms.md alongside personalities/chat. - README's Agents section gains a "Virtual LLMs" subsection. Workspace suite: 2043/2043 (smoke files run separately). v1 closes. Stage roadmap (each its own future PR): v2 wake-on-demand · v3 virtual agents · v4 LB pool · v5 task queue Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:28:43 +01:00
Michal	7e6b0cab44	feat(cli): mcpctl chat-llm + KIND/STATUS columns (v1 Stage 5) Closes the loop on user-facing surface: $ mcpctl get llm NAME KIND STATUS TYPE MODEL TIER KEY ID qwen3-thinking public active openai qwen3-thinking fast ... ... vllm-local virtual active openai Qwen/Qwen2.5-7B-Instruct fast - ... $ mcpctl chat-llm vllm-local ──────────────────────────────────────── LLM: vllm-local openai → Qwen/Qwen2.5-7B-Instruct-AWQ Kind: virtual Status: active ──────────────────────────────────────── > hello? Hi! … New: chat-llm command (commands/chat-llm.ts) - Stateless chat with any mcpd-registered LLM. No threads, no tools, no project prompts. POSTs to /api/v1/llms/<name>/infer; mcpd's kind=virtual branch handles relay-through-mcplocal transparently, so the same CLI command works for both public and virtual LLMs. - Reuses installStatusBar / formatStats / recordDelta / styleStats / PhaseStats from chat.ts (now exported) so the bottom-row tokens-per- second ticker behaves identically to mcpctl chat. - Flags: --message (one-shot), --system, --temperature, --max-tokens, --no-stream. Streaming uses OpenAI chat.completion.chunk SSE. - REPL mode keeps a per-session history array so multi-turn flows feel natural; each turn is an independent inference call. Updated: get.ts - LlmRow gains optional kind/status fields. - llmColumns layout: NAME, KIND, STATUS, TYPE, MODEL, TIER, KEY, ID. Defaults gracefully when older mcpd responses don't return them. Updated: chat.ts - Re-exports the helpers chat-llm.ts needs (PhaseStats, newPhase, recordDelta, formatStats, styleStats, styleThinking, STDERR_IS_TTY, StatusBar, installStatusBar). No behavior change. Completions: chat-llm picks up the standard option enumeration automatically; bash gets a special-case for first-arg LLM-name completion via _mcpctl_resource_names "llms". CLI suite: 437/437 (was 430, +7 from auto-discovered test cases in the regenerated completions golden). Workspace: 2043/2043 across 152 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:25:38 +01:00
Michal	97174f450f	feat(mcplocal): virtual-LLM registrar (v1 Stage 4) The mcplocal counterpart to mcpd's VirtualLlmService. After this stage, flipping \`publish: true\` on a provider in ~/.mcpctl/config.json makes the provider show up in mcpctl get llm with kind=virtual the next time mcplocal restarts; running an inference against it relays through this client back to the local LlmProvider. Config: - LlmProviderFileEntry gains optional \`publish: boolean\` (default false, so existing setups don't change). Registrar (new file: providers/registrar.ts): - start(): if any provider is opted-in, POSTs to /api/v1/llms/_provider-register with the publishable set, persists the returned providerSessionId to ~/.mcpctl/provider-session for sticky reconnects, then opens the SSE control channel and starts a 30-s heartbeat ticker. - SSE listener parses event/data lines from text/event-stream frames. task frames trigger handleInferTask: convert OpenAI body to CompletionOptions, call provider.complete(), POST the result back as either { status, body } (non-streaming) or two chunk POSTs (streaming: one delta + a [DONE] marker). - Disconnect → exponential backoff reconnect from 5 s up to 60 s. On successful reconnect the persisted sessionId revives the same Llm rows in mcpd (mcpd flips them back to active on heartbeat). - stop() destroys the SSE socket and clears the timer; cleanly handed off from main.ts's existing shutdown handler. Wired into mcplocal main.ts via maybeStartVirtualLlmRegistrar: - Filters opted-in providers, looks up their LlmProvider instances in the registry. - Reads ~/.mcpctl/credentials for mcpdUrl + bearer; absence is a best-effort skip (logs a warning, returns null) — never a boot blocker. v1 caveat documented in the file header: LlmProvider returns a finalized CompletionResult, not a token stream, so streaming requests get a single delta chunk + [DONE]. Real per-token streaming is a v2 concern. Tests: 5 new in tests/registrar.test.ts using a tiny in-process HTTP server. Cover: no-op when nothing opted-in, register POST + sticky sessionId persistence, sticky reconnect from disk, heartbeat ticker fires at the configured interval, register HTTP error surfaces. Workspace suite: 2043/2043 across 152 files (was 2006/149, +5 new tests + the new file gets discovered). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:20:54 +01:00
Michal	192a3831df	feat(mcpd): virtual-LLM routes + GC ticker (v1 Stage 3) End-to-end backend wiring. After this stage, an mcplocal client can register a provider, hold the SSE channel open, heartbeat, and have its inference requests fanned through the relay — all without touching the agent layer or the public-LLM path. Routes (new file: routes/virtual-llms.ts): POST /api/v1/llms/_provider-register → returns { providerSessionId, llms[] } GET /api/v1/llms/_provider-stream → SSE channel keyed by x-mcpctl-provider-session header. Emits `event: hello` on open, `event: task` on inference fan-out, `: ping` every 20 s for proxies. POST /api/v1/llms/_provider-heartbeat → bumps lastHeartbeatAt POST /api/v1/llms/_provider-task/:id/result → mcplocal pushes result back; body shape is one of: { error: 'msg' } { chunk: { data, done? } } { status, body } LlmService: - LlmView gains kind/status/lastHeartbeatAt/inactiveSince so route handlers + the upcoming `mcpctl get llm` columns can branch on kind without re-fetching the row. llm-infer.ts: - Detects llm.kind === 'virtual' and delegates to VirtualLlmService.enqueueInferTask. Streaming + non-streaming both supported; on 503 (publisher offline) the existing audit hook still fires with the right status code. - Adds optional `virtualLlms: VirtualLlmService` to LlmInferDeps; absence in test fixtures returns a 500 with a clear "server misconfiguration" message rather than silently falling through to the public path against an empty URL. main.ts: - Constructs VirtualLlmService(llmRepo). - Passes it to registerLlmInferRoutes. - Calls registerVirtualLlmRoutes(app, virtualLlmService). - 60-s GC ticker started after app.listen; clears on graceful shutdown alongside the existing reconcile timer. Tests: 11 new virtual-LLM route assertions (validation paths, service plumbing for register/heartbeat/task-result) + 3 new infer-route assertions (kind=virtual non-streaming relay, 503 path, 500 when virtualLlms dep missing). mcpd suite: 833/833 (was 819, +14). Typecheck clean. The full SSE handshake is exercised by the smoke test in Stage 6; under app.inject the keep-alive blocks until close so unit-level SSE testing isn't worth the complexity here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:15:18 +01:00
Michal	2215922618	feat(mcpd): VirtualLlmService + repo lifecycle helpers (v1 Stage 2) The state machine for kind=virtual Llm rows. Wires the schema added in Stage 1 into something that can register, heartbeat, time out, and relay inference tasks. The HTTP routes (Stage 3) plug into this. Repository (extends ILlmRepository): - create/update accept kind/providerSessionId/lastHeartbeatAt/status/ inactiveSince/type so VirtualLlmService can drive the lifecycle. - findBySessionId(sessionId) — the reconnect lookup. - findStaleVirtuals(cutoff) — heartbeat-stale rows for the GC sweep. - findExpiredInactives(cutoff) — 4h-expired rows for deletion. VirtualLlmService: - register(): sticky-id-aware upsert. New names insert as kind=virtual/ status=active. Existing virtual rows from the same session reactivate in place; existing inactive virtuals from a foreign session can be adopted (sticky reconnect). Refuses to overwrite a public row or a foreign session's still-active virtual. - heartbeat(): bumps lastHeartbeatAt for every row owned by the session; revives inactive rows. - bindSession()/unbindSession(): in-memory map of sessionId → SSE handle. Disconnect immediately flips owned rows to inactive AND rejects any in-flight tasks for that session. - enqueueInferTask(): pushes an `infer` task frame to the SSE handle, returns a PendingTaskRef whose `done` resolves when the publisher POSTs the result back. Streaming variant exposes onChunk(cb). - completeTask/pushTaskChunk/failTask: route-side hooks called from the result POST handler (lands in Stage 3). - gcSweep(): flips heartbeat-stale active virtuals to inactive (90s cutoff), deletes inactives past 4h. Idempotent. Lifecycle constants live in this file (HEARTBEAT_TIMEOUT_MS=90s, INACTIVE_RETENTION_MS=4h) so future stages can tune in one place. 18 new mocked-repo tests cover: register variants (insert, sticky reconnect, refuse public-overwrite, refuse foreign-session, adopt inactive-foreign), heartbeat-revive, unbind cascade, enqueue happy path + 503 paths (no session, inactive, public-Llm), complete/fail/ streaming chunk fan-out, GC sweep flip + delete + idempotence. mcpd suite: 819/819 (was 801, +18). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:05:19 +01:00
Michal	1acd8b58bc	feat(db): Llm.kind discriminator + virtual-provider lifecycle (v1 Stage 1) First step of the virtual-LLM feature. A virtual Llm row is one that gets registered by an mcplocal client rather than created via \`mcpctl create llm\`. Its inference is relayed back through an SSE control channel to the publishing session (mcpd routes added in Stage 3). The lifecycle fields below let mcpd reap stale rows when the publisher goes away. Schema additions: - enum LlmKind (public \| virtual). Default public. - enum LlmStatus (active \| inactive \| hibernating). Default active. hibernating is reserved for v2 wake-on-demand. - Llm.kind, providerSessionId, lastHeartbeatAt, status, inactiveSince. - @@index([kind, status]) for the GC sweep. - @@index([providerSessionId]) for the reconnect lookup. All existing rows backfill with kind=public/status=active so v1 is purely additive — public LLMs ignore the lifecycle columns entirely. 7 new prisma-level assertions in tests/llm-virtual-schema.test.ts cover: defaults, persisting kind=virtual + lifecycle together, the active→inactive flip, hibernating value, enum rejection, the (kind,status) GC index, the providerSessionId reconnect index. mcpd suite still 801/801 (regenerated client) and typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:59:44 +01:00
michal	e65a396d3e	fix(cli): status probe accepts reasoning_content for thinking models (#62 ) Some checks failed CI/CD / typecheck (push) Successful in 56s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / lint (push) Successful in 2m40s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 5m5s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 11:10:15 +00:00
Michal	a84214dad1	fix(cli): status probe accepts reasoning_content for thinking models Some checks failed CI/CD / typecheck (pull_request) Successful in 56s Details CI/CD / lint (pull_request) Successful in 3m6s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / build (pull_request) Successful in 2m39s Details CI/CD / smoke (pull_request) Failing after 3m58s Details CI/CD / publish (pull_request) Has been skipped Details Live deploy showed qwen3-thinking failing the probe with "empty content": at max_tokens=8 the model spent its entire budget on the reasoning trace and never emitted a final \`content\` block. Fix: - Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap models but gives reasoning models enough headroom. - If \`message.content\` is empty but \`reasoning_content\` is non-empty, count it as alive and prefix the preview with "[thinking]" so the user knows the model didn't actually answer "hi" but is responsive. - Replace the prompt with the terser "Reply with just: hi" — closer to what a thinking model can short-circuit on. Tests: existing 25 pass; the failure-path test still asserts on the "empty content" path because reasoning_content is empty there too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:09:42 +01:00
michal	54e56f7b71	feat(cli): live "say hi" probe for server LLMs in mcpctl status (#61 ) Some checks failed CI/CD / lint (push) Successful in 57s Details CI/CD / typecheck (push) Successful in 57s Details CI/CD / test (push) Has been cancelled Details CI/CD / smoke (push) Has been cancelled Details CI/CD / build (push) Has been cancelled Details CI/CD / publish (push) Has been cancelled Details	2026-04-27 11:02:26 +00:00
Michal	e4af16477c	feat(cli): live "say hi" probe for server LLMs in mcpctl status Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m13s Details CI/CD / typecheck (pull_request) Successful in 3m10s Details CI/CD / smoke (pull_request) Failing after 1m46s Details CI/CD / build (pull_request) Successful in 3m24s Details CI/CD / publish (pull_request) Has been skipped Details Status was showing the server-side LLM list but not whether each one actually serves inference. This adds a per-LLM probe that POSTs a tiny prompt to /api/v1/llms/<name>/infer: messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }] max_tokens: 8, temperature: 0 Each registered LLM gets a one-line health line: Server LLMs: 2 registered (probing live "say hi"...) fast qwen3-thinking ✓ "hi" 312ms openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY heavy sonnet ✗ upstream auth failed: 401 anthropic → claude-sonnet-4-5 provider default no key Probes run in parallel so a single slow LLM doesn't gate the others; each has its own 15-second timeout. JSON/YAML output gains a \`health: { ok, ms, say?, error? }\` field per server LLM so dashboards get the same liveness signal. Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace suite: 2006/2006 across 149 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:02:00 +01:00
michal	de96af7bf6	feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 (#60 ) Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m9s Details CI/CD / typecheck (push) Failing after 7m9s Details CI/CD / smoke (push) Has been skipped Details CI/CD / build (push) Has been skipped Details CI/CD / publish (push) Has been skipped Details	2026-04-27 10:28:10 +00:00
Michal	0db37e92a4	feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 Some checks failed CI/CD / typecheck (pull_request) Successful in 58s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / lint (pull_request) Successful in 2m14s Details CI/CD / smoke (pull_request) Failing after 1m39s Details CI/CD / build (pull_request) Successful in 2m14s Details CI/CD / publish (pull_request) Has been skipped Details Two related fixes: 1. \`mcpctl status\` now lists mcpd-managed Llm rows (the ones created via \`mcpctl create llm\`) under a new "Server LLMs:" section, grouped by tier with type, model, upstream URL, and key reference. JSON/YAML output gains a \`serverLlms\` array. Bearer token (from \`mcpctl auth login\` / saved credentials) is passed through; if mcpd is unreachable or returns non-200 the section is silently omitted (the existing mcpd connectivity line already conveys that). 6 new tests cover happy path, empty list, token plumbing, and JSON shape. 2. SPA fallback at \`/ui/<deeplink>\` was returning 500 because we registered \`@fastify/static\` with \`decorateReply: false\` and then called \`reply.sendFile\`. Read index.html once at startup and serve it with \`reply.send(html)\` instead — also dodges a per-request stat call. Drop \`decorateReply: false\` so future code can use reply.sendFile if it ever needs to. Full suite: 2005/2005 across 149 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:27:45 +01:00
michal	899f2c750c	fix(test): vitest 4 projects + src/web jsdom env (#59 ) Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / typecheck (push) Successful in 2m37s Details CI/CD / smoke (push) Failing after 1m41s Details CI/CD / build (push) Successful in 2m38s Details CI/CD / publish (push) Has been skipped Details	2026-04-26 20:31:47 +00:00
Michal	bf0a60bc0a	fix(test): switch workspace runner to vitest 4 \`projects\` field Some checks failed CI/CD / typecheck (pull_request) Successful in 57s Details CI/CD / test (pull_request) Successful in 1m7s Details CI/CD / lint (pull_request) Successful in 2m43s Details CI/CD / smoke (pull_request) Failing after 1m45s Details CI/CD / build (pull_request) Successful in 5m43s Details CI/CD / publish (pull_request) Has been skipped Details The workspace-level \`pnpm test:run\` (which fulldeploy.sh runs as a gate) was failing with \`localStorage is not defined\` on the new src/web tests. Two intertwined causes: 1. vitest 4 deprecated \`vitest.workspace.ts\`. The file was being silently ignored, so per-package configs (cli, mcpd, mcplocal) weren't being honored under workspace mode either — the root config was being used for all of them. 2. With the root config in charge, src/web/tests ran with the default Node environment, no \`localStorage\` global, so the api wrapper's test setup blew up. Fix: - Move workspace projects into the root \`vitest.config.ts\` under the new \`projects\` array (the vitest 4 replacement). - Add a proper \`src/web/vitest.config.ts\` (vitest 4 doesn't auto-pick up vite.config.ts as a test config in workspace mode, even though per-package \`pnpm --filter\` does). - Exclude \`src/web/tests/**\` from the root-level include so we don't double-run them under the wrong env. After: \`pnpm test:run\` runs 1999/1999 across 149 files (was 1992/1996 with 4 web failures). Per-package runs unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:31:27 +01:00
michal	c0ba0a9040	feat: web prompt editor + agent personalities (#58 ) Some checks failed CI/CD / typecheck (push) Successful in 56s Details CI/CD / test (push) Failing after 1m10s Details CI/CD / lint (push) Successful in 2m34s Details CI/CD / smoke (push) Has been skipped Details CI/CD / build (push) Has been skipped Details CI/CD / publish (push) Has been skipped Details	2026-04-26 20:21:53 +00:00
Michal	4cbf58d212	feat(mcpd+deploy): serve web UI at /ui + smoke tests + docs (Stage 6) Some checks failed CI/CD / lint (pull_request) Successful in 54s Details CI/CD / test (pull_request) Failing after 1m8s Details CI/CD / typecheck (pull_request) Successful in 2m35s Details CI/CD / smoke (pull_request) Has been skipped Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish (pull_request) Has been skipped Details The closing stage. mcpd now hosts the Stage 5 SPA, the Docker image bundles the build artifact, a smoke test exercises the personality HTTP surface end-to-end, and the user-facing docs spell out the mental model. mcpd: - Add @fastify/static dep. - New routes/web-ui.ts: registers /ui/* against a static bundle. Looks for the bundle at $MCPD_WEB_ROOT, then /usr/share/mcpd/web (the Docker image path), then a dev-tree fallback. Logs and skips cleanly if missing — API-only deploys keep working. - SPA fallback: any /ui/<path> that doesn't match a file falls through to index.html so direct hits to react-router URLs work. - /ui/* falls through to `kind: skip` in mapUrlToPermission, so the static assets are served unauthenticated. Each API call from the SPA still carries the bearer token. Deploy: - Dockerfile.mcpd builds the @mcpctl/web bundle in the same builder stage and copies dist/ to /usr/share/mcpd/web in the runtime image. Smoke (personality.smoke.test.ts): - Live mcpd flow: create secret/llm/agent/personality, attach an agent-direct prompt, verify the binding listing, reject double- attach (409) + foreign-agent prompt (400), set defaultPersonality by name, detach + delete cleanup. Docs: - New docs/personalities.md: VLAN-on-ethernet model, system-block ordering table, three prompt scopes, CLI walkthrough, web UI walkthrough, full API surface, RBAC notes. - agents.md and chat.md cross-link. - README's Agents section gains a Personalities subsection. Test count after Stage 6: mcpd: 801/801 cli: 430/430 web: 7/7 db: 58/62 (4 pre-existing) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:48:43 +01:00
Michal	0010cc18b7	feat(web): browser-based prompt + personality editor (Stage 5) New workspace package @mcpctl/web — a Vite + React 19 SPA that talks to mcpd's existing HTTP API. Bundles to a static dist/ which Stage 6 will bake into the RPM and serve from mcpd at /ui via @fastify/static. Pages: /ui/projects list projects /ui/projects/:name/prompts CRUD project prompts (Monaco editor) /ui/agents list agents /ui/agents/:name tabs: Direct prompts \| Personalities /ui/personalities/:id bind/unbind prompts to a personality Auth: paste a session token (mcpctl auth login) or PAT (mcpctl_pat_*) once on a login screen, kept in localStorage; logout clears it. API client: 60-line fetch wrapper, attaches the bearer header from storage, throws an ApiError with status + parsed body on non-2xx. A 200-line useFetch hook provides loading/error/data without a state-management library — we are not building Notion. UX: - Dark terminal-adjacent theme so the page feels like the CLI. - Monaco @monaco-editor/react for prompt content (markdown mode, word-wrap, search, multi-cursor). - Personality detail's "attach prompt" picker filters in-scope candidates: agent-direct + same-project + globals. Dev loop: pnpm --filter @mcpctl/web dev (vite at :5173, proxies /api to https://mcpctl.ad.itaz.eu — override with MCPCTL_API_URL). Build: pnpm --filter @mcpctl/web build → src/web/dist/. Tests: 7 vitest cases covering the bearer header / 4xx body / 204 no-content path on the api wrapper, and the login storage round-trip + help toggle. Production build green: 269 KB JS / 84 KB gzipped. Typecheck clean (TS strict + exactOptionalPropertyTypes carried over). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:41:57 +01:00
Michal	9050918a83	feat(cli): personality flag + create/get/edit/delete personalities (Stage 4) End-to-end CLI surface for the personality overlay: mcpctl create personality grumpy --agent reviewer --description "be terse" mcpctl create prompt tone --agent reviewer --content "Be very terse." mcpctl get personalities mcpctl get personalities --agent reviewer mcpctl edit personality <id> mcpctl delete personality grumpy --agent reviewer mcpctl chat reviewer --personality grumpy Chat banner gains a "Personality:" line that shows either the active flag value or the agent's `defaultPersonality` (when no flag given), so the user knows which overlay is in effect before sending a message. `--personality` is stripped from `/save` (it's a per-turn override, not a `defaultParams` field — the agent's defaultPersonality lives on its own column and is set via PUT /agents). Backend (small additions to land Stage 4 cleanly): - `GET /api/v1/personalities[?agent=name]` so `mcpctl get personalities` doesn't require an agent filter. - PersonalityService.listAll() aggregates across agents. Completions: regenerated fish + bash. `personalities` added as a canonical resource with `personality` alias; edit-resource list extended; the per-resource argument completers pick up the new type automatically. CLI suite: 430/430. mcpd: 801/801. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:32:48 +01:00
Michal	faef1e732d	feat(mcpd): personality routes + chat system block overlay (Stage 3) End-to-end backend wiring for the agents-feature evolution. After this stage you can curl all the endpoints; CLI + Web UI follow. Routes (new): GET /api/v1/agents/:agentName/personalities POST /api/v1/agents/:agentName/personalities GET /api/v1/personalities/:id PUT /api/v1/personalities/:id DELETE /api/v1/personalities/:id GET /api/v1/personalities/:id/prompts POST /api/v1/personalities/:id/prompts DELETE /api/v1/personalities/:id/prompts/:promptId GET /api/v1/agents/:agentName/prompts (agent-direct) Routes (extended): POST /api/v1/prompts now resolves `agent: <name>` like `project: <name>` POST /api/v1/agents/:name/chat accepts `personality: <name>` RBAC: `personalities` segment maps to the `agents` resource so view/edit/create/delete on the parent agent governs personality access. No new RBAC roles — piggybacking keeps the surface flat. System block (chat.service.ts): agent.systemPrompt + agent-direct prompts (Prompt.agentId === agent.id, priority desc) + project prompts (existing behavior, priority desc) + personality prompts (PersonalityPrompt[chosen], priority desc) + systemAppend Personality is selected by request body `personality: <name>`, falling back to `agent.defaultPersonalityId` if unset. A typo'd flag throws 404 rather than silently dropping back to no overlay — failing loudly on misconfiguration is the only way users learn it didn't apply. Backwards-compatible by construction: when no agent-direct prompts exist and no personality is selected, the resulting block is byte- identical to the old layout (verified by a regression test). Tests: 5 new chat-service.test cases cover ordering, default- personality fallback, missing-personality 404, and the regression guard. mcpd suite: 801/801 (was 796). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:27:59 +01:00
Michal	6b5bd78cfa	feat(mcpd): personality + prompt-by-agent repos and services (Stage 2) Wires the schema landed in Stage 1 into the service layer. No HTTP routes yet — Stage 3 will register `/api/v1/...` endpoints and update chat.service to read agent-direct + personality prompts when building the system block. Repositories: - PersonalityRepository: CRUD + listPrompts/attach/detach bindings. - PromptRepository: findByAgent + findByNameAndAgent; create/update accept the new agentId column. findGlobal now also filters agentId=null so agent-direct prompts don't leak into global lists. - AgentRepository: defaultPersonalityId on create + connect/disconnect in update. Services: - PersonalityService: CRUD scoped per agent, plus attach/detach with scope enforcement — a prompt may bind only if it's agent-direct on the same agent, in the agent's project, or global. Foreign-project / foreign-agent attachments are rejected with 400. - PromptService: createPrompt / upsertByName accept agentId and resolve `agent: <name>`, with XOR-with-project guard. Adds listPromptsForAgent. - AgentService: defaultPersonality (by name on the agent's own personality set) round-trips through update + AgentView. Validation: - prompt.schema.ts: refine() rejects projectId+agentId together. - personality.schema.ts: new Create/Update/AttachPrompt schemas. - agent.schema.ts: defaultPersonality { name } \| null on update. Tests: 12 PersonalityService + 7 PromptService agent-scope tests covering happy paths, XOR/scope enforcement, double-attach guard, detach-not-bound. mcpd suite: 796/796 (was 777). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:20:51 +01:00
Michal	f60f00f1fd	feat(db): add personalities + agent-direct prompts schema (Stage 1) A Personality is a named overlay on top of an Agent — same agent, same LLM, but a different bundle of prompts injected into the system block at chat time. VLAN-on-ethernet semantics: ethernet still works without VLAN; with a VLAN tag, frames are segmented but still ethernet. Schema additions: - Prompt.agentId (nullable FK + index, cascade on delete) so prompts can attach directly to an agent without going through a project. - Personality { id, name, description, agentId, priority } with unique (name, agentId). - PersonalityPrompt join table with per-binding priority override. - Agent.defaultPersonalityId (SetNull on delete) so an agent can pick one personality as the default when no --personality flag is passed. Backwards-compatible by construction: every new column is nullable; existing rows are valid as-is; the chat.service systemBlock changes land in Stage 3. 8 new prisma-level assertions in agent-schema.test.ts cover unique constraints, cascade behavior, the SetNull on defaultPersonalityId, and shared-prompt-across-personalities. All 16 db tests pass; mcpd typecheck + 777 mcpd unit tests still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:12:22 +01:00
michal	9389ffff3c	feat(agents+chat): agents feature + live chat UX (#57 ) Some checks failed CI/CD / lint (push) Successful in 52s Details CI/CD / test (push) Successful in 1m6s Details CI/CD / typecheck (push) Successful in 2m17s Details CI/CD / smoke (push) Failing after 1m38s Details CI/CD / build (push) Successful in 2m35s Details CI/CD / publish (push) Has been skipped Details	2026-04-26 17:53:27 +00:00
Michal	21f406037a	feat(chat): print agent + system prompt banner at chat start Some checks failed CI/CD / typecheck (pull_request) Successful in 53s Details CI/CD / test (pull_request) Successful in 1m5s Details CI/CD / lint (pull_request) Successful in 2m29s Details CI/CD / smoke (pull_request) Failing after 1m39s Details CI/CD / build (pull_request) Successful in 5m30s Details CI/CD / publish (pull_request) Has been skipped Details When you launch \`mcpctl chat <agent>\` it's not always obvious which agent, LLM, project, or system prompt you're actually wired to, especially when --system / --system-append flags are layered on top of the agent's defaults. The session would just start at \`> \` with no confirmation of the configuration. Now both REPL and one-shot modes print a banner to stderr listing: - agent name + description - LLM + project (if attached) - effective system prompt (or --system override) and any --system-append addendum, indented for readability - active sampling overrides (temperature, top_p, etc.) Goes through stderr so \`mcpctl chat ... -m "hi" 2>/dev/null\` keeps piping clean. Best-effort: a metadata fetch failure logs and lets the chat proceed rather than blocking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 18:37:06 +01:00
Michal	ae54210a52	fix(chat): pin live tokens/sec ticker to a bottom-row status bar The previous ticker used cursor save/restore (\x1b[s / \x1b[u) to draw a stats line one row below the cursor. Save/restore is unreliable when content scrolls or wraps — the saved row drifts off the visible area and the restore lands inside content lines, smearing the ticker into mid-word positions: Here are the available tools you can ⏵ 7w · 56.5 w/s · 0.1s \| thinking 41 use with Docmost:6s Replace it with a DECSTBM scroll region. Lock the bottom row, scroll rows 1..N-1 for content, redraw the locked row in place every 250 ms. This is how htop / tig / mosh status pin their footers — content and status physically can't overlap. Lifecycle: install once per chat-session (REPL or one-shot), tear down on close / Ctrl-D / /quit / SIGINT / SIGTERM / uncaughtException. Pipes and small terminals (<5 rows) get a no-op StatusBar so output stays clean. Resize re-emits the scroll region with the new height. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:49:26 +01:00
Michal	cc9822d38b	feat(chat): live tokens/sec ticker + final stats footer While streaming, the REPL now shows a live word/sec counter on a status line one row below the cursor — refreshes every 250ms via ANSI cursor save+restore so it floats with the content as the response grows. After each response, a dim stats footer prints on stderr: (47w · 12.3 w/s · 3.9s \| thinking 234w · 38 w/s · 6.2s) The ticker is stderr-only and only emits when stderr is a TTY — pipes to a file stay clean for grepping/redirect. Words are whitespace- separated tokens (good enough across English/code/Markdown without a tokenizer dependency; CJK under-counts but the rate is still directional). Both phases tracked separately: - thinking: reasoning_content from qwen3-thinking / deepseek-reasoner / o1, where the model's scratchpad is the long part - content: the actual assistant answer Final stats also added to the --no-stream path: total HTTP duration and word count, since we don't get per-token timing there. CLI suite still 430/430. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:15:26 +01:00
Michal	7cfa449465	feat(chat): surface reasoning_content as `thinking` chunks; fix --no-stream timeout Reasoning models (qwen3-thinking, deepseek-reasoner, OpenAI o1 family) emit their scratchpad as `delta.reasoning_content` (or `delta.reasoning`, or `delta.provider_specific_fields.reasoning_content` when LiteLLM passes through from vLLM) — separate from `delta.content`. Before this commit mcpd's parseStreamingChunk only watched `content`, so the model's 30-90s reasoning phase looked like dead air to the REPL: streaming connection open, no chunks, no progress. Caught during the agents-feature shakedown when qwen3-thinking sat silent for 90s on a docmost__list_pages call. mcpd ==== chat.service.ts - parseStreamingChunk extracts a `reasoningDelta` from the chunk body, accepting all four spellings (reasoning_content / reasoning / provider_specific_fields.{reasoning_content,reasoning}). Future providers can add their own field names by extending the fallback chain. - chatStream yields `{ type: 'thinking', delta }` chunks as reasoning arrives, alongside the existing `{ type: 'text', delta }` for content. - Reasoning is intentionally NOT persisted to the thread. It's the model's scratchpad, not part of the conversation. Subsequent turns don't see it. - Adds 'thinking' to the ChatStreamChunk.type union. CLI === chat.ts - streamOnce handles 'thinking' chunks: writes them dim+italic to stderr (ANSI 2;3m) so the model's reasoning visually flows like a quote block while the final answer streams to stdout. Plain text when stderr isn't a TTY (pipe to file → no escape codes leak). - chatRequestNonStream replaces the shared ApiClient.post() for the --no-stream path. ApiClient defaults to a 10s timeout, way too tight for any chat that calls a tool: LLM round + tool dispatch + LLM summary easily exceeds 10s. The new helper uses the same 600s timeout the streaming path has been using all along. Tests: chat-service.test.ts (+2): - reasoning_content deltas surface as `thinking` chunks (not text); reasoning is NOT persisted to the assistant turn's content. - LiteLLM's provider_specific_fields.reasoning_content shape parses identically to the vendor-native shape. mcpd 777/777, cli 430/430. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 17:04:01 +01:00
Michal	cc225eb70f	feat(llm): probe upstream auth at registration time mcpd now runs a cheap auth probe whenever an Llm is created (or its apiKeyRef/url is updated). Catches misconfigured tokens / wrong URLs at registration with a 422 + structured error message, instead of silently 500-ing on first chat with a generic "fetch failed". Caught in the wild today: the homelab Pulumi config exposed `MCPCTL_GATEWAY_TOKEN` (which is mcpctl_pat_-prefixed, intended for LiteLLM→mcplocal direction) where LiteLLM expects `LITELLM_MASTER_KEY` (sk-prefixed). The probe makes this immediate. Probe shape (LlmAdapter.verifyAuth): - OpenAI passthrough → GET <url>/v1/models. Cheap, idempotent, gated by the same auth as chat/completions. - Anthropic → POST /v1/messages with max_tokens:1, "ping". Anthropic has no list-models endpoint; this is the cheapest auth-exercising call. - Returns one of: { ok: true } { ok: false, reason: "auth", status, body } — 401/403, fail hard { ok: false, reason: "unreachable", error } — network, warn-only { ok: false, reason: "unexpected", status, body } — non-auth 4xx, warn-only Behavior: - LlmService.create()/update() runs the probe after resolveApiKey. Throws LlmAuthVerificationError on `auth`, logs warn for unreachable/unexpected, swallows for offline registration. - Probe is skipped when there's no apiKeyRef (nothing to verify) or when the caller passes skipAuthCheck=true. - update() probes only when apiKeyRef OR url changes — pure description/tier updates don't trigger upstream calls. - Routes catch LlmAuthVerificationError and return 422 with `{ error, status }`. The CLI surfaces the message verbatim via ApiError. Opt-out: - CLI: `mcpctl create llm ... --skip-auth-check` for offline registration before the upstream is reachable. - HTTP: side-channel body field `_skipAuthCheck: true` (stripped before validation, never persisted on the row). Side fix in same commit (caught while testing): src/cli/src/index.ts read `program.opts()` BEFORE `program.parse()`, so `--direct` was a no-op for ApiClient — every command went to mcplocal regardless. Some commands accidentally still worked because mcplocal forwards plain `/api/v1/*` to mcpd, but flows that need direct SSE streaming (e.g. `mcpctl chat`) couldn't reach mcpd. Fixed by peeking at process.argv directly for the two global flags before Commander's parse runs. Tests: - llm-adapters.test.ts (+8): OpenAI 200/401/403/404/network, Anthropic 200/401/400 (typo'd model = unexpected, NOT auth — registration shouldn't block on bad model names that surface at chat time). - llm-service.test.ts (+6): create-throws-on-auth-fail (no row written), warn-only on unreachable/unexpected, skipAuthCheck bypass, no-key skip, update-only-probes-on-auth-affecting-change. mcpd 775/775, mcplocal 715/715, cli 430/430. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 16:51:55 +01:00
Michal	1f0be8a5c1	fix(agents): close gaps from /gstack-review P1 — thread reads now enforce ownership ======================================== chat.service.ts / routes/agent-chat.ts GET /api/v1/threads/:id/messages was previously RBAC-mapped to view:agents (no resourceName scope) with the route comment promising "service-level owner check enforces fine-grained access" — but the service didn't actually check. Any caller with view:agents could read another user's thread by guessing/learning the threadId. CUIDs are hard to brute-force but they leak: SSE `final` chunks, agents-plugin `_meta.threadId`, and several response bodies surface them. Now ChatService.listMessages(threadId, ownerId) loads the thread, returns 404 (not 403, to avoid id-enumeration via differential status codes) if ownerId doesn't match. Regression test in chat-service.test.ts covers Alice/Bob isolation + nonexistent-thread same-shape 404. P2 — AgentChatRequestSchema strict mode ======================================== validation/agent.schema.ts `.merge()` does NOT inherit `.strict()` from AgentChatParamsSchema. Typo'd fields (e.g. `temprature`) silently fell through and the agent silently used the default — debuggable only by reading the LLM call payload. Re-applied `.strict()` on the merged schema. P2 — per-agent maxIterations override + clamp ============================================== chat.service.ts Loop cap was a hard-coded module constant (12), wrong for both research-style agents (need higher) and cheap-probe agents (could opt lower). Now reads `agent.extras.maxIterations`, clamps 1..50, falls back to 12 default. The clamp is the soft-DoS guard: a hostile agent definition with `maxIterations:1000000` can't burn unbounded LLM calls per request. Both chat() and chatStream() use ctx.maxIterations now. Regression test covers low-cap override (rejects with `exceeded 2`) and hostile-value clamp (rejects with `exceeded 50`). P3 — SSE write to closed socket ================================ routes/agent-chat.ts When the upstream adapter throws after some chunks were already written AND the client disconnected, the catch block tried to flush more chunks to a closed socket. Without an `on('error')` handler Node emits unhandled error events; once Pino is wired to alerts this'd page on every disconnect-mid-stream. writeSseChunk now checks `reply.raw.destroyed \|\| writableEnded` before write. P3 — BACKEND_TOKEN_DEAD preserves original stack ================================================= services/secret-backend-rotator.service.ts When wrapping mintRoleToken/lookupSelf failures as BACKEND_TOKEN_DEAD, the new Error() discarded the original throw — hard to tell whether the inner failure was a network blip vs an OpenBao API mismatch vs DNS. Now uses `new Error(msg, { cause: err })` so the inner stack survives. P3 — .gitignore .claude/scheduled_tasks.lock ============================================= This persisted state file was leaking into every `git status`. Tests ===== mcpd 761/761 (+2 regression tests). mcplocal 715/715. cli 430/430. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 23:53:19 +01:00
Michal	2e266e318a	fix(mcplocal): lower default token introspection TTL in serve.ts too Followup to `e51b924`. The middleware default in token-auth.ts is 5s, but serve.ts wraps the construction with its own env-fallback default of 30000ms — so when MCPLOCAL_TOKEN_POSITIVE_TTL_MS isn't set in the environment, serve.ts always wins and revoked tokens still propagate slowly. Lowered serve.ts to 5s for symmetry; operators wanting a longer window can set the env var explicitly. Caught by mcptoken.smoke continuing to fail after the previous redeploy: verified the token-auth.js shipped with `?? 5_000`, but the wrapper was overriding it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 18:41:22 +01:00

1 2 3 4 5 ...

336 Commits