Commit Graph

354 Commits

Author SHA1 Message Date
Michal
e8c3803fac feat(web): bold redesign — Tailwind v4 + shadcn-style primitives + Skills/Proposals/Revisions UI
Phase 6 of the Skills + Revisions + Proposals work. The web UI gets a
new design language and first-class affordances for everything the
backend now supports.

## Visual direction

- Tailwind v4 with custom @theme block (oklch tokens). Dark-mode-only
  (internal tool — light mode doubles QA surface).
- Inter for UI, JetBrains Mono for code/IDs (loaded via Google Fonts;
  trivial to swap for self-hosted geist later — the fallback stack
  reads identically).
- Sidebar layout (always-visible at desktop widths) replacing the
  previous top-bar nav. Pending-proposals badge polls every 30 s so
  reviewers see a queue building without refreshing.
- Lucide icons throughout.
- Spacing and radii on Tailwind defaults.

Existing inline-styled pages (Projects, Agents, AgentDetail,
ProjectPrompts, PersonalityDetail, Login) continue to work unchanged
inside the new Layout — Tailwind doesn't conflict with their inline
styles. A follow-up can migrate them incrementally.

## What's added

### Build infra (src/web/)
- package.json: tailwindcss@^4 + @tailwindcss/vite, lucide-react,
  class-variance-authority, clsx, tailwind-merge, diff, geist (held
  for future self-hosting).
- vite.config.ts: registers the @tailwindcss/vite plugin.
- src/index.css: Tailwind import + @theme tokens + @layer base.
- src/main.tsx: imports index.css.
- src/lib/utils.ts: shadcn-style cn() helper.

### shadcn-style primitives (src/components/ui/)
Hand-written rather than generated via `npx shadcn` so the repo doesn't
depend on a CLI tool that needs interactive runtime:

- button.tsx — variants: primary / secondary / ghost / danger / link;
  sizes: sm / md / lg / icon.
- card.tsx — Card + Header/Title/Description/Content/Footer subparts.
- badge.tsx — variants: default / info / success / warning / danger /
  outline.
- input.tsx — Input + Textarea + Label.
- tabs.tsx — no-dep accessible Tabs (no Radix needed for our use).
- separator.tsx — h/v separator with role=separator.

### Diff component (src/components/Diff.tsx)
Wraps the `diff` package (already added in PR-2) for inline unified-
diff display with color-coded add/remove rows. Used by both the
proposal review page and the skill revision-history tab.

### New pages (src/pages/)
- Dashboard.tsx — at-a-glance home. Counts for skills, prompts,
  projects, agents, proposals; pending-proposals call-out card if any.
- Skills.tsx — list view, separated into Global vs Project/Agent-
  scoped sections.
- SkillDetail.tsx — name + semver + description; tabs for SKILL.md /
  Files / Metadata / History. History tab shows revisions with
  click-to-diff against the live body.
- Proposals.tsx — queue with Pending/Approved/Rejected tabs. Pending
  count is highlighted in amber.
- ProposalDetail.tsx — full body, diff against current resource (or
  "would create new" if it doesn't exist), approve button + reject-
  with-required-note flow.

### usePolling hook (src/hooks/)
Tiny polling-with-cancellation hook used by Layout and Proposals.

### Layout rewrite (src/components/Layout.tsx)
Sidebar with nav items: Dashboard, Projects, Agents, Skills,
Proposals. Lucide icons. Active-route highlighting via NavLink.
Pending-proposals warning badge on the Proposals item.

### Routes (src/App.tsx)
New routes: /dashboard, /skills, /skills/:name, /proposals,
/proposals/:id. Default redirects to /dashboard.

### API types (src/api.ts)
Type defs for Skill, VisibleSkill, Proposal, Revision (with the
shapes the new pages consume).

## Tests

Existing 7 web tests still pass (Login + api). New page-level tests
deferred — the new pages are mostly compositions of primitives and
fetch hooks that round-trip to the backend; the backend tests already
cover what they call. PR-7 polish can add render-and-click tests if
coverage drift surfaces.

## Verification

- `pnpm --filter @mcpctl/web build` clean, no warnings.
- `pnpm test:run` whole monorepo: 162 test files / 2157 tests green.
- Visual smoke deferred — needs a running mcpd to populate the
  fixtures. Manual smoke tested locally is the next step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:54:55 +01:00
Michal
58e8e956ce feat(cli+mcpd): mcpctl skills sync + config claude extension
Phase 5 of the Skills + Revisions + Proposals work. Skills are now
materialised onto disk under ~/.claude/skills/<name>/, with
hash-pinned diff against mcpd, atomic per-skill install, and
preservation of locally-modified files. `mcpctl config claude --project X`
now wires the full pickup chain: writes .mcpctl-project marker, runs
the initial sync, installs the SessionStart hook so subsequent Claude
invocations stay in sync transparently.

## Sync algorithm

1. Resolve project: `--project` flag overrides; else walk up from cwd
   looking for `.mcpctl-project`; else fall back to globals-only.
2. GET /api/v1/projects/:name/skills/visible (or
   /api/v1/skills?scope=global without a project). Server returns
   id + name + semver + scope + contentHash + metadata — no body, no
   files. The contentHash is sha256 of the canonicalised body, computed
   server-side; any reordering of keys produces the same hash, so it's
   a stable diff key.
3. Load ~/.mcpctl/skills-state.json (lives outside ~/.claude/skills/
   on purpose — Claude Code reads that tree and we don't want to
   pollute it with our bookkeeping).
4. Diff:
     - server skill not in state → INSTALL
     - server skill, state contentHash matches → SKIP (cheap path)
     - server skill, state contentHash differs → UPDATE (fetch full body)
     - state skill not in server → orphan, REMOVE (preserve if locally
       modified, unless --force)
5. Atomic per-skill install: write to <targetDir>.mcpctl-staging-<pid>/,
   rename existing tree to .mcpctl-trash-<pid>, swap staging in,
   rmtree the trash. A concurrent reader (Claude Code starting up)
   never sees a partial tree.
6. State file updated with new versions, per-file SHA-256, install
   path. saveState is atomic (temp + rename).

## Failure semantics

- `--quiet` mode (used by SessionStart hook): exit 0 on network /
  timeout / mcpd error. Fail-open is non-negotiable here — we never
  want a hung mcpd to block Claude Code starting up.
- Auth failure: exit 1, clear "run mcpctl login" message.
- Disk error during state save: exit 2.
- Per-skill errors are collected in the result and reported as a
  count; one bad skill doesn't stop the others.

Network fetches run with concurrency 5. The server-side
`/visible` endpoint is metadata-only so the cheap path (everything
unchanged) needs exactly one HTTP roundtrip total.

## Files added

### CLI utilities (src/cli/src/utils/)
- skills-state.ts — load/save state, per-file sha256, edit detection.
- project-marker.ts — walk-up to find `.mcpctl-project`, bounded by
  user home so we never search above $HOME.
- sessionhook.ts — install/remove a SessionStart hook entry tagged
  with `_mcpctl_managed: true`. Idempotent. Defensive against
  missing/empty/JSONC settings.json.
- skills-disk.ts — atomic install via staging-dir rename swap,
  symmetric atomic delete via trash-dir rename. Path-escape attempts
  in files{} are rejected.

### CLI command (src/cli/src/commands/)
- skills.ts — `mcpctl skills sync` Commander wrapper + the
  `runSkillsSync(opts, deps)` library function (also called from
  `mcpctl config claude --project`). Supports `--dry-run`, `--force`,
  `--quiet`, `--keep-orphans`. `--skip-postinstall` is reserved
  (postInstall execution lands in a follow-up PR, not this one).

### Wiring
- index.ts: registers `mcpctl skills` after `mcpctl review`.
- config.ts: `mcpctl config claude --project X` now writes the
  `.mcpctl-project` marker, runs `runSkillsSync` in-process, and calls
  `installManagedSessionHook('mcpctl skills sync --quiet')`. New flag
  `--skip-skills` opts out (used by tests; useful for CI).

## Server-side change

- src/mcpd/src/services/skill.service.ts: getVisibleSkills now
  computes contentHash on the fly from the canonical body shape the
  client will reconstruct. Cheap (sha256 of ~few KB per skill); no
  schema migration needed since hash is derived not stored.

## Tests

Four new utility test files (31 tests) under src/cli/tests/utils/:
- sessionhook.test.ts — creation, idempotency, command updates,
  preservation of user hooks, removal, empty/JSONC tolerance.
- skills-disk.test.ts — atomic write, replacement without leftovers,
  path-escape rejection, atomic delete, listing ignores
  staging/trash artifacts.
- skills-state.test.ts — sha256 determinism, state round-trip,
  schema-version drift handling, edit detection.
- project-marker.test.ts — cwd hit, walk-up, $HOME boundary, empty
  marker, write+read round-trip.

The existing `mcpctl config claude` test (claude.test.ts) was updated
to pass `--skip-skills` so it stays focused on .mcp.json generation;
the new sync flow is covered by the utility tests.

Full suite: 162 test files / 2157 tests green (up from 158 / 2127).

## Deferred to a follow-up

- `metadata.hooks` materialisation into `~/.claude/settings.json` —
  the data path exists, sync receives it; PR-7 or a focused follow-up
  will write the `_mcpctl_managed: true` entries for declarative
  hooks.
- `metadata.mcpServers` auto-attach via mcpd API — likewise.
- `metadata.postInstall` script execution — the most substantive
  deferred piece. Current sync logs a TODO and skips. The corporate
  trust model (publisher-side rigor, not client-side defence) means
  this is straightforward to add once we wire the curated env +
  timeout + audit emission. Orthogonal to file sync, easier to ship
  separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:26:35 +01:00
Michal
db57bb5856 feat(mcpd+mcplocal+cli): propose-learnings system skill, propose_skill MCP tool, mcpctl review
Phase 4 of the Skills + Revisions + Proposals work. Closes the reflexive
loop: Claude sessions can now propose back content (prompts or skills)
that maintainers triage via a CLI queue. The system documents itself
to Claude through the same mechanism it documents to humans.

## What's added

### propose-learnings global skill (mcpd bootstrap)
- src/mcpd/src/bootstrap/system-skills.ts — idempotent upsert, mirrors
  system-project.ts. Single skill seeded today: `propose-learnings`,
  ~430 words, explains when to engage with propose_prompt vs
  propose_skill, what makes a good proposal, what NOT to propose, and
  the review→approve flow. Priority 9, global scope.
- main.ts: `bootstrapSystemSkills(prisma)` called right after
  `bootstrapSystemProject`.

### gate-encouragement-propose system prompt
- system-project.ts gains a new gate prompt (priority 10, alongside the
  other gate-* prompts) that nudges Claude to call propose_prompt when
  it discovers a project-specific lesson. Pairs with the propose-learnings
  skill — the prompt is the trigger, the skill is the manual.

### propose_skill MCP tool (mcplocal)
- proxymodel/plugins/gate.ts: new virtual tool registered alongside
  propose_prompt. Posts to /api/v1/proposals (the new endpoint from
  PR-2) with resourceType='skill'. Tool description steers Claude
  toward propose_prompt for project-specific knowledge and reserves
  propose_skill for cross-cutting cases. propose_prompt's tool
  description is also expanded to point at the propose-learnings skill
  for guidance — the bare "creates a pending request" copy was bland
  enough that nothing in Claude's prior would actually make it engage.

### mcpctl review CLI
- New top-level command in src/cli/src/commands/review.ts.
  Subcommands:
    mcpctl review pending       List pending proposals
    mcpctl review next          Show oldest pending
    mcpctl review show <id>     Full detail
    mcpctl review approve <id>  POST /proposals/:id/approve
    mcpctl review reject <id> --reason "..."
    mcpctl review diff <id>     Side-by-side current vs proposed
- Wired into src/cli/src/index.ts. Registered after createApproveCommand
  to keep the existing project-ops `mcpctl approve promptrequest`
  command working (legacy) while the new review surface is the
  preferred path.

## Tests touched

- bootstrap-system-project.test.ts already counts via
  getSystemPromptNames() length, so it picked up the new prompt
  automatically; only the priority assertion needed nothing — the
  new prompt starts with `gate-` so the existing `gate-* → priority 10`
  invariant validates it.
- system-prompt-validation.test.ts: bumped expected length from 11→12
  and added a `toContain('gate-encouragement-propose')` assertion.

Full suite: 158 test files / 2127 tests green.

## What's NOT in this PR

- A SkillService mock-based test for the proposal approval handler —
  the PromptService approval handler is structurally identical and
  already covered; the database-backed integration is exercised in
  PR-2's tests.
- Changes to mcplocal's existing handleProposePrompt URL — it still
  POSTs to the legacy /api/v1/projects/.../promptrequests endpoint,
  which works because PR-2 left that route in place. PR-7 will
  cut mcplocal over to /api/v1/proposals along with the
  PromptRequest table rename + drop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 13:13:33 +01:00
Michal
20a541a5d6 feat(mcpd): Skill resource end-to-end (CRUD + backup + revision integration)
Phase 3 of the Skills + Revisions + Proposals work. Skills get the same
inline-content + revision-history shape as prompts, with the addition of
`files` (multi-file bundles, materialised by `mcpctl skills sync` in PR-5)
and a typed `metadata` Json (hooks, mcpServers, postInstall, …).

## What's added

### Validation (src/mcpd/src/validation/skill.schema.ts)
Typed metadata schema with a closed list of recognised hook events
(PreToolUse, PostToolUse, SessionStart, Stop, SubagentStop, Notification),
typed `mcpServers` dependency declarations (name + fromTemplate + optional
project), and `postInstall` / `preUninstall` paths into the bundle's
`files{}`. `.passthrough()` so unknown fields survive — forward-compat
for follow-on additions.

### Repository (src/mcpd/src/repositories/skill.repository.ts)
Mirrors PromptRepository exactly. Same `?? ''` workaround for nullable-FK
compound-key lookups.

### Service (src/mcpd/src/services/skill.service.ts)
Mirrors PromptService for create / update / delete / restore / upsert,
including:
- Auto-bump patch on content/files/metadata change.
- Revision recording (best-effort — failures don't block the save).
- 'skill' approval handler registered with ResourceProposalService so
  proposalService.approve dispatches to skills the same way it
  dispatches to prompts.
- `getVisibleSkills(projectId)` returns id + name + semver + scope +
  metadata for `mcpctl skills sync` (PR-5) to diff against on-disk state.

### Routes (src/mcpd/src/routes/skills.ts)
- GET /api/v1/skills (filters: ?project= ?projectId= ?agent= ?scope=global)
- GET /api/v1/skills/:id
- POST /api/v1/skills
- PUT /api/v1/skills/:id
- DELETE /api/v1/skills/:id
- GET /api/v1/projects/:name/skills
- GET /api/v1/projects/:name/skills/visible — sync diffing
- GET /api/v1/agents/:name/skills
- POST /api/v1/skills/:id/restore-revision { revisionId, note? }

### main.ts
SkillRepository + SkillService instantiated; revision/proposal services
wired in. `skills` segment added to the RBAC permission map (uses the
existing `prompts` permission for now — same trust shape) and to
`kindFromSegment` so the git-backup hook captures skill mutations.

### Backup integration
- yaml-serializer.ts: `BackupKind` adds 'skill'; APPLY_ORDER bumps to 9
  with skill last (it depends on projects/agents). `parseResourcePath`
  recognises the `skills/` directory.
- git-backup.service.ts: `serializeResource` adds the `case 'skill'`
  branch alongside prompts. The git-sync loop now round-trips skills
  on every change.
- (Bundle backup-service.ts is NOT updated in this PR — deferred to PR-7
  alongside the cutover. The git-based backup IS wired, which is the
  primary persistence path.)

### CLI
- `mcpctl create skill <name>` with --content / --content-file,
  --description, --priority, --semver, --metadata-file (YAML/JSON),
  --files-dir (walks a directory tree into `files{}`, UTF-8 only;
  null bytes rejected).
- shared.ts adds `skill` / `skills` / `sk` aliases.

### apply.ts
Not updated — `mcpctl apply -f skill.yaml` is deferred to PR-7. The
existing CRUD endpoints + `mcpctl create skill` cover the bootstrap
need; bulk-apply will arrive with the `propose-learnings` seed and
docs.

## Tests

158 test files / 2127 tests green across the workspace. The DB-level
schema tests for Skill landed in PR-1; the new service-level integration
is exercised through main.ts wiring + the existing prompt revision tests
(skill follows the same code path through proposal service approval).

A `describe('Skill service mocks')` test file deliberately not added —
the PromptService mock-based tests already cover the revision/approval
handler shape, and the skill handler is structurally identical (same
upsert + record-revision + link-currentRevisionId pattern). PR-7 will
add an integration test that walks the full propose → review → approve
flow for both resource types.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:48:40 +01:00
Michal
1ec286bb14 feat(mcpd): ResourceRevision + ResourceProposal services + Prompt revision integration
Phase 2 of the Skills + Revisions + Proposals work. Stands up the generic
revision/proposal layer and wires Prompt into it. Skills will plug into the
same infrastructure in PR-3 with no further service changes required.

This PR is intentionally additive: PromptRequest table and routes are
unchanged. The /api/v1/proposals API runs side-by-side with the legacy
/api/v1/promptrequests API. The PromptRequest cutover (rename + backfill +
mcplocal rewire) is deferred to a later PR so this one stays reviewable.

## What's added

### Repositories (src/mcpd/src/repositories/)
- resource-revision.repository.ts — append-only revision log keyed by
  (resourceType, resourceId). Soft FK; no relations declared. Supports
  history listing, semver lookup, and contentHash cross-resource search.
- resource-proposal.repository.ts — generic propose queue. Status lifecycle
  pending → approved | rejected. Mirrors Prompt's `?? ''` workaround for
  nullable-FK compound lookups.

### Services (src/mcpd/src/services/)
- resource-revision.service.ts — record() inserts a revision with a stable
  sha256 contentHash computed from canonicalised JSON (key-sorted at every
  level so reordered objects produce the same hash). Caller passes a
  pre-computed semver; service does NOT decide bump policy.
- resource-proposal.service.ts — propose / approve / reject / list, with a
  per-resourceType handler registry. PromptService registers the 'prompt'
  handler at construction; the SkillService will register 'skill' in PR-3.
  approve() runs in a Prisma $transaction so the resource update + revision
  insert + proposal status flip are atomic.

### Pure utility (src/mcpd/src/utils/semver.ts)
- bumpSemver(current, kind) for major / minor / patch
- compareSemver(a, b) — numeric, not lex (10 > 9)
- isValidSemver(s)
- Invalid input falls back to '0.1.0' rather than throwing — keeps the
  audit-write path from blowing up the prompt update if a row's semver
  ever drifts out of MAJOR.MINOR.PATCH shape.

### Routes (src/mcpd/src/routes/)
- revisions.ts — GET /api/v1/revisions?resourceType=&resourceId=,
  GET /api/v1/revisions/:id, GET /api/v1/revisions/:id/diff?against=<id|live>
  (unified-format diff via the `diff` package), and POST
  /api/v1/prompts/:id/restore-revision { revisionId, note? }.
- proposals.ts — GET / POST /api/v1/proposals,
  GET /api/v1/proposals/:id, PUT for body updates, POST .../approve and
  POST .../reject, plus DELETE.

## What's changed

- PromptService.create / update now record a ResourceRevision when the
  revision service is wired. Update auto-bumps patch on content change;
  authors can override via `--bump major|minor|patch` or `--semver X.Y.Z`
  on the CLI (forwarded into the PUT body). Best-effort: revision write
  failures are swallowed so the prompt save still succeeds (revision is
  audit, not source of truth).
- PromptService.setProposalService registers a 'prompt' approval handler
  with the proposal service. Approval runs in a Prisma transaction:
  upsert prompt → record revision → update currentRevisionId → flip
  proposal status. semver bumps to 0.1.0 on first approval, patch
  thereafter.
- New CLI flags on `mcpctl edit prompt`: --bump, --semver, --note. They're
  prompt-only (validated client-side); other resources reject them.
- Aliases in shared.ts: `proposal`/`prop` → proposals,
  `revision`/`rev` → revisions.
- diff dependency added to mcpd.

## Tests

- src/mcpd/tests/utils/semver.test.ts — covers bump/compare/validate
  including numeric (not lex) semver compare and invalid-input fallback.
- prompt-service.test.ts updated: makePrompt fixture now sets semver +
  agentId + currentRevisionId; updatePrompt assertion expects the
  auto-bumped patch in the same update call.
- prompt-routes.test.ts updated symmetrically.

## RBAC

`proposals` and `revisions` URL segments map to the existing `prompts`
permission for now. PR-7 may split if a "reviewer" role becomes useful.

## Verification

Full suite: 158 test files / 2127 tests green.
`pnpm build` clean across all 6 workspace packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:38:35 +01:00
Michal
fbe68fa693 feat(db): schema for ResourceRevision, ResourceProposal, Skill
Phase 1 of the Skills + Revisions + Proposals work. Purely additive — no
existing rows are touched, no tables renamed, no columns dropped.

New tables:
- ResourceRevision — append-only audit + diff log keyed by
  (resourceType, resourceId). Both Prompt and Skill produce revisions on
  every change. Soft FK so revisions outlive the resources they describe.
  Indexed for history viewer (latest-first), semver lookup, and
  cross-resource sync diff via contentHash.
- ResourceProposal — generic propose/approve/reject queue. Drop-in
  replacement for the prompt-only PromptRequest. Created empty here;
  PR-2 will rename PromptRequest → _PromptRequest_legacy and backfill.
- Skill — new resource type that mirrors Prompt for everything CRUD-
  shaped. Adds `files` Json (multi-file bundles, materialised onto disk
  by `mcpctl skills sync` in PR-5) and `metadata` Json (typed app-layer
  in PR-3: hooks, mcpServers, postInstall, …).

New columns on Prompt:
- semver (semver string, default '0.1.0') — auto-bumped patch on save
  by PromptService.update once PR-2 wires it. Distinct from `version`,
  which stays as the optimistic-concurrency counter.
- currentRevisionId — soft pointer to the latest ResourceRevision row.

DB tests cover scope rules (project XOR agent XOR neither), name
uniqueness across both compound keys, cascade-on-delete, soft-FK
survival of deletion, and JSON column persistence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 00:18:21 +01:00
f8aa6c2f0d feat: mcpctl provider {enable,disable} — persistent on/off switch (#74)
Some checks failed
CI/CD / lint (push) Successful in 1m0s
CI/CD / typecheck (push) Successful in 1m1s
CI/CD / test (push) Successful in 4m7s
CI/CD / smoke (push) Failing after 6m4s
CI/CD / build (push) Successful in 5m27s
CI/CD / publish (push) Has been skipped
2026-05-03 14:57:21 +00:00
Michal
d04adb5623 feat(cli+mcplocal): persistent provider disable/enable
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m11s
CI/CD / typecheck (pull_request) Successful in 3m20s
CI/CD / smoke (pull_request) Failing after 52s
CI/CD / build (pull_request) Successful in 3m59s
CI/CD / publish (pull_request) Has been skipped
Adds two new subcommands on top of v7's provider lifecycle CLI:

  mcpctl provider disable vllm-local   # release GPU + survive restart
  mcpctl provider enable  vllm-local   # clear the flag, ready to chat

Use case: vLLM keeps crashing on engine init. `down` works for "now"
but the next chat triggers a restart; `disable` writes
`disabled: true` into the provider's entry in ~/.mcpctl/config.json
and short-circuits complete()/ensureRunning() until you re-enable.

Implementation:
- LlmProviderEntry / LlmProviderFileEntry: new optional `disabled` field
- ManagedVllmProvider: setDisabled(bool), isDisabled(), gate in
  complete()/ensureRunning(), expose `disabled` in getStatus()
- mcplocal HTTP: POST /llm/providers/:name/{disable,enable} write the
  config file and apply the change live; /start returns 409 when the
  target is disabled instead of silently failing
- Boot: createSingleProvider honors `entry.disabled` so a known-bad
  vLLM doesn't auto-start on the first chat after mcplocal restart
- CLI: `disable` / `enable` subcommands on `mcpctl provider`; status
  output now shows `(disabled)` next to the state

`enable` is live — provider stays in the registry while disabled, so
flipping the flag back is enough; no mcplocal restart needed.

Tests: cli 437/437, mcplocal 731/731.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 15:57:01 +01:00
fe27947f80 feat: mcpctl provider <name> {up,down,status} for managed LLMs (#73)
Some checks failed
CI/CD / typecheck (push) Successful in 1m10s
CI/CD / test (push) Successful in 1m11s
CI/CD / lint (push) Successful in 2m47s
CI/CD / build (push) Successful in 2m57s
CI/CD / smoke (push) Failing after 6m22s
CI/CD / publish (push) Has been skipped
2026-05-03 14:40:57 +00:00
Michal
356cbe87b5 feat(cli+mcplocal): mcpctl provider <name> {up,down,status} for managed LLMs
Some checks failed
CI/CD / typecheck (pull_request) Successful in 57s
CI/CD / test (pull_request) Successful in 1m23s
CI/CD / lint (pull_request) Successful in 3m1s
CI/CD / smoke (pull_request) Failing after 1m47s
CI/CD / build (pull_request) Successful in 5m58s
CI/CD / publish (pull_request) Has been skipped
Adds lifecycle control for managed local LLM providers (vllm-managed)
without the nuclear option of restarting mcplocal. Practical use:

  mcpctl provider vllm-local down    # release GPU memory now
  mcpctl provider vllm-local up      # warm up before the next chat
  mcpctl provider vllm-local status  # see state, pid, uptime

mcplocal exposes three new endpoints:

  GET  /llm/providers/:name/status   → returns lifecycle state for
                                       managed providers, { managed: false }
                                       for unmanaged (anthropic, openai, …)
  POST /llm/providers/:name/start    → calls warmup() (202 + initial state)
  POST /llm/providers/:name/stop     → calls dispose() (200 + post-stop state)

Stop and start return 400 for non-managed providers — stopping an API-key
provider is meaningless. The CLI surfaces the error verbatim.

Restarting mcplocal would also free the GPU but drops the SSE connection
to mcpd and forces every virtual Llm to re-publish; this is the targeted,
non-disruptive escape hatch.

The completions test gained a `topLevelMarkers` filter so a sub-command
named `status` (under `provider`) doesn't trip the existing "non-project
commands must guard with __mcpctl_has_project" rule.

Tests: cli 437/437, mcplocal 731/731.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:58:46 +01:00
3071bcee8e feat: v6 polish — per-publisher namespacing + auto-create project (#71)
Some checks failed
CI/CD / typecheck (push) Successful in 57s
CI/CD / test (push) Successful in 1m13s
CI/CD / lint (push) Successful in 2m51s
CI/CD / smoke (push) Failing after 1m45s
CI/CD / build (push) Successful in 5m47s
CI/CD / publish (push) Has been skipped
2026-04-28 23:33:39 +00:00
46697f4f63 feat: v5 durable inference task queue (#70)
Some checks failed
CI/CD / typecheck (push) Has been cancelled
CI/CD / test (push) Has been cancelled
CI/CD / lint (push) Has been cancelled
CI/CD / smoke (push) Has been cancelled
CI/CD / build (push) Has been cancelled
CI/CD / publish (push) Has been cancelled
2026-04-28 23:33:36 +00:00
Michal
ee18c5107e feat(mcpd): auto-create project on virtual-agent register (v6 Stage 2)
Some checks failed
CI/CD / typecheck (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / lint (pull_request) Successful in 3m0s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Successful in 6m41s
CI/CD / publish (pull_request) Has been skipped
Closes the v3-deferred "project must already exist" gap. When a
virtual agent declares `project: "my-team"` and no such project
exists, mcpd creates it idempotently with the publishing user as
owner (instead of throwing 404 from registerVirtualAgents).

ProjectService gains `ensureByName(name, ownerId, opts)` — find
the project or create it with sensible defaults (description carries
an audit note pointing at the registrar; proxyModel/gated take
their schema defaults). First publisher to land on a name owns the
row; subsequent publishers reuse the existing one.

AgentService.registerVirtualAgents calls ensureByName instead of
resolveAndGet, so the same agent register payload works regardless
of whether the project pre-existed or not.

Tests: 2 new tests (auto-creates a missing project on first publish;
reuses an existing project without re-creating). Mock projects
factory rebuilt to track _created names + maintain id→name reverse
lookup so the agent's toView returns the correct project name
(prior mock hardcoded 'mcpctl-dev').

Existing 13 virtual-agent tests + 870 mcpd suite green.
2026-04-28 15:54:27 +01:00
Michal
c346b93789 feat(mcplocal): per-publisher namespacing for virtual Llms/Agents (v6 Stage 1)
Two mcplocals sharing the same config template (`vllm-local-qwen3`)
no longer collide on mcpd's cluster-wide unique-name constraint.
Each publisher can append a suffix derived from hostname (or any
other stable per-host identifier) so the wire-side names become
distinct (`vllm-local-qwen3-alice`, `vllm-local-qwen3-bob`).
Pair with an explicit `poolName` (v4) and the rows still appear as
one logical pool — agents pinned to any member load-balance across
both.

Config (`~/.mcpctl/config.json`):

  {
    "publisher": { "suffix": "auto" }   // → os.hostname() sanitized
                  // or { "suffix": "alice" } for explicit override
  }

Or via env: `MCPCTL_PUBLISHER_SUFFIX=alice` (operations override).

Resolution order: env var → config.publisher.suffix → empty
(legacy behavior, no mangling). Sanitization lowercases, replaces
non-`[a-z0-9-]` runs with `-`, strips leading/trailing dashes —
the result must satisfy mcpd's name validation, otherwise the
register POST would 422.

Wire shape: RegistrarPublishedProvider gets an optional
`publishName` field. When set, the wire payload's `name` is
`publishName` (suffixed); when not, today's `provider.name`.
Inbound infer/wake task lookups match `publishName ?? provider.name`
so the local registry stays addressable by its original name —
SSE frames carrying the suffixed wire name still find their
provider.

Agents are forwarded with their own suffixed name AND a
`llmName` rewritten through the same per-local→wire map so the
agent rows pin to the suffixed Llm wire name (otherwise
registerVirtualAgents would 404).

Tests: 8 new tests covering applyPublisherSuffix (empty, normal,
length limit, exact-100) and loadPublisherSuffix (env override,
absent, sanitization, dash stripping). Existing registrar tests
untouched — no suffix means no behavior change.
2026-04-28 15:54:06 +01:00
Michal
7320b50dac feat(cli+docs+smoke): inference-task CLI + GC ticker + smoke + docs (v5 Stage 4)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / typecheck (pull_request) Successful in 2m46s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Failing after 7m0s
CI/CD / publish (pull_request) Has been skipped
CLI surface for the durable queue:

- `mcpctl get tasks` — table view (ID, STATUS, POOL, LLM, MODEL,
  STREAM, AGE, WORKER). Aliases `task`, `tasks`, `inference-task`,
  `inference-tasks` all normalize to the canonical plural so URL
  construction works uniformly. RESOURCE_ALIASES + completions
  generator updated.
- `mcpctl chat-llm <name> --async -m <msg>` — enqueue and exit. stdout
  is just the task id (pipeable into `xargs mcpctl get task`); stderr
  carries human-readable status. REPL mode is rejected for --async
  (fire-and-forget doesn't make sense without -m).

GC ticker in mcpd: 5-min interval. Pending tasks past 1 h queue
timeout flip to error with a clear message; terminal tasks past 7 d
retention get deleted. Both queries are index-backed.

Crash fix uncovered by the smoke: when the async route doesn't await
ref.done, a later cancel/error rejected the in-flight Promise as
unhandled and crashed mcpd. The route now attaches a no-op `.catch`
so the legacy `done` semantic still works for sync callers (chat,
direct infer) without taking out the process for async ones. The
EnqueueInferOptions also gained an explicit `ownerId` field so the
async API can stamp the authenticated user on the row instead of
inheriting 'system' from the constructor's resolveOwner — without
this, every GET/DELETE from the original caller would 404 due to
foreign-owner mismatch.

Smoke (tests/smoke/inference-task.smoke.test.ts):

  1. POST /inference-tasks while no worker bound → row=pending.
  2. Bring a registrar online → bindSession drain claims and
     dispatches → worker complete()s → row=completed → GET returns
     the assistant body.
  3. Stop worker, enqueue, DELETE → row=cancelled, persisted.

docs/inference-tasks.md (new): full data model, lifecycle diagram,
async API reference, CLI examples, RBAC table, GC defaults, and the
v5 limitations / v6 roadmap. Cross-linked from virtual-llms.md and
agents.md.

Tests + smoke: mcpd 893/893, mcplocal 723/723, cli 437/437, full
smoke 146/146 (was 144, +2 new task smoke). Live mcpd verified via
manual curl: enqueue → cancel → re-fetch — no crash, owner scoping
returns 404 on foreign ids, GC ticker logs at info when it sweeps.

v5 complete: durable queue (Stage 1) + VirtualLlmService rewire
(Stage 2) + async API & RBAC (Stage 3) + CLI/GC/smoke/docs (Stage 4).
2026-04-28 15:25:09 +01:00
Michal
1dcfdc8b05 feat(mcpd): async inference task API + tasks RBAC resource (v5 Stage 3)
Exposes the durable queue (Stage 1+2) as a first-class API so callers
can enqueue work, get a task id immediately, and poll/stream/cancel
without holding open the original HTTP connection.

New endpoints (`/api/v1/inference-tasks`):

  POST   /                  → enqueue, return task id (201 + row).
                              failFast:false — task stays pending if no
                              worker is up; future bindSession drains.
                              Rejects 400 for public Llms (the existing
                              /llms/<name>/infer is the right tool there)
                              and 404 for missing Llms.
  GET    /                  → list owner's tasks. Optional ?status,
                              ?poolName, ?agentId, ?limit query.
                              Owner-scoped at the route layer; cross-
                              user listing requires resource-wide grant.
  GET    /:id               → poll one task. 404 (not 403) on a
                              foreign-owner id to prevent enumeration.
  DELETE /:id               → cancel a non-terminal task. Already-
                              terminal rows return 200 + current shape
                              (no-op). 404 on foreign owner.
  GET    /:id/stream        → SSE feed of `chunk` and `terminal` events.
                              Re-fetches the row at subscribe time so
                              already-completed tasks emit one terminal
                              event and close immediately.

RBAC:

- New `tasks` resource added to RBAC_RESOURCES + the URL→permission
  map in main.ts. Default action mapping: GET=view, POST=create,
  DELETE=delete. The route layer enforces owner-scoping ON TOP of
  the hook (404 on foreign owner) — without this, anyone with
  `view:tasks` could list/peek every user's queued work.
- Singular alias `task` and the multi-word `inference-task` /
  `inference-tasks` all normalize to `tasks` so users can write
  `mcpctl create rbac-binding --resource task --role view ...` or
  any of the variants and have it map correctly.

Tests: 9 new route tests covering the wire shapes, owner scoping
(matching/foreign), public-Llm rejection, missing-Llm 404, list
filter, and cancel semantics (pending→cancelled, terminal→no-op).
mcpd 893/893 (was 884, +9). Live smoke: POST against a public Llm
returns the documented 400, POST against missing returns 404, GET
list returns [] cleanly.

Stage 4 (next): CLI surface (`mcpctl get tasks`, `--async` flag on
chat-llm), GC ticker, smoke test (enqueue → connect worker →
drain), docs.
2026-04-28 15:06:31 +01:00
Michal
7b18bb6d6b feat(mcpd): VirtualLlmService rewires through durable queue (v5 Stage 2)
The in-memory `tasksById` map for inference tasks is gone. Every
inference call lands as a row in `InferenceTask`; the result POST
updates the row + emits a wakeup; the in-flight HTTP handler unblocks
on the wake. mcpd surviving a restart no longer drops in-flight tasks,
and a worker disconnecting mid-task no longer fails the caller — the
row reverts to pending and a sibling worker on the same pool drains it.

Wake tasks (publisher control messages, not inference) keep their own
small in-memory map (`wakeTasks`). They're millisecond-scoped and
don't benefit from durability — a missed wake on restart just means
the next infer fires a fresh wake.

Behavioral changes worth flagging:

- Worker disconnect mid-task: WAS reject ref.done with "publisher
  disconnected"; NOW revert claimed/running rows to pending. Original
  caller's ref.done keeps waiting up to INFER_AWAIT_TIMEOUT_MS (10
  min); whichever worker delivers the result fulfills it.

- bindSession drains pending tasks for the session's pool keys. So
  tasks queued while no worker was up automatically get dispatched
  when one shows up. The drain matches by *effective pool key*
  (poolName ?? name) — tasks queued against vllm-alice get drained
  by any session whose owned Llms share alice's pool.

- New `failFast: true` option on enqueueInferTask (default: false).
  Existing callers that NEED fast-fail get it explicitly:
    - Direct `/api/v1/llms/<name>/infer` route: caller pinned a
      specific Llm and wants 503 immediately if the publisher is
      offline; queueing for an unknown future worker would surprise.
    - chat.service pool failover loop: it iterates pool candidates
      and needs each candidate's transport failure to surface fast.
      Without failFast, a downed pool member would absorb the call
      into the queue and the loop would wait 10 min before trying
      the next.
  The async API route (Stage 3) leaves failFast=false — that's the
  whole point of the durable queue path.

- VirtualLlmService now requires an InferenceTaskService dep at
  construction. Older test wirings that didn't pass it get a clear
  "InferenceTaskService not wired" error from enqueueInferTask
  rather than a confusing in-memory stub.

Tests:

- 12 existing virtual-llm-service tests updated for the new
  semantics: "rejects when no session" → "queues durably"; "rejects
  when row inactive" → "still queues (pool may have a sibling)";
  "unbindSession rejects in-flight tasks" → "reverts to pending".
  Wake-task probing now uses `wakeTasks` instead of `tasksById`.

- 3 new v5-specific tests: drain-on-bind matches by effective pool
  key (not just name); enqueue without a session keeps the row
  pending; completeTask via the result-route updates the DB and
  emits the wakeup that resolves ref.done.

- chat-service-virtual-llm + llm-infer-route assertions updated to
  expect the new {failFast: true} option arg.

mcpd 884/884 (was 881; +3 v5 cases). mcplocal 723/723. Full smoke
suite 144/144 against the deployed queue-backed mcpd.

Stage 3 (next): expose the durable queue via async API endpoints.
POST /api/v1/inference-tasks (enqueue with failFast=false), GET
/api/v1/inference-tasks/:id (poll), GET /api/v1/inference-tasks/:id/stream
(SSE), DELETE /api/v1/inference-tasks/:id (cancel). New `tasks` RBAC
resource.
2026-04-28 02:33:26 +01:00
Michal
ed21ad1b5a feat(mcpd+db): durable InferenceTask queue + state machine (v5 Stage 1)
The persistence + signaling layer for v5. No integration with the
existing in-flight inference path yet — that's Stage 2. This commit
just lands the durable queue underneath, with a state machine that
mcpd's HTTP handlers, the worker result-POST route, and the GC sweep
will all build on.

Schema (src/db/prisma/schema.prisma + migration):

- New `InferenceTask` model + `InferenceTaskStatus` enum
  (pending|claimed|running|completed|error|cancelled).
- Routing fields stored at enqueue time so a later rename of
  `Llm.poolName` doesn't reroute already-queued work: `poolName`
  (effective pool key), `llmName` (pinned target), `model`, `tier`.
- Worker tracking: `claimedBy` (providerSessionId) + `claimedAt`,
  cleared on revert.
- Bodies as `Json`: requestBody (always set), responseBody (set at
  completion). Streaming chunks are NOT persisted — too expensive at
  delta granularity. The final assembled body lands once per task.
- Lifecycle timestamps: createdAt, claimedAt, streamStartedAt,
  completedAt. Plus ownerId (RBAC + audit) and agentId (null for
  direct chat-llm calls).
- Indexes for the hot paths: (status, poolName) for the dispatcher's
  drain query, claimedBy for the disconnect revert, completedAt for
  the GC retention sweep, owner/agent for the async API listing.

Repository (src/mcpd/src/repositories/inference-task.repository.ts):

- CRUD + state transitions as conditional CAS via `updateMany`. Two
  workers racing to claim the same row both run the UPDATE; whichever
  the DB serializes first sees affected=1 and gets the row, the loser
  sees 0 and falls through to the next candidate. No application-
  level locking required.
- findPendingForPools(poolNames[]) for the worker drain on bind.
- findHeldBy(claimedBy) for the unbindSession revert.
- findStalePending + findExpiredTerminal for the GC sweep.

Service (src/mcpd/src/services/inference-task.service.ts):

- Owns the in-process EventEmitter that wakes blocked HTTP handlers
  when a worker POSTs results. The DB row is the source of truth for
  *state*; the EventEmitter just signals "go re-read row X" so we
  don't have to poll. Single-instance assumption for v5; pg
  LISTEN/NOTIFY is the v6 swap when scaling horizontally — no schema
  change needed, just replace the emitter wakeup.
- waitFor(taskId, timeoutMs) returns { done, chunks }: the terminal
  promise + an async iterator of streaming deltas. Throws on cancel
  (clear message) or error (worker's errorMessage propagates) or
  timeout. Polls the row once at subscribe time so an already-
  terminal task resolves immediately without waiting for an event
  that's never coming.
- gcSweep flips stale pending rows to error (with a clear message
  about the timeout) and deletes terminal rows past retention.
  Defaults: 1h pending timeout, 7d terminal retention; both
  configurable.

Tests:
- 6 db-level schema tests (defaults, json roundtrip, drain query
  shape, claimedBy filter, GC predicate, agentId nullable).
- 13 service tests covering enqueue, the CAS race on tryClaim,
  complete/fail/cancel, idempotent terminal transitions, revertHeldBy
  on disconnect, and the full waitFor signal lifecycle (immediate
  resolve, wake on event, chunk streaming, cancel/error/timeout
  paths). Plus a gcSweep test with a fixed clock.

mcpd 881/881 (was 868; +13). db pool-schema 14/14, +6 new
inference-task-schema. Pre-existing failures in models.test.ts
(Secret FK fixture issue, also fails on main HEAD) are unrelated.

Stage 2 (next): VirtualLlmService rewires through this — remove the
in-memory pendingTasks map; enqueue creates a row, dispatch picks an
active session, the result-route updates the row + emits the wakeup.
Worker disconnect reverts; worker bind drains.
2026-04-28 02:14:45 +01:00
256e117021 Merge pull request 'feat: v4 LB pools by shared poolName' (#69) from feat/llm-pool-by-name into main
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m10s
CI/CD / typecheck (push) Successful in 2m45s
CI/CD / smoke (push) Failing after 1m43s
CI/CD / build (push) Successful in 5m54s
CI/CD / publish (push) Has been skipped
Reviewed-on: #69
2026-04-28 01:02:45 +00:00
Michal
137711fdf6 feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3)
Some checks failed
CI/CD / lint (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / typecheck (pull_request) Successful in 2m53s
CI/CD / smoke (pull_request) Failing after 1m47s
CI/CD / build (pull_request) Successful in 6m20s
CI/CD / publish (pull_request) Has been skipped
Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars
publish virtual Llms with distinct names but a shared poolName, then:

  1. /api/v1/llms/<name>/members surfaces both with the correct
     effective pool key, size, activeCount, and per-member kind/status.
  2. Chat through an agent pinned to one pool member dispatches across
     the pool — verified by running 12 calls and asserting at least
     one response from each backend (the random-shuffle selection
     would have to hit only-A or only-B in 12 fair coin flips, ~1/2048).
  3. Failover: stop one publisher, the surviving member still serves
     chat. /members shows the stopped row as inactive immediately
     (unbindSession runs synchronously on SSE close).

docs/virtual-llms.md gets a full "LB pools (v4)" section with the
two-field schema model, dispatcher selection + failover semantics,
public + virtual declaration examples, list/describe rendering, the
"pin to specific instance" escape hatch, and an API surface entry
for /members. docs/agents.md cross-link extended.

Tests: full smoke 144/144 (was 141, +3 for the new pool smoke).
Stages 1-3 ship the complete v4 — public and virtual Llms can both
join pools, agents transparently load-balance across them, yaml
round-trip preserves poolName, and the existing single-Llm world
keeps working byte-identically when poolName is null.
2026-04-27 23:22:15 +01:00
Michal
e21f96080d feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2)
Surfaces the v4 pool model end-to-end:

- mcpd: GET /api/v1/llms/:name/members returns the effective pool the
  named anchor belongs to, plus aggregate stats (size, activeCount,
  explicit vs implicit pool key). RBAC inherits from `view:llms` —
  same as the single-Llm route. Members are full LlmView shapes so
  callers don't need a second roundtrip to render the pool block.

- mcpd: VirtualLlmService.register accepts an optional `poolName` on
  RegisterProviderInput; the route's `coerceProviderInput` validates
  the same character set as CreateLlmSchema.poolName. Backwards
  compatible — older mcplocals that don't send the field continue to
  publish solo Llms.

- CLI `get llm` table: new POOL column right after NAME. Solo rows
  show "-" so the "no pool / pool of 1" case is unambiguous (per
  user direction "make sure we see it, prominently visible and
  impossible to mistake").

- CLI `describe llm`: fetches /members and renders a Pool block at
  the top of the detail view when the row is in an explicit pool OR
  when its implicit pool has size > 1. Each member line shows
  kind/status; the anchor row gets "← this row". Block is suppressed
  for solo rows so describe stays compact in the common case.

- CLI `create llm --pool-name <name>` flag and apply schema both
  accept the new field. Yaml round-trip preserves it: get -o yaml
  emits `poolName: <name>`, apply -f re-imports it without diff.
  Verified end-to-end against the live mcpd.

- mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts
  and registrar.ts thread it through into the register payload. Use
  case for distributed inference: each user's mcplocal picks a
  unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName`
  (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool
  that auto-grows as workers come online.

- Shell completions: regenerated from source via the existing
  scripts/generate-completions.ts. `--pool-name` now suggests in
  fish + bash for `mcpctl create llm`.

Tests: +3 new mcpd route tests for /members (explicit pool, solo
pool of 1, missing-anchor 404). All suites green:
  mcpd 868/868 (was 865, +3),
  mcplocal 723/723,
  cli 437/437.

Stage 3 (next): live smoke against 2 publishers sharing a pool name +
docs.
2026-04-27 23:18:53 +01:00
Michal
7949e1393d feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1)
Adds LB-pool-by-shared-name without introducing a new resource. The
existing `Llm.name` stays globally unique; a new optional `poolName`
column declares membership in a pool. Multiple Llms sharing a non-null
`poolName` stack into one load-balanced pool that the chat dispatcher
expands at request time.

Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are
addressable as a "pool of 1" via their own name, so existing single-Llm
agents and YAMLs keep working unchanged. A solo row whose name happens
to match an explicit poolName joins the same pool — by design — so an
operator can transparently promote an existing Llm to pool seed.

Dispatcher (chat.service): prepareContext now resolves a randomly-
shuffled list of viable pool candidates (status != inactive) once per
turn. runOneInference and streamInference iterate the list on
transport-level failure (network, virtual publisher disconnect) until
one succeeds or the list is exhausted. Streaming failover only covers
"failed before first chunk" — once we've yielded text, we're committed
to that backend. Auth/4xx errors surfaced as result.status are NOT
retried; siblings with the same key/model would fail identically.

When the agent's pinned Llm is itself inactive but a sibling pool
member is up, dispatch transparently uses the sibling — that's the
whole point. When every member is inactive, prepareContext throws a
clear "No active Llm in pool '<key>' (pinned: <name>)" error rather
than letting the dispatcher's "exhausted" branch surface it.

Tests:
- 5 new chat-service tests for pool dispatch / failover / pinned-down /
  all-inactive (chat-service.test.ts).
- 7 new db schema tests for the column, the unique-name invariant, the
  fallback-to-name semantics, and the solo-name-joins-explicit-pool
  edge case (llm-pool-schema.test.ts).
- mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions.

Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool
stats on the existing single-Llm route, CLI POOL column + describe
block + --pool-name flag, yaml round-trip.
2026-04-27 22:02:41 +01:00
c0b4dc89f3 Merge pull request 'chore: fulldeploy uses bao-backed pulumi wrapper for drift check' (#68) from chore/fulldeploy-pulumi-wrapper into main
Some checks failed
CI/CD / lint (push) Successful in 54s
CI/CD / test (push) Successful in 1m8s
CI/CD / typecheck (push) Successful in 2m23s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 5m46s
CI/CD / publish (push) Has been skipped
Reviewed-on: #68
2026-04-27 20:21:33 +00:00
Michal
7f49294b36 chore(fulldeploy): use kubernetes-deployment/scripts/pulumi.sh wrapper
Some checks failed
CI/CD / lint (pull_request) Successful in 2m22s
CI/CD / typecheck (pull_request) Successful in 2m57s
CI/CD / test (pull_request) Failing after 4m36s
CI/CD / smoke (pull_request) Has been skipped
CI/CD / build (pull_request) Has been skipped
CI/CD / publish (pull_request) Has been skipped
The pre-flight drift check now calls the bao-backed pulumi wrapper
that landed with the litellm key persistence work, so deploys no
longer need PULUMI_CONFIG_PASSPHRASE in .env or shell env. The
passphrase is fetched from OpenBao at runtime by the wrapper and
exec-passed to pulumi only — never touches the parent shell's
state.

Falls back to a clear warning if the wrapper isn't present (older
clone of kubernetes-deployment) instead of pretending to skip the
check silently.
2026-04-27 19:14:36 +01:00
f5bdeea8e7 Merge pull request 'feat: virtual agents v3 (Stages 1-3) + real fixes for chat/adapter/CLI thread format' (#67) from feat/virtual-agent-v3 into main
Some checks failed
CI/CD / typecheck (push) Successful in 55s
CI/CD / test (push) Successful in 1m10s
CI/CD / lint (push) Successful in 2m32s
CI/CD / smoke (push) Failing after 1m44s
CI/CD / build (push) Successful in 5m4s
CI/CD / publish (push) Has been skipped
Reviewed-on: #67
2026-04-27 18:06:59 +00:00
Michal
1998b733b2 feat(cli+docs): mcpctl get agent KIND/STATUS columns + virtual-agent smoke + docs (v3 Stage 4)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m10s
CI/CD / typecheck (pull_request) Successful in 2m30s
CI/CD / build (pull_request) Successful in 2m36s
CI/CD / smoke (pull_request) Failing after 5m56s
CI/CD / publish (pull_request) Has been skipped
CLI: `mcpctl get agent` table view gains KIND and STATUS columns
mirroring the `get llm` shape from v1. Public agents render as
`public/active` (the AgentRow defaults) and virtual ones surface their
true lifecycle state, so `mcpctl get agent` becomes a single-pane view
for both manually-created and mcplocal-published personas.

Smoke: tests/smoke/virtual-agent.smoke.test.ts mirrors virtual-llm's
in-process registrar pattern — publishes a fake provider + agent in
one round-trip, confirms mcpd surfaces the agent kind=virtual /
status=active under /api/v1/agents, then disconnects and verifies the
paired Llm-and-Agent both flip to inactive (deletion is GC-driven, not
disconnect-driven, so the rows must still exist post-stop). Heartbeat-
stale and 4 h sweep paths are covered by the unit suite to keep smoke
duration in check.

Docs: docs/virtual-llms.md gets a "Virtual agents (v3)" section with a
config sample, lifecycle notes, listing example, and the cluster-wide
name-uniqueness caveat. The API surface block now mentions the new
`agents[]` field on _provider-register, the join-by-session heartbeat
behavior, and the `GET /api/v1/agents` lifecycle fields. docs/agents.md
gains a one-paragraph note pointing to the v3 publishing path.

Tests: full smoke suite 141/141 (was 139, +2 new), unit suites
unchanged (mcpd 860/860, mcplocal 723/723).
2026-04-27 18:47:03 +01:00
Michal
610808b9e7 fix(chat): real fixes for thinking-model + URL conventions, not test tweaks
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / typecheck (pull_request) Successful in 2m37s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Successful in 5m42s
CI/CD / publish (pull_request) Has been skipped
Five real bugs surfaced by the agent-chat smoke against live
qwen3-thinking. None of these are fixed by changing the test — the
test was right to fail.

1. openai-passthrough adapter doubled `/v1` in the request URL. The
   adapter hard-codes `/v1/chat/completions` after the configured base,
   but every OpenAI-compat provider documents its base URL with a
   trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users
   pasting that conventional shape produced
   `https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a
   trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style)
   is preserved.

2. Non-streaming chat returned an empty assistant when thinking models
   (qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only
   `reasoning_content` with `content: null`. extractChoice now also
   pulls reasoning (every spelling the streaming parser already knows
   about), and a new pickAssistantText helper falls back to it when
   content is empty. A `[response truncated by max_tokens]` marker is
   appended when finish_reason is `length`, so users see the cut-off
   instead of guessing why the answer is short. Symmetric streaming
   fix: the chatStream loop accumulates reasoning and yields ONE
   synthesized `text` frame at the end when content stayed empty,
   keeping the CLI's stdout (which only prints `text` deltas) in sync
   with the persisted thread message.

3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3
   lifecycle field) instead of `kind: agent` (apply envelope), so
   round-tripping through `apply -f` failed. Same fix shape as the v1
   Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/
   inactiveSince/providerSessionId for the agents resource too.

4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on
   stderr; streaming printed `(thread: <cuid>)` (with space). Tests
   and any other regex watching for one form missed the other.
   Standardize on `thread: <cuid>` (single space) in both paths.

5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on
   success — making any `expect(stderr).toMatch(...)` assertion
   structurally impossible to satisfy in the happy path. Switch to
   `spawnSync` so stderr is actually captured. Includes a small
   shell-style argv splitter so the existing call sites with quoted
   multi-word values (`--system-prompt "..."`) keep working.

Tests: +6 new mcpd unit tests (4 chat-service for the reasoning
fallback / truncation marker / content-preference / streaming synth;
2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd
+ mcplocal + smoke green: 860/860 + 723/723 + 139/139.
2026-04-27 18:39:01 +01:00
Michal
58bc277242 feat(mcpd+mcplocal): register-agents endpoint + mcplocal agents block (v3 Stage 3)
Extends the existing `_provider-register` payload with an optional `agents`
array so a single round-trip atomically publishes both virtual Llms and
their pinned virtual Agents. v1/v2 publishers (providers-only) keep
working unchanged — the agents path is gated on the route receiving an
AgentService instance, otherwise it logs a warning and ignores the array.

mcplocal config gains a top-level `agents` block (loadLocalAgents)
mirroring the providers shape. The registrar reads it, builds
RegistrarPublishedAgent entries against the published provider names,
and folds them into the same register POST. mcpd routes the agents
through AgentService.registerVirtualAgents(sessionId, ..., ownerId),
which was added in Stage 2.

No CLI changes here — `mcpctl chat <virtual-agent>` already works once
chat.service has the kind=virtual branch (Stage 1) and the agents are
present in the Agent table. CLI columns + smoke land in Stage 4.
2026-04-27 18:38:37 +01:00
Michal
c7b1bd8e2c feat(mcpd): AgentService virtual methods + GC cascade (v3 Stage 2)
State machine for kind=virtual Agent rows. Mirrors what
VirtualLlmService did for Llms in v1, then wires both lifecycles
together so disconnect/heartbeat/GC cascade through both at once.

AgentRepository:
- create/update accept the new lifecycle fields (kind, providerSessionId,
  status, lastHeartbeatAt, inactiveSince).
- Adds findBySessionId, findByLlmId, findStaleVirtuals, findExpiredInactives.

AgentService — new virtual-agent methods:
- registerVirtualAgents(sessionId, inputs, ownerId) — sticky upsert.
  New names insert as kind=virtual/status=active. Existing virtuals
  owned by the same session reactivate; existing inactive virtuals
  from a foreign session can be adopted (sticky reconnect). Refuses
  to overwrite a public agent or a foreign session's still-active
  virtual (HTTP 409). Pinned LLM is resolved via LlmService — caller
  posts Llms first.
- heartbeatVirtualAgents(sessionId) — bumps owned agents on a session
  heartbeat; revives inactive rows.
- markVirtualAgentsInactiveBySession(sessionId) — disconnect cascade.
- deleteVirtualAgentsForLlm(llmId) — defensive cascade for the GC's
  Llm-delete step (Agent.llmId is Restrict).
- gcSweepVirtualAgents() — same shape as VirtualLlmService.gcSweep
  (90s heartbeat-stale → inactive, 4h inactive → delete).

VirtualLlmService:
- Optional AgentService dependency. heartbeat() now also bumps owned
  agents; unbindSession() flips them inactive. gcSweep() runs the
  agent sweep FIRST (so any agent that would block an Llm delete via
  Restrict is already gone), and adds a defensive
  deleteVirtualAgentsForLlm step right before each Llm delete in case
  an agent's heartbeat lagged its Llm's just enough to escape this
  round's 4h cutoff.

main.ts:
- VirtualLlmService construction moves below AgentService so it can
  receive the cascade dependency.

Tests: 13 new in virtual-agent-service.test.ts cover all the register
variants (insert, sticky reconnect, adopt-inactive-foreign, refuse
public-overwrite, refuse foreign-session-active), heartbeat-revive,
disconnect-cascade, deleteVirtualAgentsForLlm scope, GC sweep flip
+ delete + idempotence, and three VirtualLlmService cascade scenarios
(unbindSession, gcSweep deleting agent before Llm, defensive cascade
when agent's heartbeat lagged).

mcpd suite: 854/854 (was 841 + 13 new). Workspace unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:07:23 +01:00
Michal
9afd24a3aa feat(db+mcpd): Agent lifecycle + chat.service kind=virtual branch (v3 Stage 1)
Two pieces of v3 plumbing — schema + the latent v1 chat.service bug.

Schema (db):
- Agent gains kind/providerSessionId/lastHeartbeatAt/status/inactiveSince
  mirroring Llm's v1 lifecycle. Reuses LlmKind / LlmStatus enums; no
  new types. Existing rows backfill kind=public/status=active so v1
  CRUD is unaffected.
- @@index([kind, status]) for the GC sweep, @@index([providerSessionId])
  for disconnect-cascade lookups.
- 4 new prisma-level tests cover defaults, persisting virtual fields,
  the (kind, status) GC index, and providerSessionId lookups.
  Total agent-schema tests: 20/20.

chat.service (mcpd) — fixes the v1 latent bug:
- LlmView's kind is now plumbed through prepareContext as ctx.llmKind.
- Two new private helpers, runOneInference / streamInference, branch
  on ctx.llmKind: 'public' goes through the existing adapter
  registry, 'virtual' relays through VirtualLlmService.enqueueInferTask
  (mirrors the route-handler branch from v1 Stage 3).
- Streaming bridges VirtualLlmService's onChunk callback API to an
  async iterator via a small queue + wake pattern.
- ChatService gains an optional virtualLlms constructor parameter;
  main.ts wires it in. Older test wirings without it raise a clear
  "virtualLlms dispatcher not wired" error when the row is virtual,
  rather than silently falling through to the public path against an
  empty URL.

This unblocks any Agent (public OR future v3-virtual) pinned to a
kind=virtual Llm. Pre-this-stage, those agents 502'd against the
empty url field.

Tests: 4 new chat-service-virtual-llm.test.ts cover the relay path
non-streaming, streaming, missing-dispatcher error, and rejection
surfacing. mcpd suite: 841/841 (was 833, +8 across stages 1+v3-Stage-1).
Workspace: 2054/2054 across 153 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:07:23 +01:00
9374a2652b perf: vitest threads pool + Dockerfile pnpm cache mount (#66)
Some checks failed
CI/CD / lint (push) Successful in 58s
CI/CD / test (push) Successful in 1m11s
CI/CD / typecheck (push) Successful in 2m35s
CI/CD / smoke (push) Failing after 1m43s
CI/CD / build (push) Successful in 2m21s
CI/CD / publish (push) Has been skipped
2026-04-27 16:07:05 +00:00
Michal
18245be0c1 perf: vitest threads pool + Dockerfile pnpm cache mount
Some checks failed
CI/CD / typecheck (pull_request) Successful in 56s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / lint (pull_request) Successful in 2m40s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Failing after 7m6s
CI/CD / publish (pull_request) Has been skipped
Two tuning knobs that were leaving most of the host idle:

1) vitest.config.ts pool=threads with maxThreads ≈ cores/2.
   Default left this 64-core workstation at ~10% CPU during
   \`pnpm test:run\`. Threads pool uses the box: same 152-file/2050-test
   suite now runs at ~700% CPU instead of ~150%. Wall time gain is
   modest (workload is dominated by a handful of slow individual files
   that one thread must run serially), but the parallel headroom is
   there for when the suite grows. Cap = max(2, cores/2) keeps laptops
   reasonable; override with \`VITEST_MAX_THREADS=N\` in the env.

2) Dockerfile.mcpd uses BuildKit cache mounts on both pnpm install
   steps. Adds \`# syntax=docker/dockerfile:1.6\` and a
   \`--mount=type=cache,target=/root/.local/share/pnpm/store\` so
   pnpm's content-addressed store survives across image rebuilds.
   Cold rebuilds where the lockfile changed are unaffected; warm
   rebuilds where only source changed drop the install step from
   ~60s to <5s. fulldeploy.sh's mcpd image rebuild gets that back
   minus the docker push hash mismatch.

Test parity: 2050/2050 across 152 files; per-package mcpd 837/837.
Both unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 17:06:39 +01:00
45c7737ee1 feat: virtual LLMs v2 (wake-on-demand) (#65)
Some checks failed
CI/CD / lint (push) Successful in 54s
CI/CD / test (push) Successful in 1m12s
CI/CD / typecheck (push) Successful in 2m42s
CI/CD / smoke (push) Failing after 1m43s
CI/CD / build (push) Successful in 2m33s
CI/CD / publish (push) Has been skipped
2026-04-27 14:20:59 +00:00
Michal
e0cfe0ba4d feat: virtual-LLM v2 smoke + docs (v2 Stage 3)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / typecheck (pull_request) Successful in 2m43s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Successful in 5m28s
CI/CD / publish (pull_request) Has been skipped
Closes v2 (wake-on-demand). Same shape as v1's stage 6: smoke
exercises the live-cluster path, docs lose the "v2 reserved" caveat
and gain a full wake-recipe section.

Smoke (virtual-llm.smoke.test.ts):
- New "wake-on-demand" describe block runs alongside the v1 tests.
- Spins a tiny in-process HTTP "wake controller"; the published
  provider's isAvailable() returns false until the wake POST flips
  the bool. Asserts:
    1. Provider publishes as kind=virtual / status=hibernating.
    2. First inference triggers the wake recipe, the recipe POSTs
       to the controller, the provider becomes available, mcpd
       relays the inference, and the row settles to active.
- Cleans up the row + wake server in afterAll.

Docs (docs/virtual-llms.md):
- Lifecycle table updates the `hibernating` description from
  "reserved for v2" to the actual v2 semantics.
- New "Wake-on-demand (v2)" section: configuration shapes for both
  recipe types (http + command), the wake-then-infer flow diagram,
  concurrent-infer dedup, failure semantics.
- Roadmap drops v2; v3-v5 still listed.

Workspace: 2050/2050 (smoke runs separately; the new SSE-based wake
test runs only against a live cluster, not under \`pnpm test:run\`).

v2 closes. v3 = virtual agents, v4 = LB pool by model, v5 = queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:20:18 +01:00
Michal
db839afc57 feat(mcpd): wake-before-infer for hibernating virtual LLMs (v2 Stage 2)
Second half of v2. mcpd now dispatches a \`wake\` task on the SSE
control channel when an inference request hits a row whose
status=hibernating, waits for the publisher to confirm readiness,
then proceeds with the infer task. Concurrent infers for the same
hibernating Llm share a single wake task — \`wakeInFlight\` map
dedupes by Llm name.

State machine in enqueueInferTask:
  active        → push infer task immediately (existing path).
  inactive      → 503, publisher offline (existing path).
  hibernating   → ensureAwake() → push infer task (new in v2).

ensureAwake/runWake (private):
- Allocates a fresh taskId on the existing PendingTask plumbing.
- Pushes \`{ kind: "wake", taskId, llmName }\` on the SSE handle.
- Awaits the publisher's result POST. On 2xx, flips the row to
  active + bumps lastHeartbeatAt, so all queued + future infers
  hit the active path. On non-2xx or service.failTask, the row
  stays hibernating (next request retries).

Tests: 4 new in virtual-llm-service.test.ts cover happy path
(wake → infer in order), concurrent dedup (3 parallel infers, 1
wake task), wake failure surfaces to all queued infers and leaves
the row hibernating, inactive ≠ hibernating (still rejects with 503,
no wake attempt). 22/22 service tests, 2050/2050 workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:18:24 +01:00
Michal
af0fabd84f feat(mcplocal+mcpd): wake-recipe config + wake-task execution (v2 Stage 1)
First half of v2 — mcplocal can now declare a hibernating backend and
respond to a `wake` task by running a configured recipe. v2 Stage 2
will wire mcpd to dispatch the wake task before relaying inference.

Config (LlmProviderFileEntry):
- New \`wake\` block on a published provider:
    wake:
      type: http        # or: command
      url: ...           # http only
      method: POST       # http only, default POST
      headers: {...}     # http only
      body: ...          # http only
      command: ...       # command only
      args: [...]        # command only
      maxWaitSeconds: 60 # how long to poll isAvailable() after wake fires

Registrar (mcplocal):
- At publish time, providers with a wake recipe whose isAvailable()
  returns false report initialStatus=hibernating to mcpd. Without a
  wake recipe (legacy v1) or when already up, status stays active.
- handleWakeTask: runs the recipe (HTTP request OR child-process
  spawn), then polls isAvailable() up to maxWaitSeconds, sending a
  heartbeat each loop so mcpd's GC sweep doesn't time us out
  mid-boot. Reports { ok, ms } on success or { error } on
  timeout/recipe failure via the existing _provider-task/:id/result.
- Replaces the v1 stub that rejected wake tasks with "not implemented".

mcpd VirtualLlmService:
- RegisterProviderInput gains optional initialStatus ('active' |
  'hibernating'). The register/upsert path uses it for both new and
  reconnecting rows. Defaults to 'active' so v1 publishers still
  work unchanged.
- Provider-register route's coercer accepts the new field.

Tests: 3 new in registrar.test.ts cover initialStatus selection
(hibernating when wake configured + unavailable, active otherwise,
active when no wake even if unavailable). 8/8 registrar tests, 833/833
mcpd unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 15:15:46 +01:00
700d1683c2 fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML (#64)
Some checks failed
CI/CD / lint (push) Successful in 56s
CI/CD / test (push) Successful in 1m11s
CI/CD / typecheck (push) Successful in 2m49s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 3m10s
CI/CD / publish (push) Has been skipped
2026-04-27 13:47:18 +00:00
Michal
2a44f60785 fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / typecheck (pull_request) Successful in 2m59s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Successful in 6m35s
CI/CD / publish (pull_request) Has been skipped
The smoke test \`llm.smoke > round-trips yaml output → apply -f\` failed
after v1 of the virtual-LLM feature: \`mcpctl get llm <name> -o yaml\`
output now starts with \`kind: public\` (the new schema column) instead
of \`kind: llm\` (the apply-doc envelope), because toApplyDocs spread
the cleaned item AFTER setting the kind, so the cleaned item's \`kind\`
overwrote.

Fix: in toApplyDocs, when serialising the \`llms\` resource, drop the
new lifecycle fields (kind, status, lastHeartbeatAt, inactiveSince,
providerSessionId) before merging. They collide with the apply-doc
envelope and aren't apply-able anyway — they're derived runtime state
owned by VirtualLlmService. Public-LLM round-trip is now byte-clean
(those fields default to public/active anyway). Virtual rows are
created by the registrar, not via apply -f, so dropping them on
output is the right call.

CLI suite: 437/437. Smoke will re-run against the live mcpd via
scripts/release.sh after merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:47:00 +01:00
65b6b265d9 feat: virtual LLMs v1 (registration skeleton) (#63)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m12s
CI/CD / typecheck (push) Successful in 2m13s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 4m50s
CI/CD / publish (push) Has been skipped
2026-04-27 13:38:50 +00:00
Michal
866f6abc88 feat: virtual-LLM smoke test + docs (v1 Stage 6)
Some checks failed
CI/CD / typecheck (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / lint (pull_request) Successful in 2m6s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 2m11s
CI/CD / publish (pull_request) Has been skipped
Final stage of v1.

Smoke (mcplocal/tests/smoke/virtual-llm.smoke.test.ts):
- Spins an in-process LlmProvider that returns canned content.
- Runs the registrar against the live mcpd in fulldeploy.
- Asserts: row appears with kind=virtual / status=active, infer
  through /api/v1/llms/<name>/infer comes back through the SSE
  relay with the provider's content + finish_reason, and a 503
  appears immediately after registrar.stop() (publisher offline).
- Times out / cleanup paths idempotent so re-runs against the same
  cluster don't litter rows. The 90-s heartbeat-stale flip and 4-h
  GC are unit-tested — too slow for smoke.

Docs:
- New docs/virtual-llms.md: when to use this vs creating a regular
  Llm row, how to opt-in via publish: true, the lifecycle table,
  the inference-relay sequence, the v1 streaming caveat, the v2-v5
  roadmap, and the full /api/v1/llms/_provider-* surface.
- agents.md cross-links virtual-llms.md alongside personalities/chat.
- README's Agents section gains a "Virtual LLMs" subsection.

Workspace suite: 2043/2043 (smoke files run separately). v1 closes.

Stage roadmap (each its own future PR):
  v2 wake-on-demand · v3 virtual agents · v4 LB pool · v5 task queue

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:28:43 +01:00
Michal
7e6b0cab44 feat(cli): mcpctl chat-llm + KIND/STATUS columns (v1 Stage 5)
Closes the loop on user-facing surface:

  $ mcpctl get llm
  NAME             KIND     STATUS    TYPE     MODEL                       TIER  KEY  ID
  qwen3-thinking   public   active    openai   qwen3-thinking              fast  ...  ...
  vllm-local       virtual  active    openai   Qwen/Qwen2.5-7B-Instruct    fast  -    ...

  $ mcpctl chat-llm vllm-local
  ────────────────────────────────────────
  LLM: vllm-local  openai → Qwen/Qwen2.5-7B-Instruct-AWQ
  Kind: virtual    Status: active
  ────────────────────────────────────────
  > hello?
  Hi! …

New: chat-llm command (commands/chat-llm.ts)
- Stateless chat with any mcpd-registered LLM. No threads, no tools,
  no project prompts. POSTs to /api/v1/llms/<name>/infer; mcpd's
  kind=virtual branch handles relay-through-mcplocal transparently,
  so the same CLI command works for both public and virtual LLMs.
- Reuses installStatusBar / formatStats / recordDelta / styleStats /
  PhaseStats from chat.ts (now exported) so the bottom-row tokens-per-
  second ticker behaves identically to mcpctl chat.
- Flags: --message (one-shot), --system, --temperature, --max-tokens,
  --no-stream. Streaming uses OpenAI chat.completion.chunk SSE.
- REPL mode keeps a per-session history array so multi-turn flows
  feel natural; each turn is an independent inference call.

Updated: get.ts
- LlmRow gains optional kind/status fields.
- llmColumns layout: NAME, KIND, STATUS, TYPE, MODEL, TIER, KEY, ID.
  Defaults gracefully when older mcpd responses don't return them.

Updated: chat.ts
- Re-exports the helpers chat-llm.ts needs (PhaseStats, newPhase,
  recordDelta, formatStats, styleStats, styleThinking, STDERR_IS_TTY,
  StatusBar, installStatusBar). No behavior change.

Completions: chat-llm picks up the standard option enumeration
automatically; bash gets a special-case for first-arg LLM-name
completion via _mcpctl_resource_names "llms".

CLI suite: 437/437 (was 430, +7 from auto-discovered test cases in
the regenerated completions golden). Workspace: 2043/2043 across
152 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:25:38 +01:00
Michal
97174f450f feat(mcplocal): virtual-LLM registrar (v1 Stage 4)
The mcplocal counterpart to mcpd's VirtualLlmService. After this stage,
flipping \`publish: true\` on a provider in ~/.mcpctl/config.json makes
the provider show up in mcpctl get llm with kind=virtual the next time
mcplocal restarts; running an inference against it relays through this
client back to the local LlmProvider.

Config:
- LlmProviderFileEntry gains optional \`publish: boolean\` (default false,
  so existing setups don't change).

Registrar (new file: providers/registrar.ts):
- start(): if any provider is opted-in, POSTs to
  /api/v1/llms/_provider-register with the publishable set, persists
  the returned providerSessionId to ~/.mcpctl/provider-session for
  sticky reconnects, then opens the SSE control channel and starts a
  30-s heartbeat ticker.
- SSE listener parses event/data lines from text/event-stream frames.
  task frames trigger handleInferTask: convert OpenAI body to
  CompletionOptions, call provider.complete(), POST the result back as
  either { status, body } (non-streaming) or two chunk POSTs
  (streaming: one delta + a [DONE] marker).
- Disconnect → exponential backoff reconnect from 5 s up to 60 s. On
  successful reconnect the persisted sessionId revives the same Llm
  rows in mcpd (mcpd flips them back to active on heartbeat).
- stop() destroys the SSE socket and clears the timer; cleanly handed
  off from main.ts's existing shutdown handler.

Wired into mcplocal main.ts via maybeStartVirtualLlmRegistrar:
- Filters opted-in providers, looks up their LlmProvider instances in
  the registry.
- Reads ~/.mcpctl/credentials for mcpdUrl + bearer; absence is a
  best-effort skip (logs a warning, returns null) — never a boot
  blocker.

v1 caveat documented in the file header: LlmProvider returns a
finalized CompletionResult, not a token stream, so streaming requests
get a single delta chunk + [DONE]. Real per-token streaming is a v2
concern.

Tests: 5 new in tests/registrar.test.ts using a tiny in-process HTTP
server. Cover: no-op when nothing opted-in, register POST + sticky
sessionId persistence, sticky reconnect from disk, heartbeat ticker
fires at the configured interval, register HTTP error surfaces.

Workspace suite: 2043/2043 across 152 files (was 2006/149, +5
new tests + the new file gets discovered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:20:54 +01:00
Michal
192a3831df feat(mcpd): virtual-LLM routes + GC ticker (v1 Stage 3)
End-to-end backend wiring. After this stage, an mcplocal client can
register a provider, hold the SSE channel open, heartbeat, and have
its inference requests fanned through the relay — all without
touching the agent layer or the public-LLM path.

Routes (new file: routes/virtual-llms.ts):
  POST /api/v1/llms/_provider-register    → returns { providerSessionId, llms[] }
  GET  /api/v1/llms/_provider-stream      → SSE channel keyed by
                                            x-mcpctl-provider-session header.
                                            Emits `event: hello` on open,
                                            `event: task` on inference fan-out,
                                            `: ping` every 20 s for proxies.
  POST /api/v1/llms/_provider-heartbeat   → bumps lastHeartbeatAt
  POST /api/v1/llms/_provider-task/:id/result
                                          → mcplocal pushes result back;
                                            body shape is one of:
                                              { error: 'msg' }
                                              { chunk: { data, done? } }
                                              { status, body }

LlmService:
- LlmView gains kind/status/lastHeartbeatAt/inactiveSince so route
  handlers + the upcoming `mcpctl get llm` columns can branch on
  kind without re-fetching the row.

llm-infer.ts:
- Detects llm.kind === 'virtual' and delegates to
  VirtualLlmService.enqueueInferTask. Streaming + non-streaming both
  supported; on 503 (publisher offline) the existing audit hook still
  fires with the right status code.
- Adds optional `virtualLlms: VirtualLlmService` to LlmInferDeps;
  absence in test fixtures returns a 500 with a clear "server
  misconfiguration" message rather than silently falling through to
  the public path against an empty URL.

main.ts:
- Constructs VirtualLlmService(llmRepo).
- Passes it to registerLlmInferRoutes.
- Calls registerVirtualLlmRoutes(app, virtualLlmService).
- 60-s GC ticker started after app.listen; clears on graceful
  shutdown alongside the existing reconcile timer.

Tests: 11 new virtual-LLM route assertions (validation paths,
service plumbing for register/heartbeat/task-result) + 3 new
infer-route assertions (kind=virtual non-streaming relay, 503 path,
500 when virtualLlms dep missing). mcpd suite: 833/833 (was 819,
+14). Typecheck clean.

The full SSE handshake is exercised by the smoke test in Stage 6;
under app.inject the keep-alive blocks until close so unit-level
SSE testing isn't worth the complexity here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:15:18 +01:00
Michal
2215922618 feat(mcpd): VirtualLlmService + repo lifecycle helpers (v1 Stage 2)
The state machine for kind=virtual Llm rows. Wires the schema added
in Stage 1 into something that can register, heartbeat, time out,
and relay inference tasks. The HTTP routes (Stage 3) plug into this.

Repository (extends ILlmRepository):
- create/update accept kind/providerSessionId/lastHeartbeatAt/status/
  inactiveSince/type so VirtualLlmService can drive the lifecycle.
- findBySessionId(sessionId) — the reconnect lookup.
- findStaleVirtuals(cutoff) — heartbeat-stale rows for the GC sweep.
- findExpiredInactives(cutoff) — 4h-expired rows for deletion.

VirtualLlmService:
- register(): sticky-id-aware upsert. New names insert as kind=virtual/
  status=active. Existing virtual rows from the same session reactivate
  in place; existing inactive virtuals from a foreign session can be
  adopted (sticky reconnect). Refuses to overwrite a public row or a
  foreign session's still-active virtual.
- heartbeat(): bumps lastHeartbeatAt for every row owned by the
  session; revives inactive rows.
- bindSession()/unbindSession(): in-memory map of sessionId → SSE
  handle. Disconnect immediately flips owned rows to inactive AND
  rejects any in-flight tasks for that session.
- enqueueInferTask(): pushes an `infer` task frame to the SSE handle,
  returns a PendingTaskRef whose `done` resolves when the publisher
  POSTs the result back. Streaming variant exposes onChunk(cb).
- completeTask/pushTaskChunk/failTask: route-side hooks called from
  the result POST handler (lands in Stage 3).
- gcSweep(): flips heartbeat-stale active virtuals to inactive (90s
  cutoff), deletes inactives past 4h. Idempotent.

Lifecycle constants live in this file (HEARTBEAT_TIMEOUT_MS=90s,
INACTIVE_RETENTION_MS=4h) so future stages can tune in one place.

18 new mocked-repo tests cover: register variants (insert, sticky
reconnect, refuse public-overwrite, refuse foreign-session, adopt
inactive-foreign), heartbeat-revive, unbind cascade, enqueue happy
path + 503 paths (no session, inactive, public-Llm), complete/fail/
streaming chunk fan-out, GC sweep flip + delete + idempotence.

mcpd suite: 819/819 (was 801, +18). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:05:19 +01:00
Michal
1acd8b58bc feat(db): Llm.kind discriminator + virtual-provider lifecycle (v1 Stage 1)
First step of the virtual-LLM feature. A virtual Llm row is one that
gets *registered by an mcplocal client* rather than created via
\`mcpctl create llm\`. Its inference is relayed back through an SSE
control channel to the publishing session (mcpd routes added in
Stage 3). The lifecycle fields below let mcpd reap stale rows when
the publisher goes away.

Schema additions:
- enum LlmKind (public | virtual). Default public.
- enum LlmStatus (active | inactive | hibernating). Default active.
  hibernating is reserved for v2 wake-on-demand.
- Llm.kind, providerSessionId, lastHeartbeatAt, status, inactiveSince.
- @@index([kind, status]) for the GC sweep.
- @@index([providerSessionId]) for the reconnect lookup.

All existing rows backfill with kind=public/status=active so v1 is
purely additive — public LLMs ignore the lifecycle columns entirely.

7 new prisma-level assertions in tests/llm-virtual-schema.test.ts
cover: defaults, persisting kind=virtual + lifecycle together, the
active→inactive flip, hibernating value, enum rejection, the
(kind,status) GC index, the providerSessionId reconnect index.

mcpd suite still 801/801 (regenerated client) and typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:59:44 +01:00
e65a396d3e fix(cli): status probe accepts reasoning_content for thinking models (#62)
Some checks failed
CI/CD / typecheck (push) Successful in 56s
CI/CD / test (push) Successful in 1m10s
CI/CD / lint (push) Successful in 2m40s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 5m5s
CI/CD / publish (push) Has been skipped
2026-04-27 11:10:15 +00:00
Michal
a84214dad1 fix(cli): status probe accepts reasoning_content for thinking models
Some checks failed
CI/CD / typecheck (pull_request) Successful in 56s
CI/CD / lint (pull_request) Successful in 3m6s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / build (pull_request) Successful in 2m39s
CI/CD / smoke (pull_request) Failing after 3m58s
CI/CD / publish (pull_request) Has been skipped
Live deploy showed qwen3-thinking failing the probe with "empty
content": at max_tokens=8 the model spent its entire budget on the
reasoning trace and never emitted a final \`content\` block.

Fix:
- Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap
  models but gives reasoning models enough headroom.
- If \`message.content\` is empty but \`reasoning_content\` is non-empty,
  count it as alive and prefix the preview with "[thinking]" so the
  user knows the model didn't actually answer "hi" but is responsive.
- Replace the prompt with the terser "Reply with just: hi" — closer
  to what a thinking model can short-circuit on.

Tests: existing 25 pass; the failure-path test still asserts on the
"empty content" path because reasoning_content is empty there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:09:42 +01:00
54e56f7b71 feat(cli): live "say hi" probe for server LLMs in mcpctl status (#61)
Some checks failed
CI/CD / lint (push) Successful in 57s
CI/CD / typecheck (push) Successful in 57s
CI/CD / test (push) Has been cancelled
CI/CD / smoke (push) Has been cancelled
CI/CD / build (push) Has been cancelled
CI/CD / publish (push) Has been cancelled
2026-04-27 11:02:26 +00:00
Michal
e4af16477c feat(cli): live "say hi" probe for server LLMs in mcpctl status
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m13s
CI/CD / typecheck (pull_request) Successful in 3m10s
CI/CD / smoke (pull_request) Failing after 1m46s
CI/CD / build (pull_request) Successful in 3m24s
CI/CD / publish (pull_request) Has been skipped
Status was showing the server-side LLM list but not whether each one
actually serves inference. This adds a per-LLM probe that POSTs a
tiny prompt to /api/v1/llms/<name>/infer:

  messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }]
  max_tokens: 8, temperature: 0

Each registered LLM gets a one-line health line:

  Server LLMs: 2 registered (probing live "say hi"...)
    fast   qwen3-thinking  ✓ "hi" 312ms
              openai → qwen3-thinking  http://litellm.../v1  key:litellm/API_KEY
    heavy  sonnet  ✗ upstream auth failed: 401
              anthropic → claude-sonnet-4-5  provider default  no key

Probes run in parallel so a single slow LLM doesn't gate the others;
each has its own 15-second timeout. JSON/YAML output gains a
\`health: { ok, ms, say?, error? }\` field per server LLM so dashboards
get the same liveness signal.

Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace
suite: 2006/2006 across 149 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:02:00 +01:00
de96af7bf6 feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 (#60)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m9s
CI/CD / typecheck (push) Failing after 7m9s
CI/CD / smoke (push) Has been skipped
CI/CD / build (push) Has been skipped
CI/CD / publish (push) Has been skipped
2026-04-27 10:28:10 +00:00