mcpctl

Author	SHA1	Message	Date
Michal	e8c3803fac	feat(web): bold redesign — Tailwind v4 + shadcn-style primitives + Skills/Proposals/Revisions UI Phase 6 of the Skills + Revisions + Proposals work. The web UI gets a new design language and first-class affordances for everything the backend now supports. ## Visual direction - Tailwind v4 with custom @theme block (oklch tokens). Dark-mode-only (internal tool — light mode doubles QA surface). - Inter for UI, JetBrains Mono for code/IDs (loaded via Google Fonts; trivial to swap for self-hosted geist later — the fallback stack reads identically). - Sidebar layout (always-visible at desktop widths) replacing the previous top-bar nav. Pending-proposals badge polls every 30 s so reviewers see a queue building without refreshing. - Lucide icons throughout. - Spacing and radii on Tailwind defaults. Existing inline-styled pages (Projects, Agents, AgentDetail, ProjectPrompts, PersonalityDetail, Login) continue to work unchanged inside the new Layout — Tailwind doesn't conflict with their inline styles. A follow-up can migrate them incrementally. ## What's added ### Build infra (src/web/) - package.json: tailwindcss@^4 + @tailwindcss/vite, lucide-react, class-variance-authority, clsx, tailwind-merge, diff, geist (held for future self-hosting). - vite.config.ts: registers the @tailwindcss/vite plugin. - src/index.css: Tailwind import + @theme tokens + @layer base. - src/main.tsx: imports index.css. - src/lib/utils.ts: shadcn-style cn() helper. ### shadcn-style primitives (src/components/ui/) Hand-written rather than generated via `npx shadcn` so the repo doesn't depend on a CLI tool that needs interactive runtime: - button.tsx — variants: primary / secondary / ghost / danger / link; sizes: sm / md / lg / icon. - card.tsx — Card + Header/Title/Description/Content/Footer subparts. - badge.tsx — variants: default / info / success / warning / danger / outline. - input.tsx — Input + Textarea + Label. - tabs.tsx — no-dep accessible Tabs (no Radix needed for our use). - separator.tsx — h/v separator with role=separator. ### Diff component (src/components/Diff.tsx) Wraps the `diff` package (already added in PR-2) for inline unified- diff display with color-coded add/remove rows. Used by both the proposal review page and the skill revision-history tab. ### New pages (src/pages/) - Dashboard.tsx — at-a-glance home. Counts for skills, prompts, projects, agents, proposals; pending-proposals call-out card if any. - Skills.tsx — list view, separated into Global vs Project/Agent- scoped sections. - SkillDetail.tsx — name + semver + description; tabs for SKILL.md / Files / Metadata / History. History tab shows revisions with click-to-diff against the live body. - Proposals.tsx — queue with Pending/Approved/Rejected tabs. Pending count is highlighted in amber. - ProposalDetail.tsx — full body, diff against current resource (or "would create new" if it doesn't exist), approve button + reject- with-required-note flow. ### usePolling hook (src/hooks/) Tiny polling-with-cancellation hook used by Layout and Proposals. ### Layout rewrite (src/components/Layout.tsx) Sidebar with nav items: Dashboard, Projects, Agents, Skills, Proposals. Lucide icons. Active-route highlighting via NavLink. Pending-proposals warning badge on the Proposals item. ### Routes (src/App.tsx) New routes: /dashboard, /skills, /skills/:name, /proposals, /proposals/:id. Default redirects to /dashboard. ### API types (src/api.ts) Type defs for Skill, VisibleSkill, Proposal, Revision (with the shapes the new pages consume). ## Tests Existing 7 web tests still pass (Login + api). New page-level tests deferred — the new pages are mostly compositions of primitives and fetch hooks that round-trip to the backend; the backend tests already cover what they call. PR-7 polish can add render-and-click tests if coverage drift surfaces. ## Verification - `pnpm --filter @mcpctl/web build` clean, no warnings. - `pnpm test:run` whole monorepo: 162 test files / 2157 tests green. - Visual smoke deferred — needs a running mcpd to populate the fixtures. Manual smoke tested locally is the next step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 17:54:55 +01:00
Michal	58e8e956ce	feat(cli+mcpd): mcpctl skills sync + config claude extension Phase 5 of the Skills + Revisions + Proposals work. Skills are now materialised onto disk under ~/.claude/skills/<name>/, with hash-pinned diff against mcpd, atomic per-skill install, and preservation of locally-modified files. `mcpctl config claude --project X` now wires the full pickup chain: writes .mcpctl-project marker, runs the initial sync, installs the SessionStart hook so subsequent Claude invocations stay in sync transparently. ## Sync algorithm 1. Resolve project: `--project` flag overrides; else walk up from cwd looking for `.mcpctl-project`; else fall back to globals-only. 2. GET /api/v1/projects/:name/skills/visible (or /api/v1/skills?scope=global without a project). Server returns id + name + semver + scope + contentHash + metadata — no body, no files. The contentHash is sha256 of the canonicalised body, computed server-side; any reordering of keys produces the same hash, so it's a stable diff key. 3. Load ~/.mcpctl/skills-state.json (lives outside ~/.claude/skills/ on purpose — Claude Code reads that tree and we don't want to pollute it with our bookkeeping). 4. Diff: - server skill not in state → INSTALL - server skill, state contentHash matches → SKIP (cheap path) - server skill, state contentHash differs → UPDATE (fetch full body) - state skill not in server → orphan, REMOVE (preserve if locally modified, unless --force) 5. Atomic per-skill install: write to <targetDir>.mcpctl-staging-<pid>/, rename existing tree to .mcpctl-trash-<pid>, swap staging in, rmtree the trash. A concurrent reader (Claude Code starting up) never sees a partial tree. 6. State file updated with new versions, per-file SHA-256, install path. saveState is atomic (temp + rename). ## Failure semantics - `--quiet` mode (used by SessionStart hook): exit 0 on network / timeout / mcpd error. Fail-open is non-negotiable here — we never want a hung mcpd to block Claude Code starting up. - Auth failure: exit 1, clear "run mcpctl login" message. - Disk error during state save: exit 2. - Per-skill errors are collected in the result and reported as a count; one bad skill doesn't stop the others. Network fetches run with concurrency 5. The server-side `/visible` endpoint is metadata-only so the cheap path (everything unchanged) needs exactly one HTTP roundtrip total. ## Files added ### CLI utilities (src/cli/src/utils/) - skills-state.ts — load/save state, per-file sha256, edit detection. - project-marker.ts — walk-up to find `.mcpctl-project`, bounded by user home so we never search above $HOME. - sessionhook.ts — install/remove a SessionStart hook entry tagged with `_mcpctl_managed: true`. Idempotent. Defensive against missing/empty/JSONC settings.json. - skills-disk.ts — atomic install via staging-dir rename swap, symmetric atomic delete via trash-dir rename. Path-escape attempts in files{} are rejected. ### CLI command (src/cli/src/commands/) - skills.ts — `mcpctl skills sync` Commander wrapper + the `runSkillsSync(opts, deps)` library function (also called from `mcpctl config claude --project`). Supports `--dry-run`, `--force`, `--quiet`, `--keep-orphans`. `--skip-postinstall` is reserved (postInstall execution lands in a follow-up PR, not this one). ### Wiring - index.ts: registers `mcpctl skills` after `mcpctl review`. - config.ts: `mcpctl config claude --project X` now writes the `.mcpctl-project` marker, runs `runSkillsSync` in-process, and calls `installManagedSessionHook('mcpctl skills sync --quiet')`. New flag `--skip-skills` opts out (used by tests; useful for CI). ## Server-side change - src/mcpd/src/services/skill.service.ts: getVisibleSkills now computes contentHash on the fly from the canonical body shape the client will reconstruct. Cheap (sha256 of ~few KB per skill); no schema migration needed since hash is derived not stored. ## Tests Four new utility test files (31 tests) under src/cli/tests/utils/: - sessionhook.test.ts — creation, idempotency, command updates, preservation of user hooks, removal, empty/JSONC tolerance. - skills-disk.test.ts — atomic write, replacement without leftovers, path-escape rejection, atomic delete, listing ignores staging/trash artifacts. - skills-state.test.ts — sha256 determinism, state round-trip, schema-version drift handling, edit detection. - project-marker.test.ts — cwd hit, walk-up, $HOME boundary, empty marker, write+read round-trip. The existing `mcpctl config claude` test (claude.test.ts) was updated to pass `--skip-skills` so it stays focused on .mcp.json generation; the new sync flow is covered by the utility tests. Full suite: 162 test files / 2157 tests green (up from 158 / 2127). ## Deferred to a follow-up - `metadata.hooks` materialisation into `~/.claude/settings.json` — the data path exists, sync receives it; PR-7 or a focused follow-up will write the `_mcpctl_managed: true` entries for declarative hooks. - `metadata.mcpServers` auto-attach via mcpd API — likewise. - `metadata.postInstall` script execution — the most substantive deferred piece. Current sync logs a TODO and skips. The corporate trust model (publisher-side rigor, not client-side defence) means this is straightforward to add once we wire the curated env + timeout + audit emission. Orthogonal to file sync, easier to ship separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:26:35 +01:00
Michal	db57bb5856	feat(mcpd+mcplocal+cli): propose-learnings system skill, propose_skill MCP tool, mcpctl review Phase 4 of the Skills + Revisions + Proposals work. Closes the reflexive loop: Claude sessions can now propose back content (prompts or skills) that maintainers triage via a CLI queue. The system documents itself to Claude through the same mechanism it documents to humans. ## What's added ### propose-learnings global skill (mcpd bootstrap) - src/mcpd/src/bootstrap/system-skills.ts — idempotent upsert, mirrors system-project.ts. Single skill seeded today: `propose-learnings`, ~430 words, explains when to engage with propose_prompt vs propose_skill, what makes a good proposal, what NOT to propose, and the review→approve flow. Priority 9, global scope. - main.ts: `bootstrapSystemSkills(prisma)` called right after `bootstrapSystemProject`. ### gate-encouragement-propose system prompt - system-project.ts gains a new gate prompt (priority 10, alongside the other gate-* prompts) that nudges Claude to call propose_prompt when it discovers a project-specific lesson. Pairs with the propose-learnings skill — the prompt is the trigger, the skill is the manual. ### propose_skill MCP tool (mcplocal) - proxymodel/plugins/gate.ts: new virtual tool registered alongside propose_prompt. Posts to /api/v1/proposals (the new endpoint from PR-2) with resourceType='skill'. Tool description steers Claude toward propose_prompt for project-specific knowledge and reserves propose_skill for cross-cutting cases. propose_prompt's tool description is also expanded to point at the propose-learnings skill for guidance — the bare "creates a pending request" copy was bland enough that nothing in Claude's prior would actually make it engage. ### mcpctl review CLI - New top-level command in src/cli/src/commands/review.ts. Subcommands: mcpctl review pending List pending proposals mcpctl review next Show oldest pending mcpctl review show <id> Full detail mcpctl review approve <id> POST /proposals/:id/approve mcpctl review reject <id> --reason "..." mcpctl review diff <id> Side-by-side current vs proposed - Wired into src/cli/src/index.ts. Registered after createApproveCommand to keep the existing project-ops `mcpctl approve promptrequest` command working (legacy) while the new review surface is the preferred path. ## Tests touched - bootstrap-system-project.test.ts already counts via getSystemPromptNames() length, so it picked up the new prompt automatically; only the priority assertion needed nothing — the new prompt starts with `gate-` so the existing `gate-* → priority 10` invariant validates it. - system-prompt-validation.test.ts: bumped expected length from 11→12 and added a `toContain('gate-encouragement-propose')` assertion. Full suite: 158 test files / 2127 tests green. ## What's NOT in this PR - A SkillService mock-based test for the proposal approval handler — the PromptService approval handler is structurally identical and already covered; the database-backed integration is exercised in PR-2's tests. - Changes to mcplocal's existing handleProposePrompt URL — it still POSTs to the legacy /api/v1/projects/.../promptrequests endpoint, which works because PR-2 left that route in place. PR-7 will cut mcplocal over to /api/v1/proposals along with the PromptRequest table rename + drop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 13:13:33 +01:00
Michal	20a541a5d6	feat(mcpd): Skill resource end-to-end (CRUD + backup + revision integration) Phase 3 of the Skills + Revisions + Proposals work. Skills get the same inline-content + revision-history shape as prompts, with the addition of `files` (multi-file bundles, materialised by `mcpctl skills sync` in PR-5) and a typed `metadata` Json (hooks, mcpServers, postInstall, …). ## What's added ### Validation (src/mcpd/src/validation/skill.schema.ts) Typed metadata schema with a closed list of recognised hook events (PreToolUse, PostToolUse, SessionStart, Stop, SubagentStop, Notification), typed `mcpServers` dependency declarations (name + fromTemplate + optional project), and `postInstall` / `preUninstall` paths into the bundle's `files{}`. `.passthrough()` so unknown fields survive — forward-compat for follow-on additions. ### Repository (src/mcpd/src/repositories/skill.repository.ts) Mirrors PromptRepository exactly. Same `?? ''` workaround for nullable-FK compound-key lookups. ### Service (src/mcpd/src/services/skill.service.ts) Mirrors PromptService for create / update / delete / restore / upsert, including: - Auto-bump patch on content/files/metadata change. - Revision recording (best-effort — failures don't block the save). - 'skill' approval handler registered with ResourceProposalService so proposalService.approve dispatches to skills the same way it dispatches to prompts. - `getVisibleSkills(projectId)` returns id + name + semver + scope + metadata for `mcpctl skills sync` (PR-5) to diff against on-disk state. ### Routes (src/mcpd/src/routes/skills.ts) - GET /api/v1/skills (filters: ?project= ?projectId= ?agent= ?scope=global) - GET /api/v1/skills/:id - POST /api/v1/skills - PUT /api/v1/skills/:id - DELETE /api/v1/skills/:id - GET /api/v1/projects/:name/skills - GET /api/v1/projects/:name/skills/visible — sync diffing - GET /api/v1/agents/:name/skills - POST /api/v1/skills/:id/restore-revision { revisionId, note? } ### main.ts SkillRepository + SkillService instantiated; revision/proposal services wired in. `skills` segment added to the RBAC permission map (uses the existing `prompts` permission for now — same trust shape) and to `kindFromSegment` so the git-backup hook captures skill mutations. ### Backup integration - yaml-serializer.ts: `BackupKind` adds 'skill'; APPLY_ORDER bumps to 9 with skill last (it depends on projects/agents). `parseResourcePath` recognises the `skills/` directory. - git-backup.service.ts: `serializeResource` adds the `case 'skill'` branch alongside prompts. The git-sync loop now round-trips skills on every change. - (Bundle backup-service.ts is NOT updated in this PR — deferred to PR-7 alongside the cutover. The git-based backup IS wired, which is the primary persistence path.) ### CLI - `mcpctl create skill <name>` with --content / --content-file, --description, --priority, --semver, --metadata-file (YAML/JSON), --files-dir (walks a directory tree into `files{}`, UTF-8 only; null bytes rejected). - shared.ts adds `skill` / `skills` / `sk` aliases. ### apply.ts Not updated — `mcpctl apply -f skill.yaml` is deferred to PR-7. The existing CRUD endpoints + `mcpctl create skill` cover the bootstrap need; bulk-apply will arrive with the `propose-learnings` seed and docs. ## Tests 158 test files / 2127 tests green across the workspace. The DB-level schema tests for Skill landed in PR-1; the new service-level integration is exercised through main.ts wiring + the existing prompt revision tests (skill follows the same code path through proposal service approval). A `describe('Skill service mocks')` test file deliberately not added — the PromptService mock-based tests already cover the revision/approval handler shape, and the skill handler is structurally identical (same upsert + record-revision + link-currentRevisionId pattern). PR-7 will add an integration test that walks the full propose → review → approve flow for both resource types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 00:48:40 +01:00
Michal	1ec286bb14	feat(mcpd): ResourceRevision + ResourceProposal services + Prompt revision integration Phase 2 of the Skills + Revisions + Proposals work. Stands up the generic revision/proposal layer and wires Prompt into it. Skills will plug into the same infrastructure in PR-3 with no further service changes required. This PR is intentionally additive: PromptRequest table and routes are unchanged. The /api/v1/proposals API runs side-by-side with the legacy /api/v1/promptrequests API. The PromptRequest cutover (rename + backfill + mcplocal rewire) is deferred to a later PR so this one stays reviewable. ## What's added ### Repositories (src/mcpd/src/repositories/) - resource-revision.repository.ts — append-only revision log keyed by (resourceType, resourceId). Soft FK; no relations declared. Supports history listing, semver lookup, and contentHash cross-resource search. - resource-proposal.repository.ts — generic propose queue. Status lifecycle pending → approved \| rejected. Mirrors Prompt's `?? ''` workaround for nullable-FK compound lookups. ### Services (src/mcpd/src/services/) - resource-revision.service.ts — record() inserts a revision with a stable sha256 contentHash computed from canonicalised JSON (key-sorted at every level so reordered objects produce the same hash). Caller passes a pre-computed semver; service does NOT decide bump policy. - resource-proposal.service.ts — propose / approve / reject / list, with a per-resourceType handler registry. PromptService registers the 'prompt' handler at construction; the SkillService will register 'skill' in PR-3. approve() runs in a Prisma $transaction so the resource update + revision insert + proposal status flip are atomic. ### Pure utility (src/mcpd/src/utils/semver.ts) - bumpSemver(current, kind) for major / minor / patch - compareSemver(a, b) — numeric, not lex (10 > 9) - isValidSemver(s) - Invalid input falls back to '0.1.0' rather than throwing — keeps the audit-write path from blowing up the prompt update if a row's semver ever drifts out of MAJOR.MINOR.PATCH shape. ### Routes (src/mcpd/src/routes/) - revisions.ts — GET /api/v1/revisions?resourceType=&resourceId=, GET /api/v1/revisions/:id, GET /api/v1/revisions/:id/diff?against=<id\|live> (unified-format diff via the `diff` package), and POST /api/v1/prompts/:id/restore-revision { revisionId, note? }. - proposals.ts — GET / POST /api/v1/proposals, GET /api/v1/proposals/:id, PUT for body updates, POST .../approve and POST .../reject, plus DELETE. ## What's changed - PromptService.create / update now record a ResourceRevision when the revision service is wired. Update auto-bumps patch on content change; authors can override via `--bump major\|minor\|patch` or `--semver X.Y.Z` on the CLI (forwarded into the PUT body). Best-effort: revision write failures are swallowed so the prompt save still succeeds (revision is audit, not source of truth). - PromptService.setProposalService registers a 'prompt' approval handler with the proposal service. Approval runs in a Prisma transaction: upsert prompt → record revision → update currentRevisionId → flip proposal status. semver bumps to 0.1.0 on first approval, patch thereafter. - New CLI flags on `mcpctl edit prompt`: --bump, --semver, --note. They're prompt-only (validated client-side); other resources reject them. - Aliases in shared.ts: `proposal`/`prop` → proposals, `revision`/`rev` → revisions. - diff dependency added to mcpd. ## Tests - src/mcpd/tests/utils/semver.test.ts — covers bump/compare/validate including numeric (not lex) semver compare and invalid-input fallback. - prompt-service.test.ts updated: makePrompt fixture now sets semver + agentId + currentRevisionId; updatePrompt assertion expects the auto-bumped patch in the same update call. - prompt-routes.test.ts updated symmetrically. ## RBAC `proposals` and `revisions` URL segments map to the existing `prompts` permission for now. PR-7 may split if a "reviewer" role becomes useful. ## Verification Full suite: 158 test files / 2127 tests green. `pnpm build` clean across all 6 workspace packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 00:38:35 +01:00
Michal	fbe68fa693	feat(db): schema for ResourceRevision, ResourceProposal, Skill Phase 1 of the Skills + Revisions + Proposals work. Purely additive — no existing rows are touched, no tables renamed, no columns dropped. New tables: - ResourceRevision — append-only audit + diff log keyed by (resourceType, resourceId). Both Prompt and Skill produce revisions on every change. Soft FK so revisions outlive the resources they describe. Indexed for history viewer (latest-first), semver lookup, and cross-resource sync diff via contentHash. - ResourceProposal — generic propose/approve/reject queue. Drop-in replacement for the prompt-only PromptRequest. Created empty here; PR-2 will rename PromptRequest → _PromptRequest_legacy and backfill. - Skill — new resource type that mirrors Prompt for everything CRUD- shaped. Adds `files` Json (multi-file bundles, materialised onto disk by `mcpctl skills sync` in PR-5) and `metadata` Json (typed app-layer in PR-3: hooks, mcpServers, postInstall, …). New columns on Prompt: - semver (semver string, default '0.1.0') — auto-bumped patch on save by PromptService.update once PR-2 wires it. Distinct from `version`, which stays as the optimistic-concurrency counter. - currentRevisionId — soft pointer to the latest ResourceRevision row. DB tests cover scope rules (project XOR agent XOR neither), name uniqueness across both compound keys, cascade-on-delete, soft-FK survival of deletion, and JSON column persistence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 00:18:21 +01:00
michal	f8aa6c2f0d	feat: mcpctl provider {enable,disable} — persistent on/off switch (#74 ) Some checks failed CI/CD / lint (push) Successful in 1m0s Details CI/CD / typecheck (push) Successful in 1m1s Details CI/CD / test (push) Successful in 4m7s Details CI/CD / smoke (push) Failing after 6m4s Details CI/CD / build (push) Successful in 5m27s Details CI/CD / publish (push) Has been skipped Details	2026-05-03 14:57:21 +00:00
Michal	d04adb5623	feat(cli+mcplocal): persistent provider disable/enable Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m11s Details CI/CD / typecheck (pull_request) Successful in 3m20s Details CI/CD / smoke (pull_request) Failing after 52s Details CI/CD / build (pull_request) Successful in 3m59s Details CI/CD / publish (pull_request) Has been skipped Details Adds two new subcommands on top of v7's provider lifecycle CLI: mcpctl provider disable vllm-local # release GPU + survive restart mcpctl provider enable vllm-local # clear the flag, ready to chat Use case: vLLM keeps crashing on engine init. `down` works for "now" but the next chat triggers a restart; `disable` writes `disabled: true` into the provider's entry in ~/.mcpctl/config.json and short-circuits complete()/ensureRunning() until you re-enable. Implementation: - LlmProviderEntry / LlmProviderFileEntry: new optional `disabled` field - ManagedVllmProvider: setDisabled(bool), isDisabled(), gate in complete()/ensureRunning(), expose `disabled` in getStatus() - mcplocal HTTP: POST /llm/providers/:name/{disable,enable} write the config file and apply the change live; /start returns 409 when the target is disabled instead of silently failing - Boot: createSingleProvider honors `entry.disabled` so a known-bad vLLM doesn't auto-start on the first chat after mcplocal restart - CLI: `disable` / `enable` subcommands on `mcpctl provider`; status output now shows `(disabled)` next to the state `enable` is live — provider stays in the registry while disabled, so flipping the flag back is enough; no mcplocal restart needed. Tests: cli 437/437, mcplocal 731/731. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 15:57:01 +01:00
michal	fe27947f80	feat: mcpctl provider <name> {up,down,status} for managed LLMs (#73 ) Some checks failed CI/CD / typecheck (push) Successful in 1m10s Details CI/CD / test (push) Successful in 1m11s Details CI/CD / lint (push) Successful in 2m47s Details CI/CD / build (push) Successful in 2m57s Details CI/CD / smoke (push) Failing after 6m22s Details CI/CD / publish (push) Has been skipped Details	2026-05-03 14:40:57 +00:00
Michal	356cbe87b5	feat(cli+mcplocal): mcpctl provider <name> {up,down,status} for managed LLMs Some checks failed CI/CD / typecheck (pull_request) Successful in 57s Details CI/CD / test (pull_request) Successful in 1m23s Details CI/CD / lint (pull_request) Successful in 3m1s Details CI/CD / smoke (pull_request) Failing after 1m47s Details CI/CD / build (pull_request) Successful in 5m58s Details CI/CD / publish (pull_request) Has been skipped Details Adds lifecycle control for managed local LLM providers (vllm-managed) without the nuclear option of restarting mcplocal. Practical use: mcpctl provider vllm-local down # release GPU memory now mcpctl provider vllm-local up # warm up before the next chat mcpctl provider vllm-local status # see state, pid, uptime mcplocal exposes three new endpoints: GET /llm/providers/:name/status → returns lifecycle state for managed providers, { managed: false } for unmanaged (anthropic, openai, …) POST /llm/providers/:name/start → calls warmup() (202 + initial state) POST /llm/providers/:name/stop → calls dispose() (200 + post-stop state) Stop and start return 400 for non-managed providers — stopping an API-key provider is meaningless. The CLI surfaces the error verbatim. Restarting mcplocal would also free the GPU but drops the SSE connection to mcpd and forces every virtual Llm to re-publish; this is the targeted, non-disruptive escape hatch. The completions test gained a `topLevelMarkers` filter so a sub-command named `status` (under `provider`) doesn't trip the existing "non-project commands must guard with __mcpctl_has_project" rule. Tests: cli 437/437, mcplocal 731/731. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:58:46 +01:00
michal	3071bcee8e	feat: v6 polish — per-publisher namespacing + auto-create project (#71 ) Some checks failed CI/CD / typecheck (push) Successful in 57s Details CI/CD / test (push) Successful in 1m13s Details CI/CD / lint (push) Successful in 2m51s Details CI/CD / smoke (push) Failing after 1m45s Details CI/CD / build (push) Successful in 5m47s Details CI/CD / publish (push) Has been skipped Details	2026-04-28 23:33:39 +00:00
michal	46697f4f63	feat: v5 durable inference task queue (#70 ) Some checks failed CI/CD / typecheck (push) Has been cancelled Details CI/CD / test (push) Has been cancelled Details CI/CD / lint (push) Has been cancelled Details CI/CD / smoke (push) Has been cancelled Details CI/CD / build (push) Has been cancelled Details CI/CD / publish (push) Has been cancelled Details	2026-04-28 23:33:36 +00:00
Michal	ee18c5107e	feat(mcpd): auto-create project on virtual-agent register (v6 Stage 2) Some checks failed CI/CD / typecheck (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m12s Details CI/CD / lint (pull_request) Successful in 3m0s Details CI/CD / smoke (pull_request) Failing after 1m44s Details CI/CD / build (pull_request) Successful in 6m41s Details CI/CD / publish (pull_request) Has been skipped Details Closes the v3-deferred "project must already exist" gap. When a virtual agent declares `project: "my-team"` and no such project exists, mcpd creates it idempotently with the publishing user as owner (instead of throwing 404 from registerVirtualAgents). ProjectService gains `ensureByName(name, ownerId, opts)` — find the project or create it with sensible defaults (description carries an audit note pointing at the registrar; proxyModel/gated take their schema defaults). First publisher to land on a name owns the row; subsequent publishers reuse the existing one. AgentService.registerVirtualAgents calls ensureByName instead of resolveAndGet, so the same agent register payload works regardless of whether the project pre-existed or not. Tests: 2 new tests (auto-creates a missing project on first publish; reuses an existing project without re-creating). Mock projects factory rebuilt to track _created names + maintain id→name reverse lookup so the agent's toView returns the correct project name (prior mock hardcoded 'mcpctl-dev'). Existing 13 virtual-agent tests + 870 mcpd suite green.	2026-04-28 15:54:27 +01:00
Michal	c346b93789	feat(mcplocal): per-publisher namespacing for virtual Llms/Agents (v6 Stage 1) Two mcplocals sharing the same config template (`vllm-local-qwen3`) no longer collide on mcpd's cluster-wide unique-name constraint. Each publisher can append a suffix derived from hostname (or any other stable per-host identifier) so the wire-side names become distinct (`vllm-local-qwen3-alice`, `vllm-local-qwen3-bob`). Pair with an explicit `poolName` (v4) and the rows still appear as one logical pool — agents pinned to any member load-balance across both. Config (`~/.mcpctl/config.json`): { "publisher": { "suffix": "auto" } // → os.hostname() sanitized // or { "suffix": "alice" } for explicit override } Or via env: `MCPCTL_PUBLISHER_SUFFIX=alice` (operations override). Resolution order: env var → config.publisher.suffix → empty (legacy behavior, no mangling). Sanitization lowercases, replaces non-`[a-z0-9-]` runs with `-`, strips leading/trailing dashes — the result must satisfy mcpd's name validation, otherwise the register POST would 422. Wire shape: RegistrarPublishedProvider gets an optional `publishName` field. When set, the wire payload's `name` is `publishName` (suffixed); when not, today's `provider.name`. Inbound infer/wake task lookups match `publishName ?? provider.name` so the local registry stays addressable by its original name — SSE frames carrying the suffixed wire name still find their provider. Agents are forwarded with their own suffixed name AND a `llmName` rewritten through the same per-local→wire map so the agent rows pin to the suffixed Llm wire name (otherwise registerVirtualAgents would 404). Tests: 8 new tests covering applyPublisherSuffix (empty, normal, length limit, exact-100) and loadPublisherSuffix (env override, absent, sanitization, dash stripping). Existing registrar tests untouched — no suffix means no behavior change.	2026-04-28 15:54:06 +01:00
Michal	7320b50dac	feat(cli+docs+smoke): inference-task CLI + GC ticker + smoke + docs (v5 Stage 4) Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m12s Details CI/CD / typecheck (pull_request) Successful in 2m46s Details CI/CD / smoke (pull_request) Failing after 1m44s Details CI/CD / build (pull_request) Failing after 7m0s Details CI/CD / publish (pull_request) Has been skipped Details CLI surface for the durable queue: - `mcpctl get tasks` — table view (ID, STATUS, POOL, LLM, MODEL, STREAM, AGE, WORKER). Aliases `task`, `tasks`, `inference-task`, `inference-tasks` all normalize to the canonical plural so URL construction works uniformly. RESOURCE_ALIASES + completions generator updated. - `mcpctl chat-llm <name> --async -m <msg>` — enqueue and exit. stdout is just the task id (pipeable into `xargs mcpctl get task`); stderr carries human-readable status. REPL mode is rejected for --async (fire-and-forget doesn't make sense without -m). GC ticker in mcpd: 5-min interval. Pending tasks past 1 h queue timeout flip to error with a clear message; terminal tasks past 7 d retention get deleted. Both queries are index-backed. Crash fix uncovered by the smoke: when the async route doesn't await ref.done, a later cancel/error rejected the in-flight Promise as unhandled and crashed mcpd. The route now attaches a no-op `.catch` so the legacy `done` semantic still works for sync callers (chat, direct infer) without taking out the process for async ones. The EnqueueInferOptions also gained an explicit `ownerId` field so the async API can stamp the authenticated user on the row instead of inheriting 'system' from the constructor's resolveOwner — without this, every GET/DELETE from the original caller would 404 due to foreign-owner mismatch. Smoke (tests/smoke/inference-task.smoke.test.ts): 1. POST /inference-tasks while no worker bound → row=pending. 2. Bring a registrar online → bindSession drain claims and dispatches → worker complete()s → row=completed → GET returns the assistant body. 3. Stop worker, enqueue, DELETE → row=cancelled, persisted. docs/inference-tasks.md (new): full data model, lifecycle diagram, async API reference, CLI examples, RBAC table, GC defaults, and the v5 limitations / v6 roadmap. Cross-linked from virtual-llms.md and agents.md. Tests + smoke: mcpd 893/893, mcplocal 723/723, cli 437/437, full smoke 146/146 (was 144, +2 new task smoke). Live mcpd verified via manual curl: enqueue → cancel → re-fetch — no crash, owner scoping returns 404 on foreign ids, GC ticker logs at info when it sweeps. v5 complete: durable queue (Stage 1) + VirtualLlmService rewire (Stage 2) + async API & RBAC (Stage 3) + CLI/GC/smoke/docs (Stage 4).	2026-04-28 15:25:09 +01:00
Michal	1dcfdc8b05	feat(mcpd): async inference task API + tasks RBAC resource (v5 Stage 3) Exposes the durable queue (Stage 1+2) as a first-class API so callers can enqueue work, get a task id immediately, and poll/stream/cancel without holding open the original HTTP connection. New endpoints (`/api/v1/inference-tasks`): POST / → enqueue, return task id (201 + row). failFast:false — task stays pending if no worker is up; future bindSession drains. Rejects 400 for public Llms (the existing /llms/<name>/infer is the right tool there) and 404 for missing Llms. GET / → list owner's tasks. Optional ?status, ?poolName, ?agentId, ?limit query. Owner-scoped at the route layer; cross- user listing requires resource-wide grant. GET /:id → poll one task. 404 (not 403) on a foreign-owner id to prevent enumeration. DELETE /:id → cancel a non-terminal task. Already- terminal rows return 200 + current shape (no-op). 404 on foreign owner. GET /:id/stream → SSE feed of `chunk` and `terminal` events. Re-fetches the row at subscribe time so already-completed tasks emit one terminal event and close immediately. RBAC: - New `tasks` resource added to RBAC_RESOURCES + the URL→permission map in main.ts. Default action mapping: GET=view, POST=create, DELETE=delete. The route layer enforces owner-scoping ON TOP of the hook (404 on foreign owner) — without this, anyone with `view:tasks` could list/peek every user's queued work. - Singular alias `task` and the multi-word `inference-task` / `inference-tasks` all normalize to `tasks` so users can write `mcpctl create rbac-binding --resource task --role view ...` or any of the variants and have it map correctly. Tests: 9 new route tests covering the wire shapes, owner scoping (matching/foreign), public-Llm rejection, missing-Llm 404, list filter, and cancel semantics (pending→cancelled, terminal→no-op). mcpd 893/893 (was 884, +9). Live smoke: POST against a public Llm returns the documented 400, POST against missing returns 404, GET list returns [] cleanly. Stage 4 (next): CLI surface (`mcpctl get tasks`, `--async` flag on chat-llm), GC ticker, smoke test (enqueue → connect worker → drain), docs.	2026-04-28 15:06:31 +01:00
Michal	7b18bb6d6b	feat(mcpd): VirtualLlmService rewires through durable queue (v5 Stage 2) The in-memory `tasksById` map for inference tasks is gone. Every inference call lands as a row in `InferenceTask`; the result POST updates the row + emits a wakeup; the in-flight HTTP handler unblocks on the wake. mcpd surviving a restart no longer drops in-flight tasks, and a worker disconnecting mid-task no longer fails the caller — the row reverts to pending and a sibling worker on the same pool drains it. Wake tasks (publisher control messages, not inference) keep their own small in-memory map (`wakeTasks`). They're millisecond-scoped and don't benefit from durability — a missed wake on restart just means the next infer fires a fresh wake. Behavioral changes worth flagging: - Worker disconnect mid-task: WAS reject ref.done with "publisher disconnected"; NOW revert claimed/running rows to pending. Original caller's ref.done keeps waiting up to INFER_AWAIT_TIMEOUT_MS (10 min); whichever worker delivers the result fulfills it. - bindSession drains pending tasks for the session's pool keys. So tasks queued while no worker was up automatically get dispatched when one shows up. The drain matches by effective pool key (poolName ?? name) — tasks queued against vllm-alice get drained by any session whose owned Llms share alice's pool. - New `failFast: true` option on enqueueInferTask (default: false). Existing callers that NEED fast-fail get it explicitly: - Direct `/api/v1/llms/<name>/infer` route: caller pinned a specific Llm and wants 503 immediately if the publisher is offline; queueing for an unknown future worker would surprise. - chat.service pool failover loop: it iterates pool candidates and needs each candidate's transport failure to surface fast. Without failFast, a downed pool member would absorb the call into the queue and the loop would wait 10 min before trying the next. The async API route (Stage 3) leaves failFast=false — that's the whole point of the durable queue path. - VirtualLlmService now requires an InferenceTaskService dep at construction. Older test wirings that didn't pass it get a clear "InferenceTaskService not wired" error from enqueueInferTask rather than a confusing in-memory stub. Tests: - 12 existing virtual-llm-service tests updated for the new semantics: "rejects when no session" → "queues durably"; "rejects when row inactive" → "still queues (pool may have a sibling)"; "unbindSession rejects in-flight tasks" → "reverts to pending". Wake-task probing now uses `wakeTasks` instead of `tasksById`. - 3 new v5-specific tests: drain-on-bind matches by effective pool key (not just name); enqueue without a session keeps the row pending; completeTask via the result-route updates the DB and emits the wakeup that resolves ref.done. - chat-service-virtual-llm + llm-infer-route assertions updated to expect the new {failFast: true} option arg. mcpd 884/884 (was 881; +3 v5 cases). mcplocal 723/723. Full smoke suite 144/144 against the deployed queue-backed mcpd. Stage 3 (next): expose the durable queue via async API endpoints. POST /api/v1/inference-tasks (enqueue with failFast=false), GET /api/v1/inference-tasks/:id (poll), GET /api/v1/inference-tasks/:id/stream (SSE), DELETE /api/v1/inference-tasks/:id (cancel). New `tasks` RBAC resource.	2026-04-28 02:33:26 +01:00
Michal	ed21ad1b5a	feat(mcpd+db): durable InferenceTask queue + state machine (v5 Stage 1) The persistence + signaling layer for v5. No integration with the existing in-flight inference path yet — that's Stage 2. This commit just lands the durable queue underneath, with a state machine that mcpd's HTTP handlers, the worker result-POST route, and the GC sweep will all build on. Schema (src/db/prisma/schema.prisma + migration): - New `InferenceTask` model + `InferenceTaskStatus` enum (pending\|claimed\|running\|completed\|error\|cancelled). - Routing fields stored at enqueue time so a later rename of `Llm.poolName` doesn't reroute already-queued work: `poolName` (effective pool key), `llmName` (pinned target), `model`, `tier`. - Worker tracking: `claimedBy` (providerSessionId) + `claimedAt`, cleared on revert. - Bodies as `Json`: requestBody (always set), responseBody (set at completion). Streaming chunks are NOT persisted — too expensive at delta granularity. The final assembled body lands once per task. - Lifecycle timestamps: createdAt, claimedAt, streamStartedAt, completedAt. Plus ownerId (RBAC + audit) and agentId (null for direct chat-llm calls). - Indexes for the hot paths: (status, poolName) for the dispatcher's drain query, claimedBy for the disconnect revert, completedAt for the GC retention sweep, owner/agent for the async API listing. Repository (src/mcpd/src/repositories/inference-task.repository.ts): - CRUD + state transitions as conditional CAS via `updateMany`. Two workers racing to claim the same row both run the UPDATE; whichever the DB serializes first sees affected=1 and gets the row, the loser sees 0 and falls through to the next candidate. No application- level locking required. - findPendingForPools(poolNames[]) for the worker drain on bind. - findHeldBy(claimedBy) for the unbindSession revert. - findStalePending + findExpiredTerminal for the GC sweep. Service (src/mcpd/src/services/inference-task.service.ts): - Owns the in-process EventEmitter that wakes blocked HTTP handlers when a worker POSTs results. The DB row is the source of truth for state; the EventEmitter just signals "go re-read row X" so we don't have to poll. Single-instance assumption for v5; pg LISTEN/NOTIFY is the v6 swap when scaling horizontally — no schema change needed, just replace the emitter wakeup. - waitFor(taskId, timeoutMs) returns { done, chunks }: the terminal promise + an async iterator of streaming deltas. Throws on cancel (clear message) or error (worker's errorMessage propagates) or timeout. Polls the row once at subscribe time so an already- terminal task resolves immediately without waiting for an event that's never coming. - gcSweep flips stale pending rows to error (with a clear message about the timeout) and deletes terminal rows past retention. Defaults: 1h pending timeout, 7d terminal retention; both configurable. Tests: - 6 db-level schema tests (defaults, json roundtrip, drain query shape, claimedBy filter, GC predicate, agentId nullable). - 13 service tests covering enqueue, the CAS race on tryClaim, complete/fail/cancel, idempotent terminal transitions, revertHeldBy on disconnect, and the full waitFor signal lifecycle (immediate resolve, wake on event, chunk streaming, cancel/error/timeout paths). Plus a gcSweep test with a fixed clock. mcpd 881/881 (was 868; +13). db pool-schema 14/14, +6 new inference-task-schema. Pre-existing failures in models.test.ts (Secret FK fixture issue, also fails on main HEAD) are unrelated. Stage 2 (next): VirtualLlmService rewires through this — remove the in-memory pendingTasks map; enqueue creates a row, dispatch picks an active session, the result-route updates the row + emits the wakeup. Worker disconnect reverts; worker bind drains.	2026-04-28 02:14:45 +01:00
michal	256e117021	Merge pull request 'feat: v4 LB pools by shared poolName' (#69 ) from feat/llm-pool-by-name into main Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / typecheck (push) Successful in 2m45s Details CI/CD / smoke (push) Failing after 1m43s Details CI/CD / build (push) Successful in 5m54s Details CI/CD / publish (push) Has been skipped Details Reviewed-on: #69	2026-04-28 01:02:45 +00:00
Michal	137711fdf6	feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3) Some checks failed CI/CD / lint (pull_request) Successful in 53s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / typecheck (pull_request) Successful in 2m53s Details CI/CD / smoke (pull_request) Failing after 1m47s Details CI/CD / build (pull_request) Successful in 6m20s Details CI/CD / publish (pull_request) Has been skipped Details Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars publish virtual Llms with distinct names but a shared poolName, then: 1. /api/v1/llms/<name>/members surfaces both with the correct effective pool key, size, activeCount, and per-member kind/status. 2. Chat through an agent pinned to one pool member dispatches across the pool — verified by running 12 calls and asserting at least one response from each backend (the random-shuffle selection would have to hit only-A or only-B in 12 fair coin flips, ~1/2048). 3. Failover: stop one publisher, the surviving member still serves chat. /members shows the stopped row as inactive immediately (unbindSession runs synchronously on SSE close). docs/virtual-llms.md gets a full "LB pools (v4)" section with the two-field schema model, dispatcher selection + failover semantics, public + virtual declaration examples, list/describe rendering, the "pin to specific instance" escape hatch, and an API surface entry for /members. docs/agents.md cross-link extended. Tests: full smoke 144/144 (was 141, +3 for the new pool smoke). Stages 1-3 ship the complete v4 — public and virtual Llms can both join pools, agents transparently load-balance across them, yaml round-trip preserves poolName, and the existing single-Llm world keeps working byte-identically when poolName is null.	2026-04-27 23:22:15 +01:00
Michal	e21f96080d	feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2) Surfaces the v4 pool model end-to-end: - mcpd: GET /api/v1/llms/:name/members returns the effective pool the named anchor belongs to, plus aggregate stats (size, activeCount, explicit vs implicit pool key). RBAC inherits from `view:llms` — same as the single-Llm route. Members are full LlmView shapes so callers don't need a second roundtrip to render the pool block. - mcpd: VirtualLlmService.register accepts an optional `poolName` on RegisterProviderInput; the route's `coerceProviderInput` validates the same character set as CreateLlmSchema.poolName. Backwards compatible — older mcplocals that don't send the field continue to publish solo Llms. - CLI `get llm` table: new POOL column right after NAME. Solo rows show "-" so the "no pool / pool of 1" case is unambiguous (per user direction "make sure we see it, prominently visible and impossible to mistake"). - CLI `describe llm`: fetches /members and renders a Pool block at the top of the detail view when the row is in an explicit pool OR when its implicit pool has size > 1. Each member line shows kind/status; the anchor row gets "← this row". Block is suppressed for solo rows so describe stays compact in the common case. - CLI `create llm --pool-name <name>` flag and apply schema both accept the new field. Yaml round-trip preserves it: get -o yaml emits `poolName: <name>`, apply -f re-imports it without diff. Verified end-to-end against the live mcpd. - mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts and registrar.ts thread it through into the register payload. Use case for distributed inference: each user's mcplocal picks a unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName` (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool that auto-grows as workers come online. - Shell completions: regenerated from source via the existing scripts/generate-completions.ts. `--pool-name` now suggests in fish + bash for `mcpctl create llm`. Tests: +3 new mcpd route tests for /members (explicit pool, solo pool of 1, missing-anchor 404). All suites green: mcpd 868/868 (was 865, +3), mcplocal 723/723, cli 437/437. Stage 3 (next): live smoke against 2 publishers sharing a pool name + docs.	2026-04-27 23:18:53 +01:00
Michal	7949e1393d	feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1) Adds LB-pool-by-shared-name without introducing a new resource. The existing `Llm.name` stays globally unique; a new optional `poolName` column declares membership in a pool. Multiple Llms sharing a non-null `poolName` stack into one load-balanced pool that the chat dispatcher expands at request time. Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are addressable as a "pool of 1" via their own name, so existing single-Llm agents and YAMLs keep working unchanged. A solo row whose name happens to match an explicit poolName joins the same pool — by design — so an operator can transparently promote an existing Llm to pool seed. Dispatcher (chat.service): prepareContext now resolves a randomly- shuffled list of viable pool candidates (status != inactive) once per turn. runOneInference and streamInference iterate the list on transport-level failure (network, virtual publisher disconnect) until one succeeds or the list is exhausted. Streaming failover only covers "failed before first chunk" — once we've yielded text, we're committed to that backend. Auth/4xx errors surfaced as result.status are NOT retried; siblings with the same key/model would fail identically. When the agent's pinned Llm is itself inactive but a sibling pool member is up, dispatch transparently uses the sibling — that's the whole point. When every member is inactive, prepareContext throws a clear "No active Llm in pool '<key>' (pinned: <name>)" error rather than letting the dispatcher's "exhausted" branch surface it. Tests: - 5 new chat-service tests for pool dispatch / failover / pinned-down / all-inactive (chat-service.test.ts). - 7 new db schema tests for the column, the unique-name invariant, the fallback-to-name semantics, and the solo-name-joins-explicit-pool edge case (llm-pool-schema.test.ts). - mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions. Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool stats on the existing single-Llm route, CLI POOL column + describe block + --pool-name flag, yaml round-trip.	2026-04-27 22:02:41 +01:00
michal	c0b4dc89f3	Merge pull request 'chore: fulldeploy uses bao-backed pulumi wrapper for drift check' (#68 ) from chore/fulldeploy-pulumi-wrapper into main Some checks failed CI/CD / lint (push) Successful in 54s Details CI/CD / test (push) Successful in 1m8s Details CI/CD / typecheck (push) Successful in 2m23s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 5m46s Details CI/CD / publish (push) Has been skipped Details Reviewed-on: #68	2026-04-27 20:21:33 +00:00
Michal	7f49294b36	chore(fulldeploy): use kubernetes-deployment/scripts/pulumi.sh wrapper Some checks failed CI/CD / lint (pull_request) Successful in 2m22s Details CI/CD / typecheck (pull_request) Successful in 2m57s Details CI/CD / test (pull_request) Failing after 4m36s Details CI/CD / smoke (pull_request) Has been skipped Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish (pull_request) Has been skipped Details The pre-flight drift check now calls the bao-backed pulumi wrapper that landed with the litellm key persistence work, so deploys no longer need PULUMI_CONFIG_PASSPHRASE in .env or shell env. The passphrase is fetched from OpenBao at runtime by the wrapper and exec-passed to pulumi only — never touches the parent shell's state. Falls back to a clear warning if the wrapper isn't present (older clone of kubernetes-deployment) instead of pretending to skip the check silently.	2026-04-27 19:14:36 +01:00
michal	f5bdeea8e7	Merge pull request 'feat: virtual agents v3 (Stages 1-3) + real fixes for chat/adapter/CLI thread format' (#67 ) from feat/virtual-agent-v3 into main Some checks failed CI/CD / typecheck (push) Successful in 55s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / lint (push) Successful in 2m32s Details CI/CD / smoke (push) Failing after 1m44s Details CI/CD / build (push) Successful in 5m4s Details CI/CD / publish (push) Has been skipped Details Reviewed-on: #67	2026-04-27 18:06:59 +00:00
Michal	1998b733b2	feat(cli+docs): mcpctl get agent KIND/STATUS columns + virtual-agent smoke + docs (v3 Stage 4) Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m10s Details CI/CD / typecheck (pull_request) Successful in 2m30s Details CI/CD / build (pull_request) Successful in 2m36s Details CI/CD / smoke (pull_request) Failing after 5m56s Details CI/CD / publish (pull_request) Has been skipped Details CLI: `mcpctl get agent` table view gains KIND and STATUS columns mirroring the `get llm` shape from v1. Public agents render as `public/active` (the AgentRow defaults) and virtual ones surface their true lifecycle state, so `mcpctl get agent` becomes a single-pane view for both manually-created and mcplocal-published personas. Smoke: tests/smoke/virtual-agent.smoke.test.ts mirrors virtual-llm's in-process registrar pattern — publishes a fake provider + agent in one round-trip, confirms mcpd surfaces the agent kind=virtual / status=active under /api/v1/agents, then disconnects and verifies the paired Llm-and-Agent both flip to inactive (deletion is GC-driven, not disconnect-driven, so the rows must still exist post-stop). Heartbeat- stale and 4 h sweep paths are covered by the unit suite to keep smoke duration in check. Docs: docs/virtual-llms.md gets a "Virtual agents (v3)" section with a config sample, lifecycle notes, listing example, and the cluster-wide name-uniqueness caveat. The API surface block now mentions the new `agents[]` field on _provider-register, the join-by-session heartbeat behavior, and the `GET /api/v1/agents` lifecycle fields. docs/agents.md gains a one-paragraph note pointing to the v3 publishing path. Tests: full smoke suite 141/141 (was 139, +2 new), unit suites unchanged (mcpd 860/860, mcplocal 723/723).	2026-04-27 18:47:03 +01:00
Michal	610808b9e7	fix(chat): real fixes for thinking-model + URL conventions, not test tweaks Some checks failed CI/CD / lint (pull_request) Successful in 54s Details CI/CD / test (pull_request) Successful in 1m7s Details CI/CD / typecheck (pull_request) Successful in 2m37s Details CI/CD / smoke (pull_request) Failing after 1m43s Details CI/CD / build (pull_request) Successful in 5m42s Details CI/CD / publish (pull_request) Has been skipped Details Five real bugs surfaced by the agent-chat smoke against live qwen3-thinking. None of these are fixed by changing the test — the test was right to fail. 1. openai-passthrough adapter doubled `/v1` in the request URL. The adapter hard-codes `/v1/chat/completions` after the configured base, but every OpenAI-compat provider documents its base URL with a trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users pasting that conventional shape produced `https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style) is preserved. 2. Non-streaming chat returned an empty assistant when thinking models (qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only `reasoning_content` with `content: null`. extractChoice now also pulls reasoning (every spelling the streaming parser already knows about), and a new pickAssistantText helper falls back to it when content is empty. A `[response truncated by max_tokens]` marker is appended when finish_reason is `length`, so users see the cut-off instead of guessing why the answer is short. Symmetric streaming fix: the chatStream loop accumulates reasoning and yields ONE synthesized `text` frame at the end when content stayed empty, keeping the CLI's stdout (which only prints `text` deltas) in sync with the persisted thread message. 3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3 lifecycle field) instead of `kind: agent` (apply envelope), so round-tripping through `apply -f` failed. Same fix shape as the v1 Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/ inactiveSince/providerSessionId for the agents resource too. 4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on stderr; streaming printed `(thread: <cuid>)` (with space). Tests and any other regex watching for one form missed the other. Standardize on `thread: <cuid>` (single space) in both paths. 5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on success — making any `expect(stderr).toMatch(...)` assertion structurally impossible to satisfy in the happy path. Switch to `spawnSync` so stderr is actually captured. Includes a small shell-style argv splitter so the existing call sites with quoted multi-word values (`--system-prompt "..."`) keep working. Tests: +6 new mcpd unit tests (4 chat-service for the reasoning fallback / truncation marker / content-preference / streaming synth; 2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd + mcplocal + smoke green: 860/860 + 723/723 + 139/139.	2026-04-27 18:39:01 +01:00
Michal	58bc277242	feat(mcpd+mcplocal): register-agents endpoint + mcplocal agents block (v3 Stage 3) Extends the existing `_provider-register` payload with an optional `agents` array so a single round-trip atomically publishes both virtual Llms and their pinned virtual Agents. v1/v2 publishers (providers-only) keep working unchanged — the agents path is gated on the route receiving an AgentService instance, otherwise it logs a warning and ignores the array. mcplocal config gains a top-level `agents` block (loadLocalAgents) mirroring the providers shape. The registrar reads it, builds RegistrarPublishedAgent entries against the published provider names, and folds them into the same register POST. mcpd routes the agents through AgentService.registerVirtualAgents(sessionId, ..., ownerId), which was added in Stage 2. No CLI changes here — `mcpctl chat <virtual-agent>` already works once chat.service has the kind=virtual branch (Stage 1) and the agents are present in the Agent table. CLI columns + smoke land in Stage 4.	2026-04-27 18:38:37 +01:00
Michal	c7b1bd8e2c	feat(mcpd): AgentService virtual methods + GC cascade (v3 Stage 2) State machine for kind=virtual Agent rows. Mirrors what VirtualLlmService did for Llms in v1, then wires both lifecycles together so disconnect/heartbeat/GC cascade through both at once. AgentRepository: - create/update accept the new lifecycle fields (kind, providerSessionId, status, lastHeartbeatAt, inactiveSince). - Adds findBySessionId, findByLlmId, findStaleVirtuals, findExpiredInactives. AgentService — new virtual-agent methods: - registerVirtualAgents(sessionId, inputs, ownerId) — sticky upsert. New names insert as kind=virtual/status=active. Existing virtuals owned by the same session reactivate; existing inactive virtuals from a foreign session can be adopted (sticky reconnect). Refuses to overwrite a public agent or a foreign session's still-active virtual (HTTP 409). Pinned LLM is resolved via LlmService — caller posts Llms first. - heartbeatVirtualAgents(sessionId) — bumps owned agents on a session heartbeat; revives inactive rows. - markVirtualAgentsInactiveBySession(sessionId) — disconnect cascade. - deleteVirtualAgentsForLlm(llmId) — defensive cascade for the GC's Llm-delete step (Agent.llmId is Restrict). - gcSweepVirtualAgents() — same shape as VirtualLlmService.gcSweep (90s heartbeat-stale → inactive, 4h inactive → delete). VirtualLlmService: - Optional AgentService dependency. heartbeat() now also bumps owned agents; unbindSession() flips them inactive. gcSweep() runs the agent sweep FIRST (so any agent that would block an Llm delete via Restrict is already gone), and adds a defensive deleteVirtualAgentsForLlm step right before each Llm delete in case an agent's heartbeat lagged its Llm's just enough to escape this round's 4h cutoff. main.ts: - VirtualLlmService construction moves below AgentService so it can receive the cascade dependency. Tests: 13 new in virtual-agent-service.test.ts cover all the register variants (insert, sticky reconnect, adopt-inactive-foreign, refuse public-overwrite, refuse foreign-session-active), heartbeat-revive, disconnect-cascade, deleteVirtualAgentsForLlm scope, GC sweep flip + delete + idempotence, and three VirtualLlmService cascade scenarios (unbindSession, gcSweep deleting agent before Llm, defensive cascade when agent's heartbeat lagged). mcpd suite: 854/854 (was 841 + 13 new). Workspace unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:07:23 +01:00
Michal	9afd24a3aa	feat(db+mcpd): Agent lifecycle + chat.service kind=virtual branch (v3 Stage 1) Two pieces of v3 plumbing — schema + the latent v1 chat.service bug. Schema (db): - Agent gains kind/providerSessionId/lastHeartbeatAt/status/inactiveSince mirroring Llm's v1 lifecycle. Reuses LlmKind / LlmStatus enums; no new types. Existing rows backfill kind=public/status=active so v1 CRUD is unaffected. - @@index([kind, status]) for the GC sweep, @@index([providerSessionId]) for disconnect-cascade lookups. - 4 new prisma-level tests cover defaults, persisting virtual fields, the (kind, status) GC index, and providerSessionId lookups. Total agent-schema tests: 20/20. chat.service (mcpd) — fixes the v1 latent bug: - LlmView's kind is now plumbed through prepareContext as ctx.llmKind. - Two new private helpers, runOneInference / streamInference, branch on ctx.llmKind: 'public' goes through the existing adapter registry, 'virtual' relays through VirtualLlmService.enqueueInferTask (mirrors the route-handler branch from v1 Stage 3). - Streaming bridges VirtualLlmService's onChunk callback API to an async iterator via a small queue + wake pattern. - ChatService gains an optional virtualLlms constructor parameter; main.ts wires it in. Older test wirings without it raise a clear "virtualLlms dispatcher not wired" error when the row is virtual, rather than silently falling through to the public path against an empty URL. This unblocks any Agent (public OR future v3-virtual) pinned to a kind=virtual Llm. Pre-this-stage, those agents 502'd against the empty url field. Tests: 4 new chat-service-virtual-llm.test.ts cover the relay path non-streaming, streaming, missing-dispatcher error, and rejection surfacing. mcpd suite: 841/841 (was 833, +8 across stages 1+v3-Stage-1). Workspace: 2054/2054 across 153 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:07:23 +01:00
michal	9374a2652b	perf: vitest threads pool + Dockerfile pnpm cache mount (#66 ) Some checks failed CI/CD / lint (push) Successful in 58s Details CI/CD / test (push) Successful in 1m11s Details CI/CD / typecheck (push) Successful in 2m35s Details CI/CD / smoke (push) Failing after 1m43s Details CI/CD / build (push) Successful in 2m21s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 16:07:05 +00:00
Michal	18245be0c1	perf: vitest threads pool + Dockerfile pnpm cache mount Some checks failed CI/CD / typecheck (pull_request) Successful in 56s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / lint (pull_request) Successful in 2m40s Details CI/CD / smoke (pull_request) Failing after 1m43s Details CI/CD / build (pull_request) Failing after 7m6s Details CI/CD / publish (pull_request) Has been skipped Details Two tuning knobs that were leaving most of the host idle: 1) vitest.config.ts pool=threads with maxThreads ≈ cores/2. Default left this 64-core workstation at ~10% CPU during \`pnpm test:run\`. Threads pool uses the box: same 152-file/2050-test suite now runs at ~700% CPU instead of ~150%. Wall time gain is modest (workload is dominated by a handful of slow individual files that one thread must run serially), but the parallel headroom is there for when the suite grows. Cap = max(2, cores/2) keeps laptops reasonable; override with \`VITEST_MAX_THREADS=N\` in the env. 2) Dockerfile.mcpd uses BuildKit cache mounts on both pnpm install steps. Adds \`# syntax=docker/dockerfile:1.6\` and a \`--mount=type=cache,target=/root/.local/share/pnpm/store\` so pnpm's content-addressed store survives across image rebuilds. Cold rebuilds where the lockfile changed are unaffected; warm rebuilds where only source changed drop the install step from ~60s to <5s. fulldeploy.sh's mcpd image rebuild gets that back minus the docker push hash mismatch. Test parity: 2050/2050 across 152 files; per-package mcpd 837/837. Both unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:06:39 +01:00
michal	45c7737ee1	feat: virtual LLMs v2 (wake-on-demand) (#65 ) Some checks failed CI/CD / lint (push) Successful in 54s Details CI/CD / test (push) Successful in 1m12s Details CI/CD / typecheck (push) Successful in 2m42s Details CI/CD / smoke (push) Failing after 1m43s Details CI/CD / build (push) Successful in 2m33s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 14:20:59 +00:00
Michal	e0cfe0ba4d	feat: virtual-LLM v2 smoke + docs (v2 Stage 3) Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / typecheck (pull_request) Successful in 2m43s Details CI/CD / smoke (pull_request) Failing after 1m44s Details CI/CD / build (pull_request) Successful in 5m28s Details CI/CD / publish (pull_request) Has been skipped Details Closes v2 (wake-on-demand). Same shape as v1's stage 6: smoke exercises the live-cluster path, docs lose the "v2 reserved" caveat and gain a full wake-recipe section. Smoke (virtual-llm.smoke.test.ts): - New "wake-on-demand" describe block runs alongside the v1 tests. - Spins a tiny in-process HTTP "wake controller"; the published provider's isAvailable() returns false until the wake POST flips the bool. Asserts: 1. Provider publishes as kind=virtual / status=hibernating. 2. First inference triggers the wake recipe, the recipe POSTs to the controller, the provider becomes available, mcpd relays the inference, and the row settles to active. - Cleans up the row + wake server in afterAll. Docs (docs/virtual-llms.md): - Lifecycle table updates the `hibernating` description from "reserved for v2" to the actual v2 semantics. - New "Wake-on-demand (v2)" section: configuration shapes for both recipe types (http + command), the wake-then-infer flow diagram, concurrent-infer dedup, failure semantics. - Roadmap drops v2; v3-v5 still listed. Workspace: 2050/2050 (smoke runs separately; the new SSE-based wake test runs only against a live cluster, not under \`pnpm test:run\`). v2 closes. v3 = virtual agents, v4 = LB pool by model, v5 = queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:20:18 +01:00
Michal	db839afc57	feat(mcpd): wake-before-infer for hibernating virtual LLMs (v2 Stage 2) Second half of v2. mcpd now dispatches a \`wake\` task on the SSE control channel when an inference request hits a row whose status=hibernating, waits for the publisher to confirm readiness, then proceeds with the infer task. Concurrent infers for the same hibernating Llm share a single wake task — \`wakeInFlight\` map dedupes by Llm name. State machine in enqueueInferTask: active → push infer task immediately (existing path). inactive → 503, publisher offline (existing path). hibernating → ensureAwake() → push infer task (new in v2). ensureAwake/runWake (private): - Allocates a fresh taskId on the existing PendingTask plumbing. - Pushes \`{ kind: "wake", taskId, llmName }\` on the SSE handle. - Awaits the publisher's result POST. On 2xx, flips the row to active + bumps lastHeartbeatAt, so all queued + future infers hit the active path. On non-2xx or service.failTask, the row stays hibernating (next request retries). Tests: 4 new in virtual-llm-service.test.ts cover happy path (wake → infer in order), concurrent dedup (3 parallel infers, 1 wake task), wake failure surfaces to all queued infers and leaves the row hibernating, inactive ≠ hibernating (still rejects with 503, no wake attempt). 22/22 service tests, 2050/2050 workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:18:24 +01:00
Michal	af0fabd84f	feat(mcplocal+mcpd): wake-recipe config + wake-task execution (v2 Stage 1) First half of v2 — mcplocal can now declare a hibernating backend and respond to a `wake` task by running a configured recipe. v2 Stage 2 will wire mcpd to dispatch the wake task before relaying inference. Config (LlmProviderFileEntry): - New \`wake\` block on a published provider: wake: type: http # or: command url: ... # http only method: POST # http only, default POST headers: {...} # http only body: ... # http only command: ... # command only args: [...] # command only maxWaitSeconds: 60 # how long to poll isAvailable() after wake fires Registrar (mcplocal): - At publish time, providers with a wake recipe whose isAvailable() returns false report initialStatus=hibernating to mcpd. Without a wake recipe (legacy v1) or when already up, status stays active. - handleWakeTask: runs the recipe (HTTP request OR child-process spawn), then polls isAvailable() up to maxWaitSeconds, sending a heartbeat each loop so mcpd's GC sweep doesn't time us out mid-boot. Reports { ok, ms } on success or { error } on timeout/recipe failure via the existing _provider-task/:id/result. - Replaces the v1 stub that rejected wake tasks with "not implemented". mcpd VirtualLlmService: - RegisterProviderInput gains optional initialStatus ('active' \| 'hibernating'). The register/upsert path uses it for both new and reconnecting rows. Defaults to 'active' so v1 publishers still work unchanged. - Provider-register route's coercer accepts the new field. Tests: 3 new in registrar.test.ts cover initialStatus selection (hibernating when wake configured + unavailable, active otherwise, active when no wake even if unavailable). 8/8 registrar tests, 833/833 mcpd unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:15:46 +01:00
michal	700d1683c2	fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML (#64 ) Some checks failed CI/CD / lint (push) Successful in 56s Details CI/CD / test (push) Successful in 1m11s Details CI/CD / typecheck (push) Successful in 2m49s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 3m10s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 13:47:18 +00:00
Michal	2a44f60785	fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m12s Details CI/CD / typecheck (pull_request) Successful in 2m59s Details CI/CD / smoke (pull_request) Failing after 1m44s Details CI/CD / build (pull_request) Successful in 6m35s Details CI/CD / publish (pull_request) Has been skipped Details The smoke test \`llm.smoke > round-trips yaml output → apply -f\` failed after v1 of the virtual-LLM feature: \`mcpctl get llm <name> -o yaml\` output now starts with \`kind: public\` (the new schema column) instead of \`kind: llm\` (the apply-doc envelope), because toApplyDocs spread the cleaned item AFTER setting the kind, so the cleaned item's \`kind\` overwrote. Fix: in toApplyDocs, when serialising the \`llms\` resource, drop the new lifecycle fields (kind, status, lastHeartbeatAt, inactiveSince, providerSessionId) before merging. They collide with the apply-doc envelope and aren't apply-able anyway — they're derived runtime state owned by VirtualLlmService. Public-LLM round-trip is now byte-clean (those fields default to public/active anyway). Virtual rows are created by the registrar, not via apply -f, so dropping them on output is the right call. CLI suite: 437/437. Smoke will re-run against the live mcpd via scripts/release.sh after merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:47:00 +01:00
michal	65b6b265d9	feat: virtual LLMs v1 (registration skeleton) (#63 ) Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m12s Details CI/CD / typecheck (push) Successful in 2m13s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 4m50s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 13:38:50 +00:00
Michal	866f6abc88	feat: virtual-LLM smoke test + docs (v1 Stage 6) Some checks failed CI/CD / typecheck (pull_request) Successful in 53s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / lint (pull_request) Successful in 2m6s Details CI/CD / smoke (pull_request) Failing after 1m39s Details CI/CD / build (pull_request) Successful in 2m11s Details CI/CD / publish (pull_request) Has been skipped Details Final stage of v1. Smoke (mcplocal/tests/smoke/virtual-llm.smoke.test.ts): - Spins an in-process LlmProvider that returns canned content. - Runs the registrar against the live mcpd in fulldeploy. - Asserts: row appears with kind=virtual / status=active, infer through /api/v1/llms/<name>/infer comes back through the SSE relay with the provider's content + finish_reason, and a 503 appears immediately after registrar.stop() (publisher offline). - Times out / cleanup paths idempotent so re-runs against the same cluster don't litter rows. The 90-s heartbeat-stale flip and 4-h GC are unit-tested — too slow for smoke. Docs: - New docs/virtual-llms.md: when to use this vs creating a regular Llm row, how to opt-in via publish: true, the lifecycle table, the inference-relay sequence, the v1 streaming caveat, the v2-v5 roadmap, and the full /api/v1/llms/_provider-* surface. - agents.md cross-links virtual-llms.md alongside personalities/chat. - README's Agents section gains a "Virtual LLMs" subsection. Workspace suite: 2043/2043 (smoke files run separately). v1 closes. Stage roadmap (each its own future PR): v2 wake-on-demand · v3 virtual agents · v4 LB pool · v5 task queue Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:28:43 +01:00
Michal	7e6b0cab44	feat(cli): mcpctl chat-llm + KIND/STATUS columns (v1 Stage 5) Closes the loop on user-facing surface: $ mcpctl get llm NAME KIND STATUS TYPE MODEL TIER KEY ID qwen3-thinking public active openai qwen3-thinking fast ... ... vllm-local virtual active openai Qwen/Qwen2.5-7B-Instruct fast - ... $ mcpctl chat-llm vllm-local ──────────────────────────────────────── LLM: vllm-local openai → Qwen/Qwen2.5-7B-Instruct-AWQ Kind: virtual Status: active ──────────────────────────────────────── > hello? Hi! … New: chat-llm command (commands/chat-llm.ts) - Stateless chat with any mcpd-registered LLM. No threads, no tools, no project prompts. POSTs to /api/v1/llms/<name>/infer; mcpd's kind=virtual branch handles relay-through-mcplocal transparently, so the same CLI command works for both public and virtual LLMs. - Reuses installStatusBar / formatStats / recordDelta / styleStats / PhaseStats from chat.ts (now exported) so the bottom-row tokens-per- second ticker behaves identically to mcpctl chat. - Flags: --message (one-shot), --system, --temperature, --max-tokens, --no-stream. Streaming uses OpenAI chat.completion.chunk SSE. - REPL mode keeps a per-session history array so multi-turn flows feel natural; each turn is an independent inference call. Updated: get.ts - LlmRow gains optional kind/status fields. - llmColumns layout: NAME, KIND, STATUS, TYPE, MODEL, TIER, KEY, ID. Defaults gracefully when older mcpd responses don't return them. Updated: chat.ts - Re-exports the helpers chat-llm.ts needs (PhaseStats, newPhase, recordDelta, formatStats, styleStats, styleThinking, STDERR_IS_TTY, StatusBar, installStatusBar). No behavior change. Completions: chat-llm picks up the standard option enumeration automatically; bash gets a special-case for first-arg LLM-name completion via _mcpctl_resource_names "llms". CLI suite: 437/437 (was 430, +7 from auto-discovered test cases in the regenerated completions golden). Workspace: 2043/2043 across 152 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:25:38 +01:00
Michal	97174f450f	feat(mcplocal): virtual-LLM registrar (v1 Stage 4) The mcplocal counterpart to mcpd's VirtualLlmService. After this stage, flipping \`publish: true\` on a provider in ~/.mcpctl/config.json makes the provider show up in mcpctl get llm with kind=virtual the next time mcplocal restarts; running an inference against it relays through this client back to the local LlmProvider. Config: - LlmProviderFileEntry gains optional \`publish: boolean\` (default false, so existing setups don't change). Registrar (new file: providers/registrar.ts): - start(): if any provider is opted-in, POSTs to /api/v1/llms/_provider-register with the publishable set, persists the returned providerSessionId to ~/.mcpctl/provider-session for sticky reconnects, then opens the SSE control channel and starts a 30-s heartbeat ticker. - SSE listener parses event/data lines from text/event-stream frames. task frames trigger handleInferTask: convert OpenAI body to CompletionOptions, call provider.complete(), POST the result back as either { status, body } (non-streaming) or two chunk POSTs (streaming: one delta + a [DONE] marker). - Disconnect → exponential backoff reconnect from 5 s up to 60 s. On successful reconnect the persisted sessionId revives the same Llm rows in mcpd (mcpd flips them back to active on heartbeat). - stop() destroys the SSE socket and clears the timer; cleanly handed off from main.ts's existing shutdown handler. Wired into mcplocal main.ts via maybeStartVirtualLlmRegistrar: - Filters opted-in providers, looks up their LlmProvider instances in the registry. - Reads ~/.mcpctl/credentials for mcpdUrl + bearer; absence is a best-effort skip (logs a warning, returns null) — never a boot blocker. v1 caveat documented in the file header: LlmProvider returns a finalized CompletionResult, not a token stream, so streaming requests get a single delta chunk + [DONE]. Real per-token streaming is a v2 concern. Tests: 5 new in tests/registrar.test.ts using a tiny in-process HTTP server. Cover: no-op when nothing opted-in, register POST + sticky sessionId persistence, sticky reconnect from disk, heartbeat ticker fires at the configured interval, register HTTP error surfaces. Workspace suite: 2043/2043 across 152 files (was 2006/149, +5 new tests + the new file gets discovered). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:20:54 +01:00
Michal	192a3831df	feat(mcpd): virtual-LLM routes + GC ticker (v1 Stage 3) End-to-end backend wiring. After this stage, an mcplocal client can register a provider, hold the SSE channel open, heartbeat, and have its inference requests fanned through the relay — all without touching the agent layer or the public-LLM path. Routes (new file: routes/virtual-llms.ts): POST /api/v1/llms/_provider-register → returns { providerSessionId, llms[] } GET /api/v1/llms/_provider-stream → SSE channel keyed by x-mcpctl-provider-session header. Emits `event: hello` on open, `event: task` on inference fan-out, `: ping` every 20 s for proxies. POST /api/v1/llms/_provider-heartbeat → bumps lastHeartbeatAt POST /api/v1/llms/_provider-task/:id/result → mcplocal pushes result back; body shape is one of: { error: 'msg' } { chunk: { data, done? } } { status, body } LlmService: - LlmView gains kind/status/lastHeartbeatAt/inactiveSince so route handlers + the upcoming `mcpctl get llm` columns can branch on kind without re-fetching the row. llm-infer.ts: - Detects llm.kind === 'virtual' and delegates to VirtualLlmService.enqueueInferTask. Streaming + non-streaming both supported; on 503 (publisher offline) the existing audit hook still fires with the right status code. - Adds optional `virtualLlms: VirtualLlmService` to LlmInferDeps; absence in test fixtures returns a 500 with a clear "server misconfiguration" message rather than silently falling through to the public path against an empty URL. main.ts: - Constructs VirtualLlmService(llmRepo). - Passes it to registerLlmInferRoutes. - Calls registerVirtualLlmRoutes(app, virtualLlmService). - 60-s GC ticker started after app.listen; clears on graceful shutdown alongside the existing reconcile timer. Tests: 11 new virtual-LLM route assertions (validation paths, service plumbing for register/heartbeat/task-result) + 3 new infer-route assertions (kind=virtual non-streaming relay, 503 path, 500 when virtualLlms dep missing). mcpd suite: 833/833 (was 819, +14). Typecheck clean. The full SSE handshake is exercised by the smoke test in Stage 6; under app.inject the keep-alive blocks until close so unit-level SSE testing isn't worth the complexity here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:15:18 +01:00
Michal	2215922618	feat(mcpd): VirtualLlmService + repo lifecycle helpers (v1 Stage 2) The state machine for kind=virtual Llm rows. Wires the schema added in Stage 1 into something that can register, heartbeat, time out, and relay inference tasks. The HTTP routes (Stage 3) plug into this. Repository (extends ILlmRepository): - create/update accept kind/providerSessionId/lastHeartbeatAt/status/ inactiveSince/type so VirtualLlmService can drive the lifecycle. - findBySessionId(sessionId) — the reconnect lookup. - findStaleVirtuals(cutoff) — heartbeat-stale rows for the GC sweep. - findExpiredInactives(cutoff) — 4h-expired rows for deletion. VirtualLlmService: - register(): sticky-id-aware upsert. New names insert as kind=virtual/ status=active. Existing virtual rows from the same session reactivate in place; existing inactive virtuals from a foreign session can be adopted (sticky reconnect). Refuses to overwrite a public row or a foreign session's still-active virtual. - heartbeat(): bumps lastHeartbeatAt for every row owned by the session; revives inactive rows. - bindSession()/unbindSession(): in-memory map of sessionId → SSE handle. Disconnect immediately flips owned rows to inactive AND rejects any in-flight tasks for that session. - enqueueInferTask(): pushes an `infer` task frame to the SSE handle, returns a PendingTaskRef whose `done` resolves when the publisher POSTs the result back. Streaming variant exposes onChunk(cb). - completeTask/pushTaskChunk/failTask: route-side hooks called from the result POST handler (lands in Stage 3). - gcSweep(): flips heartbeat-stale active virtuals to inactive (90s cutoff), deletes inactives past 4h. Idempotent. Lifecycle constants live in this file (HEARTBEAT_TIMEOUT_MS=90s, INACTIVE_RETENTION_MS=4h) so future stages can tune in one place. 18 new mocked-repo tests cover: register variants (insert, sticky reconnect, refuse public-overwrite, refuse foreign-session, adopt inactive-foreign), heartbeat-revive, unbind cascade, enqueue happy path + 503 paths (no session, inactive, public-Llm), complete/fail/ streaming chunk fan-out, GC sweep flip + delete + idempotence. mcpd suite: 819/819 (was 801, +18). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 14:05:19 +01:00
Michal	1acd8b58bc	feat(db): Llm.kind discriminator + virtual-provider lifecycle (v1 Stage 1) First step of the virtual-LLM feature. A virtual Llm row is one that gets registered by an mcplocal client rather than created via \`mcpctl create llm\`. Its inference is relayed back through an SSE control channel to the publishing session (mcpd routes added in Stage 3). The lifecycle fields below let mcpd reap stale rows when the publisher goes away. Schema additions: - enum LlmKind (public \| virtual). Default public. - enum LlmStatus (active \| inactive \| hibernating). Default active. hibernating is reserved for v2 wake-on-demand. - Llm.kind, providerSessionId, lastHeartbeatAt, status, inactiveSince. - @@index([kind, status]) for the GC sweep. - @@index([providerSessionId]) for the reconnect lookup. All existing rows backfill with kind=public/status=active so v1 is purely additive — public LLMs ignore the lifecycle columns entirely. 7 new prisma-level assertions in tests/llm-virtual-schema.test.ts cover: defaults, persisting kind=virtual + lifecycle together, the active→inactive flip, hibernating value, enum rejection, the (kind,status) GC index, the providerSessionId reconnect index. mcpd suite still 801/801 (regenerated client) and typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:59:44 +01:00
michal	e65a396d3e	fix(cli): status probe accepts reasoning_content for thinking models (#62 ) Some checks failed CI/CD / typecheck (push) Successful in 56s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / lint (push) Successful in 2m40s Details CI/CD / smoke (push) Failing after 1m42s Details CI/CD / build (push) Successful in 5m5s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 11:10:15 +00:00
Michal	a84214dad1	fix(cli): status probe accepts reasoning_content for thinking models Some checks failed CI/CD / typecheck (pull_request) Successful in 56s Details CI/CD / lint (pull_request) Successful in 3m6s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / build (pull_request) Successful in 2m39s Details CI/CD / smoke (pull_request) Failing after 3m58s Details CI/CD / publish (pull_request) Has been skipped Details Live deploy showed qwen3-thinking failing the probe with "empty content": at max_tokens=8 the model spent its entire budget on the reasoning trace and never emitted a final \`content\` block. Fix: - Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap models but gives reasoning models enough headroom. - If \`message.content\` is empty but \`reasoning_content\` is non-empty, count it as alive and prefix the preview with "[thinking]" so the user knows the model didn't actually answer "hi" but is responsive. - Replace the prompt with the terser "Reply with just: hi" — closer to what a thinking model can short-circuit on. Tests: existing 25 pass; the failure-path test still asserts on the "empty content" path because reasoning_content is empty there too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:09:42 +01:00
michal	54e56f7b71	feat(cli): live "say hi" probe for server LLMs in mcpctl status (#61 ) Some checks failed CI/CD / lint (push) Successful in 57s Details CI/CD / typecheck (push) Successful in 57s Details CI/CD / test (push) Has been cancelled Details CI/CD / smoke (push) Has been cancelled Details CI/CD / build (push) Has been cancelled Details CI/CD / publish (push) Has been cancelled Details	2026-04-27 11:02:26 +00:00
Michal	e4af16477c	feat(cli): live "say hi" probe for server LLMs in mcpctl status Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m13s Details CI/CD / typecheck (pull_request) Successful in 3m10s Details CI/CD / smoke (pull_request) Failing after 1m46s Details CI/CD / build (pull_request) Successful in 3m24s Details CI/CD / publish (pull_request) Has been skipped Details Status was showing the server-side LLM list but not whether each one actually serves inference. This adds a per-LLM probe that POSTs a tiny prompt to /api/v1/llms/<name>/infer: messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }] max_tokens: 8, temperature: 0 Each registered LLM gets a one-line health line: Server LLMs: 2 registered (probing live "say hi"...) fast qwen3-thinking ✓ "hi" 312ms openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY heavy sonnet ✗ upstream auth failed: 401 anthropic → claude-sonnet-4-5 provider default no key Probes run in parallel so a single slow LLM doesn't gate the others; each has its own 15-second timeout. JSON/YAML output gains a \`health: { ok, ms, say?, error? }\` field per server LLM so dashboards get the same liveness signal. Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace suite: 2006/2006 across 149 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:02:00 +01:00
michal	de96af7bf6	feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 (#60 ) Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m9s Details CI/CD / typecheck (push) Failing after 7m9s Details CI/CD / smoke (push) Has been skipped Details CI/CD / build (push) Has been skipped Details CI/CD / publish (push) Has been skipped Details	2026-04-27 10:28:10 +00:00

1 2 3 4 5 ...

354 Commits