Commit Graph

318 Commits

Author SHA1 Message Date
700d1683c2 fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML (#64)
Some checks failed
CI/CD / lint (push) Successful in 56s
CI/CD / test (push) Successful in 1m11s
CI/CD / typecheck (push) Successful in 2m49s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 3m10s
CI/CD / publish (push) Has been skipped
2026-04-27 13:47:18 +00:00
Michal
2a44f60785 fix(cli): strip virtual-LLM lifecycle fields from llm apply-doc YAML
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / typecheck (pull_request) Successful in 2m59s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Successful in 6m35s
CI/CD / publish (pull_request) Has been skipped
The smoke test \`llm.smoke > round-trips yaml output → apply -f\` failed
after v1 of the virtual-LLM feature: \`mcpctl get llm <name> -o yaml\`
output now starts with \`kind: public\` (the new schema column) instead
of \`kind: llm\` (the apply-doc envelope), because toApplyDocs spread
the cleaned item AFTER setting the kind, so the cleaned item's \`kind\`
overwrote.

Fix: in toApplyDocs, when serialising the \`llms\` resource, drop the
new lifecycle fields (kind, status, lastHeartbeatAt, inactiveSince,
providerSessionId) before merging. They collide with the apply-doc
envelope and aren't apply-able anyway — they're derived runtime state
owned by VirtualLlmService. Public-LLM round-trip is now byte-clean
(those fields default to public/active anyway). Virtual rows are
created by the registrar, not via apply -f, so dropping them on
output is the right call.

CLI suite: 437/437. Smoke will re-run against the live mcpd via
scripts/release.sh after merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:47:00 +01:00
65b6b265d9 feat: virtual LLMs v1 (registration skeleton) (#63)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m12s
CI/CD / typecheck (push) Successful in 2m13s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 4m50s
CI/CD / publish (push) Has been skipped
2026-04-27 13:38:50 +00:00
Michal
866f6abc88 feat: virtual-LLM smoke test + docs (v1 Stage 6)
Some checks failed
CI/CD / typecheck (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / lint (pull_request) Successful in 2m6s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 2m11s
CI/CD / publish (pull_request) Has been skipped
Final stage of v1.

Smoke (mcplocal/tests/smoke/virtual-llm.smoke.test.ts):
- Spins an in-process LlmProvider that returns canned content.
- Runs the registrar against the live mcpd in fulldeploy.
- Asserts: row appears with kind=virtual / status=active, infer
  through /api/v1/llms/<name>/infer comes back through the SSE
  relay with the provider's content + finish_reason, and a 503
  appears immediately after registrar.stop() (publisher offline).
- Times out / cleanup paths idempotent so re-runs against the same
  cluster don't litter rows. The 90-s heartbeat-stale flip and 4-h
  GC are unit-tested — too slow for smoke.

Docs:
- New docs/virtual-llms.md: when to use this vs creating a regular
  Llm row, how to opt-in via publish: true, the lifecycle table,
  the inference-relay sequence, the v1 streaming caveat, the v2-v5
  roadmap, and the full /api/v1/llms/_provider-* surface.
- agents.md cross-links virtual-llms.md alongside personalities/chat.
- README's Agents section gains a "Virtual LLMs" subsection.

Workspace suite: 2043/2043 (smoke files run separately). v1 closes.

Stage roadmap (each its own future PR):
  v2 wake-on-demand · v3 virtual agents · v4 LB pool · v5 task queue

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:28:43 +01:00
Michal
7e6b0cab44 feat(cli): mcpctl chat-llm + KIND/STATUS columns (v1 Stage 5)
Closes the loop on user-facing surface:

  $ mcpctl get llm
  NAME             KIND     STATUS    TYPE     MODEL                       TIER  KEY  ID
  qwen3-thinking   public   active    openai   qwen3-thinking              fast  ...  ...
  vllm-local       virtual  active    openai   Qwen/Qwen2.5-7B-Instruct    fast  -    ...

  $ mcpctl chat-llm vllm-local
  ────────────────────────────────────────
  LLM: vllm-local  openai → Qwen/Qwen2.5-7B-Instruct-AWQ
  Kind: virtual    Status: active
  ────────────────────────────────────────
  > hello?
  Hi! …

New: chat-llm command (commands/chat-llm.ts)
- Stateless chat with any mcpd-registered LLM. No threads, no tools,
  no project prompts. POSTs to /api/v1/llms/<name>/infer; mcpd's
  kind=virtual branch handles relay-through-mcplocal transparently,
  so the same CLI command works for both public and virtual LLMs.
- Reuses installStatusBar / formatStats / recordDelta / styleStats /
  PhaseStats from chat.ts (now exported) so the bottom-row tokens-per-
  second ticker behaves identically to mcpctl chat.
- Flags: --message (one-shot), --system, --temperature, --max-tokens,
  --no-stream. Streaming uses OpenAI chat.completion.chunk SSE.
- REPL mode keeps a per-session history array so multi-turn flows
  feel natural; each turn is an independent inference call.

Updated: get.ts
- LlmRow gains optional kind/status fields.
- llmColumns layout: NAME, KIND, STATUS, TYPE, MODEL, TIER, KEY, ID.
  Defaults gracefully when older mcpd responses don't return them.

Updated: chat.ts
- Re-exports the helpers chat-llm.ts needs (PhaseStats, newPhase,
  recordDelta, formatStats, styleStats, styleThinking, STDERR_IS_TTY,
  StatusBar, installStatusBar). No behavior change.

Completions: chat-llm picks up the standard option enumeration
automatically; bash gets a special-case for first-arg LLM-name
completion via _mcpctl_resource_names "llms".

CLI suite: 437/437 (was 430, +7 from auto-discovered test cases in
the regenerated completions golden). Workspace: 2043/2043 across
152 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:25:38 +01:00
Michal
97174f450f feat(mcplocal): virtual-LLM registrar (v1 Stage 4)
The mcplocal counterpart to mcpd's VirtualLlmService. After this stage,
flipping \`publish: true\` on a provider in ~/.mcpctl/config.json makes
the provider show up in mcpctl get llm with kind=virtual the next time
mcplocal restarts; running an inference against it relays through this
client back to the local LlmProvider.

Config:
- LlmProviderFileEntry gains optional \`publish: boolean\` (default false,
  so existing setups don't change).

Registrar (new file: providers/registrar.ts):
- start(): if any provider is opted-in, POSTs to
  /api/v1/llms/_provider-register with the publishable set, persists
  the returned providerSessionId to ~/.mcpctl/provider-session for
  sticky reconnects, then opens the SSE control channel and starts a
  30-s heartbeat ticker.
- SSE listener parses event/data lines from text/event-stream frames.
  task frames trigger handleInferTask: convert OpenAI body to
  CompletionOptions, call provider.complete(), POST the result back as
  either { status, body } (non-streaming) or two chunk POSTs
  (streaming: one delta + a [DONE] marker).
- Disconnect → exponential backoff reconnect from 5 s up to 60 s. On
  successful reconnect the persisted sessionId revives the same Llm
  rows in mcpd (mcpd flips them back to active on heartbeat).
- stop() destroys the SSE socket and clears the timer; cleanly handed
  off from main.ts's existing shutdown handler.

Wired into mcplocal main.ts via maybeStartVirtualLlmRegistrar:
- Filters opted-in providers, looks up their LlmProvider instances in
  the registry.
- Reads ~/.mcpctl/credentials for mcpdUrl + bearer; absence is a
  best-effort skip (logs a warning, returns null) — never a boot
  blocker.

v1 caveat documented in the file header: LlmProvider returns a
finalized CompletionResult, not a token stream, so streaming requests
get a single delta chunk + [DONE]. Real per-token streaming is a v2
concern.

Tests: 5 new in tests/registrar.test.ts using a tiny in-process HTTP
server. Cover: no-op when nothing opted-in, register POST + sticky
sessionId persistence, sticky reconnect from disk, heartbeat ticker
fires at the configured interval, register HTTP error surfaces.

Workspace suite: 2043/2043 across 152 files (was 2006/149, +5
new tests + the new file gets discovered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:20:54 +01:00
Michal
192a3831df feat(mcpd): virtual-LLM routes + GC ticker (v1 Stage 3)
End-to-end backend wiring. After this stage, an mcplocal client can
register a provider, hold the SSE channel open, heartbeat, and have
its inference requests fanned through the relay — all without
touching the agent layer or the public-LLM path.

Routes (new file: routes/virtual-llms.ts):
  POST /api/v1/llms/_provider-register    → returns { providerSessionId, llms[] }
  GET  /api/v1/llms/_provider-stream      → SSE channel keyed by
                                            x-mcpctl-provider-session header.
                                            Emits `event: hello` on open,
                                            `event: task` on inference fan-out,
                                            `: ping` every 20 s for proxies.
  POST /api/v1/llms/_provider-heartbeat   → bumps lastHeartbeatAt
  POST /api/v1/llms/_provider-task/:id/result
                                          → mcplocal pushes result back;
                                            body shape is one of:
                                              { error: 'msg' }
                                              { chunk: { data, done? } }
                                              { status, body }

LlmService:
- LlmView gains kind/status/lastHeartbeatAt/inactiveSince so route
  handlers + the upcoming `mcpctl get llm` columns can branch on
  kind without re-fetching the row.

llm-infer.ts:
- Detects llm.kind === 'virtual' and delegates to
  VirtualLlmService.enqueueInferTask. Streaming + non-streaming both
  supported; on 503 (publisher offline) the existing audit hook still
  fires with the right status code.
- Adds optional `virtualLlms: VirtualLlmService` to LlmInferDeps;
  absence in test fixtures returns a 500 with a clear "server
  misconfiguration" message rather than silently falling through to
  the public path against an empty URL.

main.ts:
- Constructs VirtualLlmService(llmRepo).
- Passes it to registerLlmInferRoutes.
- Calls registerVirtualLlmRoutes(app, virtualLlmService).
- 60-s GC ticker started after app.listen; clears on graceful
  shutdown alongside the existing reconcile timer.

Tests: 11 new virtual-LLM route assertions (validation paths,
service plumbing for register/heartbeat/task-result) + 3 new
infer-route assertions (kind=virtual non-streaming relay, 503 path,
500 when virtualLlms dep missing). mcpd suite: 833/833 (was 819,
+14). Typecheck clean.

The full SSE handshake is exercised by the smoke test in Stage 6;
under app.inject the keep-alive blocks until close so unit-level
SSE testing isn't worth the complexity here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:15:18 +01:00
Michal
2215922618 feat(mcpd): VirtualLlmService + repo lifecycle helpers (v1 Stage 2)
The state machine for kind=virtual Llm rows. Wires the schema added
in Stage 1 into something that can register, heartbeat, time out,
and relay inference tasks. The HTTP routes (Stage 3) plug into this.

Repository (extends ILlmRepository):
- create/update accept kind/providerSessionId/lastHeartbeatAt/status/
  inactiveSince/type so VirtualLlmService can drive the lifecycle.
- findBySessionId(sessionId) — the reconnect lookup.
- findStaleVirtuals(cutoff) — heartbeat-stale rows for the GC sweep.
- findExpiredInactives(cutoff) — 4h-expired rows for deletion.

VirtualLlmService:
- register(): sticky-id-aware upsert. New names insert as kind=virtual/
  status=active. Existing virtual rows from the same session reactivate
  in place; existing inactive virtuals from a foreign session can be
  adopted (sticky reconnect). Refuses to overwrite a public row or a
  foreign session's still-active virtual.
- heartbeat(): bumps lastHeartbeatAt for every row owned by the
  session; revives inactive rows.
- bindSession()/unbindSession(): in-memory map of sessionId → SSE
  handle. Disconnect immediately flips owned rows to inactive AND
  rejects any in-flight tasks for that session.
- enqueueInferTask(): pushes an `infer` task frame to the SSE handle,
  returns a PendingTaskRef whose `done` resolves when the publisher
  POSTs the result back. Streaming variant exposes onChunk(cb).
- completeTask/pushTaskChunk/failTask: route-side hooks called from
  the result POST handler (lands in Stage 3).
- gcSweep(): flips heartbeat-stale active virtuals to inactive (90s
  cutoff), deletes inactives past 4h. Idempotent.

Lifecycle constants live in this file (HEARTBEAT_TIMEOUT_MS=90s,
INACTIVE_RETENTION_MS=4h) so future stages can tune in one place.

18 new mocked-repo tests cover: register variants (insert, sticky
reconnect, refuse public-overwrite, refuse foreign-session, adopt
inactive-foreign), heartbeat-revive, unbind cascade, enqueue happy
path + 503 paths (no session, inactive, public-Llm), complete/fail/
streaming chunk fan-out, GC sweep flip + delete + idempotence.

mcpd suite: 819/819 (was 801, +18). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:05:19 +01:00
Michal
1acd8b58bc feat(db): Llm.kind discriminator + virtual-provider lifecycle (v1 Stage 1)
First step of the virtual-LLM feature. A virtual Llm row is one that
gets *registered by an mcplocal client* rather than created via
\`mcpctl create llm\`. Its inference is relayed back through an SSE
control channel to the publishing session (mcpd routes added in
Stage 3). The lifecycle fields below let mcpd reap stale rows when
the publisher goes away.

Schema additions:
- enum LlmKind (public | virtual). Default public.
- enum LlmStatus (active | inactive | hibernating). Default active.
  hibernating is reserved for v2 wake-on-demand.
- Llm.kind, providerSessionId, lastHeartbeatAt, status, inactiveSince.
- @@index([kind, status]) for the GC sweep.
- @@index([providerSessionId]) for the reconnect lookup.

All existing rows backfill with kind=public/status=active so v1 is
purely additive — public LLMs ignore the lifecycle columns entirely.

7 new prisma-level assertions in tests/llm-virtual-schema.test.ts
cover: defaults, persisting kind=virtual + lifecycle together, the
active→inactive flip, hibernating value, enum rejection, the
(kind,status) GC index, the providerSessionId reconnect index.

mcpd suite still 801/801 (regenerated client) and typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 13:59:44 +01:00
e65a396d3e fix(cli): status probe accepts reasoning_content for thinking models (#62)
Some checks failed
CI/CD / typecheck (push) Successful in 56s
CI/CD / test (push) Successful in 1m10s
CI/CD / lint (push) Successful in 2m40s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / build (push) Successful in 5m5s
CI/CD / publish (push) Has been skipped
2026-04-27 11:10:15 +00:00
Michal
a84214dad1 fix(cli): status probe accepts reasoning_content for thinking models
Some checks failed
CI/CD / typecheck (pull_request) Successful in 56s
CI/CD / lint (pull_request) Successful in 3m6s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / build (pull_request) Successful in 2m39s
CI/CD / smoke (pull_request) Failing after 3m58s
CI/CD / publish (pull_request) Has been skipped
Live deploy showed qwen3-thinking failing the probe with "empty
content": at max_tokens=8 the model spent its entire budget on the
reasoning trace and never emitted a final \`content\` block.

Fix:
- Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap
  models but gives reasoning models enough headroom.
- If \`message.content\` is empty but \`reasoning_content\` is non-empty,
  count it as alive and prefix the preview with "[thinking]" so the
  user knows the model didn't actually answer "hi" but is responsive.
- Replace the prompt with the terser "Reply with just: hi" — closer
  to what a thinking model can short-circuit on.

Tests: existing 25 pass; the failure-path test still asserts on the
"empty content" path because reasoning_content is empty there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:09:42 +01:00
54e56f7b71 feat(cli): live "say hi" probe for server LLMs in mcpctl status (#61)
Some checks failed
CI/CD / lint (push) Successful in 57s
CI/CD / typecheck (push) Successful in 57s
CI/CD / test (push) Has been cancelled
CI/CD / smoke (push) Has been cancelled
CI/CD / build (push) Has been cancelled
CI/CD / publish (push) Has been cancelled
2026-04-27 11:02:26 +00:00
Michal
e4af16477c feat(cli): live "say hi" probe for server LLMs in mcpctl status
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m13s
CI/CD / typecheck (pull_request) Successful in 3m10s
CI/CD / smoke (pull_request) Failing after 1m46s
CI/CD / build (pull_request) Successful in 3m24s
CI/CD / publish (pull_request) Has been skipped
Status was showing the server-side LLM list but not whether each one
actually serves inference. This adds a per-LLM probe that POSTs a
tiny prompt to /api/v1/llms/<name>/infer:

  messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }]
  max_tokens: 8, temperature: 0

Each registered LLM gets a one-line health line:

  Server LLMs: 2 registered (probing live "say hi"...)
    fast   qwen3-thinking  ✓ "hi" 312ms
              openai → qwen3-thinking  http://litellm.../v1  key:litellm/API_KEY
    heavy  sonnet  ✗ upstream auth failed: 401
              anthropic → claude-sonnet-4-5  provider default  no key

Probes run in parallel so a single slow LLM doesn't gate the others;
each has its own 15-second timeout. JSON/YAML output gains a
\`health: { ok, ms, say?, error? }\` field per server LLM so dashboards
get the same liveness signal.

Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace
suite: 2006/2006 across 149 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:02:00 +01:00
de96af7bf6 feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 (#60)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m9s
CI/CD / typecheck (push) Failing after 7m9s
CI/CD / smoke (push) Has been skipped
CI/CD / build (push) Has been skipped
CI/CD / publish (push) Has been skipped
2026-04-27 10:28:10 +00:00
Michal
0db37e92a4 feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500
Some checks failed
CI/CD / typecheck (pull_request) Successful in 58s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / lint (pull_request) Successful in 2m14s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 2m14s
CI/CD / publish (pull_request) Has been skipped
Two related fixes:

1. \`mcpctl status\` now lists mcpd-managed Llm rows (the ones created via
   \`mcpctl create llm\`) under a new "Server LLMs:" section, grouped by
   tier with type, model, upstream URL, and key reference. JSON/YAML
   output gains a \`serverLlms\` array.

   Bearer token (from \`mcpctl auth login\` / saved credentials) is
   passed through; if mcpd is unreachable or returns non-200 the
   section is silently omitted (the existing mcpd connectivity line
   already conveys that). 6 new tests cover happy path, empty list,
   token plumbing, and JSON shape.

2. SPA fallback at \`/ui/<deeplink>\` was returning 500 because we
   registered \`@fastify/static\` with \`decorateReply: false\` and then
   called \`reply.sendFile\`. Read index.html once at startup and serve
   it with \`reply.send(html)\` instead — also dodges a per-request
   stat call. Drop \`decorateReply: false\` so future code can use
   reply.sendFile if it ever needs to.

Full suite: 2005/2005 across 149 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 11:27:45 +01:00
899f2c750c fix(test): vitest 4 projects + src/web jsdom env (#59)
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Successful in 1m10s
CI/CD / typecheck (push) Successful in 2m37s
CI/CD / smoke (push) Failing after 1m41s
CI/CD / build (push) Successful in 2m38s
CI/CD / publish (push) Has been skipped
2026-04-26 20:31:47 +00:00
Michal
bf0a60bc0a fix(test): switch workspace runner to vitest 4 \projects\ field
Some checks failed
CI/CD / typecheck (pull_request) Successful in 57s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / lint (pull_request) Successful in 2m43s
CI/CD / smoke (pull_request) Failing after 1m45s
CI/CD / build (pull_request) Successful in 5m43s
CI/CD / publish (pull_request) Has been skipped
The workspace-level \`pnpm test:run\` (which fulldeploy.sh runs as a
gate) was failing with \`localStorage is not defined\` on the new
src/web tests. Two intertwined causes:

1. vitest 4 deprecated \`vitest.workspace.ts\`. The file was being
   silently ignored, so per-package configs (cli, mcpd, mcplocal)
   weren't being honored under workspace mode either — the root
   config was being used for all of them.

2. With the root config in charge, src/web/tests ran with the default
   Node environment, no \`localStorage\` global, so the api wrapper's
   test setup blew up.

Fix:
- Move workspace projects into the root \`vitest.config.ts\` under the
  new \`projects\` array (the vitest 4 replacement).
- Add a proper \`src/web/vitest.config.ts\` (vitest 4 doesn't auto-pick
  up vite.config.ts as a test config in workspace mode, even though
  per-package \`pnpm --filter\` does).
- Exclude \`src/web/tests/**\` from the root-level include so we don't
  double-run them under the wrong env.

After: \`pnpm test:run\` runs 1999/1999 across 149 files (was 1992/1996
with 4 web failures). Per-package runs unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:31:27 +01:00
c0ba0a9040 feat: web prompt editor + agent personalities (#58)
Some checks failed
CI/CD / typecheck (push) Successful in 56s
CI/CD / test (push) Failing after 1m10s
CI/CD / lint (push) Successful in 2m34s
CI/CD / smoke (push) Has been skipped
CI/CD / build (push) Has been skipped
CI/CD / publish (push) Has been skipped
2026-04-26 20:21:53 +00:00
Michal
4cbf58d212 feat(mcpd+deploy): serve web UI at /ui + smoke tests + docs (Stage 6)
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Failing after 1m8s
CI/CD / typecheck (pull_request) Successful in 2m35s
CI/CD / smoke (pull_request) Has been skipped
CI/CD / build (pull_request) Has been skipped
CI/CD / publish (pull_request) Has been skipped
The closing stage. mcpd now hosts the Stage 5 SPA, the Docker image
bundles the build artifact, a smoke test exercises the personality
HTTP surface end-to-end, and the user-facing docs spell out the
mental model.

mcpd:
- Add @fastify/static dep.
- New routes/web-ui.ts: registers /ui/* against a static bundle. Looks
  for the bundle at $MCPD_WEB_ROOT, then /usr/share/mcpd/web (the
  Docker image path), then a dev-tree fallback. Logs and skips
  cleanly if missing — API-only deploys keep working.
- SPA fallback: any /ui/<path> that doesn't match a file falls through
  to index.html so direct hits to react-router URLs work.
- /ui/* falls through to `kind: skip` in mapUrlToPermission, so the
  static assets are served unauthenticated. Each API call from the
  SPA still carries the bearer token.

Deploy:
- Dockerfile.mcpd builds the @mcpctl/web bundle in the same builder
  stage and copies dist/ to /usr/share/mcpd/web in the runtime image.

Smoke (personality.smoke.test.ts):
- Live mcpd flow: create secret/llm/agent/personality, attach an
  agent-direct prompt, verify the binding listing, reject double-
  attach (409) + foreign-agent prompt (400), set defaultPersonality
  by name, detach + delete cleanup.

Docs:
- New docs/personalities.md: VLAN-on-ethernet model, system-block
  ordering table, three prompt scopes, CLI walkthrough, web UI
  walkthrough, full API surface, RBAC notes.
- agents.md and chat.md cross-link.
- README's Agents section gains a Personalities subsection.

Test count after Stage 6:
  mcpd:   801/801      cli:  430/430
  web:    7/7          db:   58/62 (4 pre-existing)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:48:43 +01:00
Michal
0010cc18b7 feat(web): browser-based prompt + personality editor (Stage 5)
New workspace package @mcpctl/web — a Vite + React 19 SPA that talks
to mcpd's existing HTTP API. Bundles to a static dist/ which Stage 6
will bake into the RPM and serve from mcpd at /ui via @fastify/static.

Pages:
  /ui/projects                       list projects
  /ui/projects/:name/prompts         CRUD project prompts (Monaco editor)
  /ui/agents                         list agents
  /ui/agents/:name                   tabs: Direct prompts | Personalities
  /ui/personalities/:id              bind/unbind prompts to a personality

Auth: paste a session token (mcpctl auth login) or PAT (mcpctl_pat_*)
once on a login screen, kept in localStorage; logout clears it.

API client: 60-line fetch wrapper, attaches the bearer header from
storage, throws an ApiError with status + parsed body on non-2xx.
A 200-line useFetch hook provides loading/error/data without a
state-management library — we are not building Notion.

UX:
  - Dark terminal-adjacent theme so the page feels like the CLI.
  - Monaco @monaco-editor/react for prompt content (markdown mode,
    word-wrap, search, multi-cursor).
  - Personality detail's "attach prompt" picker filters in-scope
    candidates: agent-direct + same-project + globals.

Dev loop:  pnpm --filter @mcpctl/web dev   (vite at :5173, proxies
  /api to https://mcpctl.ad.itaz.eu — override with MCPCTL_API_URL).
Build:     pnpm --filter @mcpctl/web build → src/web/dist/.

Tests: 7 vitest cases covering the bearer header / 4xx body / 204
no-content path on the api wrapper, and the login storage round-trip
+ help toggle. Production build green: 269 KB JS / 84 KB gzipped.
Typecheck clean (TS strict + exactOptionalPropertyTypes carried over).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:41:57 +01:00
Michal
9050918a83 feat(cli): personality flag + create/get/edit/delete personalities (Stage 4)
End-to-end CLI surface for the personality overlay:

  mcpctl create personality grumpy --agent reviewer --description "be terse"
  mcpctl create prompt tone --agent reviewer --content "Be very terse."
  mcpctl get personalities
  mcpctl get personalities --agent reviewer
  mcpctl edit personality <id>
  mcpctl delete personality grumpy --agent reviewer
  mcpctl chat reviewer --personality grumpy

Chat banner gains a "Personality:" line that shows either the active
flag value or the agent's `defaultPersonality` (when no flag given),
so the user knows which overlay is in effect before sending a message.

`--personality` is stripped from `/save` (it's a per-turn override,
not a `defaultParams` field — the agent's defaultPersonality lives on
its own column and is set via PUT /agents).

Backend (small additions to land Stage 4 cleanly):
- `GET /api/v1/personalities[?agent=name]` so `mcpctl get
  personalities` doesn't require an agent filter.
- PersonalityService.listAll() aggregates across agents.

Completions: regenerated fish + bash. `personalities` added as a
canonical resource with `personality` alias; edit-resource list
extended; the per-resource argument completers pick up the new
type automatically.

CLI suite: 430/430. mcpd: 801/801. Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:32:48 +01:00
Michal
faef1e732d feat(mcpd): personality routes + chat system block overlay (Stage 3)
End-to-end backend wiring for the agents-feature evolution. After
this stage you can curl all the endpoints; CLI + Web UI follow.

Routes (new):
  GET    /api/v1/agents/:agentName/personalities
  POST   /api/v1/agents/:agentName/personalities
  GET    /api/v1/personalities/:id
  PUT    /api/v1/personalities/:id
  DELETE /api/v1/personalities/:id
  GET    /api/v1/personalities/:id/prompts
  POST   /api/v1/personalities/:id/prompts
  DELETE /api/v1/personalities/:id/prompts/:promptId
  GET    /api/v1/agents/:agentName/prompts            (agent-direct)

Routes (extended):
  POST /api/v1/prompts now resolves `agent: <name>` like `project: <name>`
  POST /api/v1/agents/:name/chat accepts `personality: <name>`

RBAC: `personalities` segment maps to the `agents` resource so
view/edit/create/delete on the parent agent governs personality access.
No new RBAC roles — piggybacking keeps the surface flat.

System block (chat.service.ts):
  agent.systemPrompt
  + agent-direct prompts (Prompt.agentId === agent.id, priority desc)
  + project prompts        (existing behavior, priority desc)
  + personality prompts    (PersonalityPrompt[chosen], priority desc)
  + systemAppend

Personality is selected by request body `personality: <name>`, falling
back to `agent.defaultPersonalityId` if unset. A typo'd flag throws
404 rather than silently dropping back to no overlay — failing loudly
on misconfiguration is the only way users learn it didn't apply.

Backwards-compatible by construction: when no agent-direct prompts
exist and no personality is selected, the resulting block is byte-
identical to the old layout (verified by a regression test).

Tests: 5 new chat-service.test cases cover ordering, default-
personality fallback, missing-personality 404, and the regression
guard. mcpd suite: 801/801 (was 796). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:27:59 +01:00
Michal
6b5bd78cfa feat(mcpd): personality + prompt-by-agent repos and services (Stage 2)
Wires the schema landed in Stage 1 into the service layer. No HTTP
routes yet — Stage 3 will register `/api/v1/...` endpoints and update
chat.service to read agent-direct + personality prompts when building
the system block.

Repositories:
- PersonalityRepository: CRUD + listPrompts/attach/detach bindings.
- PromptRepository: findByAgent + findByNameAndAgent; create/update
  accept the new agentId column. findGlobal now also filters
  agentId=null so agent-direct prompts don't leak into global lists.
- AgentRepository: defaultPersonalityId on create + connect/disconnect
  in update.

Services:
- PersonalityService: CRUD scoped per agent, plus attach/detach with
  scope enforcement — a prompt may bind only if it's agent-direct on
  the same agent, in the agent's project, or global. Foreign-project
  / foreign-agent attachments are rejected with 400.
- PromptService: createPrompt / upsertByName accept agentId and
  resolve `agent: <name>`, with XOR-with-project guard. Adds
  listPromptsForAgent.
- AgentService: defaultPersonality (by name on the agent's own
  personality set) round-trips through update + AgentView.

Validation:
- prompt.schema.ts: refine() rejects projectId+agentId together.
- personality.schema.ts: new Create/Update/AttachPrompt schemas.
- agent.schema.ts: defaultPersonality { name } | null on update.

Tests: 12 PersonalityService + 7 PromptService agent-scope tests
covering happy paths, XOR/scope enforcement, double-attach guard,
detach-not-bound. mcpd suite: 796/796 (was 777). Typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:20:51 +01:00
Michal
f60f00f1fd feat(db): add personalities + agent-direct prompts schema (Stage 1)
A Personality is a named overlay on top of an Agent — same agent,
same LLM, but a different bundle of prompts injected into the system
block at chat time. VLAN-on-ethernet semantics: ethernet still works
without VLAN; with a VLAN tag, frames are segmented but still ethernet.

Schema additions:
- Prompt.agentId (nullable FK + index, cascade on delete) so prompts
  can attach directly to an agent without going through a project.
- Personality { id, name, description, agentId, priority } with
  unique (name, agentId).
- PersonalityPrompt join table with per-binding priority override.
- Agent.defaultPersonalityId (SetNull on delete) so an agent can pick
  one personality as the default when no --personality flag is passed.

Backwards-compatible by construction: every new column is nullable;
existing rows are valid as-is; the chat.service systemBlock changes
land in Stage 3.

8 new prisma-level assertions in agent-schema.test.ts cover unique
constraints, cascade behavior, the SetNull on defaultPersonalityId,
and shared-prompt-across-personalities. All 16 db tests pass; mcpd
typecheck + 777 mcpd unit tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 19:12:22 +01:00
9389ffff3c feat(agents+chat): agents feature + live chat UX (#57)
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / test (push) Successful in 1m6s
CI/CD / typecheck (push) Successful in 2m17s
CI/CD / smoke (push) Failing after 1m38s
CI/CD / build (push) Successful in 2m35s
CI/CD / publish (push) Has been skipped
2026-04-26 17:53:27 +00:00
Michal
21f406037a feat(chat): print agent + system prompt banner at chat start
Some checks failed
CI/CD / typecheck (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m5s
CI/CD / lint (pull_request) Successful in 2m29s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 5m30s
CI/CD / publish (pull_request) Has been skipped
When you launch \`mcpctl chat <agent>\` it's not always obvious which
agent, LLM, project, or system prompt you're actually wired to,
especially when --system / --system-append flags are layered on top
of the agent's defaults. The session would just start at \`> \` with
no confirmation of the configuration.

Now both REPL and one-shot modes print a banner to stderr listing:
  - agent name + description
  - LLM + project (if attached)
  - effective system prompt (or --system override) and any
    --system-append addendum, indented for readability
  - active sampling overrides (temperature, top_p, etc.)

Goes through stderr so \`mcpctl chat ... -m "hi" 2>/dev/null\` keeps
piping clean. Best-effort: a metadata fetch failure logs and lets
the chat proceed rather than blocking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 18:37:06 +01:00
Michal
ae54210a52 fix(chat): pin live tokens/sec ticker to a bottom-row status bar
The previous ticker used cursor save/restore (\x1b[s / \x1b[u) to draw
a stats line one row below the cursor. Save/restore is unreliable when
content scrolls or wraps — the saved row drifts off the visible area
and the restore lands inside content lines, smearing the ticker into
mid-word positions:

  Here are the available tools you can
  ⏵ 7w · 56.5 w/s · 0.1s | thinking 41 use with Docmost:6s

Replace it with a DECSTBM scroll region. Lock the bottom row, scroll
rows 1..N-1 for content, redraw the locked row in place every 250 ms.
This is how htop / tig / mosh status pin their footers — content and
status physically can't overlap.

Lifecycle: install once per chat-session (REPL or one-shot), tear down
on close / Ctrl-D / /quit / SIGINT / SIGTERM / uncaughtException. Pipes
and small terminals (<5 rows) get a no-op StatusBar so output stays
clean. Resize re-emits the scroll region with the new height.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:49:26 +01:00
Michal
cc9822d38b feat(chat): live tokens/sec ticker + final stats footer
While streaming, the REPL now shows a live word/sec counter on a status
line one row below the cursor — refreshes every 250ms via ANSI cursor
save+restore so it floats with the content as the response grows.
After each response, a dim stats footer prints on stderr:

  (47w · 12.3 w/s · 3.9s | thinking 234w · 38 w/s · 6.2s)

The ticker is stderr-only and only emits when stderr is a TTY — pipes
to a file stay clean for grepping/redirect. Words are whitespace-
separated tokens (good enough across English/code/Markdown without a
tokenizer dependency; CJK under-counts but the rate is still
directional).

Both phases tracked separately:
  - thinking: reasoning_content from qwen3-thinking / deepseek-reasoner
    / o1, where the model's scratchpad is the long part
  - content: the actual assistant answer

Final stats also added to the --no-stream path: total HTTP duration
and word count, since we don't get per-token timing there.

CLI suite still 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:15:26 +01:00
Michal
7cfa449465 feat(chat): surface reasoning_content as thinking chunks; fix --no-stream timeout
Reasoning models (qwen3-thinking, deepseek-reasoner, OpenAI o1 family) emit
their scratchpad as `delta.reasoning_content` (or `delta.reasoning`,
or `delta.provider_specific_fields.reasoning_content` when LiteLLM passes
through from vLLM) — separate from `delta.content`. Before this commit
mcpd's parseStreamingChunk only watched `content`, so the model's 30-90s
reasoning phase looked like dead air to the REPL: streaming connection
open, no chunks, no progress. Caught during the agents-feature shakedown
when qwen3-thinking sat silent for 90s on a docmost__list_pages call.

mcpd
====
chat.service.ts
  - parseStreamingChunk extracts a `reasoningDelta` from the chunk body,
    accepting all four spellings (reasoning_content / reasoning /
    provider_specific_fields.{reasoning_content,reasoning}). Future
    providers can add their own field names by extending the
    fallback chain.
  - chatStream yields `{ type: 'thinking', delta }` chunks as reasoning
    arrives, alongside the existing `{ type: 'text', delta }` for content.
  - Reasoning is intentionally NOT persisted to the thread. It's the
    model's scratchpad, not part of the conversation. Subsequent turns
    don't see it.
  - Adds 'thinking' to the ChatStreamChunk.type union.

CLI
===
chat.ts
  - streamOnce handles 'thinking' chunks: writes them dim+italic to
    stderr (ANSI 2;3m) so the model's reasoning visually flows like a
    quote block while the final answer streams to stdout. Plain text
    when stderr isn't a TTY (pipe to file → no escape codes leak).
  - chatRequestNonStream replaces the shared ApiClient.post() for the
    --no-stream path. ApiClient defaults to a 10s timeout, way too tight
    for any chat that calls a tool: LLM round + tool dispatch + LLM
    summary easily exceeds 10s. The new helper uses the same 600s timeout
    the streaming path has been using all along.

Tests:
  chat-service.test.ts (+2):
    - reasoning_content deltas surface as `thinking` chunks (not text);
      reasoning is NOT persisted to the assistant turn's content.
    - LiteLLM's provider_specific_fields.reasoning_content shape parses
      identically to the vendor-native shape.

mcpd 777/777, cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 17:04:01 +01:00
Michal
cc225eb70f feat(llm): probe upstream auth at registration time
mcpd now runs a cheap auth probe whenever an Llm is created (or its
apiKeyRef/url is updated). Catches misconfigured tokens / wrong URLs at
registration with a 422 + structured error message, instead of silently
500-ing on first chat with a generic "fetch failed". Caught in the wild
today: the homelab Pulumi config exposed `MCPCTL_GATEWAY_TOKEN` (which
is mcpctl_pat_-prefixed, intended for LiteLLM→mcplocal direction) where
LiteLLM expects `LITELLM_MASTER_KEY` (sk-prefixed). The probe makes
this immediate.

Probe shape (LlmAdapter.verifyAuth):
  - OpenAI passthrough → GET <url>/v1/models. Cheap, idempotent, gated
    by the same auth as chat/completions.
  - Anthropic → POST /v1/messages with max_tokens:1, "ping". Anthropic
    has no list-models endpoint; this is the cheapest auth-exercising
    call.
  - Returns one of:
      { ok: true }
      { ok: false, reason: "auth", status, body }    — 401/403, fail hard
      { ok: false, reason: "unreachable", error }    — network, warn-only
      { ok: false, reason: "unexpected", status, body } — non-auth 4xx, warn-only

Behavior:
  - LlmService.create()/update() runs the probe after resolveApiKey.
    Throws LlmAuthVerificationError on `auth`, logs warn for
    unreachable/unexpected, swallows for offline registration.
  - Probe is skipped when there's no apiKeyRef (nothing to verify) or
    when the caller passes skipAuthCheck=true.
  - update() probes only when apiKeyRef OR url changes — pure
    description/tier updates don't trigger upstream calls.
  - Routes catch LlmAuthVerificationError and return 422 with
    `{ error, status }`. The CLI surfaces the message verbatim via
    ApiError.

Opt-out:
  - CLI: `mcpctl create llm ... --skip-auth-check` for offline
    registration before the upstream is reachable.
  - HTTP: side-channel body field `_skipAuthCheck: true` (stripped
    before validation, never persisted on the row).

Side fix in same commit (caught while testing): src/cli/src/index.ts
read `program.opts()` BEFORE `program.parse()`, so `--direct` was a
no-op for ApiClient — every command went to mcplocal regardless. Some
commands accidentally still worked because mcplocal forwards plain
`/api/v1/*` to mcpd, but flows that need direct SSE streaming (e.g.
`mcpctl chat`) couldn't reach mcpd. Fixed by peeking at process.argv
directly for the two global flags before Commander's parse runs.

Tests:
  - llm-adapters.test.ts (+8): OpenAI 200/401/403/404/network, Anthropic
    200/401/400 (typo'd model = unexpected, NOT auth — registration
    shouldn't block on bad model names that surface at chat time).
  - llm-service.test.ts (+6): create-throws-on-auth-fail (no row
    written), warn-only on unreachable/unexpected, skipAuthCheck
    bypass, no-key skip, update-only-probes-on-auth-affecting-change.

mcpd 775/775, mcplocal 715/715, cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:51:55 +01:00
Michal
1f0be8a5c1 fix(agents): close gaps from /gstack-review
P1 — thread reads now enforce ownership
========================================
chat.service.ts / routes/agent-chat.ts
  GET /api/v1/threads/:id/messages was previously RBAC-mapped to
  view:agents (no resourceName scope) with the route comment promising
  "service-level owner check enforces fine-grained access" — but the
  service didn't actually check. Any caller with view:agents could read
  another user's thread by guessing/learning the threadId. CUIDs are
  hard to brute-force but they leak: SSE `final` chunks, agents-plugin
  `_meta.threadId`, and several response bodies surface them. Now
  ChatService.listMessages(threadId, ownerId) loads the thread, returns
  404 (not 403, to avoid id-enumeration via differential status codes)
  if ownerId doesn't match. Regression test in chat-service.test.ts
  covers Alice/Bob isolation + nonexistent-thread same-shape 404.

P2 — AgentChatRequestSchema strict mode
========================================
validation/agent.schema.ts
  `.merge()` does NOT inherit `.strict()` from AgentChatParamsSchema.
  Typo'd fields (e.g. `temprature`) silently fell through and the agent
  silently used the default — debuggable only by reading the LLM call
  payload. Re-applied `.strict()` on the merged schema.

P2 — per-agent maxIterations override + clamp
==============================================
chat.service.ts
  Loop cap was a hard-coded module constant (12), wrong for both
  research-style agents (need higher) and cheap-probe agents (could opt
  lower). Now reads `agent.extras.maxIterations`, clamps 1..50, falls
  back to 12 default. The clamp is the soft-DoS guard: a hostile agent
  definition with `maxIterations:1000000` can't burn unbounded LLM calls
  per request. Both chat() and chatStream() use ctx.maxIterations now.
  Regression test covers low-cap override (rejects with `exceeded 2`)
  and hostile-value clamp (rejects with `exceeded 50`).

P3 — SSE write to closed socket
================================
routes/agent-chat.ts
  When the upstream adapter throws after some chunks were already
  written AND the client disconnected, the catch block tried to flush
  more chunks to a closed socket. Without an `on('error')` handler
  Node emits unhandled error events; once Pino is wired to alerts
  this'd page on every disconnect-mid-stream. writeSseChunk now
  checks `reply.raw.destroyed || writableEnded` before write.

P3 — BACKEND_TOKEN_DEAD preserves original stack
=================================================
services/secret-backend-rotator.service.ts
  When wrapping mintRoleToken/lookupSelf failures as
  BACKEND_TOKEN_DEAD, the new Error() discarded the original throw —
  hard to tell whether the inner failure was a network blip vs an
  OpenBao API mismatch vs DNS. Now uses `new Error(msg, { cause: err })`
  so the inner stack survives.

P3 — .gitignore .claude/scheduled_tasks.lock
=============================================
This persisted state file was leaking into every `git status`.

Tests
=====
mcpd 761/761 (+2 regression tests). mcplocal 715/715. cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 23:53:19 +01:00
Michal
2e266e318a fix(mcplocal): lower default token introspection TTL in serve.ts too
Followup to e51b924. The middleware default in token-auth.ts is 5s, but
serve.ts wraps the construction with its own env-fallback default of
30000ms — so when MCPLOCAL_TOKEN_POSITIVE_TTL_MS isn't set in the
environment, serve.ts always wins and revoked tokens still propagate
slowly. Lowered serve.ts to 5s for symmetry; operators wanting a longer
window can set the env var explicitly.

Caught by mcptoken.smoke continuing to fail after the previous redeploy:
verified the token-auth.js shipped with `?? 5_000`, but the wrapper was
overriding it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 18:41:22 +01:00
Michal
e51b92473f fix(smoke,rotator,auth): repair smoke env + close failure modes that
caused 27 post-deploy smoke failures

This commit lands the durable side of the post-deploy investigation:
genuine bugs that let the upstream OpenBao re-init silently break every
secret write for 4 days, plus test-code bugs that masked the same
breakage in the smoke output.

mcpd — fail loud on dead OpenBao tokens
=======================================
secret-backend-rotator.service.ts
  When `mintRoleToken` or `lookupSelf` returns 403/401, classify it as
  BACKEND_TOKEN_DEAD (likely cause: upstream OpenBao re-init invalidated
  every pre-existing token), wrap the thrown error with explicit
  remediation (mint via root + `mcpctl create secret <name> --data
  <key>=<token> --force`), persist the same message to
  tokenMeta.lastRotationError, and emit a structured `level:fatal`
  console.error so it shows up in `kubectl logs deploy/mcpd` with grep-
  friendly `kind:BACKEND_TOKEN_DEAD`. Adds a `healthCheck(backendId)`
  method that runs lookup-self without minting — so the boot-time loop
  can detect the dead-token state immediately, not 24 hours later.

secret-backend-rotator-loop.ts
  Boot-time health check: in `start()`, for every rotatable backend, call
  `rotator.healthCheck(b.id)` and on failure log a structured fatal entry.
  This converts the prior silent failure mode (24h wait until scheduled
  rotation reveals the dead token, with secret writes failing under it
  the entire time) into "mcpd boots, immediately sees the dead token,
  alerts loudly". Existing isOverdue path is unchanged.

mcpd — Prisma userId crash on /me
=================================
routes/auth.ts
  GET /api/v1/auth/me used `request.userId!` which lied: an authenticated
  McpToken bearer satisfies the auth middleware but has no associated
  User row, so userId stayed undefined and `findUnique({ id: undefined })`
  threw PrismaClientValidationError. Now returns 401 with a clear
  "service-account/token-bound principal cannot be queried via /me"
  message instead of bubbling a 500.

mcplocal — token revocation propagation
=======================================
http/token-auth.ts
  Lowered default introspection positiveTtl from 30s → 5s. mcpd's
  introspection endpoint is a single DB lookup; the cache only protects
  against burst restart storms, not steady-state load. The 30s window
  let revoked tokens keep working for the full window after revocation
  (caught by mcptoken.smoke's negative-cache assertion). Aligns with the
  existing 5s negativeTtl and the test's `wait 7s after revoke` expectation.

smoke tests — read URL the same way the CLI does
================================================
mcp-client.ts
  Adds `loadMcpdAuth()`: URL from `~/.mcpctl/config.json`, token from
  `~/.mcpctl/credentials`. Critically, the URL does NOT come from
  credentials. credentials.mcpdUrl carries a stale field for legacy
  reasons and goes out of sync (left over from old `mcpctl login
  --mcpd-url localhost:3xxx` invocations) — tests reading it ended up
  hitting whatever URL the user last logged into rather than the URL
  the CLI is actually using right now. audit/security/system-prompts
  smoke now use loadMcpdAuth(), eliminating ~10 cascade failures.
  Also: switch httpRequest to https.request when scheme is https
  (matching audit/security/system-prompts/mcp-client/agent helpers).
  Bumps default callTool timeout from 30s → 60s; many tools that fetch
  external resources routinely run 10-30s.

agent.smoke.test.ts
  - readToken read from `credentials.json`; the file is `credentials`
    (no extension). Caused 401 on POST /threads.
  - `mcpctl get <resource> <name> -o json` returns an array, not a bare
    object. Round-trip yaml test now indexes [0] before reading
    description.

secretbackend.smoke.test.ts
  Two genuine assertion-drift fixes (env was right, test was stale):
  - "lists at least one secretbackend": stop hard-coding the default
    backend type as 'plaintext'; the invariant is "exactly one default
    exists". The seeded plaintext is the bootstrap default but operators
    routinely promote a remote backend (openbao etc.) once it's healthy.
  - "refuses to delete the seeded default": widen the regex from
    /default|in use|cannot delete/ to also accept "referenced" — the
    exact wording has shifted to "is still referenced by N secret(s);
    migrate them first".

audit.test.ts / system-prompts.test.ts / security.test.ts
  Switch http.request → https.request when URL is https (each had its
  own copy of the helper). Drop the now-orphan loadMcpdCredentials in
  favour of loadMcpdAuth from mcp-client.ts.

Tests
=====
mcpd 759/759, mcplocal 715/715 unit suites still green. Smoke (live):
  Run 1 (pre-commit, post bao-token rotation):  27 → 12 failures.
  Run 2 (after fixes-batch, pre-redeploy):      12 →  2 failures.
The remaining 2 (mcptoken cache TTL, proxy-pipeline timeout) are what
the durable code changes here address; verify after the next redeploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 18:35:13 +01:00
Michal
8b56f09f25 feat(agents): smoke tests + README + docs (Stage 6, final)
Closes the agents feature.

Smoke tests (run via `pnpm test:smoke` against a live mcpd at
$MCPD_URL, default https://mcpctl.ad.itaz.eu):

* tests/smoke/agent.smoke.test.ts — full CRUD round-trip:
  create secret + Llm + agent with sampling defaults; `get agents`
  surfaces it; `get agent foo -o yaml | apply -f` round-trips
  identically; create + list a thread via the HTTP API; agent delete
  leaves Llm + secret intact (Restrict + SetNull as designed). Self-
  skips with a warning when /healthz is unreachable.

* tests/smoke/agent-chat.smoke.test.ts — gated on
  MCPCTL_SMOKE_LLM_URL + MCPCTL_SMOKE_LLM_KEY. Provisions secret +
  Llm + agent against a real upstream, runs `mcpctl chat -m … --no-
  stream` (asserts a reply lands), then runs the streaming default
  (asserts text on stdout + `(thread: …)` on stderr). The fast path
  for verifying the in-cluster qwen3-thinking deployment:

      MCPCTL_SMOKE_LLM_URL=http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
      MCPCTL_SMOKE_LLM_MODEL=qwen3-thinking \
      MCPCTL_SMOKE_LLM_KEY=$(pulumi config get --stack homelab \
        secrets:litellmMcpctlGatewayToken) \
        pnpm test:smoke

Docs:

* README.md — new "Agents" section under Resources with the
  qwen3-thinking quickstart and links to docs/agents.md and
  docs/chat.md. Adds llm + agent rows to the resources table.

* docs/agents.md (new) — full reference: data model, chat-parameter
  table, HTTP API, RBAC mapping, tool-use loop semantics, yaml
  round-trip shorthand, the kubernetes-deployment wiring recipe,
  and a troubleshooting section (namespace collision, llm-in-use,
  pending-row recovery, Anthropic-tool limitation).

* docs/chat.md (new) — user-facing `mcpctl chat` walkthrough:
  modes, per-call flags, slash-commands, threads, and a
  troubleshooting section.

* CLAUDE.md — adds a "Resource types" cheatsheet with one-line
  pointers to each, including the new `agent` row that links to
  the docs.

All suites still green: mcpd 759/759, mcplocal 715/715, cli 430/430.
Smoke tests typecheck and self-skip when no live mcpd is reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 17:08:37 +01:00
Michal
727e7d628c feat(agents): mcpctl chat REPL + agent CRUD + completions (Stage 5)
This is the moment the user can actually talk to an agent end-to-end:

  mcpctl create llm qwen3-thinking --type openai --model qwen3-thinking \
    --url http://litellm.nvidia-nim.svc.cluster.local:4000/v1 \
    --api-key-ref litellm-key/API_KEY
  mcpctl create agent reviewer --llm qwen3-thinking --project mcpctl-dev \
    --description "I review security design — ask me after each major change."
  mcpctl chat reviewer

Pieces:

* src/cli/src/commands/chat.ts (new) — REPL + one-shot. Streams the SSE
  endpoint and prints text deltas to stdout as they arrive; tool_call /
  tool_result events go to stderr in dim-style brackets so the chat
  output stays clean. LiteLLM-style flags (--temperature / --top-p /
  --top-k / --max-tokens / --seed / --stop / --allow-tool / --extra)
  layer over agent.defaultParams. In-REPL slash-commands: /set KEY VAL,
  /system <text>, /tools (list project's MCP servers), /clear (new
  thread), /save (PATCH agent.defaultParams = current overrides),
  /quit.

* src/cli/src/commands/create.ts — `create agent` mirroring the llm
  pattern. Every yaml-applyable field has a corresponding flag (memory
  rule); --default-temperature / --default-top-p / --default-top-k /
  --default-max-tokens / --default-seed / --default-stop /
  --default-extra / --default-params-file all populate agent.defaultParams.

* src/cli/src/commands/apply.ts — AgentSpecSchema accepts both `llm:
  qwen3-thinking` shorthand and `llm: { name: ... }` long form; runs
  after llms in the apply order so apiKey/llm references resolve. Round-
  trips with `get agent foo -o yaml | apply -f -` (memory rule).

* src/cli/src/commands/get.ts — agentColumns (NAME, LLM, PROJECT,
  DESCRIPTION, ID); RESOURCE_KIND mapping for yaml export.

* src/cli/src/commands/shared.ts — `agent`/`agents`/`thread`/`threads`
  added to RESOURCE_ALIASES.

* src/cli/src/index.ts — wires createChatCommand into the program; passes
  the resolved baseUrl + token so chat can stream SSE without going
  through ApiClient (which only does buffered request/response).

* completions/mcpctl.{fish,bash} regenerated. scripts/generate-completions.ts
  knows about agents (canonical + aliases) and emits a special-case
  `chat)` block that completes the first arg with `mcpctl get agents`
  names. tests/completions.test.ts: +9 new assertions covering agents in
  the resource list, chat in the commands list, --llm flag for create
  agent, agent-name completion for chat, etc.

CLI suite: 430/430 (was 421). Completions --check is clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 17:02:38 +01:00
Michal
285be11dd5 feat(agents): mcplocal agents plugin + composePlugins helper (Stage 4)
When a Claude (or any other MCP client) connects to a project's mcplocal
endpoint, every Agent attached to that project now appears in the
session's tools/list as a virtual MCP server named `agent-<agentName>`
with one tool `chat`. Calling that tool POSTs to the Stage 3 chat
endpoint and returns the assistant's reply as MCP content. The tool's
description is the agent's own description, so connecting clients see
prose like "I review security design — ask me after each major change."
This is what makes one agent reachable from another's MCP session.

Plumbing:
  * src/mcplocal/src/proxymodel/plugins/agents.ts (new) — the plugin.
    onSessionCreate fetches /api/v1/projects/:p/agents via mcpd, then
    registers a VirtualServer per agent. The chat tool's inputSchema
    mirrors the LiteLLM-style override surface (temperature, top_p,
    top_k, max_tokens, stop, seed, tools_allowlist, extra) plus
    threadId for follow-ups. Namespace collision with an existing
    upstream MCP server named `agent-<x>` is detected and skipped with
    a `ctx.log.warn` line — better to surface the conflict than to
    silently shadow real tool entries in the virtualTools map.
  * src/mcplocal/src/proxymodel/plugins/compose.ts (new) — generic
    N-plugin composition helper. Lifecycle hooks fan out in order;
    transform hooks (onToolsList, onResourcesList, onPromptsList,
    onToolCallAfter) pipeline; intercept hooks (onToolCallBefore,
    onResourceRead, onPromptGet, onInitialize) short-circuit on the
    first non-null. Generalizes what createDefaultPlugin does for
    two fixed parents.
  * src/mcplocal/src/http/project-mcp-endpoint.ts — every project
    session now uses composePlugins([defaultPlugin, agentsPlugin]) so
    agents show up no matter which proxymodel the project is on.
  * Plugin context: added getFromMcpd(path) alongside postToMcpd. The
    existing postToMcpd was hard-coded to POST; the agents plugin
    needs GET to discover. Wired through plugin.ts → plugin-context.ts
    → router.ts.

Tests:
  plugin-agents.test.ts (8) — registers per agent, falls back to a
    generic description, skips on namespace collision, no-ops with
    zero agents, logs and continues on mcpd error, chat handler
    POSTs correct body and returns content array, isError surfacing
    on mcpd error, onSessionDestroy unregisters everything.
  plugin-compose.test.ts (6) — single-plugin pass-through, empty
    rejection, lifecycle ordering, intercept short-circuit, list
    pipeline, no-op composition stays minimal.

mcplocal suite: 715/715. mcpd suite still 759/759.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 16:51:44 +01:00
Michal
03ae4e15f7 feat(agents): mcpd routes + RBAC + tool dispatcher (Stage 3)
Wires the Stage 2 services into HTTP. New routes:

  GET    /api/v1/agents                — list
  GET    /api/v1/agents/:idOrName       — describe
  POST   /api/v1/agents                 — create
  PUT    /api/v1/agents/:idOrName       — update
  DELETE /api/v1/agents/:idOrName       — delete
  GET    /api/v1/projects/:p/agents     — project-scoped list (mcplocal disco)
  POST   /api/v1/agents/:name/chat      — chat (non-streaming or SSE stream)
  POST   /api/v1/agents/:name/threads   — create thread explicitly
  GET    /api/v1/agents/:name/threads   — list threads
  GET    /api/v1/threads/:id/messages   — replay history

The chat endpoint reuses the SSE pattern from llm-infer.ts (same headers
incl. X-Accel-Buffering:no, same `data: …\n\n` framing, same `[DONE]`
terminator). Each ChatService chunk is one frame. Non-streaming returns
{threadId, assistant, turnIndex} as JSON.

RBAC mapping in main.ts:mapUrlToPermission:
  - /agents/:name/{chat,threads*}  → run:agents:<name>
  - /threads/:id/*                 → view:agents (service-level owner check
    handles fine-grained access since the URL doesn't carry the agent name)
  - /agents and /agents/:idOrName  → default {GET:view, POST:create,
    PUT:edit, DELETE:delete} on resource 'agents'.
'agents' added to nameResolvers so RBAC's CUID→name lookup works.

ChatToolDispatcherImpl bridges ChatService to McpProxyService: it lists a
project's MCP servers, fans out tools/list calls to each, namespaces tool
names as `<server>__<tool>`, and routes tools/call back to the right
serverId on dispatch. tools/list errors on a single server are logged and
that server's tools are dropped from the turn's tool surface — one bad
server doesn't poison the whole list.

Tests:
  agent-routes.test.ts (15) — full HTTP CRUD round-trip, 404/409 paths,
    project-scoped list, non-streaming + SSE chat, thread create/list,
    /threads/:id/messages replay, body-required 400.
  chat-tool-dispatcher.test.ts (7) — empty list when no project / no
    servers, namespacing + inputSchema forwarding, partial-failure
    skipping with audit log, callTool dispatch shape, missing-server
    rejection, JSON-RPC error surfacing.

All 22 new green; mcpd suite now 759/759 (was 737).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 16:45:15 +01:00
Michal
eda8e79712 feat(agents): mcpd repos + Agent/Chat services with tool-use loop (Stage 2)
Layers the persistence-side logic on top of the Stage 1 schema. AgentService
mirrors LlmService's CRUD shape with name-resolved llm/project references and
yaml round-trip support; ChatService is the orchestrator that drives one chat
turn end-to-end: build the merged system block (agent.systemPrompt + project
Prompts ordered by priority desc + per-call systemAppend), persist the user
turn, run the adapter, dispatch any tool_calls through an injected
ChatToolDispatcher, persist tool turns linked back via toolCallId, and loop
until the model returns terminal text.

Per-call params resolve LiteLLM-style: request body → agent.defaultParams →
adapter default. The escape hatch `extra` is forwarded as-is so each adapter
can cherry-pick provider-specific knobs (Anthropic metadata, vLLM
repetition_penalty, etc.) without code changes here.

Persistence is non-transactional across the loop because tool calls can take
minutes; long-held DB transactions would starve other writers. Instead each
in-flight assistant turn is written `pending` and flipped to `complete` only
after its tool results land. On failure or max-iter overrun, every `pending`
row in the thread is flipped to `error` so the trail is auditable.

Tools are namespaced on the wire as `<server>__<tool>`, unmarshalled at
dispatch time; `tools_allowlist` filters before the model sees the list.

Tests:
  agent-service.test.ts (7) — CRUD with name-resolved llm/project, conflict
    on duplicate, llm switch, project detach, listByProject filtering,
    upsertByName branch coverage.
  chat-service.test.ts (9) — plain text turn, full text→tool→text loop with
    toolCallId linkage, max-iter cap leaves zero pending, adapter-throws
    leaves zero pending, body→defaultParams merge, `extra` passthrough,
    project-Prompt priority ordering in the system block, tool-without-
    project rejection, tools_allowlist filtering.

All 16 green; full mcpd suite still 737/737.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 16:38:38 +01:00
Michal
3726a65f53 feat(agents): add Agent + ChatThread + ChatMessage schema (Stage 1)
Introduces the persistence layer for the upcoming Agent feature: an LLM
persona pinned to a specific Llm, optionally attached to a Project, with
persisted chat threads/messages so conversations survive REPL exits.

Constraint shape:
- Agent.llm uses ON DELETE RESTRICT — deleting an Llm in active use fails.
- Agent.project uses ON DELETE SET NULL — agents survive project deletion.
- ChatThread → ChatMessage cascade so deleting an agent purges its history.
- ChatMessage @@unique([threadId, turnIndex]) gives append ordering even
  under racing writers (services retry on collision).

LiteLLM-style per-call overrides will live in Agent.defaultParams (Json);
the loose extras Json field is reserved for future LoRA/tool-allowlist work.

Pinned vitest fileParallelism=false in @mcpctl/db: all suites share the
same Postgres, and adding a second suite exposed FK contention between a
clearAllTables in one file and a create in another. Per-test isolation
still comes from beforeEach.

Tests: 8/8 green in src/db/tests/agent-schema.test.ts (defaults, name
uniqueness, llm-in-use Restrict, project-delete SetNull, agent-delete
cascade, duplicate (threadId, turnIndex) blocked, tool-call payload
round-trip, lastTurnAt DESC ordering).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 16:29:55 +01:00
Michal
6ac79de8a4 feat(secrets): one-shot startup backfill for keyNames on existing rows
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / test (push) Successful in 1m8s
CI/CD / typecheck (push) Successful in 2m20s
CI/CD / build (push) Successful in 2m49s
CI/CD / smoke (push) Failing after 3m16s
CI/CD / publish (push) Has been skipped
Lazy backfill in SecretService.getById covers per-row retries, but list
views still show 'KEYS: -' until each row is described. New
backfillSecretKeyNames bootstrap runs once at startup, finds Secrets
where keyNames=[] AND data={} (i.e. backend-stored, pre-existing rows),
calls resolveData to learn the keys, persists. Sequential to be kind to
the upstream backend on cold start. Idempotent + non-fatal.
2026-04-24 01:01:40 +01:00
Michal
9a808877b5 feat(secrets): track key names so list/describe work for backend-stored secrets
Some checks failed
CI/CD / lint (push) Successful in 53s
CI/CD / test (push) Successful in 1m6s
CI/CD / typecheck (push) Successful in 2m11s
CI/CD / smoke (push) Failing after 1m42s
CI/CD / publish (push) Has been cancelled
CI/CD / build (push) Has been cancelled
Post-migration, every Secret on a non-plaintext backend had an empty `data`
column (values live in the backend; only externalRef on the row). The CLI's
\`get secrets\` showed \`KEYS: -\` and \`describe secret\` showed \`(empty)\` for
all 9 migrated secrets — useless without --show-values.

Fix: dedicated \`keyNames Json\` column on Secret that stores the sorted key
list independently from the values. Populated on every write path, lazily
backfilled on first read for pre-existing rows that pre-date the column.
Schema default \`[]\` keeps prisma db push self-healing on rolling upgrades.

- src/db/prisma/schema.prisma: add Secret.keyNames Json @default("[]")
- src/mcpd/src/repositories/secret.repository.ts: pipe keyNames through create
  + update
- src/mcpd/src/services/secret.service.ts:
  - create/update populate keyNames = sorted Object.keys(data)
  - getById lazy-backfills empty keyNames (cheap: derives from data for
    plaintext, single backend read for openbao)
- src/mcpd/src/services/secret-migrate.service.ts: migrate writes keyNames
  alongside the new backendId so freshly-migrated rows are populated without
  a follow-up read
- src/cli/src/commands/get.ts: KEYS column reads keyNames first, falls back
  to Object.keys(data) for older rows
- src/cli/src/commands/describe.ts: shows the Data section keys whenever
  keyNames OR data has entries (so backend-stored secrets render their key
  list); --show-values still resolves through the backend

After deploy, the 9 already-migrated secrets backfill their keyNames on the
next describe-by-id, with no operator action needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:57:06 +01:00
Michal
b1bccee50d test(describe): mock the ?reveal=true path on --show-values
Some checks failed
CI/CD / lint (push) Successful in 54s
CI/CD / test (push) Successful in 1m7s
CI/CD / typecheck (push) Successful in 2m19s
CI/CD / smoke (push) Failing after 5m9s
CI/CD / publish (push) Has been cancelled
CI/CD / build (push) Has been cancelled
Follow-up to faccbb5: the describe-secret test for --show-values used the
old fetchResource shape, so it broke after the route now goes through
client.get directly with ?reveal=true.
2026-04-24 00:49:22 +01:00
Michal
faccbb58e7 fix(secrets): describe --show-values resolves through the backend driver
Some checks failed
CI/CD / lint (push) Successful in 55s
CI/CD / test (push) Failing after 1m5s
CI/CD / typecheck (push) Has started running
CI/CD / smoke (push) Has been cancelled
CI/CD / build (push) Has been cancelled
CI/CD / publish (push) Has been cancelled
Post-migration, every Secret on a non-plaintext backend has empty `Secret.data`
(the actual value lives in the backend; only externalRef is on the row).
`describe secret --show-values` was reading the raw row, so the user saw
"Data: (empty)" for every migrated secret.

- Route GET /api/v1/secrets/:id accepts ?reveal=true; when set, resolves the
  value via SecretService.resolveData() so the response carries the actual
  data dispatched through the right driver.
- CLI --show-values flips that query param. Without --show-values the route
  returns the raw row exactly as before (no leak risk).

Caught running the wizard end-to-end on the live cluster after the
ClusterMesh fix on the kubernetes-deployment side made bao reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:46:54 +01:00
Michal
bf312850b5 fix(openbao): include response body in error messages
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / typecheck (push) Successful in 51s
CI/CD / test (push) Successful in 1m4s
CI/CD / smoke (push) Failing after 1m36s
CI/CD / build (push) Successful in 2m19s
CI/CD / publish (push) Has been skipped
Debugging the wizard migration flow, every OpenBao error was just
`HTTP 403` with no context. The response body often carries the actual
reason (missing capability, specific path, namespace mismatch), so
surfacing it makes operator debugging a one-step task. Added a shared
bodyText() helper that trims huge HTML error pages to 400 chars.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 21:01:03 +01:00
Michal
72e49f719f fix(mcpd): skip bootstrap tokens on migrate + back-fill ops on existing admins
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / typecheck (push) Successful in 1m45s
CI/CD / test (push) Successful in 1m2s
CI/CD / build (push) Successful in 2m9s
CI/CD / smoke (push) Has started running
CI/CD / publish (push) Has been cancelled
Two production issues caught running the wizard end-to-end:

1. `mcpctl migrate secrets --from default --to bao` listed `bao-creds` as a
   candidate — the very token that lets mcpd reach bao. Moving it would
   brick the auth chain (destination backend needs its own bootstrap token
   to read its own bootstrap token). Fix: SecretMigrateService now calls
   backends.list() and filters out any Secret whose name matches ANY
   SecretBackend's `config.tokenSecretRef.name`. dryRun mirrors the same
   filter so candidates match reality. `--names` explicitly bypasses the
   filter for operators who really mean it.

2. Initial rotation in the wizard 403'd because the global RBAC hook
   demands the `rotate-secretbackend` operation, which wasn't in
   bootstrap-admin — migrateAdminRole only added ops when processing a
   legacy `role: admin` entry, so already-migrated admin rows missed every
   new op added after their initial migration. Fix: migrateAdminRole now
   also runs a back-fill pass on rows that look admin-equivalent (have both
   `edit:*` and `run:*`), appending any missing op from ADMIN_OPS. Writes
   only when something actually changed, so restarts stay quiet. Same path
   also retroactively grants `migrate-secrets` which had the same problem
   yesterday.

Tests: 4 new migrate-service cases (bootstrap filter on/off, dryRun parity,
--names bypass). Full suite 1889/1889.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:56:00 +01:00
Michal
56a4ff7f17 chore: regenerate completions after --setup-token rename
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / test (push) Successful in 1m4s
CI/CD / typecheck (push) Successful in 2m2s
CI/CD / smoke (push) Failing after 1m36s
CI/CD / build (push) Successful in 4m53s
CI/CD / publish (push) Has been skipped
2026-04-20 17:28:05 +01:00
Michal
1c5301289c refactor(wizard): rename --admin-token → --setup-token
Some checks failed
CI/CD / typecheck (push) Has been cancelled
CI/CD / test (push) Has been cancelled
CI/CD / smoke (push) Has been cancelled
CI/CD / build (push) Has been cancelled
CI/CD / publish (push) Has been cancelled
CI/CD / lint (push) Has been cancelled
Any token with policy-write + auth/token admin works; root is a convenient
default but a scoped service account is fine too. The previous naming
misrepresented the permission floor as root-only.

- flag: --admin-token → --setup-token
- wizard field: adminToken → setupToken
- prompt label: "OpenBao admin / root token" → "OpenBao setup token (needs
  policy write + auth/token admin perms; root is fine)"
- file doc + one comment reworded
- tests updated for the new label
- regression test (token-absent-from-stdout) kept unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 17:27:09 +01:00
ba4129a1e4 Merge pull request 'feat(openbao): wizard + daily token rotation' (#56) from feat/openbao-wizard into main
Some checks failed
CI/CD / lint (push) Successful in 51s
CI/CD / test (push) Successful in 1m4s
CI/CD / typecheck (push) Successful in 1m56s
CI/CD / build (push) Has been cancelled
CI/CD / publish (push) Has been cancelled
CI/CD / smoke (push) Has been cancelled
2026-04-20 16:22:50 +00:00
Michal
dd4246878d feat(openbao): wizard-provisioning + daily token rotation
Some checks failed
CI/CD / typecheck (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m4s
CI/CD / lint (pull_request) Successful in 2m2s
CI/CD / smoke (pull_request) Failing after 1m36s
CI/CD / build (pull_request) Successful in 4m13s
CI/CD / publish (pull_request) Has been skipped
One-command setup replaces the 6-step manual flow — `mcpctl create
secretbackend bao --type openbao --wizard` takes the OpenBao admin token
once, provisions a narrow policy + token role, mints the first periodic
token, stores it on mcpd, verifies end-to-end, and prints the migration
command. The admin token is NEVER persisted.

The stored credential auto-rotates daily: mcpd mints a successor via the
token role (self-rotation capability is part of the policy it was issued
with), verifies the successor, writes it over the backing Secret, then
revokes the predecessor by accessor. TTL 720h means a week of rotation
failures still leaves 20+ days of runway.

Shared:
- New `@mcpctl/shared/vault` — pure HTTP wrappers (verifyHealth,
  ensureKvV2, writePolicy, ensureTokenRole, mintRoleToken, revokeAccessor,
  lookupSelf, testWriteReadDelete) and policy HCL builder.

mcpd:
- `tokenMeta Json @default("{}")` on SecretBackend. Self-healing schema
  migration — empty default lets `prisma db push` add the column cleanly.
- SecretBackendRotator.rotateOne: mint → verify → persist → revoke-old →
  update tokenMeta. Failures surface via `lastRotationError` on the row;
  the old token keeps working.
- SecretBackendRotatorLoop: on startup rotates overdue backends, schedules
  per-backend timers with ±10min jitter. Stops cleanly on shutdown.
- New `POST /api/v1/secretbackends/:id/rotate` (operation
  `rotate-secretbackend` — added to bootstrap-admin's auto-migrated ops
  alongside migrate-secrets, which was previously missing too).

CLI:
- `--wizard` on `create secretbackend` delegates to the interactive flow.
  All prompts can be pre-answered via flags (--url, --admin-token,
  --mount, --path-prefix, --policy-name, --token-role,
  --no-promote-default) for CI.
- `mcpctl rotate secretbackend <name>` — convenience verb; hits the new
  rotate endpoint.
- `describe secretbackend` renders a Token health section (healthy /
  STALE / WARNING / ERROR) with generated/renewal/expiry timestamps and
  last rotation error. Only shown when tokenMeta.rotatable is true — the
  existing k8s-auth + static-token backends don't surface it.

Tests: 15 vault-client unit tests (shared), 8 rotator unit tests (mcpd),
3 wizard flow tests (cli, including a regression test that the admin
token never appears in stdout). Full suite 1885/1885 (+32). Completions
regenerated for the new flags.

Out of scope (explicit): kubernetes-auth wizard, Vault Enterprise
namespaces in the wizard path, rotation for non-wizard static-token
backends. See plan file for details.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 17:20:37 +01:00
Michal
515206685b feat(openbao): kubernetes ServiceAccount auth — no static token in DB
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / test (push) Successful in 1m5s
CI/CD / typecheck (push) Successful in 2m8s
CI/CD / smoke (push) Failing after 3m38s
CI/CD / build (push) Successful in 4m15s
CI/CD / publish (push) Has been skipped
Why: requiring a static OpenBao root token to live (even once-bootstrap) on
the plaintext backend is the weakest link in the chain. With the bao-side
Kubernetes auth method enabled, mcpd's pod can authenticate using its own
projected SA token, exchange it for a short-lived Vault client token, and
keep the database free of any vault credentials at all.

Driver changes (src/mcpd/src/services/secret-backends/openbao.ts):
- New `OpenBaoConfig.auth = 'token' | 'kubernetes'`. Defaults to 'token' so
  existing rows keep working. Both shapes share url + mount + pathPrefix +
  namespace; auth-specific fields are mutually exclusive in the config schema.
- Kubernetes auth flow: read JWT from /var/run/secrets/.../token, POST to
  /v1/auth/<authMount>/login {role, jwt}, cache the returned client_token
  for `lease_duration - 60s` (grace window), then re-login.
- One-shot 403-retry: if a request comes back 403 (revoked / clock skew),
  purge cache and retry the original request once with a fresh login.
- Reads + writes go through the same getToken() path so token-auth is
  unchanged for existing deployments.

CLI (src/cli/src/commands/create.ts):
- `mcpctl create secretbackend bao --type openbao --auth kubernetes \
     --url https://bao.example:8200 --role mcpctl`
- Optional `--auth-mount` (default 'kubernetes') + `--sa-token-path` (default
  the standard projected-token path) for non-default deployments.
- Token-auth path unchanged: `--auth token --token-secret SECRET/KEY`
  (or omit `--auth` since 'token' is the default).

Validation (factory.ts) gates on the auth strategy: each path enforces its
own required fields and produces a clear error if misconfigured.

Tests: 6 new k8s-auth unit cases (login wire shape, lease-based caching,
custom authMount, 403-on-login, missing-role rejection, missing-tokenSecretRef
rejection). Full suite 1859/1859. Completions regenerated for the new flags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 23:23:05 +01:00