mcpctl

Author	SHA1	Message	Date
Michal	e6cd73543a	fix(mcpd): fail-loud on env resolution + retry/backoff + readiness via proxy Three connected issues with how instances came up + got reported as healthy when their secret backend was unreachable. The motivating case: gitea-mcp-server starts when mcpd can't read the gitea-creds secret from OpenBao, runs with an empty GITEA_ACCESS_TOKEN, replies fine to tools/list (so liveness passes), but every authed call fails with "token is required" — and `mcpctl get instances` cheerfully reports the instance as healthy. ## What changed ### 1. Env resolution failures are now fatal for the start attempt `src/mcpd/src/services/instance.service.ts` The previous behaviour swallowed `resolveServerEnv` failures and let the container start anyway with whatever env survived ("non-fatal — container may still work if env vars are optional"). That's the bug: the gitea container started with no token, ran for weeks, and was reported healthy. The catch now calls `markInstanceError(instance, "secret resolution failed: <reason>")` and returns. Optional/missing env vars should be modelled as `value: ""` entries on the server, not as silent secret-resolution failures. ### 2. ERROR instances retry with backoff, not blind churn Adds Kubernetes-style escalation: 30 s × 5 attempts, then 5 min pauses thereafter. Retry state lives on `McpInstance.metadata` (no schema migration) — `attemptCount`, `lastAttemptAt`, `nextRetryAt`, `error`. The reconciler no longer tears down ERROR instances and creates fresh replacements (which would reset attemptCount and effectively loop at 30 s forever). Instead: - ERROR rows whose `nextRetryAt` is in the future are LEFT ALONE and counted against the replica budget — preventing tight create- fail-create churn while a previous attempt is in its backoff window. - ERROR rows whose `nextRetryAt` has elapsed are retried IN-PLACE via a new `retryInstance` method, which preserves attemptCount on the same row so the schedule actually escalates. The work has been factored into `startOne` (creates + initial attempt) + `attemptStart` (env + container) + `retryInstance` (re-attempt the same row) + `markInstanceError` (write retry metadata). ### 3. STDIO readiness probe goes through mcpProxyService `src/mcpd/src/services/health-probe.service.ts` The legacy `probeStdio` (a `docker exec node -e '... spawn(packageName) ...'` invocation) only worked for packageName-based servers. Image- based STDIO servers like gitea-mcp-server fell through with "No packageName or command for STDIO server" and were reported unhealthy for the WRONG reason — they have no packageName because they are an image, not because anything's wrong. New `probeReadinessViaProxy`: sends `tools/call` through the live running container via `mcpProxyService.execute`. Same code path as production traffic, so probe failures match real failures. Picks up: - JSON-RPC errors (e.g. "token is required" when env is empty). - Tool-level errors expressed as `result.isError: true`. - Connection failures wrapped as exceptions. - Hard timeouts via the deadline race. After this PR, configuring `gitea` with `healthCheck: { tool: get_me, intervalSeconds: 60 }` makes `mcpctl get instances` report it as `unhealthy` whenever the auth token is missing or wrong — which is honest. The dead `probeStdio` (~120 LOC) is removed; HTTP/SSE bespoke probe paths are kept for now (they work and the diff stays minimal). ## Tests `src/mcpd/tests/instance-service.test.ts`: - Replaces "cleans up ERROR instances and creates replacements" with "retries ERROR instances in-place when their backoff has elapsed". - Adds "leaves ERROR instances alone while their nextRetryAt is in the future" and "escalates the backoff: attemptCount + nextRetryAt persist on retry failures". `src/mcpd/tests/services/health-probe.test.ts`: - Swaps STDIO probe mocks from `orchestrator.execInContainer` → `mcpProxyService.execute`. - Adds "marks unhealthy when proxy returns a JSON-RPC error (e.g. broken-secret auth failure)" — explicitly the gitea case. - Adds "marks unhealthy when proxy returns a tool-level error in result.isError" — covers servers that report tool failures as isError instead of as JSON-RPC errors. - Renames "handles exec timeout" → "handles probe timeout" and exercises the deadline race rather than an exec rejection. Full suite: 162 test files / 2161 tests green (+4 new). ## Manual verification step (post-deploy) ```bash mcpctl edit server gitea # → add healthCheck: # tool: get_me # intervalSeconds: 60 # timeoutSeconds: 10 # failureThreshold: 3 ``` If OpenBao is still down: gitea instance enters ERROR with attemptCount + nextRetryAt visible in `mcpctl describe instance`. Otherwise: gitea env resolves at next start, probe passes, instance is honestly healthy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 18:55:23 +01:00
Michal	1ec286bb14	feat(mcpd): ResourceRevision + ResourceProposal services + Prompt revision integration Phase 2 of the Skills + Revisions + Proposals work. Stands up the generic revision/proposal layer and wires Prompt into it. Skills will plug into the same infrastructure in PR-3 with no further service changes required. This PR is intentionally additive: PromptRequest table and routes are unchanged. The /api/v1/proposals API runs side-by-side with the legacy /api/v1/promptrequests API. The PromptRequest cutover (rename + backfill + mcplocal rewire) is deferred to a later PR so this one stays reviewable. ## What's added ### Repositories (src/mcpd/src/repositories/) - resource-revision.repository.ts — append-only revision log keyed by (resourceType, resourceId). Soft FK; no relations declared. Supports history listing, semver lookup, and contentHash cross-resource search. - resource-proposal.repository.ts — generic propose queue. Status lifecycle pending → approved \| rejected. Mirrors Prompt's `?? ''` workaround for nullable-FK compound lookups. ### Services (src/mcpd/src/services/) - resource-revision.service.ts — record() inserts a revision with a stable sha256 contentHash computed from canonicalised JSON (key-sorted at every level so reordered objects produce the same hash). Caller passes a pre-computed semver; service does NOT decide bump policy. - resource-proposal.service.ts — propose / approve / reject / list, with a per-resourceType handler registry. PromptService registers the 'prompt' handler at construction; the SkillService will register 'skill' in PR-3. approve() runs in a Prisma $transaction so the resource update + revision insert + proposal status flip are atomic. ### Pure utility (src/mcpd/src/utils/semver.ts) - bumpSemver(current, kind) for major / minor / patch - compareSemver(a, b) — numeric, not lex (10 > 9) - isValidSemver(s) - Invalid input falls back to '0.1.0' rather than throwing — keeps the audit-write path from blowing up the prompt update if a row's semver ever drifts out of MAJOR.MINOR.PATCH shape. ### Routes (src/mcpd/src/routes/) - revisions.ts — GET /api/v1/revisions?resourceType=&resourceId=, GET /api/v1/revisions/:id, GET /api/v1/revisions/:id/diff?against=<id\|live> (unified-format diff via the `diff` package), and POST /api/v1/prompts/:id/restore-revision { revisionId, note? }. - proposals.ts — GET / POST /api/v1/proposals, GET /api/v1/proposals/:id, PUT for body updates, POST .../approve and POST .../reject, plus DELETE. ## What's changed - PromptService.create / update now record a ResourceRevision when the revision service is wired. Update auto-bumps patch on content change; authors can override via `--bump major\|minor\|patch` or `--semver X.Y.Z` on the CLI (forwarded into the PUT body). Best-effort: revision write failures are swallowed so the prompt save still succeeds (revision is audit, not source of truth). - PromptService.setProposalService registers a 'prompt' approval handler with the proposal service. Approval runs in a Prisma transaction: upsert prompt → record revision → update currentRevisionId → flip proposal status. semver bumps to 0.1.0 on first approval, patch thereafter. - New CLI flags on `mcpctl edit prompt`: --bump, --semver, --note. They're prompt-only (validated client-side); other resources reject them. - Aliases in shared.ts: `proposal`/`prop` → proposals, `revision`/`rev` → revisions. - diff dependency added to mcpd. ## Tests - src/mcpd/tests/utils/semver.test.ts — covers bump/compare/validate including numeric (not lex) semver compare and invalid-input fallback. - prompt-service.test.ts updated: makePrompt fixture now sets semver + agentId + currentRevisionId; updatePrompt assertion expects the auto-bumped patch in the same update call. - prompt-routes.test.ts updated symmetrically. ## RBAC `proposals` and `revisions` URL segments map to the existing `prompts` permission for now. PR-7 may split if a "reviewer" role becomes useful. ## Verification Full suite: 158 test files / 2127 tests green. `pnpm build` clean across all 6 workspace packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 00:38:35 +01:00
Michal	6b5bd78cfa	feat(mcpd): personality + prompt-by-agent repos and services (Stage 2) Wires the schema landed in Stage 1 into the service layer. No HTTP routes yet — Stage 3 will register `/api/v1/...` endpoints and update chat.service to read agent-direct + personality prompts when building the system block. Repositories: - PersonalityRepository: CRUD + listPrompts/attach/detach bindings. - PromptRepository: findByAgent + findByNameAndAgent; create/update accept the new agentId column. findGlobal now also filters agentId=null so agent-direct prompts don't leak into global lists. - AgentRepository: defaultPersonalityId on create + connect/disconnect in update. Services: - PersonalityService: CRUD scoped per agent, plus attach/detach with scope enforcement — a prompt may bind only if it's agent-direct on the same agent, in the agent's project, or global. Foreign-project / foreign-agent attachments are rejected with 400. - PromptService: createPrompt / upsertByName accept agentId and resolve `agent: <name>`, with XOR-with-project guard. Adds listPromptsForAgent. - AgentService: defaultPersonality (by name on the agent's own personality set) round-trips through update + AgentView. Validation: - prompt.schema.ts: refine() rejects projectId+agentId together. - personality.schema.ts: new Create/Update/AttachPrompt schemas. - agent.schema.ts: defaultPersonality { name } \| null on update. Tests: 12 PersonalityService + 7 PromptService agent-scope tests covering happy paths, XOR/scope enforcement, double-attach guard, detach-not-bound. mcpd suite: 796/796 (was 777). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 19:20:51 +01:00
Michal	3149ea3ae7	fix: MCP proxy resilience — discovery cache, default liveness probes Some checks failed CI/CD / lint (push) Successful in 52s Details CI/CD / typecheck (push) Successful in 1m51s Details CI/CD / test (push) Successful in 1m1s Details CI/CD / smoke (push) Failing after 3m21s Details CI/CD / build (push) Successful in 4m9s Details CI/CD / publish (push) Has been skipped Details Adds a per-server tools/list cache in McpRouter (positive + negative TTL) so a slow or dead upstream only stalls the first discovery call, not every subsequent client request. Invalidated on upstream add/remove. Health probes now apply a default liveness spec (tools/list via the real production path) to any RUNNING instance without an explicit healthCheck, so synthetic and real failures converge on the same signal. Includes supporting updates in mcpd-client, discovery, upstream/mcpd, seeder, and fulldeploy/release scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 00:48:57 +01:00
Michal	0995851810	feat: remove proxyMode — all traffic goes through mcplocal proxy proxyMode "direct" was a security hole (leaked secrets as plaintext env vars in .mcp.json) and bypassed all mcplocal features (gating, audit, RBAC, content pipeline, namespacing). Removed from schema, API, CLI, and all tests. Old configs with proxyMode are accepted but silently stripped via Zod .transform() for backward compatibility. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 23:36:36 +00:00
Michal	5d859ca7d8	feat: audit console TUI, system prompt management, and CLI improvements Audit Console Phase 1: tool_call_trace emission from mcplocal router, session_bind/rbac_decision event kinds, GET /audit/sessions endpoint, full Ink TUI with session sidebar, event timeline, and detail view (mcpctl console --audit). System prompts: move 6 hardcoded LLM prompts to mcpctl-system project with extensible ResourceRuleRegistry validation framework, template variable enforcement ({{maxTokens}}, {{pageCount}}), and delete-resets- to-default behavior. All consumers fetch via SystemPromptFetcher with hardcoded fallbacks. CLI: -p shorthand for --project across get/create/delete/config commands, console auto-scroll improvements, shell completions regenerated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 23:50:54 +00:00
Michal	03827f11e4	feat: eager vLLM warmup and smart page titles in paginate stage - Add warmup() to LlmProvider interface for eager subprocess startup - ManagedVllmProvider.warmup() starts vLLM in background on project load - ProviderRegistry.warmupAll() triggers all managed providers - NamedProvider proxies warmup() to inner provider - paginate stage generates LLM-powered descriptive page titles when available, cached by content hash, falls back to generic "Page N" - project-mcp-endpoint calls warmupAll() on router creation so vLLM is loading while the session initializes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 19:07:39 +00:00
Michal	ecc9c48597	feat: gated project experience & prompt intelligence Implements the full gated session flow and prompt intelligence system: - Prisma schema: add gated, priority, summary, chapters, linkTarget fields - Session gate: state machine (gated → begin_session → ungated) with LLM-powered tool selection based on prompt index - Tag matcher: intelligent prompt-to-tool matching with project/server/action tags - LLM selector: tiered provider selection (fast for gating, heavy for complex tasks) - Link resolver: cross-project MCP resource references (project/server:uri format) - Prompt summary service: LLM-generated summaries and chapter extraction - System project bootstrap: ensures default project exists on startup - Structural link health checks: enrichWithLinkStatus on prompt GET endpoints - CLI: create prompt --priority/--link, create project --gated/--no-gated, describe project shows prompts section, get prompts shows PRI/LINK/STATUS - Apply/edit: priority, linkTarget, gated fields supported - Shell completions: fish updated with new flags - 1,253 tests passing across all packages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 23:22:42 +00:00
Michal	b025ade2b0	feat: add prompt resources, fix MCP proxy transport, enrich tool descriptions - Fix MCP proxy to support SSE and STDIO transports (not just HTTP POST) - Enrich tool descriptions with server context for LLM clarity - Add Prompt and PromptRequest resources with two-resource RBAC model - Add propose_prompt MCP tool for LLM to create pending prompt requests - Add prompt resources visible in MCP resources/list (approved + session's pending) - Add project-level prompt/instructions in MCP initialize response - Add ServiceAccount subject type for RBAC (SA identity from X-Service-Account header) - Add CLI commands: create prompt, get prompts/promptrequests, approve promptrequest - Add prompts to apply config schema - 956 tests passing across all packages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 14:53:00 +00:00
Michal	738bfafd46	feat: MCP health probe runner — periodic tool-call probes for instances Implements Kubernetes-style liveness probes that call MCP tools defined in server healthCheck configs. For STDIO servers, uses docker exec to spawn a disposable MCP client that sends initialize + tool call. For HTTP/SSE servers, sends JSON-RPC directly. - HealthProbeRunner service with configurable interval/threshold/timeout - execInContainer added to orchestrator interface + Docker implementation - Instance findById now includes server relation (fixes describe showing IDs) - Events appended to instance (last 50), healthStatus tracked as healthy/degraded/unhealthy - 12 unit tests covering probing, thresholds, intervals, cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 00:38:48 +00:00

10 Commits