The persistence + signaling layer for v5. No integration with the
existing in-flight inference path yet — that's Stage 2. This commit
just lands the durable queue underneath, with a state machine that
mcpd's HTTP handlers, the worker result-POST route, and the GC sweep
will all build on.
Schema (src/db/prisma/schema.prisma + migration):
- New `InferenceTask` model + `InferenceTaskStatus` enum
(pending|claimed|running|completed|error|cancelled).
- Routing fields stored at enqueue time so a later rename of
`Llm.poolName` doesn't reroute already-queued work: `poolName`
(effective pool key), `llmName` (pinned target), `model`, `tier`.
- Worker tracking: `claimedBy` (providerSessionId) + `claimedAt`,
cleared on revert.
- Bodies as `Json`: requestBody (always set), responseBody (set at
completion). Streaming chunks are NOT persisted — too expensive at
delta granularity. The final assembled body lands once per task.
- Lifecycle timestamps: createdAt, claimedAt, streamStartedAt,
completedAt. Plus ownerId (RBAC + audit) and agentId (null for
direct chat-llm calls).
- Indexes for the hot paths: (status, poolName) for the dispatcher's
drain query, claimedBy for the disconnect revert, completedAt for
the GC retention sweep, owner/agent for the async API listing.
Repository (src/mcpd/src/repositories/inference-task.repository.ts):
- CRUD + state transitions as conditional CAS via `updateMany`. Two
workers racing to claim the same row both run the UPDATE; whichever
the DB serializes first sees affected=1 and gets the row, the loser
sees 0 and falls through to the next candidate. No application-
level locking required.
- findPendingForPools(poolNames[]) for the worker drain on bind.
- findHeldBy(claimedBy) for the unbindSession revert.
- findStalePending + findExpiredTerminal for the GC sweep.
Service (src/mcpd/src/services/inference-task.service.ts):
- Owns the in-process EventEmitter that wakes blocked HTTP handlers
when a worker POSTs results. The DB row is the source of truth for
*state*; the EventEmitter just signals "go re-read row X" so we
don't have to poll. Single-instance assumption for v5; pg
LISTEN/NOTIFY is the v6 swap when scaling horizontally — no schema
change needed, just replace the emitter wakeup.
- waitFor(taskId, timeoutMs) returns { done, chunks }: the terminal
promise + an async iterator of streaming deltas. Throws on cancel
(clear message) or error (worker's errorMessage propagates) or
timeout. Polls the row once at subscribe time so an already-
terminal task resolves immediately without waiting for an event
that's never coming.
- gcSweep flips stale pending rows to error (with a clear message
about the timeout) and deletes terminal rows past retention.
Defaults: 1h pending timeout, 7d terminal retention; both
configurable.
Tests:
- 6 db-level schema tests (defaults, json roundtrip, drain query
shape, claimedBy filter, GC predicate, agentId nullable).
- 13 service tests covering enqueue, the CAS race on tryClaim,
complete/fail/cancel, idempotent terminal transitions, revertHeldBy
on disconnect, and the full waitFor signal lifecycle (immediate
resolve, wake on event, chunk streaming, cancel/error/timeout
paths). Plus a gcSweep test with a fixed clock.
mcpd 881/881 (was 868; +13). db pool-schema 14/14, +6 new
inference-task-schema. Pre-existing failures in models.test.ts
(Secret FK fixture issue, also fails on main HEAD) are unrelated.
Stage 2 (next): VirtualLlmService rewires through this — remove the
in-memory pendingTasks map; enqueue creates a row, dispatch picks an
active session, the result-route updates the row + emits the wakeup.
Worker disconnect reverts; worker bind drains.
Adds LB-pool-by-shared-name without introducing a new resource. The
existing `Llm.name` stays globally unique; a new optional `poolName`
column declares membership in a pool. Multiple Llms sharing a non-null
`poolName` stack into one load-balanced pool that the chat dispatcher
expands at request time.
Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are
addressable as a "pool of 1" via their own name, so existing single-Llm
agents and YAMLs keep working unchanged. A solo row whose name happens
to match an explicit poolName joins the same pool — by design — so an
operator can transparently promote an existing Llm to pool seed.
Dispatcher (chat.service): prepareContext now resolves a randomly-
shuffled list of viable pool candidates (status != inactive) once per
turn. runOneInference and streamInference iterate the list on
transport-level failure (network, virtual publisher disconnect) until
one succeeds or the list is exhausted. Streaming failover only covers
"failed before first chunk" — once we've yielded text, we're committed
to that backend. Auth/4xx errors surfaced as result.status are NOT
retried; siblings with the same key/model would fail identically.
When the agent's pinned Llm is itself inactive but a sibling pool
member is up, dispatch transparently uses the sibling — that's the
whole point. When every member is inactive, prepareContext throws a
clear "No active Llm in pool '<key>' (pinned: <name>)" error rather
than letting the dispatcher's "exhausted" branch surface it.
Tests:
- 5 new chat-service tests for pool dispatch / failover / pinned-down /
all-inactive (chat-service.test.ts).
- 7 new db schema tests for the column, the unique-name invariant, the
fallback-to-name semantics, and the solo-name-joins-explicit-pool
edge case (llm-pool-schema.test.ts).
- mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions.
Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool
stats on the existing single-Llm route, CLI POOL column + describe
block + --pool-name flag, yaml round-trip.
Two pieces of v3 plumbing — schema + the latent v1 chat.service bug.
Schema (db):
- Agent gains kind/providerSessionId/lastHeartbeatAt/status/inactiveSince
mirroring Llm's v1 lifecycle. Reuses LlmKind / LlmStatus enums; no
new types. Existing rows backfill kind=public/status=active so v1
CRUD is unaffected.
- @@index([kind, status]) for the GC sweep, @@index([providerSessionId])
for disconnect-cascade lookups.
- 4 new prisma-level tests cover defaults, persisting virtual fields,
the (kind, status) GC index, and providerSessionId lookups.
Total agent-schema tests: 20/20.
chat.service (mcpd) — fixes the v1 latent bug:
- LlmView's kind is now plumbed through prepareContext as ctx.llmKind.
- Two new private helpers, runOneInference / streamInference, branch
on ctx.llmKind: 'public' goes through the existing adapter
registry, 'virtual' relays through VirtualLlmService.enqueueInferTask
(mirrors the route-handler branch from v1 Stage 3).
- Streaming bridges VirtualLlmService's onChunk callback API to an
async iterator via a small queue + wake pattern.
- ChatService gains an optional virtualLlms constructor parameter;
main.ts wires it in. Older test wirings without it raise a clear
"virtualLlms dispatcher not wired" error when the row is virtual,
rather than silently falling through to the public path against an
empty URL.
This unblocks any Agent (public OR future v3-virtual) pinned to a
kind=virtual Llm. Pre-this-stage, those agents 502'd against the
empty url field.
Tests: 4 new chat-service-virtual-llm.test.ts cover the relay path
non-streaming, streaming, missing-dispatcher error, and rejection
surfacing. mcpd suite: 841/841 (was 833, +8 across stages 1+v3-Stage-1).
Workspace: 2054/2054 across 153 files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First step of the virtual-LLM feature. A virtual Llm row is one that
gets *registered by an mcplocal client* rather than created via
\`mcpctl create llm\`. Its inference is relayed back through an SSE
control channel to the publishing session (mcpd routes added in
Stage 3). The lifecycle fields below let mcpd reap stale rows when
the publisher goes away.
Schema additions:
- enum LlmKind (public | virtual). Default public.
- enum LlmStatus (active | inactive | hibernating). Default active.
hibernating is reserved for v2 wake-on-demand.
- Llm.kind, providerSessionId, lastHeartbeatAt, status, inactiveSince.
- @@index([kind, status]) for the GC sweep.
- @@index([providerSessionId]) for the reconnect lookup.
All existing rows backfill with kind=public/status=active so v1 is
purely additive — public LLMs ignore the lifecycle columns entirely.
7 new prisma-level assertions in tests/llm-virtual-schema.test.ts
cover: defaults, persisting kind=virtual + lifecycle together, the
active→inactive flip, hibernating value, enum rejection, the
(kind,status) GC index, the providerSessionId reconnect index.
mcpd suite still 801/801 (regenerated client) and typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A Personality is a named overlay on top of an Agent — same agent,
same LLM, but a different bundle of prompts injected into the system
block at chat time. VLAN-on-ethernet semantics: ethernet still works
without VLAN; with a VLAN tag, frames are segmented but still ethernet.
Schema additions:
- Prompt.agentId (nullable FK + index, cascade on delete) so prompts
can attach directly to an agent without going through a project.
- Personality { id, name, description, agentId, priority } with
unique (name, agentId).
- PersonalityPrompt join table with per-binding priority override.
- Agent.defaultPersonalityId (SetNull on delete) so an agent can pick
one personality as the default when no --personality flag is passed.
Backwards-compatible by construction: every new column is nullable;
existing rows are valid as-is; the chat.service systemBlock changes
land in Stage 3.
8 new prisma-level assertions in agent-schema.test.ts cover unique
constraints, cascade behavior, the SetNull on defaultPersonalityId,
and shared-prompt-across-personalities. All 16 db tests pass; mcpd
typecheck + 777 mcpd unit tests still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces the persistence layer for the upcoming Agent feature: an LLM
persona pinned to a specific Llm, optionally attached to a Project, with
persisted chat threads/messages so conversations survive REPL exits.
Constraint shape:
- Agent.llm uses ON DELETE RESTRICT — deleting an Llm in active use fails.
- Agent.project uses ON DELETE SET NULL — agents survive project deletion.
- ChatThread → ChatMessage cascade so deleting an agent purges its history.
- ChatMessage @@unique([threadId, turnIndex]) gives append ordering even
under racing writers (services retry on collision).
LiteLLM-style per-call overrides will live in Agent.defaultParams (Json);
the loose extras Json field is reserved for future LoRA/tool-allowlist work.
Pinned vitest fileParallelism=false in @mcpctl/db: all suites share the
same Postgres, and adding a second suite exposed FK contention between a
clearAllTables in one file and a create in another. Per-test isolation
still comes from beforeEach.
Tests: 8/8 green in src/db/tests/agent-schema.test.ts (defaults, name
uniqueness, llm-in-use Restrict, project-delete SetNull, agent-delete
cascade, duplicate (threadId, turnIndex) blocked, tool-call payload
round-trip, lastTurnAt DESC ordering).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
proxyMode "direct" was a security hole (leaked secrets as plaintext env
vars in .mcp.json) and bypassed all mcplocal features (gating, audit,
RBAC, content pipeline, namespacing). Removed from schema, API, CLI,
and all tests. Old configs with proxyMode are accepted but silently
stripped via Zod .transform() for backward compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Exclude src/db/tests from workspace vitest config (needs test DB)
- Make global-setup.ts gracefully skip when test DB unavailable
- Fix exactOptionalPropertyTypes issues in proxymodel-endpoint.ts
- Use proper ProxyModelPlugin type for getPluginHooks function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add warmup() to LlmProvider interface for eager subprocess startup
- ManagedVllmProvider.warmup() starts vLLM in background on project load
- ProviderRegistry.warmupAll() triggers all managed providers
- NamedProvider proxies warmup() to inner provider
- paginate stage generates LLM-powered descriptive page titles when
available, cached by content hash, falls back to generic "Page N"
- project-mcp-endpoint calls warmupAll() on router creation so vLLM
is loading while the session initializes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive MCP server management with kubectl-style CLI.
Key features in this release:
- Declarative YAML apply/get round-trip with project cloning support
- Gated sessions with prompt intelligence for Claude
- Interactive MCP console with traffic inspector
- Persistent STDIO connections for containerized servers
- RBAC with name-scoped bindings
- Shell completions (fish + bash) auto-generated
- Rate-limit retry with exponential backoff in apply
- Project-scoped prompt management
- Credential scrubbing from git history
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace admin role with granular roles: view, create, delete, edit, run
- Two binding types: resource bindings (role+resource+optional name) and
operation bindings (role:run + action like backup, logs, impersonate)
- Name-scoped resource bindings for per-instance access control
- Remove role from project members (all permissions via RBAC)
- Add users, groups, RBAC CRUD endpoints and CLI commands
- describe user/group shows all RBAC access (direct + inherited)
- create rbac supports --subject, --binding, --operation flags
- Backup/restore handles users, groups, RBAC definitions
- mcplocal project-based MCP endpoint discovery
- Full test coverage for all new functionality
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce a Helm-chart-like template system for MCP servers. Templates are
YAML files in templates/ that get seeded into the DB on startup. Users can
browse them with `mcpctl get templates`, inspect with `mcpctl describe
template`, and instantiate with `mcpctl create server --from-template=`.
Also adds Portainer deployment scripts, mcplocal systemd service,
Streamable HTTP MCP endpoint, and RPM packaging for mcpctl-local.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the confused Profile abstraction with a dedicated Secret resource
following Kubernetes conventions. Servers now have env entries with inline
values or secretRef references. Env vars are resolved and passed to
containers at startup (fixes existing gap).
- Add Secret CRUD (model, repo, service, routes, CLI commands)
- Server env: {name, value} or {name, valueFrom: {secretRef: {name, key}}}
- Add env-resolver utility shared by instance startup and config generation
- Remove all profile-related code (models, services, routes, CLI, tests)
- Update backup/restore for secrets instead of profiles
- describe secret masks values by default, --show-values to reveal
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add PostgreSQL schema with 8 models (User, Session, McpServer, McpProfile,
Project, ProjectMcpProfile, McpInstance, AuditLog), comprehensive model
tests (31 passing), seed data for default MCP servers, and package exports.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>