feat: virtual LLMs v1 (registration skeleton) #63
Merged
michal
merged 6 commits from 2026-04-27 13:38:51 +00:00
feat/virtual-llm-v1 into main
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
866f6abc88 |
feat: virtual-LLM smoke test + docs (v1 Stage 6)
Some checks failed
CI/CD / typecheck (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / lint (pull_request) Successful in 2m6s
CI/CD / smoke (pull_request) Failing after 1m39s
CI/CD / build (pull_request) Successful in 2m11s
CI/CD / publish (pull_request) Has been skipped
Final stage of v1. Smoke (mcplocal/tests/smoke/virtual-llm.smoke.test.ts): - Spins an in-process LlmProvider that returns canned content. - Runs the registrar against the live mcpd in fulldeploy. - Asserts: row appears with kind=virtual / status=active, infer through /api/v1/llms/<name>/infer comes back through the SSE relay with the provider's content + finish_reason, and a 503 appears immediately after registrar.stop() (publisher offline). - Times out / cleanup paths idempotent so re-runs against the same cluster don't litter rows. The 90-s heartbeat-stale flip and 4-h GC are unit-tested — too slow for smoke. Docs: - New docs/virtual-llms.md: when to use this vs creating a regular Llm row, how to opt-in via publish: true, the lifecycle table, the inference-relay sequence, the v1 streaming caveat, the v2-v5 roadmap, and the full /api/v1/llms/_provider-* surface. - agents.md cross-links virtual-llms.md alongside personalities/chat. - README's Agents section gains a "Virtual LLMs" subsection. Workspace suite: 2043/2043 (smoke files run separately). v1 closes. Stage roadmap (each its own future PR): v2 wake-on-demand · v3 virtual agents · v4 LB pool · v5 task queue Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7e6b0cab44 |
feat(cli): mcpctl chat-llm + KIND/STATUS columns (v1 Stage 5)
Closes the loop on user-facing surface: $ mcpctl get llm NAME KIND STATUS TYPE MODEL TIER KEY ID qwen3-thinking public active openai qwen3-thinking fast ... ... vllm-local virtual active openai Qwen/Qwen2.5-7B-Instruct fast - ... $ mcpctl chat-llm vllm-local ──────────────────────────────────────── LLM: vllm-local openai → Qwen/Qwen2.5-7B-Instruct-AWQ Kind: virtual Status: active ──────────────────────────────────────── > hello? Hi! … New: chat-llm command (commands/chat-llm.ts) - Stateless chat with any mcpd-registered LLM. No threads, no tools, no project prompts. POSTs to /api/v1/llms/<name>/infer; mcpd's kind=virtual branch handles relay-through-mcplocal transparently, so the same CLI command works for both public and virtual LLMs. - Reuses installStatusBar / formatStats / recordDelta / styleStats / PhaseStats from chat.ts (now exported) so the bottom-row tokens-per- second ticker behaves identically to mcpctl chat. - Flags: --message (one-shot), --system, --temperature, --max-tokens, --no-stream. Streaming uses OpenAI chat.completion.chunk SSE. - REPL mode keeps a per-session history array so multi-turn flows feel natural; each turn is an independent inference call. Updated: get.ts - LlmRow gains optional kind/status fields. - llmColumns layout: NAME, KIND, STATUS, TYPE, MODEL, TIER, KEY, ID. Defaults gracefully when older mcpd responses don't return them. Updated: chat.ts - Re-exports the helpers chat-llm.ts needs (PhaseStats, newPhase, recordDelta, formatStats, styleStats, styleThinking, STDERR_IS_TTY, StatusBar, installStatusBar). No behavior change. Completions: chat-llm picks up the standard option enumeration automatically; bash gets a special-case for first-arg LLM-name completion via _mcpctl_resource_names "llms". CLI suite: 437/437 (was 430, +7 from auto-discovered test cases in the regenerated completions golden). Workspace: 2043/2043 across 152 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
97174f450f |
feat(mcplocal): virtual-LLM registrar (v1 Stage 4)
The mcplocal counterpart to mcpd's VirtualLlmService. After this stage,
flipping \`publish: true\` on a provider in ~/.mcpctl/config.json makes
the provider show up in mcpctl get llm with kind=virtual the next time
mcplocal restarts; running an inference against it relays through this
client back to the local LlmProvider.
Config:
- LlmProviderFileEntry gains optional \`publish: boolean\` (default false,
so existing setups don't change).
Registrar (new file: providers/registrar.ts):
- start(): if any provider is opted-in, POSTs to
/api/v1/llms/_provider-register with the publishable set, persists
the returned providerSessionId to ~/.mcpctl/provider-session for
sticky reconnects, then opens the SSE control channel and starts a
30-s heartbeat ticker.
- SSE listener parses event/data lines from text/event-stream frames.
task frames trigger handleInferTask: convert OpenAI body to
CompletionOptions, call provider.complete(), POST the result back as
either { status, body } (non-streaming) or two chunk POSTs
(streaming: one delta + a [DONE] marker).
- Disconnect → exponential backoff reconnect from 5 s up to 60 s. On
successful reconnect the persisted sessionId revives the same Llm
rows in mcpd (mcpd flips them back to active on heartbeat).
- stop() destroys the SSE socket and clears the timer; cleanly handed
off from main.ts's existing shutdown handler.
Wired into mcplocal main.ts via maybeStartVirtualLlmRegistrar:
- Filters opted-in providers, looks up their LlmProvider instances in
the registry.
- Reads ~/.mcpctl/credentials for mcpdUrl + bearer; absence is a
best-effort skip (logs a warning, returns null) — never a boot
blocker.
v1 caveat documented in the file header: LlmProvider returns a
finalized CompletionResult, not a token stream, so streaming requests
get a single delta chunk + [DONE]. Real per-token streaming is a v2
concern.
Tests: 5 new in tests/registrar.test.ts using a tiny in-process HTTP
server. Cover: no-op when nothing opted-in, register POST + sticky
sessionId persistence, sticky reconnect from disk, heartbeat ticker
fires at the configured interval, register HTTP error surfaces.
Workspace suite: 2043/2043 across 152 files (was 2006/149, +5
new tests + the new file gets discovered).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
192a3831df |
feat(mcpd): virtual-LLM routes + GC ticker (v1 Stage 3)
End-to-end backend wiring. After this stage, an mcplocal client can
register a provider, hold the SSE channel open, heartbeat, and have
its inference requests fanned through the relay — all without
touching the agent layer or the public-LLM path.
Routes (new file: routes/virtual-llms.ts):
POST /api/v1/llms/_provider-register → returns { providerSessionId, llms[] }
GET /api/v1/llms/_provider-stream → SSE channel keyed by
x-mcpctl-provider-session header.
Emits `event: hello` on open,
`event: task` on inference fan-out,
`: ping` every 20 s for proxies.
POST /api/v1/llms/_provider-heartbeat → bumps lastHeartbeatAt
POST /api/v1/llms/_provider-task/:id/result
→ mcplocal pushes result back;
body shape is one of:
{ error: 'msg' }
{ chunk: { data, done? } }
{ status, body }
LlmService:
- LlmView gains kind/status/lastHeartbeatAt/inactiveSince so route
handlers + the upcoming `mcpctl get llm` columns can branch on
kind without re-fetching the row.
llm-infer.ts:
- Detects llm.kind === 'virtual' and delegates to
VirtualLlmService.enqueueInferTask. Streaming + non-streaming both
supported; on 503 (publisher offline) the existing audit hook still
fires with the right status code.
- Adds optional `virtualLlms: VirtualLlmService` to LlmInferDeps;
absence in test fixtures returns a 500 with a clear "server
misconfiguration" message rather than silently falling through to
the public path against an empty URL.
main.ts:
- Constructs VirtualLlmService(llmRepo).
- Passes it to registerLlmInferRoutes.
- Calls registerVirtualLlmRoutes(app, virtualLlmService).
- 60-s GC ticker started after app.listen; clears on graceful
shutdown alongside the existing reconcile timer.
Tests: 11 new virtual-LLM route assertions (validation paths,
service plumbing for register/heartbeat/task-result) + 3 new
infer-route assertions (kind=virtual non-streaming relay, 503 path,
500 when virtualLlms dep missing). mcpd suite: 833/833 (was 819,
+14). Typecheck clean.
The full SSE handshake is exercised by the smoke test in Stage 6;
under app.inject the keep-alive blocks until close so unit-level
SSE testing isn't worth the complexity here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2215922618 |
feat(mcpd): VirtualLlmService + repo lifecycle helpers (v1 Stage 2)
The state machine for kind=virtual Llm rows. Wires the schema added in Stage 1 into something that can register, heartbeat, time out, and relay inference tasks. The HTTP routes (Stage 3) plug into this. Repository (extends ILlmRepository): - create/update accept kind/providerSessionId/lastHeartbeatAt/status/ inactiveSince/type so VirtualLlmService can drive the lifecycle. - findBySessionId(sessionId) — the reconnect lookup. - findStaleVirtuals(cutoff) — heartbeat-stale rows for the GC sweep. - findExpiredInactives(cutoff) — 4h-expired rows for deletion. VirtualLlmService: - register(): sticky-id-aware upsert. New names insert as kind=virtual/ status=active. Existing virtual rows from the same session reactivate in place; existing inactive virtuals from a foreign session can be adopted (sticky reconnect). Refuses to overwrite a public row or a foreign session's still-active virtual. - heartbeat(): bumps lastHeartbeatAt for every row owned by the session; revives inactive rows. - bindSession()/unbindSession(): in-memory map of sessionId → SSE handle. Disconnect immediately flips owned rows to inactive AND rejects any in-flight tasks for that session. - enqueueInferTask(): pushes an `infer` task frame to the SSE handle, returns a PendingTaskRef whose `done` resolves when the publisher POSTs the result back. Streaming variant exposes onChunk(cb). - completeTask/pushTaskChunk/failTask: route-side hooks called from the result POST handler (lands in Stage 3). - gcSweep(): flips heartbeat-stale active virtuals to inactive (90s cutoff), deletes inactives past 4h. Idempotent. Lifecycle constants live in this file (HEARTBEAT_TIMEOUT_MS=90s, INACTIVE_RETENTION_MS=4h) so future stages can tune in one place. 18 new mocked-repo tests cover: register variants (insert, sticky reconnect, refuse public-overwrite, refuse foreign-session, adopt inactive-foreign), heartbeat-revive, unbind cascade, enqueue happy path + 503 paths (no session, inactive, public-Llm), complete/fail/ streaming chunk fan-out, GC sweep flip + delete + idempotence. mcpd suite: 819/819 (was 801, +18). Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1acd8b58bc |
feat(db): Llm.kind discriminator + virtual-provider lifecycle (v1 Stage 1)
First step of the virtual-LLM feature. A virtual Llm row is one that gets *registered by an mcplocal client* rather than created via \`mcpctl create llm\`. Its inference is relayed back through an SSE control channel to the publishing session (mcpd routes added in Stage 3). The lifecycle fields below let mcpd reap stale rows when the publisher goes away. Schema additions: - enum LlmKind (public | virtual). Default public. - enum LlmStatus (active | inactive | hibernating). Default active. hibernating is reserved for v2 wake-on-demand. - Llm.kind, providerSessionId, lastHeartbeatAt, status, inactiveSince. - @@index([kind, status]) for the GC sweep. - @@index([providerSessionId]) for the reconnect lookup. All existing rows backfill with kind=public/status=active so v1 is purely additive — public LLMs ignore the lifecycle columns entirely. 7 new prisma-level assertions in tests/llm-virtual-schema.test.ts cover: defaults, persisting kind=virtual + lifecycle together, the active→inactive flip, hibernating value, enum rejection, the (kind,status) GC index, the providerSessionId reconnect index. mcpd suite still 801/801 (regenerated client) and typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |