feat: virtual LLMs v2 (wake-on-demand) #65

Merged
michal merged 3 commits from feat/virtual-llm-v2-wake into main 2026-04-27 14:21:02 +00:00
Owner

Summary

v2 of the virtual-LLM feature. A provider with a hibernating backend (vLLM that suspends idle, Ollama that exits when nothing's connected, …) declares a wake recipe in mcplocal's local config. When that provider's `isAvailable()` returns false at boot, the row is published as `status=hibernating`. The next inference request triggers a `wake` SSE task; mcplocal runs the recipe, polls until the backend comes up, then mcpd flips the row to `active` and relays the inference.

Two recipe types:

  • `http` — POST to an external "wake controller" (a sleep-monitor sidecar, etc.).
  • `command` — spawn a local process (`systemctl --user start vllm`, `wakeonlan ...`).

Concurrent infers for the same hibernating Llm share one wake task — `wakeInFlight` map dedupes by Llm name.

Stages

  • v2 Stage 1 — wake recipe + execution (`af0fabd`): config schema, registrar publishes `initialStatus=hibernating` when `isAvailable() === false` AND a recipe is configured, registrar runs the recipe + polls availability when a wake task arrives. 3 new registrar tests.
  • v2 Stage 2 — wake-before-infer (`db839af`): mcpd's `enqueueInferTask` sees `status=hibernating`, dispatches a `wake` task, awaits its result, flips the row to active, then dispatches the original infer. Concurrent dedup. 4 new service tests.
  • v2 Stage 3 — smoke + docs (`e0cfe0b`): live-cluster smoke spins a fake "wake controller" and asserts the full hibernating → wake → active → infer path. `docs/virtual-llms.md` gains the wake-recipe section.

Test plan

  • mcpd: 833/833
  • mcplocal: registrar 8/8 (was 5, +3 v2)
  • Workspace: 2050/2050 across 152 files (was 2043, +7 from v2 stages)
  • Typecheck clean across all packages
  • Smoke (live cluster): `virtual-llm.smoke > wake-on-demand` after deploy.

🤖 Generated with Claude Code

## Summary v2 of the virtual-LLM feature. A provider with a hibernating backend (vLLM that suspends idle, Ollama that exits when nothing's connected, …) declares a **wake recipe** in mcplocal's local config. When that provider's \`isAvailable()\` returns false at boot, the row is published as \`status=hibernating\`. The next inference request triggers a \`wake\` SSE task; mcplocal runs the recipe, polls until the backend comes up, then mcpd flips the row to \`active\` and relays the inference. Two recipe types: - \`http\` — POST to an external "wake controller" (a sleep-monitor sidecar, etc.). - \`command\` — spawn a local process (\`systemctl --user start vllm\`, \`wakeonlan ...\`). Concurrent infers for the same hibernating Llm share one wake task — \`wakeInFlight\` map dedupes by Llm name. ## Stages - **v2 Stage 1 — wake recipe + execution** (\`af0fabd\`): config schema, registrar publishes \`initialStatus=hibernating\` when \`isAvailable() === false\` AND a recipe is configured, registrar runs the recipe + polls availability when a wake task arrives. 3 new registrar tests. - **v2 Stage 2 — wake-before-infer** (\`db839af\`): mcpd's \`enqueueInferTask\` sees \`status=hibernating\`, dispatches a \`wake\` task, awaits its result, flips the row to active, then dispatches the original infer. Concurrent dedup. 4 new service tests. - **v2 Stage 3 — smoke + docs** (\`e0cfe0b\`): live-cluster smoke spins a fake "wake controller" and asserts the full hibernating → wake → active → infer path. \`docs/virtual-llms.md\` gains the wake-recipe section. ## Test plan - [x] mcpd: 833/833 - [x] mcplocal: registrar 8/8 (was 5, +3 v2) - [x] Workspace: 2050/2050 across 152 files (was 2043, +7 from v2 stages) - [x] Typecheck clean across all packages - [ ] Smoke (live cluster): \`virtual-llm.smoke > wake-on-demand\` after deploy. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
michal added 3 commits 2026-04-27 14:20:41 +00:00
First half of v2 — mcplocal can now declare a hibernating backend and
respond to a `wake` task by running a configured recipe. v2 Stage 2
will wire mcpd to dispatch the wake task before relaying inference.

Config (LlmProviderFileEntry):
- New \`wake\` block on a published provider:
    wake:
      type: http        # or: command
      url: ...           # http only
      method: POST       # http only, default POST
      headers: {...}     # http only
      body: ...          # http only
      command: ...       # command only
      args: [...]        # command only
      maxWaitSeconds: 60 # how long to poll isAvailable() after wake fires

Registrar (mcplocal):
- At publish time, providers with a wake recipe whose isAvailable()
  returns false report initialStatus=hibernating to mcpd. Without a
  wake recipe (legacy v1) or when already up, status stays active.
- handleWakeTask: runs the recipe (HTTP request OR child-process
  spawn), then polls isAvailable() up to maxWaitSeconds, sending a
  heartbeat each loop so mcpd's GC sweep doesn't time us out
  mid-boot. Reports { ok, ms } on success or { error } on
  timeout/recipe failure via the existing _provider-task/:id/result.
- Replaces the v1 stub that rejected wake tasks with "not implemented".

mcpd VirtualLlmService:
- RegisterProviderInput gains optional initialStatus ('active' |
  'hibernating'). The register/upsert path uses it for both new and
  reconnecting rows. Defaults to 'active' so v1 publishers still
  work unchanged.
- Provider-register route's coercer accepts the new field.

Tests: 3 new in registrar.test.ts cover initialStatus selection
(hibernating when wake configured + unavailable, active otherwise,
active when no wake even if unavailable). 8/8 registrar tests, 833/833
mcpd unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second half of v2. mcpd now dispatches a \`wake\` task on the SSE
control channel when an inference request hits a row whose
status=hibernating, waits for the publisher to confirm readiness,
then proceeds with the infer task. Concurrent infers for the same
hibernating Llm share a single wake task — \`wakeInFlight\` map
dedupes by Llm name.

State machine in enqueueInferTask:
  active        → push infer task immediately (existing path).
  inactive      → 503, publisher offline (existing path).
  hibernating   → ensureAwake() → push infer task (new in v2).

ensureAwake/runWake (private):
- Allocates a fresh taskId on the existing PendingTask plumbing.
- Pushes \`{ kind: "wake", taskId, llmName }\` on the SSE handle.
- Awaits the publisher's result POST. On 2xx, flips the row to
  active + bumps lastHeartbeatAt, so all queued + future infers
  hit the active path. On non-2xx or service.failTask, the row
  stays hibernating (next request retries).

Tests: 4 new in virtual-llm-service.test.ts cover happy path
(wake → infer in order), concurrent dedup (3 parallel infers, 1
wake task), wake failure surfaces to all queued infers and leaves
the row hibernating, inactive ≠ hibernating (still rejects with 503,
no wake attempt). 22/22 service tests, 2050/2050 workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat: virtual-LLM v2 smoke + docs (v2 Stage 3)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / typecheck (pull_request) Successful in 2m43s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Successful in 5m28s
CI/CD / publish (pull_request) Has been skipped
e0cfe0ba4d
Closes v2 (wake-on-demand). Same shape as v1's stage 6: smoke
exercises the live-cluster path, docs lose the "v2 reserved" caveat
and gain a full wake-recipe section.

Smoke (virtual-llm.smoke.test.ts):
- New "wake-on-demand" describe block runs alongside the v1 tests.
- Spins a tiny in-process HTTP "wake controller"; the published
  provider's isAvailable() returns false until the wake POST flips
  the bool. Asserts:
    1. Provider publishes as kind=virtual / status=hibernating.
    2. First inference triggers the wake recipe, the recipe POSTs
       to the controller, the provider becomes available, mcpd
       relays the inference, and the row settles to active.
- Cleans up the row + wake server in afterAll.

Docs (docs/virtual-llms.md):
- Lifecycle table updates the `hibernating` description from
  "reserved for v2" to the actual v2 semantics.
- New "Wake-on-demand (v2)" section: configuration shapes for both
  recipe types (http + command), the wake-then-infer flow diagram,
  concurrent-infer dedup, failure semantics.
- Roadmap drops v2; v3-v5 still listed.

Workspace: 2050/2050 (smoke runs separately; the new SSE-based wake
test runs only against a live cluster, not under \`pnpm test:run\`).

v2 closes. v3 = virtual agents, v4 = LB pool by model, v5 = queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
michal merged commit 45c7737ee1 into main 2026-04-27 14:21:02 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: michal/mcpctl#65