feat: virtual LLMs v2 (wake-on-demand) #65

Merged

michal merged 3 commits from feat/virtual-llm-v2-wake into main

2026-04-27 14:21:02 +00:00

Author	SHA1	Message	Date
Michal	e0cfe0ba4d	feat: virtual-LLM v2 smoke + docs (v2 Stage 3) Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / typecheck (pull_request) Successful in 2m43s Details CI/CD / smoke (pull_request) Failing after 1m44s Details CI/CD / build (pull_request) Successful in 5m28s Details CI/CD / publish (pull_request) Has been skipped Details Closes v2 (wake-on-demand). Same shape as v1's stage 6: smoke exercises the live-cluster path, docs lose the "v2 reserved" caveat and gain a full wake-recipe section. Smoke (virtual-llm.smoke.test.ts): - New "wake-on-demand" describe block runs alongside the v1 tests. - Spins a tiny in-process HTTP "wake controller"; the published provider's isAvailable() returns false until the wake POST flips the bool. Asserts: 1. Provider publishes as kind=virtual / status=hibernating. 2. First inference triggers the wake recipe, the recipe POSTs to the controller, the provider becomes available, mcpd relays the inference, and the row settles to active. - Cleans up the row + wake server in afterAll. Docs (docs/virtual-llms.md): - Lifecycle table updates the `hibernating` description from "reserved for v2" to the actual v2 semantics. - New "Wake-on-demand (v2)" section: configuration shapes for both recipe types (http + command), the wake-then-infer flow diagram, concurrent-infer dedup, failure semantics. - Roadmap drops v2; v3-v5 still listed. Workspace: 2050/2050 (smoke runs separately; the new SSE-based wake test runs only against a live cluster, not under \`pnpm test:run\`). v2 closes. v3 = virtual agents, v4 = LB pool by model, v5 = queue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:20:18 +01:00
Michal	db839afc57	feat(mcpd): wake-before-infer for hibernating virtual LLMs (v2 Stage 2) Second half of v2. mcpd now dispatches a \`wake\` task on the SSE control channel when an inference request hits a row whose status=hibernating, waits for the publisher to confirm readiness, then proceeds with the infer task. Concurrent infers for the same hibernating Llm share a single wake task — \`wakeInFlight\` map dedupes by Llm name. State machine in enqueueInferTask: active → push infer task immediately (existing path). inactive → 503, publisher offline (existing path). hibernating → ensureAwake() → push infer task (new in v2). ensureAwake/runWake (private): - Allocates a fresh taskId on the existing PendingTask plumbing. - Pushes \`{ kind: "wake", taskId, llmName }\` on the SSE handle. - Awaits the publisher's result POST. On 2xx, flips the row to active + bumps lastHeartbeatAt, so all queued + future infers hit the active path. On non-2xx or service.failTask, the row stays hibernating (next request retries). Tests: 4 new in virtual-llm-service.test.ts cover happy path (wake → infer in order), concurrent dedup (3 parallel infers, 1 wake task), wake failure surfaces to all queued infers and leaves the row hibernating, inactive ≠ hibernating (still rejects with 503, no wake attempt). 22/22 service tests, 2050/2050 workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:18:24 +01:00
Michal	af0fabd84f	feat(mcplocal+mcpd): wake-recipe config + wake-task execution (v2 Stage 1) First half of v2 — mcplocal can now declare a hibernating backend and respond to a `wake` task by running a configured recipe. v2 Stage 2 will wire mcpd to dispatch the wake task before relaying inference. Config (LlmProviderFileEntry): - New \`wake\` block on a published provider: wake: type: http # or: command url: ... # http only method: POST # http only, default POST headers: {...} # http only body: ... # http only command: ... # command only args: [...] # command only maxWaitSeconds: 60 # how long to poll isAvailable() after wake fires Registrar (mcplocal): - At publish time, providers with a wake recipe whose isAvailable() returns false report initialStatus=hibernating to mcpd. Without a wake recipe (legacy v1) or when already up, status stays active. - handleWakeTask: runs the recipe (HTTP request OR child-process spawn), then polls isAvailable() up to maxWaitSeconds, sending a heartbeat each loop so mcpd's GC sweep doesn't time us out mid-boot. Reports { ok, ms } on success or { error } on timeout/recipe failure via the existing _provider-task/:id/result. - Replaces the v1 stub that rejected wake tasks with "not implemented". mcpd VirtualLlmService: - RegisterProviderInput gains optional initialStatus ('active' \| 'hibernating'). The register/upsert path uses it for both new and reconnecting rows. Defaults to 'active' so v1 publishers still work unchanged. - Provider-register route's coercer accepts the new field. Tests: 3 new in registrar.test.ts cover initialStatus selection (hibernating when wake configured + unavailable, active otherwise, active when no wake even if unavailable). 8/8 registrar tests, 833/833 mcpd unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 15:15:46 +01:00

Author

SHA1

Message

Date

Michal

e0cfe0ba4d

feat: virtual-LLM v2 smoke + docs (v2 Stage 3)

CI/CD / lint (pull_request) Successful in 55s

Details

CI/CD / test (pull_request) Successful in 1m8s

Details

CI/CD / typecheck (pull_request) Successful in 2m43s

Details

CI/CD / smoke (pull_request) Failing after 1m44s

Details

CI/CD / build (pull_request) Successful in 5m28s

Details

CI/CD / publish (pull_request) Has been skipped

Details

Closes v2 (wake-on-demand). Same shape as v1's stage 6: smoke
exercises the live-cluster path, docs lose the "v2 reserved" caveat
and gain a full wake-recipe section.

Smoke (virtual-llm.smoke.test.ts):
- New "wake-on-demand" describe block runs alongside the v1 tests.
- Spins a tiny in-process HTTP "wake controller"; the published
  provider's isAvailable() returns false until the wake POST flips
  the bool. Asserts:
    1. Provider publishes as kind=virtual / status=hibernating.
    2. First inference triggers the wake recipe, the recipe POSTs
       to the controller, the provider becomes available, mcpd
       relays the inference, and the row settles to active.
- Cleans up the row + wake server in afterAll.

Docs (docs/virtual-llms.md):
- Lifecycle table updates the `hibernating` description from
  "reserved for v2" to the actual v2 semantics.
- New "Wake-on-demand (v2)" section: configuration shapes for both
  recipe types (http + command), the wake-then-infer flow diagram,
  concurrent-infer dedup, failure semantics.
- Roadmap drops v2; v3-v5 still listed.

Workspace: 2050/2050 (smoke runs separately; the new SSE-based wake
test runs only against a live cluster, not under \`pnpm test:run\`).

v2 closes. v3 = virtual agents, v4 = LB pool by model, v5 = queue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 15:20:18 +01:00

Michal

db839afc57

feat(mcpd): wake-before-infer for hibernating virtual LLMs (v2 Stage 2)

Second half of v2. mcpd now dispatches a \`wake\` task on the SSE
control channel when an inference request hits a row whose
status=hibernating, waits for the publisher to confirm readiness,
then proceeds with the infer task. Concurrent infers for the same
hibernating Llm share a single wake task — \`wakeInFlight\` map
dedupes by Llm name.

State machine in enqueueInferTask:
  active        → push infer task immediately (existing path).
  inactive      → 503, publisher offline (existing path).
  hibernating   → ensureAwake() → push infer task (new in v2).

ensureAwake/runWake (private):
- Allocates a fresh taskId on the existing PendingTask plumbing.
- Pushes \`{ kind: "wake", taskId, llmName }\` on the SSE handle.
- Awaits the publisher's result POST. On 2xx, flips the row to
  active + bumps lastHeartbeatAt, so all queued + future infers
  hit the active path. On non-2xx or service.failTask, the row
  stays hibernating (next request retries).

Tests: 4 new in virtual-llm-service.test.ts cover happy path
(wake → infer in order), concurrent dedup (3 parallel infers, 1
wake task), wake failure surfaces to all queued infers and leaves
the row hibernating, inactive ≠ hibernating (still rejects with 503,
no wake attempt). 22/22 service tests, 2050/2050 workspace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 15:18:24 +01:00

Michal

af0fabd84f

feat(mcplocal+mcpd): wake-recipe config + wake-task execution (v2 Stage 1)

First half of v2 — mcplocal can now declare a hibernating backend and
respond to a `wake` task by running a configured recipe. v2 Stage 2
will wire mcpd to dispatch the wake task before relaying inference.

Config (LlmProviderFileEntry):
- New \`wake\` block on a published provider:
    wake:
      type: http        # or: command
      url: ...           # http only
      method: POST       # http only, default POST
      headers: {...}     # http only
      body: ...          # http only
      command: ...       # command only
      args: [...]        # command only
      maxWaitSeconds: 60 # how long to poll isAvailable() after wake fires

Registrar (mcplocal):
- At publish time, providers with a wake recipe whose isAvailable()
  returns false report initialStatus=hibernating to mcpd. Without a
  wake recipe (legacy v1) or when already up, status stays active.
- handleWakeTask: runs the recipe (HTTP request OR child-process
  spawn), then polls isAvailable() up to maxWaitSeconds, sending a
  heartbeat each loop so mcpd's GC sweep doesn't time us out
  mid-boot. Reports { ok, ms } on success or { error } on
  timeout/recipe failure via the existing _provider-task/:id/result.
- Replaces the v1 stub that rejected wake tasks with "not implemented".

mcpd VirtualLlmService:
- RegisterProviderInput gains optional initialStatus ('active' |
  'hibernating'). The register/upsert path uses it for both new and
  reconnecting rows. Defaults to 'active' so v1 publishers still
  work unchanged.
- Provider-register route's coercer accepts the new field.

Tests: 3 new in registrar.test.ts cover initialStatus selection
(hibernating when wake configured + unavailable, active otherwise,
active when no wake even if unavailable). 8/8 registrar tests, 833/833
mcpd unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-27 15:15:46 +01:00

feat: virtual LLMs v2 (wake-on-demand) #65

3 Commits