feat(cli+docs+smoke): inference-task CLI + GC ticker + smoke + docs (v5 Stage 4)

CLI surface for the durable queue: - `mcpctl get tasks` — table view (ID, STATUS, POOL, LLM, MODEL, STREAM, AGE, WORKER). Aliases `task`, `tasks`, `inference-task`, `inference-tasks` all normalize to the canonical plural so URL construction works uniformly. RESOURCE_ALIASES + completions generator updated. - `mcpctl chat-llm <name> --async -m <msg>` — enqueue and exit. stdout is just the task id (pipeable into `xargs mcpctl get task`); stderr carries human-readable status. REPL mode is rejected for --async (fire-and-forget doesn't make sense without -m). GC ticker in mcpd: 5-min interval. Pending tasks past 1 h queue timeout flip to error with a clear message; terminal tasks past 7 d retention get deleted. Both queries are index-backed. Crash fix uncovered by the smoke: when the async route doesn't await ref.done, a later cancel/error rejected the in-flight Promise as unhandled and crashed mcpd. The route now attaches a no-op `.catch` so the legacy `done` semantic still works for sync callers (chat, direct infer) without taking out the process for async ones. The EnqueueInferOptions also gained an explicit `ownerId` field so the async API can stamp the authenticated user on the row instead of inheriting 'system' from the constructor's resolveOwner — without this, every GET/DELETE from the original caller would 404 due to foreign-owner mismatch. Smoke (tests/smoke/inference-task.smoke.test.ts): 1. POST /inference-tasks while no worker bound → row=pending. 2. Bring a registrar online → bindSession drain claims and dispatches → worker complete()s → row=completed → GET returns the assistant body. 3. Stop worker, enqueue, DELETE → row=cancelled, persisted. docs/inference-tasks.md (new): full data model, lifecycle diagram, async API reference, CLI examples, RBAC table, GC defaults, and the v5 limitations / v6 roadmap. Cross-linked from virtual-llms.md and agents.md. Tests + smoke: mcpd 893/893, mcplocal 723/723, cli 437/437, full smoke 146/146 (was 144, +2 new task smoke). Live mcpd verified via manual curl: enqueue → cancel → re-fetch — no crash, owner scoping returns 404 on foreign ids, GC ticker logs at info when it sweeps. v5 complete: durable queue (Stage 1) + VirtualLlmService rewire (Stage 2) + async API & RBAC (Stage 3) + CLI/GC/smoke/docs (Stage 4).
2026-04-28 15:25:09 +01:00
parent 1dcfdc8b05
commit 7320b50dac
14 changed files with 654 additions and 27 deletions
--- a/docs/virtual-llms.md
+++ b/docs/virtual-llms.md
@@ -418,10 +418,25 @@ unset. The agent's pool is then size 1 and dispatch is deterministic.
 Pool membership is opt-in via `poolName` — the default behavior is
 single-Llm.

+## Durable inference task queue (v5)
+
+Every infer call (sync `/llms/<name>/infer`, agent chat, async
+`POST /inference-tasks`) now lands as a row in `InferenceTask`. mcpd's
+in-memory request map is gone — the DB is the source of truth.
+Workers (mcplocal SSE sessions) drain queued rows when they bind, so
+tasks queued while a pool was empty drain when the first worker shows
+up. mcpd restart no longer drops in-flight work; worker disconnect
+mid-task reverts the row to pending instead of failing the caller.
+
+See [inference-tasks.md](./inference-tasks.md) for the full data
+model, async API, lifecycle, RBAC, and CLI surface.
+
 ## Roadmap (later stages)

- **v5 — Task queue**: persisted requests for hibernating/saturated
-  pools. Workers pull tasks of their model when they come online.
+(LB pool by name landed in v4; durable task queue landed in v5.)
+- **v6** — multi-instance mcpd via pg `LISTEN/NOTIFY` (replaces the
+  per-instance EventEmitter wakeup), per-session worker capacity,
+  remote cancel protocol over the SSE channel.

 ## API surface (v1)