feat(cli+docs+smoke): inference-task CLI + GC ticker + smoke + docs (v5 Stage 4)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / typecheck (pull_request) Successful in 2m46s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Failing after 7m0s
CI/CD / publish (pull_request) Has been skipped

CLI surface for the durable queue:

- `mcpctl get tasks` — table view (ID, STATUS, POOL, LLM, MODEL,
  STREAM, AGE, WORKER). Aliases `task`, `tasks`, `inference-task`,
  `inference-tasks` all normalize to the canonical plural so URL
  construction works uniformly. RESOURCE_ALIASES + completions
  generator updated.
- `mcpctl chat-llm <name> --async -m <msg>` — enqueue and exit. stdout
  is just the task id (pipeable into `xargs mcpctl get task`); stderr
  carries human-readable status. REPL mode is rejected for --async
  (fire-and-forget doesn't make sense without -m).

GC ticker in mcpd: 5-min interval. Pending tasks past 1 h queue
timeout flip to error with a clear message; terminal tasks past 7 d
retention get deleted. Both queries are index-backed.

Crash fix uncovered by the smoke: when the async route doesn't await
ref.done, a later cancel/error rejected the in-flight Promise as
unhandled and crashed mcpd. The route now attaches a no-op `.catch`
so the legacy `done` semantic still works for sync callers (chat,
direct infer) without taking out the process for async ones. The
EnqueueInferOptions also gained an explicit `ownerId` field so the
async API can stamp the authenticated user on the row instead of
inheriting 'system' from the constructor's resolveOwner — without
this, every GET/DELETE from the original caller would 404 due to
foreign-owner mismatch.

Smoke (tests/smoke/inference-task.smoke.test.ts):

  1. POST /inference-tasks while no worker bound → row=pending.
  2. Bring a registrar online → bindSession drain claims and
     dispatches → worker complete()s → row=completed → GET returns
     the assistant body.
  3. Stop worker, enqueue, DELETE → row=cancelled, persisted.

docs/inference-tasks.md (new): full data model, lifecycle diagram,
async API reference, CLI examples, RBAC table, GC defaults, and the
v5 limitations / v6 roadmap. Cross-linked from virtual-llms.md and
agents.md.

Tests + smoke: mcpd 893/893, mcplocal 723/723, cli 437/437, full
smoke 146/146 (was 144, +2 new task smoke). Live mcpd verified via
manual curl: enqueue → cancel → re-fetch — no crash, owner scoping
returns 404 on foreign ids, GC ticker logs at info when it sweeps.

v5 complete: durable queue (Stage 1) + VirtualLlmService rewire
(Stage 2) + async API & RBAC (Stage 3) + CLI/GC/smoke/docs (Stage 4).
This commit is contained in:
Michal
2026-04-28 15:25:09 +01:00
parent 1dcfdc8b05
commit 7320b50dac
14 changed files with 654 additions and 27 deletions

View File

@@ -418,10 +418,25 @@ unset. The agent's pool is then size 1 and dispatch is deterministic.
Pool membership is opt-in via `poolName` — the default behavior is
single-Llm.
## Durable inference task queue (v5)
Every infer call (sync `/llms/<name>/infer`, agent chat, async
`POST /inference-tasks`) now lands as a row in `InferenceTask`. mcpd's
in-memory request map is gone — the DB is the source of truth.
Workers (mcplocal SSE sessions) drain queued rows when they bind, so
tasks queued while a pool was empty drain when the first worker shows
up. mcpd restart no longer drops in-flight work; worker disconnect
mid-task reverts the row to pending instead of failing the caller.
See [inference-tasks.md](./inference-tasks.md) for the full data
model, async API, lifecycle, RBAC, and CLI surface.
## Roadmap (later stages)
- **v5 — Task queue**: persisted requests for hibernating/saturated
pools. Workers pull tasks of their model when they come online.
(LB pool by name landed in v4; durable task queue landed in v5.)
- **v6** — multi-instance mcpd via pg `LISTEN/NOTIFY` (replaces the
per-instance EventEmitter wakeup), per-session worker capacity,
remote cancel protocol over the SSE channel.
## API surface (v1)