feat(cli+docs+smoke): inference-task CLI + GC ticker + smoke + docs (v5 Stage 4)
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / typecheck (pull_request) Successful in 2m46s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Failing after 7m0s
CI/CD / publish (pull_request) Has been skipped
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m12s
CI/CD / typecheck (pull_request) Successful in 2m46s
CI/CD / smoke (pull_request) Failing after 1m44s
CI/CD / build (pull_request) Failing after 7m0s
CI/CD / publish (pull_request) Has been skipped
CLI surface for the durable queue:
- `mcpctl get tasks` — table view (ID, STATUS, POOL, LLM, MODEL,
STREAM, AGE, WORKER). Aliases `task`, `tasks`, `inference-task`,
`inference-tasks` all normalize to the canonical plural so URL
construction works uniformly. RESOURCE_ALIASES + completions
generator updated.
- `mcpctl chat-llm <name> --async -m <msg>` — enqueue and exit. stdout
is just the task id (pipeable into `xargs mcpctl get task`); stderr
carries human-readable status. REPL mode is rejected for --async
(fire-and-forget doesn't make sense without -m).
GC ticker in mcpd: 5-min interval. Pending tasks past 1 h queue
timeout flip to error with a clear message; terminal tasks past 7 d
retention get deleted. Both queries are index-backed.
Crash fix uncovered by the smoke: when the async route doesn't await
ref.done, a later cancel/error rejected the in-flight Promise as
unhandled and crashed mcpd. The route now attaches a no-op `.catch`
so the legacy `done` semantic still works for sync callers (chat,
direct infer) without taking out the process for async ones. The
EnqueueInferOptions also gained an explicit `ownerId` field so the
async API can stamp the authenticated user on the row instead of
inheriting 'system' from the constructor's resolveOwner — without
this, every GET/DELETE from the original caller would 404 due to
foreign-owner mismatch.
Smoke (tests/smoke/inference-task.smoke.test.ts):
1. POST /inference-tasks while no worker bound → row=pending.
2. Bring a registrar online → bindSession drain claims and
dispatches → worker complete()s → row=completed → GET returns
the assistant body.
3. Stop worker, enqueue, DELETE → row=cancelled, persisted.
docs/inference-tasks.md (new): full data model, lifecycle diagram,
async API reference, CLI examples, RBAC table, GC defaults, and the
v5 limitations / v6 roadmap. Cross-linked from virtual-llms.md and
agents.md.
Tests + smoke: mcpd 893/893, mcplocal 723/723, cli 437/437, full
smoke 146/146 (was 144, +2 new task smoke). Live mcpd verified via
manual curl: enqueue → cancel → re-fetch — no crash, owner scoping
returns 404 on foreign ids, GC ticker logs at info when it sweeps.
v5 complete: durable queue (Stage 1) + VirtualLlmService rewire
(Stage 2) + async API & RBAC (Stage 3) + CLI/GC/smoke/docs (Stage 4).
This commit is contained in:
@@ -418,10 +418,25 @@ unset. The agent's pool is then size 1 and dispatch is deterministic.
|
||||
Pool membership is opt-in via `poolName` — the default behavior is
|
||||
single-Llm.
|
||||
|
||||
## Durable inference task queue (v5)
|
||||
|
||||
Every infer call (sync `/llms/<name>/infer`, agent chat, async
|
||||
`POST /inference-tasks`) now lands as a row in `InferenceTask`. mcpd's
|
||||
in-memory request map is gone — the DB is the source of truth.
|
||||
Workers (mcplocal SSE sessions) drain queued rows when they bind, so
|
||||
tasks queued while a pool was empty drain when the first worker shows
|
||||
up. mcpd restart no longer drops in-flight work; worker disconnect
|
||||
mid-task reverts the row to pending instead of failing the caller.
|
||||
|
||||
See [inference-tasks.md](./inference-tasks.md) for the full data
|
||||
model, async API, lifecycle, RBAC, and CLI surface.
|
||||
|
||||
## Roadmap (later stages)
|
||||
|
||||
- **v5 — Task queue**: persisted requests for hibernating/saturated
|
||||
pools. Workers pull tasks of their model when they come online.
|
||||
(LB pool by name landed in v4; durable task queue landed in v5.)
|
||||
- **v6** — multi-instance mcpd via pg `LISTEN/NOTIFY` (replaces the
|
||||
per-instance EventEmitter wakeup), per-session worker capacity,
|
||||
remote cancel protocol over the SSE channel.
|
||||
|
||||
## API surface (v1)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user