mcpctl

michal/mcpctl

Fork 0

Commit Graph

Author	SHA1	Message	Date
Michal	ed21ad1b5a	feat(mcpd+db): durable InferenceTask queue + state machine (v5 Stage 1) The persistence + signaling layer for v5. No integration with the existing in-flight inference path yet — that's Stage 2. This commit just lands the durable queue underneath, with a state machine that mcpd's HTTP handlers, the worker result-POST route, and the GC sweep will all build on. Schema (src/db/prisma/schema.prisma + migration): - New `InferenceTask` model + `InferenceTaskStatus` enum (pending\|claimed\|running\|completed\|error\|cancelled). - Routing fields stored at enqueue time so a later rename of `Llm.poolName` doesn't reroute already-queued work: `poolName` (effective pool key), `llmName` (pinned target), `model`, `tier`. - Worker tracking: `claimedBy` (providerSessionId) + `claimedAt`, cleared on revert. - Bodies as `Json`: requestBody (always set), responseBody (set at completion). Streaming chunks are NOT persisted — too expensive at delta granularity. The final assembled body lands once per task. - Lifecycle timestamps: createdAt, claimedAt, streamStartedAt, completedAt. Plus ownerId (RBAC + audit) and agentId (null for direct chat-llm calls). - Indexes for the hot paths: (status, poolName) for the dispatcher's drain query, claimedBy for the disconnect revert, completedAt for the GC retention sweep, owner/agent for the async API listing. Repository (src/mcpd/src/repositories/inference-task.repository.ts): - CRUD + state transitions as conditional CAS via `updateMany`. Two workers racing to claim the same row both run the UPDATE; whichever the DB serializes first sees affected=1 and gets the row, the loser sees 0 and falls through to the next candidate. No application- level locking required. - findPendingForPools(poolNames[]) for the worker drain on bind. - findHeldBy(claimedBy) for the unbindSession revert. - findStalePending + findExpiredTerminal for the GC sweep. Service (src/mcpd/src/services/inference-task.service.ts): - Owns the in-process EventEmitter that wakes blocked HTTP handlers when a worker POSTs results. The DB row is the source of truth for state; the EventEmitter just signals "go re-read row X" so we don't have to poll. Single-instance assumption for v5; pg LISTEN/NOTIFY is the v6 swap when scaling horizontally — no schema change needed, just replace the emitter wakeup. - waitFor(taskId, timeoutMs) returns { done, chunks }: the terminal promise + an async iterator of streaming deltas. Throws on cancel (clear message) or error (worker's errorMessage propagates) or timeout. Polls the row once at subscribe time so an already- terminal task resolves immediately without waiting for an event that's never coming. - gcSweep flips stale pending rows to error (with a clear message about the timeout) and deletes terminal rows past retention. Defaults: 1h pending timeout, 7d terminal retention; both configurable. Tests: - 6 db-level schema tests (defaults, json roundtrip, drain query shape, claimedBy filter, GC predicate, agentId nullable). - 13 service tests covering enqueue, the CAS race on tryClaim, complete/fail/cancel, idempotent terminal transitions, revertHeldBy on disconnect, and the full waitFor signal lifecycle (immediate resolve, wake on event, chunk streaming, cancel/error/timeout paths). Plus a gcSweep test with a fixed clock. mcpd 881/881 (was 868; +13). db pool-schema 14/14, +6 new inference-task-schema. Pre-existing failures in models.test.ts (Secret FK fixture issue, also fails on main HEAD) are unrelated. Stage 2 (next): VirtualLlmService rewires through this — remove the in-memory pendingTasks map; enqueue creates a row, dispatch picks an active session, the result-route updates the row + emits the wakeup. Worker disconnect reverts; worker bind drains.	2026-04-28 02:14:45 +01:00

Author

SHA1

Message

Date

Michal

ed21ad1b5a

feat(mcpd+db): durable InferenceTask queue + state machine (v5 Stage 1)

The persistence + signaling layer for v5. No integration with the
existing in-flight inference path yet — that's Stage 2. This commit
just lands the durable queue underneath, with a state machine that
mcpd's HTTP handlers, the worker result-POST route, and the GC sweep
will all build on.

Schema (src/db/prisma/schema.prisma + migration):

- New `InferenceTask` model + `InferenceTaskStatus` enum
  (pending|claimed|running|completed|error|cancelled).
- Routing fields stored at enqueue time so a later rename of
  `Llm.poolName` doesn't reroute already-queued work: `poolName`
  (effective pool key), `llmName` (pinned target), `model`, `tier`.
- Worker tracking: `claimedBy` (providerSessionId) + `claimedAt`,
  cleared on revert.
- Bodies as `Json`: requestBody (always set), responseBody (set at
  completion). Streaming chunks are NOT persisted — too expensive at
  delta granularity. The final assembled body lands once per task.
- Lifecycle timestamps: createdAt, claimedAt, streamStartedAt,
  completedAt. Plus ownerId (RBAC + audit) and agentId (null for
  direct chat-llm calls).
- Indexes for the hot paths: (status, poolName) for the dispatcher's
  drain query, claimedBy for the disconnect revert, completedAt for
  the GC retention sweep, owner/agent for the async API listing.

Repository (src/mcpd/src/repositories/inference-task.repository.ts):

- CRUD + state transitions as conditional CAS via `updateMany`. Two
  workers racing to claim the same row both run the UPDATE; whichever
  the DB serializes first sees affected=1 and gets the row, the loser
  sees 0 and falls through to the next candidate. No application-
  level locking required.
- findPendingForPools(poolNames[]) for the worker drain on bind.
- findHeldBy(claimedBy) for the unbindSession revert.
- findStalePending + findExpiredTerminal for the GC sweep.

Service (src/mcpd/src/services/inference-task.service.ts):

- Owns the in-process EventEmitter that wakes blocked HTTP handlers
  when a worker POSTs results. The DB row is the source of truth for
  *state*; the EventEmitter just signals "go re-read row X" so we
  don't have to poll. Single-instance assumption for v5; pg
  LISTEN/NOTIFY is the v6 swap when scaling horizontally — no schema
  change needed, just replace the emitter wakeup.
- waitFor(taskId, timeoutMs) returns { done, chunks }: the terminal
  promise + an async iterator of streaming deltas. Throws on cancel
  (clear message) or error (worker's errorMessage propagates) or
  timeout. Polls the row once at subscribe time so an already-
  terminal task resolves immediately without waiting for an event
  that's never coming.
- gcSweep flips stale pending rows to error (with a clear message
  about the timeout) and deletes terminal rows past retention.
  Defaults: 1h pending timeout, 7d terminal retention; both
  configurable.

Tests:
- 6 db-level schema tests (defaults, json roundtrip, drain query
  shape, claimedBy filter, GC predicate, agentId nullable).
- 13 service tests covering enqueue, the CAS race on tryClaim,
  complete/fail/cancel, idempotent terminal transitions, revertHeldBy
  on disconnect, and the full waitFor signal lifecycle (immediate
  resolve, wake on event, chunk streaming, cancel/error/timeout
  paths). Plus a gcSweep test with a fixed clock.

mcpd 881/881 (was 868; +13). db pool-schema 14/14, +6 new
inference-task-schema. Pre-existing failures in models.test.ts
(Secret FK fixture issue, also fails on main HEAD) are unrelated.

Stage 2 (next): VirtualLlmService rewires through this — remove the
in-memory pendingTasks map; enqueue creates a row, dispatch picks an
active session, the result-route updates the row + emits the wakeup.
Worker disconnect reverts; worker bind drains.

2026-04-28 02:14:45 +01:00

1 Commits