feat(mcpd): AgentService virtual methods + GC cascade (v3 Stage 2)

State machine for kind=virtual Agent rows. Mirrors what VirtualLlmService did for Llms in v1, then wires both lifecycles together so disconnect/heartbeat/GC cascade through both at once. AgentRepository: - create/update accept the new lifecycle fields (kind, providerSessionId, status, lastHeartbeatAt, inactiveSince). - Adds findBySessionId, findByLlmId, findStaleVirtuals, findExpiredInactives. AgentService — new virtual-agent methods: - registerVirtualAgents(sessionId, inputs, ownerId) — sticky upsert. New names insert as kind=virtual/status=active. Existing virtuals owned by the same session reactivate; existing inactive virtuals from a foreign session can be adopted (sticky reconnect). Refuses to overwrite a public agent or a foreign session's still-active virtual (HTTP 409). Pinned LLM is resolved via LlmService — caller posts Llms first. - heartbeatVirtualAgents(sessionId) — bumps owned agents on a session heartbeat; revives inactive rows. - markVirtualAgentsInactiveBySession(sessionId) — disconnect cascade. - deleteVirtualAgentsForLlm(llmId) — defensive cascade for the GC's Llm-delete step (Agent.llmId is Restrict). - gcSweepVirtualAgents() — same shape as VirtualLlmService.gcSweep (90s heartbeat-stale → inactive, 4h inactive → delete). VirtualLlmService: - Optional AgentService dependency. heartbeat() now also bumps owned agents; unbindSession() flips them inactive. gcSweep() runs the agent sweep FIRST (so any agent that would block an Llm delete via Restrict is already gone), and adds a defensive deleteVirtualAgentsForLlm step right before each Llm delete in case an agent's heartbeat lagged its Llm's just enough to escape this round's 4h cutoff. main.ts: - VirtualLlmService construction moves below AgentService so it can receive the cascade dependency. Tests: 13 new in virtual-agent-service.test.ts cover all the register variants (insert, sticky reconnect, adopt-inactive-foreign, refuse public-overwrite, refuse foreign-session-active), heartbeat-revive, disconnect-cascade, deleteVirtualAgentsForLlm scope, GC sweep flip + delete + idempotence, and three VirtualLlmService cascade scenarios (unbindSession, gcSweep deleting agent before Llm, defensive cascade when agent's heartbeat lagged). mcpd suite: 854/854 (was 841 + 13 new). Workspace unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(db+mcpd): Agent lifecycle + chat.service kind=virtual branch (v3 Stage 1)
2026-04-27 17:07:23 +01:00 · 2026-04-27 17:07:23 +01:00 · 2026-04-27 16:07:05 +00:00 · 2026-04-27 17:06:39 +01:00
2 changed files with 30 additions and 4 deletions
--- a/deploy/Dockerfile.mcpd
+++ b/deploy/Dockerfile.mcpd
@@ -1,3 +1,9 @@
+# syntax=docker/dockerfile:1.6
+# `# syntax=...` enables BuildKit's --mount feature on the builder so we can
+# share the pnpm content-addressed store across image builds. Without it the
+# next two RUN steps fall back to plain mode and the cache mount is ignored
+# (build still works, just slower).
+
 # Stage 1: Build TypeScript
 FROM node:20-alpine AS builder

@@ -12,8 +18,12 @@ COPY src/db/package.json src/db/tsconfig.json src/db/
 COPY src/shared/package.json src/shared/tsconfig.json src/shared/
 COPY src/web/package.json src/web/tsconfig.json src/web/

-# Install all dependencies
-RUN pnpm install --frozen-lockfile
+# Install all dependencies. The cache mount keeps pnpm's CAS store warm
+# across builds: only newly-changed packages get downloaded; everything
+# else hardlinks from the cache. Drops install from ~60s to <5s on a
+# warm cache. `--frozen-lockfile` still guarantees lockfile fidelity.
+RUN --mount=type=cache,id=pnpm-store-mcpd-builder,target=/root/.local/share/pnpm/store \
+    pnpm install --frozen-lockfile

 # Copy source code
 COPY src/mcpd/src/ src/mcpd/src/
@@ -42,8 +52,11 @@ COPY src/mcpd/package.json src/mcpd/
 COPY src/db/package.json src/db/
 COPY src/shared/package.json src/shared/

-# Install all deps (prisma CLI needed at runtime for db push)
-RUN pnpm install --frozen-lockfile
+# Install all deps (prisma CLI needed at runtime for db push). Same
+# cache-mount trick as the builder stage; separate cache id so the two
+# stages don't compete for the same lock.
+RUN --mount=type=cache,id=pnpm-store-mcpd-runtime,target=/root/.local/share/pnpm/store \
+    pnpm install --frozen-lockfile

 # Copy prisma schema and generate client
 COPY src/db/prisma/ src/db/prisma/
--- a/vitest.config.ts
+++ b/vitest.config.ts
@@ -1,8 +1,21 @@
 import { defineConfig } from 'vitest/config';
+import { availableParallelism } from 'node:os';
+
+// Default vitest's pool to ~half the CPU threads we have. The previous
+// implicit default left this 64-thread workstation at ~10% utilization
+// during `pnpm test:run`. Half is a soft cap that stays kind to laptops
+// (8-thread → 4 workers) while letting beefy hosts push closer to the
+// box's actual capacity. Override at run time with VITEST_MAX_THREADS.
+const cores = availableParallelism();
+const maxThreads = Number(process.env['VITEST_MAX_THREADS'] ?? Math.max(2, Math.floor(cores / 2)));

 export default defineConfig({
  test: {
    globals: true,
+    pool: 'threads',
+    poolOptions: {
+      threads: { maxThreads, minThreads: 1 },
+    },
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
Author	SHA1	Message	Date
Michal	c7b1bd8e2c	feat(mcpd): AgentService virtual methods + GC cascade (v3 Stage 2) State machine for kind=virtual Agent rows. Mirrors what VirtualLlmService did for Llms in v1, then wires both lifecycles together so disconnect/heartbeat/GC cascade through both at once. AgentRepository: - create/update accept the new lifecycle fields (kind, providerSessionId, status, lastHeartbeatAt, inactiveSince). - Adds findBySessionId, findByLlmId, findStaleVirtuals, findExpiredInactives. AgentService — new virtual-agent methods: - registerVirtualAgents(sessionId, inputs, ownerId) — sticky upsert. New names insert as kind=virtual/status=active. Existing virtuals owned by the same session reactivate; existing inactive virtuals from a foreign session can be adopted (sticky reconnect). Refuses to overwrite a public agent or a foreign session's still-active virtual (HTTP 409). Pinned LLM is resolved via LlmService — caller posts Llms first. - heartbeatVirtualAgents(sessionId) — bumps owned agents on a session heartbeat; revives inactive rows. - markVirtualAgentsInactiveBySession(sessionId) — disconnect cascade. - deleteVirtualAgentsForLlm(llmId) — defensive cascade for the GC's Llm-delete step (Agent.llmId is Restrict). - gcSweepVirtualAgents() — same shape as VirtualLlmService.gcSweep (90s heartbeat-stale → inactive, 4h inactive → delete). VirtualLlmService: - Optional AgentService dependency. heartbeat() now also bumps owned agents; unbindSession() flips them inactive. gcSweep() runs the agent sweep FIRST (so any agent that would block an Llm delete via Restrict is already gone), and adds a defensive deleteVirtualAgentsForLlm step right before each Llm delete in case an agent's heartbeat lagged its Llm's just enough to escape this round's 4h cutoff. main.ts: - VirtualLlmService construction moves below AgentService so it can receive the cascade dependency. Tests: 13 new in virtual-agent-service.test.ts cover all the register variants (insert, sticky reconnect, adopt-inactive-foreign, refuse public-overwrite, refuse foreign-session-active), heartbeat-revive, disconnect-cascade, deleteVirtualAgentsForLlm scope, GC sweep flip + delete + idempotence, and three VirtualLlmService cascade scenarios (unbindSession, gcSweep deleting agent before Llm, defensive cascade when agent's heartbeat lagged). mcpd suite: 854/854 (was 841 + 13 new). Workspace unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:07:23 +01:00
Michal	9afd24a3aa	feat(db+mcpd): Agent lifecycle + chat.service kind=virtual branch (v3 Stage 1) Two pieces of v3 plumbing — schema + the latent v1 chat.service bug. Schema (db): - Agent gains kind/providerSessionId/lastHeartbeatAt/status/inactiveSince mirroring Llm's v1 lifecycle. Reuses LlmKind / LlmStatus enums; no new types. Existing rows backfill kind=public/status=active so v1 CRUD is unaffected. - @@index([kind, status]) for the GC sweep, @@index([providerSessionId]) for disconnect-cascade lookups. - 4 new prisma-level tests cover defaults, persisting virtual fields, the (kind, status) GC index, and providerSessionId lookups. Total agent-schema tests: 20/20. chat.service (mcpd) — fixes the v1 latent bug: - LlmView's kind is now plumbed through prepareContext as ctx.llmKind. - Two new private helpers, runOneInference / streamInference, branch on ctx.llmKind: 'public' goes through the existing adapter registry, 'virtual' relays through VirtualLlmService.enqueueInferTask (mirrors the route-handler branch from v1 Stage 3). - Streaming bridges VirtualLlmService's onChunk callback API to an async iterator via a small queue + wake pattern. - ChatService gains an optional virtualLlms constructor parameter; main.ts wires it in. Older test wirings without it raise a clear "virtualLlms dispatcher not wired" error when the row is virtual, rather than silently falling through to the public path against an empty URL. This unblocks any Agent (public OR future v3-virtual) pinned to a kind=virtual Llm. Pre-this-stage, those agents 502'd against the empty url field. Tests: 4 new chat-service-virtual-llm.test.ts cover the relay path non-streaming, streaming, missing-dispatcher error, and rejection surfacing. mcpd suite: 841/841 (was 833, +8 across stages 1+v3-Stage-1). Workspace: 2054/2054 across 153 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:07:23 +01:00
michal	9374a2652b	perf: vitest threads pool + Dockerfile pnpm cache mount (#66 ) Some checks failed CI/CD / lint (push) Successful in 58s Details CI/CD / test (push) Successful in 1m11s Details CI/CD / typecheck (push) Successful in 2m35s Details CI/CD / smoke (push) Failing after 1m43s Details CI/CD / build (push) Successful in 2m21s Details CI/CD / publish (push) Has been skipped Details	2026-04-27 16:07:05 +00:00
Michal	18245be0c1	perf: vitest threads pool + Dockerfile pnpm cache mount Some checks failed CI/CD / typecheck (pull_request) Successful in 56s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / lint (pull_request) Successful in 2m40s Details CI/CD / smoke (pull_request) Failing after 1m43s Details CI/CD / build (pull_request) Failing after 7m6s Details CI/CD / publish (pull_request) Has been skipped Details Two tuning knobs that were leaving most of the host idle: 1) vitest.config.ts pool=threads with maxThreads ≈ cores/2. Default left this 64-core workstation at ~10% CPU during \`pnpm test:run\`. Threads pool uses the box: same 152-file/2050-test suite now runs at ~700% CPU instead of ~150%. Wall time gain is modest (workload is dominated by a handful of slow individual files that one thread must run serially), but the parallel headroom is there for when the suite grows. Cap = max(2, cores/2) keeps laptops reasonable; override with \`VITEST_MAX_THREADS=N\` in the env. 2) Dockerfile.mcpd uses BuildKit cache mounts on both pnpm install steps. Adds \`# syntax=docker/dockerfile:1.6\` and a \`--mount=type=cache,target=/root/.local/share/pnpm/store\` so pnpm's content-addressed store survives across image rebuilds. Cold rebuilds where the lockfile changed are unaffected; warm rebuilds where only source changed drop the install step from ~60s to <5s. fulldeploy.sh's mcpd image rebuild gets that back minus the docker push hash mismatch. Test parity: 2050/2050 across 152 files; per-package mcpd 837/837. Both unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 17:06:39 +01:00