Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars
publish virtual Llms with distinct names but a shared poolName, then:
1. /api/v1/llms/<name>/members surfaces both with the correct
effective pool key, size, activeCount, and per-member kind/status.
2. Chat through an agent pinned to one pool member dispatches across
the pool — verified by running 12 calls and asserting at least
one response from each backend (the random-shuffle selection
would have to hit only-A or only-B in 12 fair coin flips, ~1/2048).
3. Failover: stop one publisher, the surviving member still serves
chat. /members shows the stopped row as inactive immediately
(unbindSession runs synchronously on SSE close).
docs/virtual-llms.md gets a full "LB pools (v4)" section with the
two-field schema model, dispatcher selection + failover semantics,
public + virtual declaration examples, list/describe rendering, the
"pin to specific instance" escape hatch, and an API surface entry
for /members. docs/agents.md cross-link extended.
Tests: full smoke 144/144 (was 141, +3 for the new pool smoke).
Stages 1-3 ship the complete v4 — public and virtual Llms can both
join pools, agents transparently load-balance across them, yaml
round-trip preserves poolName, and the existing single-Llm world
keeps working byte-identically when poolName is null.
Surfaces the v4 pool model end-to-end:
- mcpd: GET /api/v1/llms/:name/members returns the effective pool the
named anchor belongs to, plus aggregate stats (size, activeCount,
explicit vs implicit pool key). RBAC inherits from `view:llms` —
same as the single-Llm route. Members are full LlmView shapes so
callers don't need a second roundtrip to render the pool block.
- mcpd: VirtualLlmService.register accepts an optional `poolName` on
RegisterProviderInput; the route's `coerceProviderInput` validates
the same character set as CreateLlmSchema.poolName. Backwards
compatible — older mcplocals that don't send the field continue to
publish solo Llms.
- CLI `get llm` table: new POOL column right after NAME. Solo rows
show "-" so the "no pool / pool of 1" case is unambiguous (per
user direction "make sure we see it, prominently visible and
impossible to mistake").
- CLI `describe llm`: fetches /members and renders a Pool block at
the top of the detail view when the row is in an explicit pool OR
when its implicit pool has size > 1. Each member line shows
kind/status; the anchor row gets "← this row". Block is suppressed
for solo rows so describe stays compact in the common case.
- CLI `create llm --pool-name <name>` flag and apply schema both
accept the new field. Yaml round-trip preserves it: get -o yaml
emits `poolName: <name>`, apply -f re-imports it without diff.
Verified end-to-end against the live mcpd.
- mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts
and registrar.ts thread it through into the register payload. Use
case for distributed inference: each user's mcplocal picks a
unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName`
(e.g. `user-vllm-qwen3-thinking`); agents see one logical pool
that auto-grows as workers come online.
- Shell completions: regenerated from source via the existing
scripts/generate-completions.ts. `--pool-name` now suggests in
fish + bash for `mcpctl create llm`.
Tests: +3 new mcpd route tests for /members (explicit pool, solo
pool of 1, missing-anchor 404). All suites green:
mcpd 868/868 (was 865, +3),
mcplocal 723/723,
cli 437/437.
Stage 3 (next): live smoke against 2 publishers sharing a pool name +
docs.
Adds LB-pool-by-shared-name without introducing a new resource. The
existing `Llm.name` stays globally unique; a new optional `poolName`
column declares membership in a pool. Multiple Llms sharing a non-null
`poolName` stack into one load-balanced pool that the chat dispatcher
expands at request time.
Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are
addressable as a "pool of 1" via their own name, so existing single-Llm
agents and YAMLs keep working unchanged. A solo row whose name happens
to match an explicit poolName joins the same pool — by design — so an
operator can transparently promote an existing Llm to pool seed.
Dispatcher (chat.service): prepareContext now resolves a randomly-
shuffled list of viable pool candidates (status != inactive) once per
turn. runOneInference and streamInference iterate the list on
transport-level failure (network, virtual publisher disconnect) until
one succeeds or the list is exhausted. Streaming failover only covers
"failed before first chunk" — once we've yielded text, we're committed
to that backend. Auth/4xx errors surfaced as result.status are NOT
retried; siblings with the same key/model would fail identically.
When the agent's pinned Llm is itself inactive but a sibling pool
member is up, dispatch transparently uses the sibling — that's the
whole point. When every member is inactive, prepareContext throws a
clear "No active Llm in pool '<key>' (pinned: <name>)" error rather
than letting the dispatcher's "exhausted" branch surface it.
Tests:
- 5 new chat-service tests for pool dispatch / failover / pinned-down /
all-inactive (chat-service.test.ts).
- 7 new db schema tests for the column, the unique-name invariant, the
fallback-to-name semantics, and the solo-name-joins-explicit-pool
edge case (llm-pool-schema.test.ts).
- mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions.
Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool
stats on the existing single-Llm route, CLI POOL column + describe
block + --pool-name flag, yaml round-trip.