feat: v4 LB pools by shared poolName #69

michal · 2026-04-27T22:22:38Z

michal commented

2026-04-27 22:22:38 +00:00

Summary

v4 adds a load-balanced pool model without introducing a new
resource. Llm.name stays globally unique (the apply key); a new
optional Llm.poolName declares membership. Multiple Llms sharing a
non-null poolName stack into one pool that the chat dispatcher
expands at request time and selects from with random + sequential
failover. Solo Llms (poolName=null) work exactly as pre-v4 — the
effective pool key falls back to the row's own name, the pool is
size 1, no failover.

The user direction was explicit: keep Llm.name unique, separate
"resource name" from "pool name", make pools impossible to mistake
in mcpctl get llm. That's what landed.

Three stages

Stage 1 (7949e13): poolName schema + repo + service +
chat.service dispatcher with random selection + transport-failure
failover. New findByPoolName returns members where
poolName = $1 OR (poolName IS NULL AND name = $1) so solo rows
stay addressable as pool-of-1. 5 new chat-service tests, 7 new db
schema tests.
Stage 2 (e21f960): GET /api/v1/llms/<name>/members returns
members + aggregate size / activeCount. CLI gains a POOL
column right after NAME, a Pool: block in describe llm with
member list and "← this row" indicator, --pool-name flag on
create llm, yaml round-trip with poolName, and shell
completions. mcplocal LlmProviderFileEntry + RegistrarPublishedProvider
thread poolName through the register payload, validated server-
side with the same regex as CreateLlmSchema.
Stage 3 (137711f): live smoke against the deployed mcpd —
two in-process publishers share a poolName, agent pinned to one
member dispatches across both (asserted across 12 calls), failover
verified by stopping one publisher and confirming the survivor
serves chat. New "LB pools (v4)" section in docs/virtual-llms.md
with declaration examples for public + virtual, dispatcher
semantics, and the API surface entry.

Test plan

mcpd unit suite: 868/868 (was 865; +3 for /members route)
mcplocal unit suite: 723/723
cli unit suite: 437/437
db schema suite: +7 new tests for poolName + findByPoolName
full smoke suite: 144/144 (was 141; +3 for live pool smoke)
End-to-end verified against live mcpd: create with --pool-name,
get llm shows POOL column, describe shows Pool block + members,
get -o yaml | apply -f - round-trips without diff
Solo (non-pooled) Llms verified to render as - in POOL column
and to suppress the Pool block in describe

## Summary v4 adds a load-balanced pool model without introducing a new resource. `Llm.name` stays globally unique (the apply key); a new optional `Llm.poolName` declares membership. Multiple Llms sharing a non-null `poolName` stack into one pool that the chat dispatcher expands at request time and selects from with random + sequential failover. Solo Llms (poolName=null) work exactly as pre-v4 — the effective pool key falls back to the row's own name, the pool is size 1, no failover. The user direction was explicit: keep `Llm.name` unique, separate "resource name" from "pool name", make pools impossible to mistake in `mcpctl get llm`. That's what landed. ### Three stages - **Stage 1** (`7949e13`): `poolName` schema + repo + service + `chat.service` dispatcher with random selection + transport-failure failover. New `findByPoolName` returns members where `poolName = $1 OR (poolName IS NULL AND name = $1)` so solo rows stay addressable as pool-of-1. 5 new chat-service tests, 7 new db schema tests. - **Stage 2** (`e21f960`): `GET /api/v1/llms/<name>/members` returns members + aggregate `size` / `activeCount`. CLI gains a `POOL` column right after `NAME`, a `Pool:` block in `describe llm` with member list and "← this row" indicator, `--pool-name` flag on `create llm`, yaml round-trip with `poolName`, and shell completions. mcplocal `LlmProviderFileEntry` + `RegistrarPublishedProvider` thread `poolName` through the register payload, validated server- side with the same regex as `CreateLlmSchema`. - **Stage 3** (`137711f`): live smoke against the deployed mcpd — two in-process publishers share a `poolName`, agent pinned to one member dispatches across both (asserted across 12 calls), failover verified by stopping one publisher and confirming the survivor serves chat. New "LB pools (v4)" section in `docs/virtual-llms.md` with declaration examples for public + virtual, dispatcher semantics, and the API surface entry. ### Test plan - [x] mcpd unit suite: 868/868 (was 865; +3 for /members route) - [x] mcplocal unit suite: 723/723 - [x] cli unit suite: 437/437 - [x] db schema suite: +7 new tests for poolName + findByPoolName - [x] full smoke suite: 144/144 (was 141; +3 for live pool smoke) - [x] End-to-end verified against live mcpd: create with --pool-name, get llm shows POOL column, describe shows Pool block + members, `get -o yaml | apply -f -` round-trips without diff - [x] Solo (non-pooled) Llms verified to render as `-` in POOL column and to suppress the Pool block in describe

michal added 3 commits 2026-04-27 22:22:39 +00:00

feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1) 7949e1393d

Adds LB-pool-by-shared-name without introducing a new resource. The
existing `Llm.name` stays globally unique; a new optional `poolName`
column declares membership in a pool. Multiple Llms sharing a non-null
`poolName` stack into one load-balanced pool that the chat dispatcher
expands at request time.

Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are
addressable as a "pool of 1" via their own name, so existing single-Llm
agents and YAMLs keep working unchanged. A solo row whose name happens
to match an explicit poolName joins the same pool — by design — so an
operator can transparently promote an existing Llm to pool seed.

Dispatcher (chat.service): prepareContext now resolves a randomly-
shuffled list of viable pool candidates (status != inactive) once per
turn. runOneInference and streamInference iterate the list on
transport-level failure (network, virtual publisher disconnect) until
one succeeds or the list is exhausted. Streaming failover only covers
"failed before first chunk" — once we've yielded text, we're committed
to that backend. Auth/4xx errors surfaced as result.status are NOT
retried; siblings with the same key/model would fail identically.

When the agent's pinned Llm is itself inactive but a sibling pool
member is up, dispatch transparently uses the sibling — that's the
whole point. When every member is inactive, prepareContext throws a
clear "No active Llm in pool '<key>' (pinned: <name>)" error rather
than letting the dispatcher's "exhausted" branch surface it.

Tests:
- 5 new chat-service tests for pool dispatch / failover / pinned-down /
  all-inactive (chat-service.test.ts).
- 7 new db schema tests for the column, the unique-name invariant, the
  fallback-to-name semantics, and the solo-name-joins-explicit-pool
  edge case (llm-pool-schema.test.ts).
- mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions.

Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool
stats on the existing single-Llm route, CLI POOL column + describe
block + --pool-name flag, yaml round-trip.

feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2) e21f96080d

Surfaces the v4 pool model end-to-end:

- mcpd: GET /api/v1/llms/:name/members returns the effective pool the
  named anchor belongs to, plus aggregate stats (size, activeCount,
  explicit vs implicit pool key). RBAC inherits from `view:llms` —
  same as the single-Llm route. Members are full LlmView shapes so
  callers don't need a second roundtrip to render the pool block.

- mcpd: VirtualLlmService.register accepts an optional `poolName` on
  RegisterProviderInput; the route's `coerceProviderInput` validates
  the same character set as CreateLlmSchema.poolName. Backwards
  compatible — older mcplocals that don't send the field continue to
  publish solo Llms.

- CLI `get llm` table: new POOL column right after NAME. Solo rows
  show "-" so the "no pool / pool of 1" case is unambiguous (per
  user direction "make sure we see it, prominently visible and
  impossible to mistake").

- CLI `describe llm`: fetches /members and renders a Pool block at
  the top of the detail view when the row is in an explicit pool OR
  when its implicit pool has size > 1. Each member line shows
  kind/status; the anchor row gets "← this row". Block is suppressed
  for solo rows so describe stays compact in the common case.

- CLI `create llm --pool-name <name>` flag and apply schema both
  accept the new field. Yaml round-trip preserves it: get -o yaml
  emits `poolName: <name>`, apply -f re-imports it without diff.
  Verified end-to-end against the live mcpd.

- mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts
  and registrar.ts thread it through into the register payload. Use
  case for distributed inference: each user's mcplocal picks a
  unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName`
  (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool
  that auto-grows as workers come online.

- Shell completions: regenerated from source via the existing
  scripts/generate-completions.ts. `--pool-name` now suggests in
  fish + bash for `mcpctl create llm`.

Tests: +3 new mcpd route tests for /members (explicit pool, solo
pool of 1, missing-anchor 404). All suites green:
  mcpd 868/868 (was 865, +3),
  mcplocal 723/723,
  cli 437/437.

Stage 3 (next): live smoke against 2 publishers sharing a pool name +
docs.

feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3)

CI/CD / lint (pull_request) Successful in 53s

Details

CI/CD / test (pull_request) Successful in 1m8s

Details

CI/CD / typecheck (pull_request) Successful in 2m53s

Details

CI/CD / smoke (pull_request) Failing after 1m47s

Details

CI/CD / build (pull_request) Successful in 6m20s

Details

CI/CD / publish (pull_request) Has been skipped

Details

137711fdf6

Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars
publish virtual Llms with distinct names but a shared poolName, then:

  1. /api/v1/llms/<name>/members surfaces both with the correct
     effective pool key, size, activeCount, and per-member kind/status.
  2. Chat through an agent pinned to one pool member dispatches across
     the pool — verified by running 12 calls and asserting at least
     one response from each backend (the random-shuffle selection
     would have to hit only-A or only-B in 12 fair coin flips, ~1/2048).
  3. Failover: stop one publisher, the surviving member still serves
     chat. /members shows the stopped row as inactive immediately
     (unbindSession runs synchronously on SSE close).

docs/virtual-llms.md gets a full "LB pools (v4)" section with the
two-field schema model, dispatcher selection + failover semantics,
public + virtual declaration examples, list/describe rendering, the
"pin to specific instance" escape hatch, and an API surface entry
for /members. docs/agents.md cross-link extended.

Tests: full smoke 144/144 (was 141, +3 for the new pool smoke).
Stages 1-3 ship the complete v4 — public and virtual Llms can both
join pools, agents transparently load-balance across them, yaml
round-trip preserves poolName, and the existing single-Llm world
keeps working byte-identically when poolName is null.

michal added 1 commit 2026-04-27 22:22:39 +00:00

feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3)

CI/CD / lint (pull_request) Successful in 53s

Details

CI/CD / test (pull_request) Successful in 1m8s

Details

CI/CD / typecheck (pull_request) Successful in 2m53s

Details

CI/CD / smoke (pull_request) Failing after 1m47s

Details

CI/CD / build (pull_request) Successful in 6m20s

Details

CI/CD / publish (pull_request) Has been skipped

Details

137711fdf6

Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars
publish virtual Llms with distinct names but a shared poolName, then:

  1. /api/v1/llms/<name>/members surfaces both with the correct
     effective pool key, size, activeCount, and per-member kind/status.
  2. Chat through an agent pinned to one pool member dispatches across
     the pool — verified by running 12 calls and asserting at least
     one response from each backend (the random-shuffle selection
     would have to hit only-A or only-B in 12 fair coin flips, ~1/2048).
  3. Failover: stop one publisher, the surviving member still serves
     chat. /members shows the stopped row as inactive immediately
     (unbindSession runs synchronously on SSE close).

docs/virtual-llms.md gets a full "LB pools (v4)" section with the
two-field schema model, dispatcher selection + failover semantics,
public + virtual declaration examples, list/describe rendering, the
"pin to specific instance" escape hatch, and an API surface entry
for /members. docs/agents.md cross-link extended.

Tests: full smoke 144/144 (was 141, +3 for the new pool smoke).
Stages 1-3 ship the complete v4 — public and virtual Llms can both
join pools, agents transparently load-balance across them, yaml
round-trip preserves poolName, and the existing single-Llm world
keeps working byte-identically when poolName is null.

michal merged commit 256e117021 into main

2026-04-28 01:02:46 +00:00

michal deleted branch feat/llm-pool-by-name

2026-04-28 01:02:46 +00:00

michal referenced this issue from a commit

2026-04-28 01:02:47 +00:00

Merge pull request 'feat: v4 LB pools by shared poolName' (#69) from feat/llm-pool-by-name into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: michal/mcpctl#69