feat: v4 LB pools by shared poolName #69

Merged

michal merged 3 commits from feat/llm-pool-by-name into main

2026-04-28 01:02:46 +00:00

Author	SHA1	Message	Date
Michal	137711fdf6	feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3) Some checks failed CI/CD / lint (pull_request) Successful in 53s Details CI/CD / test (pull_request) Successful in 1m8s Details CI/CD / typecheck (pull_request) Successful in 2m53s Details CI/CD / smoke (pull_request) Failing after 1m47s Details CI/CD / build (pull_request) Successful in 6m20s Details CI/CD / publish (pull_request) Has been skipped Details Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars publish virtual Llms with distinct names but a shared poolName, then: 1. /api/v1/llms/<name>/members surfaces both with the correct effective pool key, size, activeCount, and per-member kind/status. 2. Chat through an agent pinned to one pool member dispatches across the pool — verified by running 12 calls and asserting at least one response from each backend (the random-shuffle selection would have to hit only-A or only-B in 12 fair coin flips, ~1/2048). 3. Failover: stop one publisher, the surviving member still serves chat. /members shows the stopped row as inactive immediately (unbindSession runs synchronously on SSE close). docs/virtual-llms.md gets a full "LB pools (v4)" section with the two-field schema model, dispatcher selection + failover semantics, public + virtual declaration examples, list/describe rendering, the "pin to specific instance" escape hatch, and an API surface entry for /members. docs/agents.md cross-link extended. Tests: full smoke 144/144 (was 141, +3 for the new pool smoke). Stages 1-3 ship the complete v4 — public and virtual Llms can both join pools, agents transparently load-balance across them, yaml round-trip preserves poolName, and the existing single-Llm world keeps working byte-identically when poolName is null.	2026-04-27 23:22:15 +01:00
Michal	e21f96080d	feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2) Surfaces the v4 pool model end-to-end: - mcpd: GET /api/v1/llms/:name/members returns the effective pool the named anchor belongs to, plus aggregate stats (size, activeCount, explicit vs implicit pool key). RBAC inherits from `view:llms` — same as the single-Llm route. Members are full LlmView shapes so callers don't need a second roundtrip to render the pool block. - mcpd: VirtualLlmService.register accepts an optional `poolName` on RegisterProviderInput; the route's `coerceProviderInput` validates the same character set as CreateLlmSchema.poolName. Backwards compatible — older mcplocals that don't send the field continue to publish solo Llms. - CLI `get llm` table: new POOL column right after NAME. Solo rows show "-" so the "no pool / pool of 1" case is unambiguous (per user direction "make sure we see it, prominently visible and impossible to mistake"). - CLI `describe llm`: fetches /members and renders a Pool block at the top of the detail view when the row is in an explicit pool OR when its implicit pool has size > 1. Each member line shows kind/status; the anchor row gets "← this row". Block is suppressed for solo rows so describe stays compact in the common case. - CLI `create llm --pool-name <name>` flag and apply schema both accept the new field. Yaml round-trip preserves it: get -o yaml emits `poolName: <name>`, apply -f re-imports it without diff. Verified end-to-end against the live mcpd. - mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts and registrar.ts thread it through into the register payload. Use case for distributed inference: each user's mcplocal picks a unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName` (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool that auto-grows as workers come online. - Shell completions: regenerated from source via the existing scripts/generate-completions.ts. `--pool-name` now suggests in fish + bash for `mcpctl create llm`. Tests: +3 new mcpd route tests for /members (explicit pool, solo pool of 1, missing-anchor 404). All suites green: mcpd 868/868 (was 865, +3), mcplocal 723/723, cli 437/437. Stage 3 (next): live smoke against 2 publishers sharing a pool name + docs.	2026-04-27 23:18:53 +01:00
Michal	7949e1393d	feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1) Adds LB-pool-by-shared-name without introducing a new resource. The existing `Llm.name` stays globally unique; a new optional `poolName` column declares membership in a pool. Multiple Llms sharing a non-null `poolName` stack into one load-balanced pool that the chat dispatcher expands at request time. Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are addressable as a "pool of 1" via their own name, so existing single-Llm agents and YAMLs keep working unchanged. A solo row whose name happens to match an explicit poolName joins the same pool — by design — so an operator can transparently promote an existing Llm to pool seed. Dispatcher (chat.service): prepareContext now resolves a randomly- shuffled list of viable pool candidates (status != inactive) once per turn. runOneInference and streamInference iterate the list on transport-level failure (network, virtual publisher disconnect) until one succeeds or the list is exhausted. Streaming failover only covers "failed before first chunk" — once we've yielded text, we're committed to that backend. Auth/4xx errors surfaced as result.status are NOT retried; siblings with the same key/model would fail identically. When the agent's pinned Llm is itself inactive but a sibling pool member is up, dispatch transparently uses the sibling — that's the whole point. When every member is inactive, prepareContext throws a clear "No active Llm in pool '<key>' (pinned: <name>)" error rather than letting the dispatcher's "exhausted" branch surface it. Tests: - 5 new chat-service tests for pool dispatch / failover / pinned-down / all-inactive (chat-service.test.ts). - 7 new db schema tests for the column, the unique-name invariant, the fallback-to-name semantics, and the solo-name-joins-explicit-pool edge case (llm-pool-schema.test.ts). - mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions. Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool stats on the existing single-Llm route, CLI POOL column + describe block + --pool-name flag, yaml round-trip.	2026-04-27 22:02:41 +01:00

Author

SHA1

Message

Date

Michal

137711fdf6

feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3)

CI/CD / lint (pull_request) Successful in 53s

Details

CI/CD / test (pull_request) Successful in 1m8s

Details

CI/CD / typecheck (pull_request) Successful in 2m53s

Details

CI/CD / smoke (pull_request) Failing after 1m47s

Details

CI/CD / build (pull_request) Successful in 6m20s

Details

CI/CD / publish (pull_request) Has been skipped

Details

Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars
publish virtual Llms with distinct names but a shared poolName, then:

  1. /api/v1/llms/<name>/members surfaces both with the correct
     effective pool key, size, activeCount, and per-member kind/status.
  2. Chat through an agent pinned to one pool member dispatches across
     the pool — verified by running 12 calls and asserting at least
     one response from each backend (the random-shuffle selection
     would have to hit only-A or only-B in 12 fair coin flips, ~1/2048).
  3. Failover: stop one publisher, the surviving member still serves
     chat. /members shows the stopped row as inactive immediately
     (unbindSession runs synchronously on SSE close).

docs/virtual-llms.md gets a full "LB pools (v4)" section with the
two-field schema model, dispatcher selection + failover semantics,
public + virtual declaration examples, list/describe rendering, the
"pin to specific instance" escape hatch, and an API surface entry
for /members. docs/agents.md cross-link extended.

Tests: full smoke 144/144 (was 141, +3 for the new pool smoke).
Stages 1-3 ship the complete v4 — public and virtual Llms can both
join pools, agents transparently load-balance across them, yaml
round-trip preserves poolName, and the existing single-Llm world
keeps working byte-identically when poolName is null.

2026-04-27 23:22:15 +01:00

Michal

e21f96080d

feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2)

Surfaces the v4 pool model end-to-end:

- mcpd: GET /api/v1/llms/:name/members returns the effective pool the
  named anchor belongs to, plus aggregate stats (size, activeCount,
  explicit vs implicit pool key). RBAC inherits from `view:llms` —
  same as the single-Llm route. Members are full LlmView shapes so
  callers don't need a second roundtrip to render the pool block.

- mcpd: VirtualLlmService.register accepts an optional `poolName` on
  RegisterProviderInput; the route's `coerceProviderInput` validates
  the same character set as CreateLlmSchema.poolName. Backwards
  compatible — older mcplocals that don't send the field continue to
  publish solo Llms.

- CLI `get llm` table: new POOL column right after NAME. Solo rows
  show "-" so the "no pool / pool of 1" case is unambiguous (per
  user direction "make sure we see it, prominently visible and
  impossible to mistake").

- CLI `describe llm`: fetches /members and renders a Pool block at
  the top of the detail view when the row is in an explicit pool OR
  when its implicit pool has size > 1. Each member line shows
  kind/status; the anchor row gets "← this row". Block is suppressed
  for solo rows so describe stays compact in the common case.

- CLI `create llm --pool-name <name>` flag and apply schema both
  accept the new field. Yaml round-trip preserves it: get -o yaml
  emits `poolName: <name>`, apply -f re-imports it without diff.
  Verified end-to-end against the live mcpd.

- mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts
  and registrar.ts thread it through into the register payload. Use
  case for distributed inference: each user's mcplocal picks a
  unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName`
  (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool
  that auto-grows as workers come online.

- Shell completions: regenerated from source via the existing
  scripts/generate-completions.ts. `--pool-name` now suggests in
  fish + bash for `mcpctl create llm`.

Tests: +3 new mcpd route tests for /members (explicit pool, solo
pool of 1, missing-anchor 404). All suites green:
  mcpd 868/868 (was 865, +3),
  mcplocal 723/723,
  cli 437/437.

Stage 3 (next): live smoke against 2 publishers sharing a pool name +
docs.

2026-04-27 23:18:53 +01:00

Michal

7949e1393d

feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1)

Adds LB-pool-by-shared-name without introducing a new resource. The
existing `Llm.name` stays globally unique; a new optional `poolName`
column declares membership in a pool. Multiple Llms sharing a non-null
`poolName` stack into one load-balanced pool that the chat dispatcher
expands at request time.

Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are
addressable as a "pool of 1" via their own name, so existing single-Llm
agents and YAMLs keep working unchanged. A solo row whose name happens
to match an explicit poolName joins the same pool — by design — so an
operator can transparently promote an existing Llm to pool seed.

Dispatcher (chat.service): prepareContext now resolves a randomly-
shuffled list of viable pool candidates (status != inactive) once per
turn. runOneInference and streamInference iterate the list on
transport-level failure (network, virtual publisher disconnect) until
one succeeds or the list is exhausted. Streaming failover only covers
"failed before first chunk" — once we've yielded text, we're committed
to that backend. Auth/4xx errors surfaced as result.status are NOT
retried; siblings with the same key/model would fail identically.

When the agent's pinned Llm is itself inactive but a sibling pool
member is up, dispatch transparently uses the sibling — that's the
whole point. When every member is inactive, prepareContext throws a
clear "No active Llm in pool '<key>' (pinned: <name>)" error rather
than letting the dispatcher's "exhausted" branch surface it.

Tests:
- 5 new chat-service tests for pool dispatch / failover / pinned-down /
  all-inactive (chat-service.test.ts).
- 7 new db schema tests for the column, the unique-name invariant, the
  fallback-to-name semantics, and the solo-name-joins-explicit-pool
  edge case (llm-pool-schema.test.ts).
- mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions.

Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool
stats on the existing single-Llm route, CLI POOL column + describe
block + --pool-name flag, yaml round-trip.

2026-04-27 22:02:41 +01:00

feat: v4 LB pools by shared poolName #69

3 Commits