feat: v4 LB pools by shared poolName #69

Merged
michal merged 3 commits from feat/llm-pool-by-name into main 2026-04-28 01:02:46 +00:00

3 Commits

Author SHA1 Message Date
Michal
137711fdf6 feat(docs+smoke): LB pool live smoke + virtual-llms.md pool semantics (v4 Stage 3)
Some checks failed
CI/CD / lint (pull_request) Successful in 53s
CI/CD / test (pull_request) Successful in 1m8s
CI/CD / typecheck (pull_request) Successful in 2m53s
CI/CD / smoke (pull_request) Failing after 1m47s
CI/CD / build (pull_request) Successful in 6m20s
CI/CD / publish (pull_request) Has been skipped
Smoke (tests/smoke/llm-pool.smoke.test.ts): two in-process registrars
publish virtual Llms with distinct names but a shared poolName, then:

  1. /api/v1/llms/<name>/members surfaces both with the correct
     effective pool key, size, activeCount, and per-member kind/status.
  2. Chat through an agent pinned to one pool member dispatches across
     the pool — verified by running 12 calls and asserting at least
     one response from each backend (the random-shuffle selection
     would have to hit only-A or only-B in 12 fair coin flips, ~1/2048).
  3. Failover: stop one publisher, the surviving member still serves
     chat. /members shows the stopped row as inactive immediately
     (unbindSession runs synchronously on SSE close).

docs/virtual-llms.md gets a full "LB pools (v4)" section with the
two-field schema model, dispatcher selection + failover semantics,
public + virtual declaration examples, list/describe rendering, the
"pin to specific instance" escape hatch, and an API surface entry
for /members. docs/agents.md cross-link extended.

Tests: full smoke 144/144 (was 141, +3 for the new pool smoke).
Stages 1-3 ship the complete v4 — public and virtual Llms can both
join pools, agents transparently load-balance across them, yaml
round-trip preserves poolName, and the existing single-Llm world
keeps working byte-identically when poolName is null.
2026-04-27 23:22:15 +01:00
Michal
e21f96080d feat(mcpd+cli+mcplocal): /llms/<name>/members + POOL column + --pool-name (v4 Stage 2)
Surfaces the v4 pool model end-to-end:

- mcpd: GET /api/v1/llms/:name/members returns the effective pool the
  named anchor belongs to, plus aggregate stats (size, activeCount,
  explicit vs implicit pool key). RBAC inherits from `view:llms` —
  same as the single-Llm route. Members are full LlmView shapes so
  callers don't need a second roundtrip to render the pool block.

- mcpd: VirtualLlmService.register accepts an optional `poolName` on
  RegisterProviderInput; the route's `coerceProviderInput` validates
  the same character set as CreateLlmSchema.poolName. Backwards
  compatible — older mcplocals that don't send the field continue to
  publish solo Llms.

- CLI `get llm` table: new POOL column right after NAME. Solo rows
  show "-" so the "no pool / pool of 1" case is unambiguous (per
  user direction "make sure we see it, prominently visible and
  impossible to mistake").

- CLI `describe llm`: fetches /members and renders a Pool block at
  the top of the detail view when the row is in an explicit pool OR
  when its implicit pool has size > 1. Each member line shows
  kind/status; the anchor row gets "← this row". Block is suppressed
  for solo rows so describe stays compact in the common case.

- CLI `create llm --pool-name <name>` flag and apply schema both
  accept the new field. Yaml round-trip preserves it: get -o yaml
  emits `poolName: <name>`, apply -f re-imports it without diff.
  Verified end-to-end against the live mcpd.

- mcplocal: LlmProviderFileEntry gains optional `poolName`; main.ts
  and registrar.ts thread it through into the register payload. Use
  case for distributed inference: each user's mcplocal picks a
  unique `name` (e.g. `vllm-<host>-qwen3`) but a shared `poolName`
  (e.g. `user-vllm-qwen3-thinking`); agents see one logical pool
  that auto-grows as workers come online.

- Shell completions: regenerated from source via the existing
  scripts/generate-completions.ts. `--pool-name` now suggests in
  fish + bash for `mcpctl create llm`.

Tests: +3 new mcpd route tests for /members (explicit pool, solo
pool of 1, missing-anchor 404). All suites green:
  mcpd 868/868 (was 865, +3),
  mcplocal 723/723,
  cli 437/437.

Stage 3 (next): live smoke against 2 publishers sharing a pool name +
docs.
2026-04-27 23:18:53 +01:00
Michal
7949e1393d feat(mcpd+db): Llm.poolName + chat dispatcher pool failover (v4 Stage 1)
Adds LB-pool-by-shared-name without introducing a new resource. The
existing `Llm.name` stays globally unique; a new optional `poolName`
column declares membership in a pool. Multiple Llms sharing a non-null
`poolName` stack into one load-balanced pool that the chat dispatcher
expands at request time.

Effective pool key = `poolName ?? name`. Solo rows (poolName=null) are
addressable as a "pool of 1" via their own name, so existing single-Llm
agents and YAMLs keep working unchanged. A solo row whose name happens
to match an explicit poolName joins the same pool — by design — so an
operator can transparently promote an existing Llm to pool seed.

Dispatcher (chat.service): prepareContext now resolves a randomly-
shuffled list of viable pool candidates (status != inactive) once per
turn. runOneInference and streamInference iterate the list on
transport-level failure (network, virtual publisher disconnect) until
one succeeds or the list is exhausted. Streaming failover only covers
"failed before first chunk" — once we've yielded text, we're committed
to that backend. Auth/4xx errors surfaced as result.status are NOT
retried; siblings with the same key/model would fail identically.

When the agent's pinned Llm is itself inactive but a sibling pool
member is up, dispatch transparently uses the sibling — that's the
whole point. When every member is inactive, prepareContext throws a
clear "No active Llm in pool '<key>' (pinned: <name>)" error rather
than letting the dispatcher's "exhausted" branch surface it.

Tests:
- 5 new chat-service tests for pool dispatch / failover / pinned-down /
  all-inactive (chat-service.test.ts).
- 7 new db schema tests for the column, the unique-name invariant, the
  fallback-to-name semantics, and the solo-name-joins-explicit-pool
  edge case (llm-pool-schema.test.ts).
- mcpd 865/865 (was 860; +5), db pool-schema 7/7, no regressions.

Stage 2 (next): HTTP route /api/v1/llms/<name>/members + aggregate pool
stats on the existing single-Llm route, CLI POOL column + describe
block + --pool-name flag, yaml round-trip.
2026-04-27 22:02:41 +01:00