2026-04-28 01:02:46 +00:00
3 changed files with 432 additions and 5 deletions
--- a/docs/agents.md
+++ b/docs/agents.md
@@ -208,5 +208,8 @@ mcpctl chat reviewer
  extends the same publishing model to **virtual agents** declared in
  mcplocal config — they show up in `mcpctl get agent` with
  `KIND=virtual / STATUS=active` and become chat-able via
-  `mcpctl chat <name>` like any other agent.
+  `mcpctl chat <name>` like any other agent. **v4** adds **pools**:
  Llms sharing a `poolName` stack into one load-balanced pool that
  the chat dispatcher transparently widens to at request time, with
  random selection + sequential failover on transport errors.
 - [chat.md](./chat.md) — `mcpctl chat` flow and LiteLLM-style flags.
--- a/docs/virtual-llms.md
+++ b/docs/virtual-llms.md
@@ -278,10 +278,148 @@ when the agent's pinned Llm is virtual.
 same agent name collide on the second register with HTTP 409. Per-publisher
 namespacing is a v4+ concern — same constraint as virtual Llms in v1.
 ## LB pools (v4)
 Two or more `Llm` rows that share a `poolName` stack into one
 load-balanced pool. Agents pin to a single Llm by id; the chat
 dispatcher transparently widens to "all healthy Llms with the same
 effective pool key" at request time and picks one. There is no new
 `LlmPool` resource — `poolName` is just an optional column on `Llm`,
 so RBAC, listing, yaml round-trip, and apply all work the same way
 they did pre-v4.
 ### Pool semantics
 | Field | Behavior |
 |---|---|
 | `Llm.name` | Globally unique (unchanged). The apply key. |
 | `Llm.poolName` | Optional. When set, declares membership. When NULL, falls back to `name` ("solo Llm, pool of 1"). |
 Effective pool key = `poolName ?? name`. The dispatcher's lookup is:
 ```sql
 SELECT * FROM Llm
 WHERE poolName = $1 OR (poolName IS NULL AND name = $1)
 ```
 So a solo Llm whose `name` happens to equal an explicit `poolName`
 joins that pool — by design, an existing single-row Llm can be
 promoted to "pool seed" without a rename or migration.
 ### Selection + failover
 - **Selection**: random shuffle of all members whose `status` is
  `active` (or `hibernating` — VirtualLlmService handles wake on
  dispatch). `inactive` members are skipped.
 - **Failover** (non-streaming): if dispatch throws on the first
  candidate (transport failure, virtual publisher disconnect), the
  dispatcher iterates the rest of the shuffled list until one
  succeeds or the list is exhausted. Auth/4xx responses are NOT
  retried — siblings with the same key/model would fail identically.
 - **Failover** (streaming): only covers "couldn't establish stream"
  failures (transport error before any chunk yielded). Once any
  output has been streamed, we're committed to that backend.
 ### Declaring a pool
 #### Public Llms
 ```bash
 mcpctl create llm prod-qwen-1 --type openai --model qwen3-thinking \
  --url https://prod-1.example.com --pool-name qwen-pool \
  --api-key-ref qwen-key/API_KEY
 mcpctl create llm prod-qwen-2 --type openai --model qwen3-thinking \
  --url https://prod-2.example.com --pool-name qwen-pool \
  --api-key-ref qwen-key/API_KEY
 ```
 Or via apply (yaml round-trip preserves `poolName`):
 ```yaml
 ---
 kind: llm
 name: prod-qwen-1
 type: openai
 model: qwen3-thinking
 url: https://prod-1.example.com
 poolName: qwen-pool
 apiKeyRef: { name: qwen-key, key: API_KEY }
 ---
 kind: llm
 name: prod-qwen-2
 type: openai
 model: qwen3-thinking
 url: https://prod-2.example.com
 poolName: qwen-pool
 apiKeyRef: { name: qwen-key, key: API_KEY }
 ```
 #### Virtual Llms (mcplocal-published)
 ```jsonc
 // ~/.mcpctl/config.json
 {
  "llm": {
    "providers": [
      {
        "name": "vllm-alice-qwen3",       // unique per publisher
        "type": "vllm-managed",
        "model": "Qwen/Qwen2.5-7B-Instruct-AWQ",
        "venvPath": "~/vllm_env",
        "publish": true,
        "poolName": "user-vllm-qwen3-thinking"   // shared pool key
      }
    ]
  }
 }
 ```
 Each user's mcplocal picks a unique `name` (e.g. include the hostname
 to guarantee no collisions) but shares the `poolName`. Agents pinned
 to any single member — or to `qwen3-thinking` (the public LiteLLM
 endpoint, also given `poolName: user-vllm-qwen3-thinking` if mixing
 public + virtual is desired) — see one logical pool that auto-grows
 as more workers come online.
 ### Listing + describe
 The `mcpctl get llm` table has a `POOL` column right after `NAME`.
 Solo rows render as `-`; pool members show their explicit pool key:
 ```
 NAME              POOL          KIND     STATUS    TYPE    MODEL                           ID
 qwen3-thinking    -             public   active    openai  qwen3-thinking                  cmo...
 prod-qwen-1       qwen-pool     public   active    openai  qwen3-thinking                  cmo...
 prod-qwen-2       qwen-pool     public   active    openai  qwen3-thinking                  cmo...
 ```
 `mcpctl describe llm <name>` adds a `Pool:` block at the top when the
 row is in an explicit pool OR when its implicit pool has size > 1:
 ```
 Pool:
  Pool name:    qwen-pool
  Members:      2 (2 active)
    - prod-qwen-1  [public/active]  ← this row
    - prod-qwen-2  [public/active]
 ```
 `GET /api/v1/llms/<name>/members` is the API surface — returns full
 `LlmView`s for every member plus aggregate `size` / `activeCount` so
 operator tooling doesn't need a second roundtrip.
 ### Pinning to a specific instance
 To pin an agent to one specific instance (e.g. for debugging,
 RBAC-scoped routing, or "this agent must hit this model with this
 key"), give that instance a unique name and leave its `poolName`
 unset. The agent's pool is then size 1 and dispatch is deterministic.
 Pool membership is opt-in via `poolName` — the default behavior is
 single-Llm.
 ## Roadmap (later stages)
 - **v4 — LB pool by model**: agents can target a model name instead of
  a specific Llm; mcpd picks the healthiest pool member per request.
 - **v5 — Task queue**: persisted requests for hibernating/saturated
  pools. Workers pull tasks of their model when they come online.
@@ -301,8 +439,12 @@ POST  /api/v1/llms/_provider-task/:id/result
                                             { chunk: { data, done? } }
                                             { status, body }
-GET   /api/v1/llms                         → list (includes kind, status, lastHeartbeatAt, inactiveSince)
+GET   /api/v1/llms                         → list (includes kind, status, lastHeartbeatAt, inactiveSince, poolName)
-POST  /api/v1/llms/<virtual>/infer         → routes through the SSE relay
+GET   /api/v1/llms/<name>                  → single Llm row (also accepts a CUID id)
 GET   /api/v1/llms/<name>/members          → v4: pool members for the effective pool key:
                                              { poolName, explicitPoolName, size, activeCount, members[] }
 POST  /api/v1/llms/<virtual>/infer         → routes through the SSE relay (v4: dispatcher
                                              also expands by poolName when set)
 DELETE /api/v1/llms/<virtual>              → delete unconditionally (also runs GC's job)
 GET   /api/v1/agents                       → list (v3: includes kind, status, lastHeartbeatAt, inactiveSince)
 ```
--- a/src/mcplocal/tests/smoke/llm-pool.smoke.test.ts
+++ b/src/mcplocal/tests/smoke/llm-pool.smoke.test.ts
@@ -0,0 +1,282 @@
 /**
 * v4 smoke: LB pool by shared `poolName`. Spins up two in-process
 * registrars publishing virtual Llms with distinct `name`s but a
 * shared `poolName`. Verifies:
 *
 *   - both rows show in `GET /api/v1/llms/<name>/members` with the
 *     correct effective pool key + active count.
 *   - chat through an agent pinned to one pool member dispatches
 *     across the pool (proven by the second member's content showing
 *     up at least once across N calls).
 *   - failover: stop one publisher, chat continues to work via the
 *     surviving member.
 *
 * Lifecycle (heartbeat-stale, 4 h GC) is unit-test territory; smoke
 * just covers the path the operator-facing flow runs through.
 */
 import { describe, it, expect, beforeAll, afterAll } from 'vitest';
 import http from 'node:http';
 import https from 'node:https';
 import { mkdtempSync, rmSync, readFileSync, existsSync } from 'node:fs';
 import { tmpdir } from 'node:os';
 import { join } from 'node:path';
 import {
  VirtualLlmRegistrar,
  type RegistrarPublishedProvider,
  type RegistrarPublishedAgent,
 } from '../../src/providers/registrar.js';
 import type { LlmProvider, CompletionResult } from '../../src/providers/types.js';
 const MCPD_URL = process.env.MCPD_URL ?? 'https://mcpctl.ad.itaz.eu';
 const SUFFIX = Date.now().toString(36);
 const POOL_NAME = `smoke-pool-${SUFFIX}`;
 const PROVIDER_A = `smoke-pool-a-${SUFFIX}`;
 const PROVIDER_B = `smoke-pool-b-${SUFFIX}`;
 const AGENT_NAME = `smoke-pool-agent-${SUFFIX}`;
 function makeFakeProvider(name: string, content: string): LlmProvider {
  return {
    name,
    async complete(): Promise<CompletionResult> {
      return {
        content,
        toolCalls: [],
        usage: { promptTokens: 1, completionTokens: 4, totalTokens: 5 },
        finishReason: 'stop',
      };
    },
    async listModels() { return []; },
    async isAvailable() { return true; },
  };
 }
 function healthz(url: string, timeoutMs = 5000): Promise<boolean> {
  return new Promise((resolve) => {
    const parsed = new URL(`${url.replace(/\/$/, '')}/healthz`);
    const driver = parsed.protocol === 'https:' ? https : http;
    const req = driver.get({
      hostname: parsed.hostname,
      port: parsed.port || (parsed.protocol === 'https:' ? 443 : 80),
      path: parsed.pathname,
      timeout: timeoutMs,
    }, (res) => { resolve((res.statusCode ?? 500) < 500); res.resume(); });
    req.on('error', () => resolve(false));
    req.on('timeout', () => { req.destroy(); resolve(false); });
  });
 }
 function readToken(): string | null {
  try {
    const path = join(process.env.HOME ?? '', '.mcpctl', 'credentials');
    if (!existsSync(path)) return null;
    const parsed = JSON.parse(readFileSync(path, 'utf-8')) as { token?: string };
    return parsed.token ?? null;
  } catch {
    return null;
  }
 }
 interface HttpResponse { status: number; body: string }
 function httpRequest(method: string, urlStr: string, body: unknown): Promise<HttpResponse> {
  return new Promise((resolve, reject) => {
    const tokenRaw = readToken();
    const parsed = new URL(urlStr);
    const driver = parsed.protocol === 'https:' ? https : http;
    const headers: Record<string, string> = {
      Accept: 'application/json',
      ...(body !== undefined ? { 'Content-Type': 'application/json' } : {}),
      ...(tokenRaw !== null ? { Authorization: `Bearer ${tokenRaw}` } : {}),
    };
    const req = driver.request({
      hostname: parsed.hostname,
      port: parsed.port || (parsed.protocol === 'https:' ? 443 : 80),
      path: parsed.pathname + parsed.search,
      method,
      headers,
      timeout: 30_000,
    }, (res) => {
      const chunks: Buffer[] = [];
      res.on('data', (c: Buffer) => chunks.push(c));
      res.on('end', () => {
        resolve({ status: res.statusCode ?? 0, body: Buffer.concat(chunks).toString('utf-8') });
      });
    });
    req.on('error', reject);
    req.on('timeout', () => { req.destroy(); reject(new Error(`httpRequest timeout: ${method} ${urlStr}`)); });
    if (body !== undefined) req.write(JSON.stringify(body));
    req.end();
  });
 }
 let mcpdUp = false;
 let registrarA: VirtualLlmRegistrar | null = null;
 let registrarB: VirtualLlmRegistrar | null = null;
 let tempDir: string;
 describe('llm-pool smoke (v4)', () => {
  beforeAll(async () => {
    mcpdUp = await healthz(MCPD_URL);
    if (!mcpdUp) {
      // eslint-disable-next-line no-console
      console.warn(`\n  ○ llm-pool smoke: skipped — ${MCPD_URL}/healthz unreachable.\n`);
      return;
    }
    if (readToken() === null) {
      mcpdUp = false;
      // eslint-disable-next-line no-console
      console.warn('\n  ○ llm-pool smoke: skipped — no ~/.mcpctl/credentials.\n');
      return;
    }
    tempDir = mkdtempSync(join(tmpdir(), 'mcpctl-llm-pool-smoke-'));
  }, 20_000);
  afterAll(async () => {
    if (registrarA !== null) registrarA.stop();
    if (registrarB !== null) registrarB.stop();
    if (tempDir !== undefined) rmSync(tempDir, { recursive: true, force: true });
    if (!mcpdUp) return;
    // Best-effort cleanup. Agent first (Restrict FK), then both Llms.
    const agents = await httpRequest('GET', `${MCPD_URL}/api/v1/agents`, undefined);
    if (agents.status === 200) {
      const rows = JSON.parse(agents.body) as Array<{ id: string; name: string }>;
      const row = rows.find((r) => r.name === AGENT_NAME);
      if (row !== undefined) await httpRequest('DELETE', `${MCPD_URL}/api/v1/agents/${row.id}`, undefined);
    }
    const llms = await httpRequest('GET', `${MCPD_URL}/api/v1/llms`, undefined);
    if (llms.status === 200) {
      const rows = JSON.parse(llms.body) as Array<{ id: string; name: string }>;
      for (const r of rows) {
        if (r.name === PROVIDER_A || r.name === PROVIDER_B) {
          await httpRequest('DELETE', `${MCPD_URL}/api/v1/llms/${r.id}`, undefined);
        }
      }
    }
  });
  it('two publishers with shared poolName show up in /api/v1/llms/<name>/members', async () => {
    if (!mcpdUp) return;
    const token = readToken();
    if (token === null) return;
    // Publisher A — also publishes the agent so we can chat through the pool.
    const pubA: RegistrarPublishedProvider = {
      provider: makeFakeProvider(PROVIDER_A, 'reply from A'),
      type: 'openai',
      model: 'fake-pool',
      poolName: POOL_NAME,
    };
    const pubAgent: RegistrarPublishedAgent = {
      name: AGENT_NAME,
      // Agent pins to publisher A specifically — pool dispatch then widens
      // at chat time. Demonstrates the v4 transparency: pinning to one
      // member implicitly opts the agent into the whole pool.
      llmName: PROVIDER_A,
      description: 'v4 pool smoke',
      systemPrompt: 'Reply with whatever the backend returns.',
    };
    registrarA = new VirtualLlmRegistrar({
      mcpdUrl: MCPD_URL,
      token,
      publishedProviders: [pubA],
      publishedAgents: [pubAgent],
      sessionFilePath: join(tempDir, 'session-a'),
      log: { info: () => {}, warn: () => {}, error: () => {} },
      heartbeatIntervalMs: 60_000,
    });
    await registrarA.start();
    // Publisher B — same poolName, different name. No agent.
    const pubB: RegistrarPublishedProvider = {
      provider: makeFakeProvider(PROVIDER_B, 'reply from B'),
      type: 'openai',
      model: 'fake-pool',
      poolName: POOL_NAME,
    };
    registrarB = new VirtualLlmRegistrar({
      mcpdUrl: MCPD_URL,
      token,
      publishedProviders: [pubB],
      sessionFilePath: join(tempDir, 'session-b'),
      log: { info: () => {}, warn: () => {}, error: () => {} },
      heartbeatIntervalMs: 60_000,
    });
    await registrarB.start();
    // Let both registers settle on mcpd's side.
    await new Promise((r) => setTimeout(r, 600));
    // Hit the new /members endpoint via either pool member's name.
    const res = await httpRequest('GET', `${MCPD_URL}/api/v1/llms/${PROVIDER_A}/members`, undefined);
    expect(res.status).toBe(200);
    const body = JSON.parse(res.body) as {
      poolName: string;
      explicitPoolName: string | null;
      size: number;
      activeCount: number;
      members: Array<{ name: string; poolName: string | null; status: string }>;
    };
    expect(body.poolName).toBe(POOL_NAME);
    expect(body.explicitPoolName).toBe(POOL_NAME);
    expect(body.size).toBe(2);
    expect(body.activeCount).toBe(2);
    const names = body.members.map((m) => m.name).sort();
    expect(names).toEqual([PROVIDER_A, PROVIDER_B].sort());
    for (const m of body.members) {
      expect(m.poolName).toBe(POOL_NAME);
    }
  }, 30_000);
  it('chat through the agent dispatches across both pool members over multiple calls', async () => {
    if (!mcpdUp) return;
    // The chat dispatcher randomly shuffles candidates per call. Run
    // enough turns that hitting only one member would be statistically
    // suspicious (P(hit only A or only B) over 12 calls is ~1/2048 if the
    // shuffle is fair). We assert >= one of each.
    const seen = new Set<string>();
    for (let i = 0; i < 12; i += 1) {
      const res = await httpRequest('POST', `${MCPD_URL}/api/v1/agents/${AGENT_NAME}/chat`, {
        message: `ping ${String(i)}`,
        stream: false,
      });
      expect(res.status, res.body).toBe(200);
      const body = JSON.parse(res.body) as { assistant: string };
      seen.add(body.assistant);
      if (seen.has('reply from A') && seen.has('reply from B')) break;
    }
    expect(seen.has('reply from A'), `pool dispatch should have hit A at least once; saw: ${[...seen].join(', ')}`).toBe(true);
    expect(seen.has('reply from B'), `pool dispatch should have hit B at least once; saw: ${[...seen].join(', ')}`).toBe(true);
  }, 90_000);
  it('failover: stop one publisher, chat still succeeds via the surviving member', async () => {
    if (!mcpdUp) return;
    // Stop publisher B. mcpd's unbindSession flips B's row to inactive
    // synchronously on SSE close, so the next chat's pool resolution
    // skips it.
    if (registrarB !== null) {
      registrarB.stop();
      registrarB = null;
    }
    await new Promise((r) => setTimeout(r, 400));
    // Confirm B is inactive in /members.
    const members = await httpRequest('GET', `${MCPD_URL}/api/v1/llms/${PROVIDER_A}/members`, undefined);
    expect(members.status).toBe(200);
    const body = JSON.parse(members.body) as {
      members: Array<{ name: string; status: string }>;
    };
    const memB = body.members.find((m) => m.name === PROVIDER_B);
    expect(memB?.status).toBe('inactive');
    // Chat continues to work — only A responds now.
    for (let i = 0; i < 3; i += 1) {
      const res = await httpRequest('POST', `${MCPD_URL}/api/v1/agents/${AGENT_NAME}/chat`, {
        message: `post-failover ping ${String(i)}`,
        stream: false,
      });
      expect(res.status, res.body).toBe(200);
      const out = JSON.parse(res.body) as { assistant: string };
      expect(out.assistant).toBe('reply from A');
    }
  }, 30_000);
 });