fix(cli): status probe accepts reasoning_content for thinking models

Live deploy showed qwen3-thinking failing the probe with "empty content": at max_tokens=8 the model spent its entire budget on the reasoning trace and never emitted a final \`content\` block. Fix: - Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap models but gives reasoning models enough headroom. - If \`message.content\` is empty but \`reasoning_content\` is non-empty, count it as alive and prefix the preview with "[thinking]" so the user knows the model didn't actually answer "hi" but is responsive. - Replace the prompt with the terser "Reply with just: hi" — closer to what a thinking model can short-circuit on. Tests: existing 25 pass; the failure-path test still asserts on the "empty content" path because reasoning_content is empty there too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(cli): live "say hi" probe for server LLMs in mcpctl status (#61 )
2026-04-27 12:09:42 +01:00 · 2026-04-27 11:02:26 +00:00 · 2026-04-27 12:02:00 +01:00 · 2026-04-27 10:28:10 +00:00 · 2026-04-27 11:27:45 +01:00 · 2026-04-26 20:31:47 +00:00
3 changed files with 382 additions and 7 deletions
--- a/src/cli/src/commands/status.ts
+++ b/src/cli/src/commands/status.ts
@@ -34,6 +34,29 @@ interface ProvidersInfo {
  details?: Record<string, ProviderDetail>;
 }

+interface ServerLlm {
+  id: string;
+  name: string;
+  type: string;
+  model: string;
+  tier: string;
+  url: string;
+  apiKeyRef?: { name: string; key: string } | null;
+}
+
+/**
+ * Result of a live "say hi" probe against a server LLM. `ok` says we got a
+ * 200 + non-empty content back; `say` is the trimmed first 16 chars of the
+ * reply for the user to spot-check (most LLMs say "hi", a misbehaving one
+ * says "Hello! How can I assist…"). `ms` is end-to-end including TLS.
+ */
+export interface ServerLlmHealth {
+  ok: boolean;
+  ms: number;
+  say?: string;
+  error?: string;
+}
+
 export interface StatusCommandDeps {
  configDeps: Partial<ConfigLoaderDeps>;
  credentialsDeps: Partial<CredentialsDeps>;
@@ -46,6 +69,18 @@ export interface StatusCommandDeps {
  fetchModels: (mcplocalUrl: string) => Promise<string[]>;
  /** Fetch provider tier info from mcplocal's /llm/providers endpoint */
  fetchProviders: (mcplocalUrl: string) => Promise<ProvidersInfo | null>;
+  /**
+   * Fetch server-managed LLMs from mcpd (`mcpctl create llm` rows). Returns
+   * null on auth failure, network error, or any other unhappy path so the
+   * command stays printable even when mcpd is unreachable.
+   */
+  fetchServerLlms: (mcpdUrl: string, token: string | null) => Promise<ServerLlm[] | null>;
+  /**
+   * Probe a single server LLM with a tiny "say hi" prompt to check it's
+   * actually serving inference. Used per-LLM in parallel after the fetch.
+   * Always resolves (never throws) so one bad LLM doesn't sink the section.
+   */
+  probeServerLlm: (mcpdUrl: string, name: string, token: string | null) => Promise<ServerLlmHealth>;
  isTTY: boolean;
 }

@@ -156,6 +191,137 @@ function defaultFetchProviders(mcplocalUrl: string): Promise<ProvidersInfo | nul
  });
 }

+/**
+ * Fetch server-managed LLMs (the rows created by `mcpctl create llm`).
+ * Goes directly to mcpd because mcplocal does not proxy /api/v1/llms.
+ * Returns null on any error so the caller can decide whether to render
+ * a "not available" line vs. spilling stack traces into the status view.
+ */
+function defaultFetchServerLlms(mcpdUrl: string, token: string | null): Promise<ServerLlm[] | null> {
+  return new Promise((resolve) => {
+    let req: http.ClientRequest;
+    const headers: Record<string, string> = { Accept: 'application/json' };
+    if (token !== null) headers['Authorization'] = `Bearer ${token}`;
+    try {
+      req = httpDriverFor(mcpdUrl).get(`${mcpdUrl}/api/v1/llms`, { timeout: 5000, headers }, (res) => {
+        if (res.statusCode !== 200) { resolve(null); res.resume(); return; }
+        const chunks: Buffer[] = [];
+        res.on('data', (chunk: Buffer) => chunks.push(chunk));
+        res.on('end', () => {
+          try {
+            resolve(JSON.parse(Buffer.concat(chunks).toString('utf-8')) as ServerLlm[]);
+          } catch {
+            resolve(null);
+          }
+        });
+      });
+    } catch {
+      resolve(null);
+      return;
+    }
+    req.on('error', () => resolve(null));
+    req.on('timeout', () => { req.destroy(); resolve(null); });
+  });
+}
+
+/**
+ * POST a tiny "say hi" prompt to /api/v1/llms/<name>/infer and decide if
+ * the LLM actually serves inference. Returns ok=true when the response is
+ * 200 with non-empty content OR reasoning_content (thinking models often
+ * spend their token budget on the reasoning trace and never emit a
+ * `content` block, but they're clearly alive if reasoning came back).
+ * Otherwise ok=false with an error string suitable for one-line display.
+ *
+ * `max_tokens: 64` gives reasoning models enough headroom to emit
+ * something visible while still capping latency at ~1-2 sec on cheap
+ * models. The exact wording — "Reply with just: hi" — is more terse and
+ * closer to what a thinking model can short-circuit on without burning
+ * its entire budget on reasoning.
+ */
+const PROBE_TIMEOUT_MS = 15_000;
+const PROBE_BODY = JSON.stringify({
+  messages: [{ role: 'user', content: 'Reply with just: hi' }],
+  max_tokens: 64,
+  temperature: 0,
+});
+
+function defaultProbeServerLlm(mcpdUrl: string, name: string, token: string | null): Promise<ServerLlmHealth> {
+  return new Promise((resolve) => {
+    const started = Date.now();
+    const u = new URL(`${mcpdUrl}/api/v1/llms/${encodeURIComponent(name)}/infer`);
+    const driver = u.protocol === 'https:' ? https : http;
+    const headers: Record<string, string> = {
+      'Content-Type': 'application/json',
+      Accept: 'application/json',
+      'Content-Length': String(Buffer.byteLength(PROBE_BODY)),
+    };
+    if (token !== null) headers['Authorization'] = `Bearer ${token}`;
+
+    let req: http.ClientRequest;
+    try {
+      req = driver.request({
+        hostname: u.hostname,
+        port: u.port || (u.protocol === 'https:' ? 443 : 80),
+        path: u.pathname + u.search,
+        method: 'POST',
+        headers,
+        timeout: PROBE_TIMEOUT_MS,
+      }, (res) => {
+        const chunks: Buffer[] = [];
+        res.on('data', (c: Buffer) => chunks.push(c));
+        res.on('end', () => {
+          const ms = Date.now() - started;
+          const body = Buffer.concat(chunks).toString('utf-8');
+          if ((res.statusCode ?? 0) !== 200) {
+            // Pull out just the error message if the body is JSON, else the
+            // raw status — keeps the line tidy.
+            let msg = `HTTP ${String(res.statusCode ?? 0)}`;
+            try {
+              const parsed = JSON.parse(body) as { error?: string };
+              if (typeof parsed.error === 'string') msg = parsed.error;
+            } catch { /* not JSON, fall through */ }
+            resolve({ ok: false, ms, error: msg.slice(0, 80) });
+            return;
+          }
+          let content = '';
+          let reasoning = '';
+          try {
+            const parsed = JSON.parse(body) as {
+              choices?: Array<{ message?: { content?: string; reasoning_content?: string } }>;
+            };
+            const msg = parsed.choices?.[0]?.message;
+            content = msg?.content?.trim() ?? '';
+            reasoning = msg?.reasoning_content?.trim() ?? '';
+          } catch {
+            resolve({ ok: false, ms, error: 'invalid response body' });
+            return;
+          }
+          if (content !== '') {
+            resolve({ ok: true, ms, say: content.slice(0, 16) });
+            return;
+          }
+          if (reasoning !== '') {
+            // Thinking model burned its budget on the reasoning trace
+            // before emitting `content`. The LLM is alive — flag it as
+            // ok and surface a short reasoning preview so the user can
+            // tell at a glance.
+            resolve({ ok: true, ms, say: `[thinking] ${reasoning.slice(0, 12)}` });
+            return;
+          }
+          resolve({ ok: false, ms, error: 'empty content' });
+        });
+      });
+    } catch {
+      resolve({ ok: false, ms: Date.now() - started, error: 'request failed' });
+      return;
+    }
+    req.on('error', (e) => resolve({ ok: false, ms: Date.now() - started, error: e.message.slice(0, 80) }));
+    req.on('timeout', () => { req.destroy(); resolve({ ok: false, ms: Date.now() - started, error: 'timeout' }); });
+    req.write(PROBE_BODY);
+    req.end();
+  });
+}
+
 const SPINNER_FRAMES = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'];

 const defaultDeps: StatusCommandDeps = {
@@ -167,6 +333,8 @@ const defaultDeps: StatusCommandDeps = {
  checkLlm: defaultCheckLlm,
  fetchModels: defaultFetchModels,
  fetchProviders: defaultFetchProviders,
+  fetchServerLlms: defaultFetchServerLlms,
+  probeServerLlm: defaultProbeServerLlm,
  isTTY: process.stdout.isTTY ?? false,
 };

@@ -228,7 +396,7 @@ function formatProviderStatus(name: string, info: ProvidersInfo, ansi: boolean):
 }

 export function createStatusCommand(deps?: Partial<StatusCommandDeps>): Command {
-  const { configDeps, credentialsDeps, log, write, checkHealth, checkLlm, fetchModels, fetchProviders, isTTY } = { ...defaultDeps, ...deps };
+  const { configDeps, credentialsDeps, log, write, checkHealth, checkLlm, fetchModels, fetchProviders, fetchServerLlms, probeServerLlm, isTTY } = { ...defaultDeps, ...deps };

  return new Command('status')
    .description('Show mcpctl status and connectivity')
@@ -242,13 +410,25 @@ export function createStatusCommand(deps?: Partial<StatusCommandDeps>): Command

      if (opts.output !== 'table') {
        // JSON/YAML: run everything in parallel, wait, output at once
-        const [mcplocalReachable, mcpdReachable, llmStatus, providersInfo] = await Promise.all([
+        const token = creds?.token ?? null;
+        const [mcplocalReachable, mcpdReachable, llmStatus, providersInfo, serverLlms] = await Promise.all([
          checkHealth(config.mcplocalUrl),
          checkHealth(config.mcpdUrl),
          llmLabel ? checkLlm(config.mcplocalUrl) : Promise.resolve(null),
          multiProvider ? fetchProviders(config.mcplocalUrl) : Promise.resolve(null),
+          fetchServerLlms(config.mcpdUrl, token),
        ]);

+        // Probe each server LLM in parallel — adds 0-2 sec to JSON mode but
+        // gives consumers (scripts, dashboards) the same liveness signal as
+        // the table view.
+        const serverLlmsWithHealth = serverLlms !== null
+          ? await Promise.all(serverLlms.map(async (l) => ({
+              ...l,
+              health: await probeServerLlm(config.mcpdUrl, l.name, token),
+            })))
+          : null;
+
        const llm = llmLabel
          ? llmStatus === 'ok' ? llmLabel : `${llmLabel} (${llmStatus})`
          : null;
@@ -265,6 +445,7 @@ export function createStatusCommand(deps?: Partial<StatusCommandDeps>): Command
          llm,
          llmStatus,
          ...(providersInfo ? { providers: providersInfo } : {}),
+          ...(serverLlmsWithHealth !== null ? { serverLlms: serverLlmsWithHealth } : {}),
        };

        log(opts.output === 'json' ? formatJson(status) : formatYaml(status));
@@ -286,8 +467,15 @@ export function createStatusCommand(deps?: Partial<StatusCommandDeps>): Command
      log(`Registries: ${config.registries.join(', ')}`);
      log(`Output:     ${config.outputFormat}`);

+      // Server LLMs (mcpd-managed) — fetched in parallel regardless of the
+      // local-LLM config, so the section renders even on machines without
+      // a configured client-side provider.
+      const token = creds?.token ?? null;
+      const serverLlmsPromise = fetchServerLlms(config.mcpdUrl, token);
+
      if (!llmLabel) {
        log(`LLM:        not configured (run 'mcpctl config setup')`);
+        await renderServerLlmsSection(serverLlmsPromise, config.mcpdUrl, token, isTTY);
        return;
      }

@@ -350,5 +538,85 @@ export function createStatusCommand(deps?: Partial<StatusCommandDeps>): Command
          log(`${DIM}            Available: ${models.join(', ')}${RESET}`);
        }
      }
+
+      await renderServerLlmsSection(serverLlmsPromise, config.mcpdUrl, token, isTTY);
    });
+
+  /**
+   * Print a "Server LLMs:" section listing mcpd-managed Llm rows by tier
+   * with a per-LLM "say hi" liveness probe. Distinct from the mcplocal-side
+   * providers shown by the existing "LLM:" lines above. The caller awaits a
+   * pre-launched promise so this doesn't add fetch round-trips, but the
+   * probe itself runs here (after the user has the rest of the screen).
+   */
+  async function renderServerLlmsSection(
+    serverLlmsPromise: Promise<ServerLlm[] | null>,
+    mcpdUrl: string,
+    token: string | null,
+    ansi: boolean,
+  ): Promise<void> {
+    const llms = await serverLlmsPromise;
+    if (llms === null) {
+      // Auth failure / unreachable mcpd: fold into the existing mcpd-status
+      // signal we already printed; nothing more to say here.
+      return;
+    }
+    if (llms.length === 0) {
+      log(`Server LLMs: none registered  ${ansi ? DIM : ''}(use 'mcpctl create llm')${ansi ? RESET : ''}`);
+      return;
+    }
+
+    log(`Server LLMs: ${String(llms.length)} registered ${ansi ? DIM : ''}(probing live "say hi"...)${ansi ? RESET : ''}`);
+
+    // Run all probes in parallel — one slow LLM doesn't block the others.
+    const healthByName = new Map<string, ServerLlmHealth>();
+    await Promise.all(llms.map(async (l) => {
+      healthByName.set(l.name, await probeServerLlm(mcpdUrl, l.name, token));
+    }));
+
+    const byTier = new Map<string, ServerLlm[]>();
+    for (const l of llms) {
+      const arr = byTier.get(l.tier) ?? [];
+      arr.push(l);
+      byTier.set(l.tier, arr);
+    }
+
+    // Print tiers in a stable order — fast/heavy first, then anything else.
+    const tierOrder = ['fast', 'heavy', ...[...byTier.keys()].filter((t) => t !== 'fast' && t !== 'heavy').sort()];
+    for (const tier of tierOrder) {
+      const rows = byTier.get(tier);
+      if (rows === undefined || rows.length === 0) continue;
+      const formatted = rows.map((r) => formatServerLlmLine(r, healthByName.get(r.name), ansi));
+      log(`  ${tier.padEnd(6)} ${formatted.join('\n          ')}`);
+    }
+  }
+}
+
+/**
+ * Format a single server-LLM row plus its health-probe outcome on one line.
+ * Exported via module scope (not closure) so it stays cheap to test in
+ * isolation; takes \`ansi\` rather than reading a TTY at call time.
+ */
+function formatServerLlmLine(r: ServerLlm, h: ServerLlmHealth | undefined, ansi: boolean): string {
+  const upstream = r.url !== '' ? r.url : 'provider default';
+  const auth = r.apiKeyRef ? `key:${r.apiKeyRef.name}/${r.apiKeyRef.key}` : 'no key';
+  let healthStr: string;
+  if (h === undefined) {
+    healthStr = ansi ? `${DIM}? probe skipped${RESET}` : '? probe skipped';
+  } else if (h.ok) {
+    const reply = h.say !== undefined ? `"${h.say}"` : 'ok';
+    const ms = `${String(h.ms)}ms`;
+    healthStr = ansi ? `${GREEN}✓ ${reply} ${DIM}${ms}${RESET}` : `✓ ${reply} ${ms}`;
+  } else {
+    const err = h.error ?? 'failed';
+    healthStr = ansi ? `${RED}✗ ${err}${RESET}` : `✗ ${err}`;
+  }
+  const meta = `${r.type} → ${r.model}`;
+  // Two-line layout: name + health on top, dim metadata indented below.
+  // Keeps the at-a-glance signal (✓/✗) close to the LLM name.
+  const head = `${r.name}  ${healthStr}`;
+  const tail = ansi
+    ? `${DIM}${meta}  ${upstream}  ${auth}${RESET}`
+    : `${meta}  ${upstream}  ${auth}`;
+  return `${head}\n            ${tail}`;
 }
--- a/src/cli/tests/commands/status.test.ts
+++ b/src/cli/tests/commands/status.test.ts
@@ -27,6 +27,8 @@ function baseDeps(overrides?: Partial<StatusCommandDeps>): Partial<StatusCommand
    write,
    checkHealth: async () => true,
    fetchProviders: async () => null,
+    fetchServerLlms: async () => null,
+    probeServerLlm: async () => ({ ok: true, ms: 12, say: 'hi' }),
    isTTY: false,
    ...overrides,
  };
@@ -199,4 +201,95 @@ describe('status command', () => {
    expect(parsed['llm']).toBeNull();
    expect(parsed['llmStatus']).toBeNull();
  });
+
+  // ── Server LLMs (mcpd-managed Llm rows) ──
+
+  it('renders a "Server LLMs:" section grouped by tier in table mode', async () => {
+    saveCredentials({ token: 't', mcpdUrl: 'http://mcpd', user: 'u' }, { configDir: tempDir });
+    const cmd = createStatusCommand(baseDeps({
+      fetchServerLlms: async () => [
+        { id: 'l1', name: 'qwen3-thinking', type: 'openai', model: 'qwen3-thinking', tier: 'fast', url: 'http://x:4000/v1', apiKeyRef: { name: 'litellm', key: 'API_KEY' } },
+        { id: 'l2', name: 'sonnet', type: 'anthropic', model: 'claude-sonnet-4-5', tier: 'heavy', url: '', apiKeyRef: null },
+      ],
+      probeServerLlm: async () => ({ ok: true, ms: 42, say: 'hi' }),
+    }));
+    await cmd.parseAsync([], { from: 'user' });
+    const out = output.join('\n');
+    expect(out).toContain('Server LLMs: 2 registered');
+    expect(out).toContain('qwen3-thinking');
+    expect(out).toContain('openai → qwen3-thinking');
+    expect(out).toContain('sonnet');
+    expect(out).toContain('anthropic → claude-sonnet-4-5');
+    expect(out).toMatch(/fast\s+qwen3-thinking/);
+    expect(out).toMatch(/heavy\s+sonnet/);
+    // Health probe result rendered for each LLM
+    expect(out).toContain('✓ "hi" 42ms');
+  });
+
+  it('renders a failed "say hi" probe with the error message', async () => {
+    const cmd = createStatusCommand(baseDeps({
+      fetchServerLlms: async () => [
+        { id: 'l1', name: 'broken', type: 'openai', model: 'gpt-4o', tier: 'fast', url: 'http://x', apiKeyRef: null },
+      ],
+      probeServerLlm: async () => ({ ok: false, ms: 5000, error: 'upstream auth failed: 401' }),
+    }));
+    await cmd.parseAsync([], { from: 'user' });
+    const out = output.join('\n');
+    expect(out).toContain('Server LLMs: 1 registered');
+    expect(out).toContain('broken');
+    expect(out).toContain('✗ upstream auth failed: 401');
+  });
+
+  it('renders "none registered" when mcpd has no Llm rows', async () => {
+    const cmd = createStatusCommand(baseDeps({ fetchServerLlms: async () => [] }));
+    await cmd.parseAsync([], { from: 'user' });
+    const out = output.join('\n');
+    expect(out).toContain('Server LLMs: none registered');
+    expect(out).toContain("'mcpctl create llm'");
+  });
+
+  it('omits the section silently when mcpd is unreachable (fetcher returns null)', async () => {
+    const cmd = createStatusCommand(baseDeps({ fetchServerLlms: async () => null }));
+    await cmd.parseAsync([], { from: 'user' });
+    const out = output.join('\n');
+    expect(out).not.toContain('Server LLMs');
+  });
+
+  it('passes the bearer token from saved credentials to the fetcher', async () => {
+    saveCredentials({ token: 'tok-abc', mcpdUrl: 'http://mcpd', user: 'u' }, { configDir: tempDir });
+    let capturedToken: string | null = '<unseen>';
+    const cmd = createStatusCommand(baseDeps({
+      fetchServerLlms: async (_url, token) => { capturedToken = token; return []; },
+    }));
+    await cmd.parseAsync([], { from: 'user' });
+    expect(capturedToken).toBe('tok-abc');
+  });
+
+  it('passes null token when there are no saved credentials', async () => {
+    let capturedToken: string | null = '<unseen>';
+    const cmd = createStatusCommand(baseDeps({
+      fetchServerLlms: async (_url, token) => { capturedToken = token; return []; },
+    }));
+    await cmd.parseAsync([], { from: 'user' });
+    expect(capturedToken).toBeNull();
+  });
+
+  it('includes serverLlms with probed health in JSON output', async () => {
+    const llms = [
+      { id: 'l1', name: 'qwen3-thinking', type: 'openai', model: 'qwen3-thinking', tier: 'fast', url: 'http://x', apiKeyRef: null },
+    ];
+    const cmd = createStatusCommand(baseDeps({
+      fetchServerLlms: async () => llms,
+      probeServerLlm: async () => ({ ok: true, ms: 99, say: 'hi' }),
+    }));
+    await cmd.parseAsync(['-o', 'json'], { from: 'user' });
+    const parsed = JSON.parse(output[0]) as {
+      serverLlms?: Array<typeof llms[number] & { health: { ok: boolean; ms: number; say?: string } }>;
+    };
+    expect(parsed.serverLlms).toHaveLength(1);
+    expect(parsed.serverLlms![0]).toMatchObject({
+      name: 'qwen3-thinking',
+      health: { ok: true, ms: 99, say: 'hi' },
+    });
+  });
 });
--- a/src/mcpd/src/routes/web-ui.ts
+++ b/src/mcpd/src/routes/web-ui.ts
@@ -13,7 +13,7 @@
 * index.html so client-side react-router routes work on direct hits.
 */
 import path from 'node:path';
-import { existsSync, statSync } from 'node:fs';
+import { existsSync, statSync, readFileSync } from 'node:fs';
 import { fileURLToPath } from 'node:url';
 import type { FastifyInstance } from 'fastify';
 import fastifyStatic from '@fastify/static';
@@ -57,13 +57,27 @@ export async function registerWebUi(app: FastifyInstance): Promise<void> {
    root,
    prefix: '/ui/',
    wildcard: false,
-    decorateReply: false,
  });

-  // SPA fallback — react-router URLs like /ui/agents/foo/personalities/bar
-  // need index.html to bootstrap the app.
+  // Read index.html once at startup; the SPA fallback below serves it
+  // verbatim for every unmatched /ui/* path so client-side routing works
+  // on direct hits. Reading once also dodges a per-request `sendFile`
+  // call — there's only one file ever served from this handler.
+  const indexHtmlPath = path.join(root, 'index.html');
+  const indexHtml = existsSync(indexHtmlPath)
+    ? readFileSync(indexHtmlPath, 'utf-8')
+    : null;
+  if (indexHtml === null) {
+    app.log.warn({ root }, 'web UI index.html missing; deep links to /ui/<path> will 404');
+  }
+
  app.get('/ui/*', (_request, reply) => {
-    return reply.sendFile('index.html', root);
+    if (indexHtml === null) {
+      reply.code(404);
+      return { error: 'index.html missing from web UI bundle' };
+    }
+    reply.type('text/html').send(indexHtml);
+    return reply;
  });
  // Cover the bare /ui (no trailing slash) too.
  app.get('/ui', (_request, reply) => {
Author	SHA1	Message	Date
Michal	a84214dad1	fix(cli): status probe accepts reasoning_content for thinking models Some checks failed CI/CD / typecheck (pull_request) Successful in 56s Details CI/CD / lint (pull_request) Successful in 3m6s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / build (pull_request) Successful in 2m39s Details CI/CD / smoke (pull_request) Failing after 3m58s Details CI/CD / publish (pull_request) Has been skipped Details Live deploy showed qwen3-thinking failing the probe with "empty content": at max_tokens=8 the model spent its entire budget on the reasoning trace and never emitted a final \`content\` block. Fix: - Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap models but gives reasoning models enough headroom. - If \`message.content\` is empty but \`reasoning_content\` is non-empty, count it as alive and prefix the preview with "[thinking]" so the user knows the model didn't actually answer "hi" but is responsive. - Replace the prompt with the terser "Reply with just: hi" — closer to what a thinking model can short-circuit on. Tests: existing 25 pass; the failure-path test still asserts on the "empty content" path because reasoning_content is empty there too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:09:42 +01:00
michal	54e56f7b71	feat(cli): live "say hi" probe for server LLMs in mcpctl status (#61 ) Some checks failed CI/CD / lint (push) Successful in 57s Details CI/CD / typecheck (push) Successful in 57s Details CI/CD / test (push) Has been cancelled Details CI/CD / smoke (push) Has been cancelled Details CI/CD / build (push) Has been cancelled Details CI/CD / publish (push) Has been cancelled Details	2026-04-27 11:02:26 +00:00
Michal	e4af16477c	feat(cli): live "say hi" probe for server LLMs in mcpctl status Some checks failed CI/CD / lint (pull_request) Successful in 55s Details CI/CD / test (pull_request) Successful in 1m13s Details CI/CD / typecheck (pull_request) Successful in 3m10s Details CI/CD / smoke (pull_request) Failing after 1m46s Details CI/CD / build (pull_request) Successful in 3m24s Details CI/CD / publish (pull_request) Has been skipped Details Status was showing the server-side LLM list but not whether each one actually serves inference. This adds a per-LLM probe that POSTs a tiny prompt to /api/v1/llms/<name>/infer: messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }] max_tokens: 8, temperature: 0 Each registered LLM gets a one-line health line: Server LLMs: 2 registered (probing live "say hi"...) fast qwen3-thinking ✓ "hi" 312ms openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY heavy sonnet ✗ upstream auth failed: 401 anthropic → claude-sonnet-4-5 provider default no key Probes run in parallel so a single slow LLM doesn't gate the others; each has its own 15-second timeout. JSON/YAML output gains a \`health: { ok, ms, say?, error? }\` field per server LLM so dashboards get the same liveness signal. Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace suite: 2006/2006 across 149 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:02:00 +01:00
michal	de96af7bf6	feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 (#60 ) Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m9s Details CI/CD / typecheck (push) Failing after 7m9s Details CI/CD / smoke (push) Has been skipped Details CI/CD / build (push) Has been skipped Details CI/CD / publish (push) Has been skipped Details	2026-04-27 10:28:10 +00:00
Michal	0db37e92a4	feat(cli)+fix(mcpd): server-side LLM status + SPA fallback 500 Some checks failed CI/CD / typecheck (pull_request) Successful in 58s Details CI/CD / test (pull_request) Successful in 1m9s Details CI/CD / lint (pull_request) Successful in 2m14s Details CI/CD / smoke (pull_request) Failing after 1m39s Details CI/CD / build (pull_request) Successful in 2m14s Details CI/CD / publish (pull_request) Has been skipped Details Two related fixes: 1. \`mcpctl status\` now lists mcpd-managed Llm rows (the ones created via \`mcpctl create llm\`) under a new "Server LLMs:" section, grouped by tier with type, model, upstream URL, and key reference. JSON/YAML output gains a \`serverLlms\` array. Bearer token (from \`mcpctl auth login\` / saved credentials) is passed through; if mcpd is unreachable or returns non-200 the section is silently omitted (the existing mcpd connectivity line already conveys that). 6 new tests cover happy path, empty list, token plumbing, and JSON shape. 2. SPA fallback at \`/ui/<deeplink>\` was returning 500 because we registered \`@fastify/static\` with \`decorateReply: false\` and then called \`reply.sendFile\`. Read index.html once at startup and serve it with \`reply.send(html)\` instead — also dodges a per-request stat call. Drop \`decorateReply: false\` so future code can use reply.sendFile if it ever needs to. Full suite: 2005/2005 across 149 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 11:27:45 +01:00
michal	899f2c750c	fix(test): vitest 4 projects + src/web jsdom env (#59 ) Some checks failed CI/CD / lint (push) Successful in 55s Details CI/CD / test (push) Successful in 1m10s Details CI/CD / typecheck (push) Successful in 2m37s Details CI/CD / smoke (push) Failing after 1m41s Details CI/CD / build (push) Successful in 2m38s Details CI/CD / publish (push) Has been skipped Details	2026-04-26 20:31:47 +00:00