feat(cli): live "say hi" probe for server LLMs in mcpctl status #61

Merged
michal merged 1 commits from feat/status-llm-say-hi into main 2026-04-27 11:02:28 +00:00
Owner

Summary

`mcpctl status` was listing server-side LLMs but not telling you whether they actually serve inference. This adds a per-LLM "say hi" probe — POST a 8-token prompt to `/api/v1/llms//infer` and render the result inline.

```
Server LLMs: 2 registered (probing live "say hi"...)
fast qwen3-thinking ✓ "hi" 312ms
openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY
heavy sonnet ✗ upstream auth failed: 401
anthropic → claude-sonnet-4-5 provider default no key
```

Probes run in parallel (one slow LLM doesn't block the others) with a 15s per-probe timeout. JSON/YAML output gains a `health: { ok, ms, say?, error? }` field so dashboards get the same signal.

Test plan

  • CLI status: 25/25 (was 24, +1 for the failure-path render)
  • Workspace: 2006/2006 across 149 files
  • Typecheck clean
  • Manual: `mcpctl status` against the live cluster shows ✓ "hi" + ms for qwen3-thinking.

🤖 Generated with Claude Code

## Summary \`mcpctl status\` was listing server-side LLMs but not telling you whether they actually serve inference. This adds a per-LLM "say hi" probe — POST a 8-token prompt to \`/api/v1/llms/<name>/infer\` and render the result inline. \`\`\` Server LLMs: 2 registered (probing live "say hi"...) fast qwen3-thinking ✓ "hi" 312ms openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY heavy sonnet ✗ upstream auth failed: 401 anthropic → claude-sonnet-4-5 provider default no key \`\`\` Probes run in parallel (one slow LLM doesn't block the others) with a 15s per-probe timeout. JSON/YAML output gains a \`health: { ok, ms, say?, error? }\` field so dashboards get the same signal. ## Test plan - [x] CLI status: 25/25 (was 24, +1 for the failure-path render) - [x] Workspace: 2006/2006 across 149 files - [x] Typecheck clean - [ ] Manual: \`mcpctl status\` against the live cluster shows ✓ "hi" + ms for qwen3-thinking. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
michal added 1 commit 2026-04-27 11:02:14 +00:00
feat(cli): live "say hi" probe for server LLMs in mcpctl status
Some checks failed
CI/CD / lint (pull_request) Successful in 55s
CI/CD / test (pull_request) Successful in 1m13s
CI/CD / typecheck (pull_request) Successful in 3m10s
CI/CD / smoke (pull_request) Failing after 1m46s
CI/CD / build (pull_request) Successful in 3m24s
CI/CD / publish (pull_request) Has been skipped
e4af16477c
Status was showing the server-side LLM list but not whether each one
actually serves inference. This adds a per-LLM probe that POSTs a
tiny prompt to /api/v1/llms/<name>/infer:

  messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }]
  max_tokens: 8, temperature: 0

Each registered LLM gets a one-line health line:

  Server LLMs: 2 registered (probing live "say hi"...)
    fast   qwen3-thinking  ✓ "hi" 312ms
              openai → qwen3-thinking  http://litellm.../v1  key:litellm/API_KEY
    heavy  sonnet  ✗ upstream auth failed: 401
              anthropic → claude-sonnet-4-5  provider default  no key

Probes run in parallel so a single slow LLM doesn't gate the others;
each has its own 15-second timeout. JSON/YAML output gains a
\`health: { ok, ms, say?, error? }\` field per server LLM so dashboards
get the same liveness signal.

Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace
suite: 2006/2006 across 149 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
michal merged commit 54e56f7b71 into main 2026-04-27 11:02:28 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: michal/mcpctl#61