feat(cli): live "say hi" probe for server LLMs in mcpctl status #61
Reference in New Issue
Block a user
Delete Branch "feat/status-llm-say-hi"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
`mcpctl status` was listing server-side LLMs but not telling you whether they actually serve inference. This adds a per-LLM "say hi" probe — POST a 8-token prompt to `/api/v1/llms//infer` and render the result inline.
```
Server LLMs: 2 registered (probing live "say hi"...)
fast qwen3-thinking ✓ "hi" 312ms
openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY
heavy sonnet ✗ upstream auth failed: 401
anthropic → claude-sonnet-4-5 provider default no key
```
Probes run in parallel (one slow LLM doesn't block the others) with a 15s per-probe timeout. JSON/YAML output gains a `health: { ok, ms, say?, error? }` field so dashboards get the same signal.
Test plan
🤖 Generated with Claude Code
Status was showing the server-side LLM list but not whether each one actually serves inference. This adds a per-LLM probe that POSTs a tiny prompt to /api/v1/llms/<name>/infer: messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }] max_tokens: 8, temperature: 0 Each registered LLM gets a one-line health line: Server LLMs: 2 registered (probing live "say hi"...) fast qwen3-thinking ✓ "hi" 312ms openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY heavy sonnet ✗ upstream auth failed: 401 anthropic → claude-sonnet-4-5 provider default no key Probes run in parallel so a single slow LLM doesn't gate the others; each has its own 15-second timeout. JSON/YAML output gains a \`health: { ok, ms, say?, error? }\` field per server LLM so dashboards get the same liveness signal. Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace suite: 2006/2006 across 149 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>