Status was showing the server-side LLM list but not whether each one
actually serves inference. This adds a per-LLM probe that POSTs a
tiny prompt to /api/v1/llms/<name>/infer:
messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }]
max_tokens: 8, temperature: 0
Each registered LLM gets a one-line health line:
Server LLMs: 2 registered (probing live "say hi"...)
fast qwen3-thinking ✓ "hi" 312ms
openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY
heavy sonnet ✗ upstream auth failed: 401
anthropic → claude-sonnet-4-5 provider default no key
Probes run in parallel so a single slow LLM doesn't gate the others;
each has its own 15-second timeout. JSON/YAML output gains a
\`health: { ok, ms, say?, error? }\` field per server LLM so dashboards
get the same liveness signal.
Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace
suite: 2006/2006 across 149 files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>