fix(cli): status probe accepts reasoning_content for thinking models #62

Merged
michal merged 1 commits from fix/status-probe-reasoning-content into main 2026-04-27 11:10:18 +00:00
Owner

Summary

Live deploy of #61 showed qwen3-thinking failing the new probe with `✗ empty content` because it spends its 8-token budget on the reasoning trace before emitting any `content`. Fix: bump max_tokens, recognize `reasoning_content` as a liveness signal, and tighten the prompt.

  • max_tokens 8 → 64 (still ~1-2s latency on cheap models, but reasoning models get breathing room).
  • Empty `content` + non-empty `reasoning_content` → ok, with a `[thinking] ` tag so the user sees the model is responsive but didn't follow instructions verbatim.
  • Prompt: `"Say exactly the word 'hi' and nothing else."` → `"Reply with just: hi"` — closer to what a thinking model can short-circuit on.

Test plan

  • CLI status: 25/25 (the failure-path test still asserts on "empty content" because reasoning_content is empty too).
  • Manual: `mcpctl status` against the live cluster shows ✓ for qwen3-thinking instead of ✗ empty content.

🤖 Generated with Claude Code

## Summary Live deploy of #61 showed qwen3-thinking failing the new probe with **\`✗ empty content\`** because it spends its 8-token budget on the reasoning trace before emitting any \`content\`. Fix: bump max_tokens, recognize \`reasoning_content\` as a liveness signal, and tighten the prompt. - max_tokens 8 → 64 (still ~1-2s latency on cheap models, but reasoning models get breathing room). - Empty \`content\` + non-empty \`reasoning_content\` → ok, with a \`[thinking] <preview>\` tag so the user sees the model is responsive but didn't follow instructions verbatim. - Prompt: \`"Say exactly the word 'hi' and nothing else."\` → \`"Reply with just: hi"\` — closer to what a thinking model can short-circuit on. ## Test plan - [x] CLI status: 25/25 (the failure-path test still asserts on "empty content" because reasoning_content is empty too). - [ ] Manual: \`mcpctl status\` against the live cluster shows ✓ for qwen3-thinking instead of ✗ empty content. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
michal added 1 commit 2026-04-27 11:10:03 +00:00
fix(cli): status probe accepts reasoning_content for thinking models
Some checks failed
CI/CD / typecheck (pull_request) Successful in 56s
CI/CD / lint (pull_request) Successful in 3m6s
CI/CD / test (pull_request) Successful in 1m9s
CI/CD / build (pull_request) Successful in 2m39s
CI/CD / smoke (pull_request) Failing after 3m58s
CI/CD / publish (pull_request) Has been skipped
a84214dad1
Live deploy showed qwen3-thinking failing the probe with "empty
content": at max_tokens=8 the model spent its entire budget on the
reasoning trace and never emitted a final \`content\` block.

Fix:
- Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap
  models but gives reasoning models enough headroom.
- If \`message.content\` is empty but \`reasoning_content\` is non-empty,
  count it as alive and prefix the preview with "[thinking]" so the
  user knows the model didn't actually answer "hi" but is responsive.
- Replace the prompt with the terser "Reply with just: hi" — closer
  to what a thinking model can short-circuit on.

Tests: existing 25 pass; the failure-path test still asserts on the
"empty content" path because reasoning_content is empty there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
michal merged commit e65a396d3e into main 2026-04-27 11:10:18 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: michal/mcpctl#62