fix(cli): status probe accepts reasoning_content for thinking models #62

michal · 2026-04-27T11:10:00Z

michal commented

2026-04-27 11:10:00 +00:00

Summary

Live deploy of #61 showed qwen3-thinking failing the new probe with `✗ empty content` because it spends its 8-token budget on the reasoning trace before emitting any `content`. Fix: bump max_tokens, recognize `reasoning_content` as a liveness signal, and tighten the prompt.

max_tokens 8 → 64 (still ~1-2s latency on cheap models, but reasoning models get breathing room).
Empty `content` + non-empty `reasoning_content` → ok, with a `[thinking] ` tag so the user sees the model is responsive but didn't follow instructions verbatim.
Prompt: `"Say exactly the word 'hi' and nothing else."` → `"Reply with just: hi"` — closer to what a thinking model can short-circuit on.

Test plan

CLI status: 25/25 (the failure-path test still asserts on "empty content" because reasoning_content is empty too).
Manual: `mcpctl status` against the live cluster shows ✓ for qwen3-thinking instead of ✗ empty content.

🤖 Generated with Claude Code

## Summary Live deploy of #61 showed qwen3-thinking failing the new probe with **\`✗ empty content\`** because it spends its 8-token budget on the reasoning trace before emitting any \`content\`. Fix: bump max_tokens, recognize \`reasoning_content\` as a liveness signal, and tighten the prompt. - max_tokens 8 → 64 (still ~1-2s latency on cheap models, but reasoning models get breathing room). - Empty \`content\` + non-empty \`reasoning_content\` → ok, with a \`[thinking] <preview>\` tag so the user sees the model is responsive but didn't follow instructions verbatim. - Prompt: \`"Say exactly the word 'hi' and nothing else."\` → \`"Reply with just: hi"\` — closer to what a thinking model can short-circuit on. ## Test plan - [x] CLI status: 25/25 (the failure-path test still asserts on "empty content" because reasoning_content is empty too). - [ ] Manual: \`mcpctl status\` against the live cluster shows ✓ for qwen3-thinking instead of ✗ empty content. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

michal added 1 commit 2026-04-27 11:10:03 +00:00

fix(cli): status probe accepts reasoning_content for thinking models

CI/CD / typecheck (pull_request) Successful in 56s

Details

CI/CD / lint (pull_request) Successful in 3m6s

Details

CI/CD / test (pull_request) Successful in 1m9s

Details

CI/CD / build (pull_request) Successful in 2m39s

Details

CI/CD / smoke (pull_request) Failing after 3m58s

Details

CI/CD / publish (pull_request) Has been skipped

Details

a84214dad1

Live deploy showed qwen3-thinking failing the probe with "empty
content": at max_tokens=8 the model spent its entire budget on the
reasoning trace and never emitted a final \`content\` block.

Fix:
- Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap
  models but gives reasoning models enough headroom.
- If \`message.content\` is empty but \`reasoning_content\` is non-empty,
  count it as alive and prefix the preview with "[thinking]" so the
  user knows the model didn't actually answer "hi" but is responsive.
- Replace the prompt with the terser "Reply with just: hi" — closer
  to what a thinking model can short-circuit on.

Tests: existing 25 pass; the failure-path test still asserts on the
"empty content" path because reasoning_content is empty there too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michal merged commit e65a396d3e into main

2026-04-27 11:10:18 +00:00

michal referenced this issue from a commit

2026-04-27 11:10:20 +00:00

fix(cli): status probe accepts reasoning_content for thinking models (#62)

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: michal/mcpctl#62