Live deploy showed qwen3-thinking failing the probe with "empty
content": at max_tokens=8 the model spent its entire budget on the
reasoning trace and never emitted a final \`content\` block.
Fix:
- Bump max_tokens to 64. Still caps latency at ~1-2 sec on cheap
models but gives reasoning models enough headroom.
- If \`message.content\` is empty but \`reasoning_content\` is non-empty,
count it as alive and prefix the preview with "[thinking]" so the
user knows the model didn't actually answer "hi" but is responsive.
- Replace the prompt with the terser "Reply with just: hi" — closer
to what a thinking model can short-circuit on.
Tests: existing 25 pass; the failure-path test still asserts on the
"empty content" path because reasoning_content is empty there too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>