fix(chat): real fixes for thinking-model + URL conventions, not test tweaks
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / typecheck (pull_request) Successful in 2m37s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Successful in 5m42s
CI/CD / publish (pull_request) Has been skipped

Five real bugs surfaced by the agent-chat smoke against live
qwen3-thinking. None of these are fixed by changing the test — the
test was right to fail.

1. openai-passthrough adapter doubled `/v1` in the request URL. The
   adapter hard-codes `/v1/chat/completions` after the configured base,
   but every OpenAI-compat provider documents its base URL with a
   trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users
   pasting that conventional shape produced
   `https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a
   trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style)
   is preserved.

2. Non-streaming chat returned an empty assistant when thinking models
   (qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only
   `reasoning_content` with `content: null`. extractChoice now also
   pulls reasoning (every spelling the streaming parser already knows
   about), and a new pickAssistantText helper falls back to it when
   content is empty. A `[response truncated by max_tokens]` marker is
   appended when finish_reason is `length`, so users see the cut-off
   instead of guessing why the answer is short. Symmetric streaming
   fix: the chatStream loop accumulates reasoning and yields ONE
   synthesized `text` frame at the end when content stayed empty,
   keeping the CLI's stdout (which only prints `text` deltas) in sync
   with the persisted thread message.

3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3
   lifecycle field) instead of `kind: agent` (apply envelope), so
   round-tripping through `apply -f` failed. Same fix shape as the v1
   Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/
   inactiveSince/providerSessionId for the agents resource too.

4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on
   stderr; streaming printed `(thread: <cuid>)` (with space). Tests
   and any other regex watching for one form missed the other.
   Standardize on `thread: <cuid>` (single space) in both paths.

5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on
   success — making any `expect(stderr).toMatch(...)` assertion
   structurally impossible to satisfy in the happy path. Switch to
   `spawnSync` so stderr is actually captured. Includes a small
   shell-style argv splitter so the existing call sites with quoted
   multi-word values (`--system-prompt "..."`) keep working.

Tests: +6 new mcpd unit tests (4 chat-service for the reasoning
fallback / truncation marker / content-preference / streaming synth;
2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd
+ mcplocal + smoke green: 860/860 + 723/723 + 139/139.
This commit is contained in:
Michal
2026-04-27 18:39:01 +01:00
parent 58bc277242
commit 610808b9e7
7 changed files with 293 additions and 29 deletions

View File

@@ -461,6 +461,121 @@ describe('ChatService', () => {
expect(assistantTurn?.content).not.toContain('Let me think');
});
// Regression: thinking models with a tight max_tokens budget produce
// `reasoning_content` only and leave `content` null. Without falling back
// to reasoning, the assistant turn was empty and the smoke test saw an
// empty stdout. This covers BOTH chat() (non-streaming) and chatStream()
// (synthetic final text frame so the CLI's stdout matches what's
// persisted to the thread).
it('chat falls back to reasoning_content when content is null', async () => {
const chatRepo = mockChatRepo();
const adapter: LlmAdapter = {
kind: 'thinking-truncated',
infer: vi.fn(async () => ({
status: 200,
body: {
id: 'cmpl-1',
object: 'chat.completion',
choices: [{
index: 0,
message: { role: 'assistant', content: null, reasoning_content: 'Thinking out loud about the answer' },
finish_reason: 'stop',
}],
},
})),
stream: async function*() { yield { data: '[DONE]', done: true }; },
};
const svc = new ChatService(
mockAgents(), mockLlms(), adapterRegistry(adapter),
chatRepo, mockPromptRepo(), mockTools(),
);
const result = await svc.chat({ agentName: 'reviewer', userMessage: 'hi', ownerId: 'owner-1' });
expect(result.assistant).toBe('Thinking out loud about the answer');
const stored = chatRepo._msgs.find((m) => m.role === 'assistant');
expect(stored?.content).toBe('Thinking out loud about the answer');
});
it('chat appends [response truncated by max_tokens] when finish_reason is "length"', async () => {
const chatRepo = mockChatRepo();
const adapter: LlmAdapter = {
kind: 'thinking-clipped',
infer: vi.fn(async () => ({
status: 200,
body: {
choices: [{
index: 0,
message: { role: 'assistant', content: null, reasoning_content: 'partial reasoning that ran out of' },
finish_reason: 'length',
}],
},
})),
stream: async function*() { yield { data: '[DONE]', done: true }; },
};
const svc = new ChatService(
mockAgents(), mockLlms(), adapterRegistry(adapter),
chatRepo, mockPromptRepo(), mockTools(),
);
const result = await svc.chat({ agentName: 'reviewer', userMessage: 'hi', ownerId: 'owner-1' });
expect(result.assistant).toContain('partial reasoning that ran out of');
expect(result.assistant).toContain('[response truncated by max_tokens]');
});
it('chat prefers content when both content and reasoning_content are present', async () => {
// Thinking models that DO produce content shouldn't see the reasoning
// bleed into the response — that's what the streaming path's
// text/thinking split is for, and the non-streaming path should match.
const chatRepo = mockChatRepo();
const adapter: LlmAdapter = {
kind: 'thinking-with-content',
infer: vi.fn(async () => ({
status: 200,
body: {
choices: [{
index: 0,
message: { role: 'assistant', content: 'real answer', reasoning_content: 'background thinking' },
finish_reason: 'stop',
}],
},
})),
stream: async function*() { yield { data: '[DONE]', done: true }; },
};
const svc = new ChatService(
mockAgents(), mockLlms(), adapterRegistry(adapter),
chatRepo, mockPromptRepo(), mockTools(),
);
const result = await svc.chat({ agentName: 'reviewer', userMessage: 'hi', ownerId: 'owner-1' });
expect(result.assistant).toBe('real answer');
expect(result.assistant).not.toContain('background thinking');
});
it('chatStream emits a synthetic text frame and persists reasoning when content is empty', async () => {
const chatRepo = mockChatRepo();
const adapter: LlmAdapter = {
kind: 'thinking-only-stream',
infer: vi.fn(),
stream: async function*() {
yield { data: JSON.stringify({ choices: [{ delta: { reasoning_content: 'thinking ' }, finish_reason: null }] }) };
yield { data: JSON.stringify({ choices: [{ delta: { reasoning_content: 'more.' }, finish_reason: 'stop' }] }) };
yield { data: '[DONE]', done: true };
},
};
const svc = new ChatService(
mockAgents(), mockLlms(), adapterRegistry(adapter),
chatRepo, mockPromptRepo(), mockTools(),
);
const chunks: Array<{ type: string; delta?: string }> = [];
for await (const c of svc.chatStream({ agentName: 'reviewer', userMessage: 'hi', ownerId: 'owner-1' })) {
chunks.push({ type: c.type, delta: c.delta });
}
// 2 thinking deltas (live), 1 synthesized text frame, 1 final.
expect(chunks.filter((c) => c.type === 'thinking').map((c) => c.delta)).toEqual(['thinking ', 'more.']);
expect(chunks.filter((c) => c.type === 'text').map((c) => c.delta)).toEqual(['thinking more.']);
// The thread message captures the synthesized text so resumed chats see
// a coherent assistant turn (rather than blank).
const stored = chatRepo._msgs.find((m) => m.role === 'assistant');
expect(stored?.content).toBe('thinking more.');
});
// Regression: provider_specific_fields.reasoning_content shape (LiteLLM
// passthrough from vLLM) is also recognized.
it('chatStream recognizes LiteLLM provider_specific_fields.reasoning_content', async () => {

View File

@@ -71,6 +71,36 @@ describe('OpenAiPassthroughAdapter', () => {
await expect(adapter.infer(makeCtx())).rejects.toThrow(/no default endpoint/);
});
it('infer: strips a trailing /v1 from the configured URL', async () => {
// Users naturally paste the OpenAI-style base URL with /v1 because
// every provider documents it that way (https://api.openai.com/v1,
// https://llm.example.com/v1). The adapter then re-appends
// /v1/chat/completions; without normalization this would produce a
// doubled-/v1 404 against LiteLLM and friends.
const fetchFn = mockFetch([{ match: /\/v1\/chat\/completions$/, status: 200, body: {} }]);
const adapter = new OpenAiPassthroughAdapter('openai', { fetch: fetchFn as unknown as typeof fetch });
await adapter.infer(makeCtx({ url: 'https://llm.example.com/v1' }));
const [url1] = fetchFn.mock.calls[0] as [string];
expect(url1).toBe('https://llm.example.com/v1/chat/completions');
// Trailing slash + /v1 should also normalize correctly.
const fetchFn2 = mockFetch([{ match: /\/v1\/chat\/completions$/, status: 200, body: {} }]);
const adapter2 = new OpenAiPassthroughAdapter('openai', { fetch: fetchFn2 as unknown as typeof fetch });
await adapter2.infer(makeCtx({ url: 'https://llm.example.com/v1/' }));
const [url2] = fetchFn2.mock.calls[0] as [string];
expect(url2).toBe('https://llm.example.com/v1/chat/completions');
});
it('infer: preserves a trailing /v1beta suffix (only exact /v1 is stripped)', async () => {
// Some providers expose `/v1beta` as a parallel API surface — don't
// accidentally rewrite that to `/v1` or strip it.
const fetchFn = mockFetch([{ match: /\/v1beta\/v1\/chat\/completions$/, status: 200, body: {} }]);
const adapter = new OpenAiPassthroughAdapter('openai', { fetch: fetchFn as unknown as typeof fetch });
await adapter.infer(makeCtx({ url: 'https://api.example.com/v1beta' }));
const [url] = fetchFn.mock.calls[0] as [string];
expect(url).toBe('https://api.example.com/v1beta/v1/chat/completions');
});
it('infer: omits Authorization when apiKey is empty', async () => {
const fetchFn = mockFetch([{ match: /ollama/, status: 200, body: {} }]);
const adapter = new OpenAiPassthroughAdapter('ollama', { fetch: fetchFn as unknown as typeof fetch });