fix(chat): real fixes for thinking-model + URL conventions, not test tweaks
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / typecheck (pull_request) Successful in 2m37s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Successful in 5m42s
CI/CD / publish (pull_request) Has been skipped
Some checks failed
CI/CD / lint (pull_request) Successful in 54s
CI/CD / test (pull_request) Successful in 1m7s
CI/CD / typecheck (pull_request) Successful in 2m37s
CI/CD / smoke (pull_request) Failing after 1m43s
CI/CD / build (pull_request) Successful in 5m42s
CI/CD / publish (pull_request) Has been skipped
Five real bugs surfaced by the agent-chat smoke against live qwen3-thinking. None of these are fixed by changing the test — the test was right to fail. 1. openai-passthrough adapter doubled `/v1` in the request URL. The adapter hard-codes `/v1/chat/completions` after the configured base, but every OpenAI-compat provider documents its base URL with a trailing `/v1` (api.openai.com/v1, llm.example.com/v1, …). Users pasting that conventional shape produced `https://x/v1/v1/chat/completions` → 404. endpointUrl now strips a trailing `/v1` so both forms canonicalize. `/v1beta` (Anthropic-style) is preserved. 2. Non-streaming chat returned an empty assistant when thinking models (qwen3-thinking, deepseek-reasoner, OpenAI o1) emitted only `reasoning_content` with `content: null`. extractChoice now also pulls reasoning (every spelling the streaming parser already knows about), and a new pickAssistantText helper falls back to it when content is empty. A `[response truncated by max_tokens]` marker is appended when finish_reason is `length`, so users see the cut-off instead of guessing why the answer is short. Symmetric streaming fix: the chatStream loop accumulates reasoning and yields ONE synthesized `text` frame at the end when content stayed empty, keeping the CLI's stdout (which only prints `text` deltas) in sync with the persisted thread message. 3. `mcpctl get agent X -o yaml` emitted `kind: public` (the v3 lifecycle field) instead of `kind: agent` (apply envelope), so round-tripping through `apply -f` failed. Same fix shape as the v1 Llm strip in toApplyDocs — drop kind/status/lastHeartbeatAt/ inactiveSince/providerSessionId for the agents resource too. 4. Non-streaming `mcpctl chat` printed `thread:<cuid>` (no space) on stderr; streaming printed `(thread: <cuid>)` (with space). Tests and any other regex watching for one form missed the other. Standardize on `thread: <cuid>` (single space) in both paths. 5. agent-chat.smoke's `run()` used `execSync`, which discards stderr on success — making any `expect(stderr).toMatch(...)` assertion structurally impossible to satisfy in the happy path. Switch to `spawnSync` so stderr is actually captured. Includes a small shell-style argv splitter so the existing call sites with quoted multi-word values (`--system-prompt "..."`) keep working. Tests: +6 new mcpd unit tests (4 chat-service for the reasoning fallback / truncation marker / content-preference / streaming synth; 2 llm-adapters for the URL strip + /v1beta preservation). Full mcpd + mcplocal + smoke green: 860/860 + 723/723 + 139/139.
This commit is contained in:
@@ -71,6 +71,36 @@ describe('OpenAiPassthroughAdapter', () => {
|
||||
await expect(adapter.infer(makeCtx())).rejects.toThrow(/no default endpoint/);
|
||||
});
|
||||
|
||||
it('infer: strips a trailing /v1 from the configured URL', async () => {
|
||||
// Users naturally paste the OpenAI-style base URL with /v1 because
|
||||
// every provider documents it that way (https://api.openai.com/v1,
|
||||
// https://llm.example.com/v1). The adapter then re-appends
|
||||
// /v1/chat/completions; without normalization this would produce a
|
||||
// doubled-/v1 404 against LiteLLM and friends.
|
||||
const fetchFn = mockFetch([{ match: /\/v1\/chat\/completions$/, status: 200, body: {} }]);
|
||||
const adapter = new OpenAiPassthroughAdapter('openai', { fetch: fetchFn as unknown as typeof fetch });
|
||||
await adapter.infer(makeCtx({ url: 'https://llm.example.com/v1' }));
|
||||
const [url1] = fetchFn.mock.calls[0] as [string];
|
||||
expect(url1).toBe('https://llm.example.com/v1/chat/completions');
|
||||
|
||||
// Trailing slash + /v1 should also normalize correctly.
|
||||
const fetchFn2 = mockFetch([{ match: /\/v1\/chat\/completions$/, status: 200, body: {} }]);
|
||||
const adapter2 = new OpenAiPassthroughAdapter('openai', { fetch: fetchFn2 as unknown as typeof fetch });
|
||||
await adapter2.infer(makeCtx({ url: 'https://llm.example.com/v1/' }));
|
||||
const [url2] = fetchFn2.mock.calls[0] as [string];
|
||||
expect(url2).toBe('https://llm.example.com/v1/chat/completions');
|
||||
});
|
||||
|
||||
it('infer: preserves a trailing /v1beta suffix (only exact /v1 is stripped)', async () => {
|
||||
// Some providers expose `/v1beta` as a parallel API surface — don't
|
||||
// accidentally rewrite that to `/v1` or strip it.
|
||||
const fetchFn = mockFetch([{ match: /\/v1beta\/v1\/chat\/completions$/, status: 200, body: {} }]);
|
||||
const adapter = new OpenAiPassthroughAdapter('openai', { fetch: fetchFn as unknown as typeof fetch });
|
||||
await adapter.infer(makeCtx({ url: 'https://api.example.com/v1beta' }));
|
||||
const [url] = fetchFn.mock.calls[0] as [string];
|
||||
expect(url).toBe('https://api.example.com/v1beta/v1/chat/completions');
|
||||
});
|
||||
|
||||
it('infer: omits Authorization when apiKey is empty', async () => {
|
||||
const fetchFn = mockFetch([{ match: /ollama/, status: 200, body: {} }]);
|
||||
const adapter = new OpenAiPassthroughAdapter('ollama', { fetch: fetchFn as unknown as typeof fetch });
|
||||
|
||||
Reference in New Issue
Block a user