Commit Graph

2 Commits

Author SHA1 Message Date
Michal
cc225eb70f feat(llm): probe upstream auth at registration time
mcpd now runs a cheap auth probe whenever an Llm is created (or its
apiKeyRef/url is updated). Catches misconfigured tokens / wrong URLs at
registration with a 422 + structured error message, instead of silently
500-ing on first chat with a generic "fetch failed". Caught in the wild
today: the homelab Pulumi config exposed `MCPCTL_GATEWAY_TOKEN` (which
is mcpctl_pat_-prefixed, intended for LiteLLM→mcplocal direction) where
LiteLLM expects `LITELLM_MASTER_KEY` (sk-prefixed). The probe makes
this immediate.

Probe shape (LlmAdapter.verifyAuth):
  - OpenAI passthrough → GET <url>/v1/models. Cheap, idempotent, gated
    by the same auth as chat/completions.
  - Anthropic → POST /v1/messages with max_tokens:1, "ping". Anthropic
    has no list-models endpoint; this is the cheapest auth-exercising
    call.
  - Returns one of:
      { ok: true }
      { ok: false, reason: "auth", status, body }    — 401/403, fail hard
      { ok: false, reason: "unreachable", error }    — network, warn-only
      { ok: false, reason: "unexpected", status, body } — non-auth 4xx, warn-only

Behavior:
  - LlmService.create()/update() runs the probe after resolveApiKey.
    Throws LlmAuthVerificationError on `auth`, logs warn for
    unreachable/unexpected, swallows for offline registration.
  - Probe is skipped when there's no apiKeyRef (nothing to verify) or
    when the caller passes skipAuthCheck=true.
  - update() probes only when apiKeyRef OR url changes — pure
    description/tier updates don't trigger upstream calls.
  - Routes catch LlmAuthVerificationError and return 422 with
    `{ error, status }`. The CLI surfaces the message verbatim via
    ApiError.

Opt-out:
  - CLI: `mcpctl create llm ... --skip-auth-check` for offline
    registration before the upstream is reachable.
  - HTTP: side-channel body field `_skipAuthCheck: true` (stripped
    before validation, never persisted on the row).

Side fix in same commit (caught while testing): src/cli/src/index.ts
read `program.opts()` BEFORE `program.parse()`, so `--direct` was a
no-op for ApiClient — every command went to mcplocal regardless. Some
commands accidentally still worked because mcplocal forwards plain
`/api/v1/*` to mcpd, but flows that need direct SSE streaming (e.g.
`mcpctl chat`) couldn't reach mcpd. Fixed by peeking at process.argv
directly for the two global flags before Commander's parse runs.

Tests:
  - llm-adapters.test.ts (+8): OpenAI 200/401/403/404/network, Anthropic
    200/401/400 (typo'd model = unexpected, NOT auth — registration
    shouldn't block on bad model names that surface at chat time).
  - llm-service.test.ts (+6): create-throws-on-auth-fail (no row
    written), warn-only on unreachable/unexpected, skipAuthCheck
    bypass, no-key skip, update-only-probes-on-auth-affecting-change.

mcpd 775/775, mcplocal 715/715, cli 430/430.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 16:51:55 +01:00
Michal
23f53a0798 feat(mcpd): inference proxy — POST /api/v1/llms/:name/infer
Why: the point of the Llm resource (Phase 1) is that credentials never leave
the server. This lands the proxy: clients POST OpenAI chat/completions to
mcpd, mcpd attaches the provider API key server-side, and the response
streams back as OpenAI-format SSE.

Design:
- Wire format client-side is always OpenAI chat/completions — every existing
  SDK speaks it. Adapters translate on the provider side.
- `openai | vllm | deepseek | ollama` → pure passthrough (they already speak
  OpenAI). `anthropic` → translator to/from Anthropic Messages API
  (system-string extraction, content-block flattening, SSE event remap).
- Plain fetch; no @anthropic-ai/sdk dep. Consistent with the OpenBao driver
  shape and keeps the proxy layer thin.
- `gemini-cli` intentionally rejected — subprocess providers need extra
  lifecycle plumbing; deferred to a follow-up.
- Streaming: adapters yield `StreamingChunk`s; the route frames them as
  `data: <json>\n\n` + terminal `data: [DONE]\n\n` so any OpenAI client
  works unchanged.

RBAC:
- New URL special-case in mapUrlToPermission: `POST /api/v1/llms/:name/infer`
  → `run:llms:<name>` (not the default create:llms). Users need an explicit
  `{role: 'run', resource: 'llms', [name: X]}` binding to call infer.
- Possession of `edit:llms` does NOT imply `run` — keeps catalogue
  management separate from spend.

Audit: route emits an `llm_inference_call` event per request (llm name,
model, user/tokenSha, streaming, duration, status). main.ts wires it to the
structured logger for now; hook is in place for a richer audit sink later.

Tests:
- 11 adapter tests (passthrough POST shape + default URLs + no-auth ollama +
  SSE forwarding; anthropic translate request/response + non-2xx wrap + SSE
  event translation; registry dispatch + caching + unsupported-provider).
- 7 route tests (404, 400, non-streaming dispatch + audit, apiKey failure,
  null apiKeyRef path, streaming SSE output, 502 on adapter error).
- Full suite 1830/1830 (+18 from Phase 1's 1812). TypeScript clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 22:43:55 +01:00