feat(mcpd): LLM inference proxy + OpenAI/Anthropic adapters #53

michal · 2026-04-18T21:44:13Z

michal commented

2026-04-18 21:44:13 +00:00

Summary

Phase 2 of the Llm plan. Ships the server-side proxy so clients POST OpenAI-format chat completions to `mcpd /api/v1/llms/:name/infer` — mcpd resolves the API key from the referenced Secret (through Phase 0's backend dispatch) and forwards to the right provider. Credentials never leave the server.

Based on `feat/llm` (PR #52) — needs the Llm resource landed to dispatch against.

Wire format client-side: always OpenAI chat/completions (de-facto lingua franca — every SDK speaks it).
openai / vllm / deepseek / ollama: pure passthrough. Request + SSE response forwarded verbatim.
anthropic: full translator. System messages → `system` string, content blocks → flat string, Anthropic SSE events → OpenAI `chat.completion.chunk`. Plain `fetch()` — no `@anthropic-ai/sdk` dep.
gemini-cli: intentionally deferred — subprocess lifecycle is out of scope for this phase.
Streaming: adapters yield framed chunks; route writes `data: \n\n` + terminal `data: [DONE]`.
RBAC: new URL special case maps `POST /api/v1/llms/:name/infer` → `run:llms:`. `edit:llms` does NOT imply `run` — catalogue management stays separate from spend authorisation.
Audit: `llm_inference_call` events (model/user/tokenSha/streaming/duration/status) piped to the structured logger; sink hook is in place for a richer sink later.

Test plan

11 adapter unit tests (OpenAI passthrough shape + default URLs + no-auth ollama + SSE framing; Anthropic request/response translation + non-2xx wrap + SSE event remap; registry dispatch + caching + unsupported-provider guard)
7 route tests (404 missing, 400 no messages, non-streaming dispatch + audit, 500 on key resolve failure, null apiKeyRef skips resolution, streaming SSE output, 502 on upstream error)
Full workspace suite: 1830/1830 passing (+18 from Phase 1's 1812)
TypeScript clean across mcpd
End-to-end: deploy, grant `run:llms:claude` to a user, `curl -N` an inference call through mcpd, confirm streaming response

🤖 Generated with Claude Code

## Summary Phase 2 of the Llm plan. Ships the server-side proxy so clients POST OpenAI-format chat completions to \`mcpd /api/v1/llms/:name/infer\` — mcpd resolves the API key from the referenced Secret (through Phase 0's backend dispatch) and forwards to the right provider. Credentials never leave the server. **Based on \`feat/llm\` (PR #52)** — needs the Llm resource landed to dispatch against. - Wire format client-side: always OpenAI chat/completions (de-facto lingua franca — every SDK speaks it). - **openai / vllm / deepseek / ollama**: pure passthrough. Request + SSE response forwarded verbatim. - **anthropic**: full translator. System messages → \`system\` string, content blocks → flat string, Anthropic SSE events → OpenAI \`chat.completion.chunk\`. Plain \`fetch()\` — no \`@anthropic-ai/sdk\` dep. - **gemini-cli**: intentionally deferred — subprocess lifecycle is out of scope for this phase. - **Streaming**: adapters yield framed chunks; route writes \`data: <json>\\n\\n\` + terminal \`data: [DONE]\`. - **RBAC**: new URL special case maps \`POST /api/v1/llms/:name/infer\` → \`run:llms:<name>\`. \`edit:llms\` does NOT imply \`run\` — catalogue management stays separate from spend authorisation. - **Audit**: \`llm_inference_call\` events (model/user/tokenSha/streaming/duration/status) piped to the structured logger; sink hook is in place for a richer sink later. ## Test plan - [x] 11 adapter unit tests (OpenAI passthrough shape + default URLs + no-auth ollama + SSE framing; Anthropic request/response translation + non-2xx wrap + SSE event remap; registry dispatch + caching + unsupported-provider guard) - [x] 7 route tests (404 missing, 400 no messages, non-streaming dispatch + audit, 500 on key resolve failure, null apiKeyRef skips resolution, streaming SSE output, 502 on upstream error) - [x] Full workspace suite: **1830/1830 passing** (+18 from Phase 1's 1812) - [x] TypeScript clean across mcpd - [ ] End-to-end: deploy, grant \`run:llms:claude\` to a user, \`curl -N\` an inference call through mcpd, confirm streaming response 🤖 Generated with [Claude Code](https://claude.com/claude-code)

michal referenced this pull request

2026-04-19 12:05:57 +00:00

feat(mcplocal): RBAC-bounded vllm-managed failover #54

michal changed target branch from feat/llm to main

2026-04-19 21:39:34 +00:00

michal added 1 commit 2026-04-19 21:39:34 +00:00

feat(mcpd): inference proxy — POST /api/v1/llms/:name/infer 23f53a0798

Why: the point of the Llm resource (Phase 1) is that credentials never leave
the server. This lands the proxy: clients POST OpenAI chat/completions to
mcpd, mcpd attaches the provider API key server-side, and the response
streams back as OpenAI-format SSE.

Design:
- Wire format client-side is always OpenAI chat/completions — every existing
  SDK speaks it. Adapters translate on the provider side.
- `openai | vllm | deepseek | ollama` → pure passthrough (they already speak
  OpenAI). `anthropic` → translator to/from Anthropic Messages API
  (system-string extraction, content-block flattening, SSE event remap).
- Plain fetch; no @anthropic-ai/sdk dep. Consistent with the OpenBao driver
  shape and keeps the proxy layer thin.
- `gemini-cli` intentionally rejected — subprocess providers need extra
  lifecycle plumbing; deferred to a follow-up.
- Streaming: adapters yield `StreamingChunk`s; the route frames them as
  `data: <json>\n\n` + terminal `data: [DONE]\n\n` so any OpenAI client
  works unchanged.

RBAC:
- New URL special-case in mapUrlToPermission: `POST /api/v1/llms/:name/infer`
  → `run:llms:<name>` (not the default create:llms). Users need an explicit
  `{role: 'run', resource: 'llms', [name: X]}` binding to call infer.
- Possession of `edit:llms` does NOT imply `run` — keeps catalogue
  management separate from spend.

Audit: route emits an `llm_inference_call` event per request (llm name,
model, user/tokenSha, streaming, duration, status). main.ts wires it to the
structured logger for now; hook is in place for a richer audit sink later.

Tests:
- 11 adapter tests (passthrough POST shape + default URLs + no-auth ollama +
  SSE forwarding; anthropic translate request/response + non-2xx wrap + SSE
  event translation; registry dispatch + caching + unsupported-provider).
- 7 route tests (404, 400, non-streaming dispatch + audit, apiKey failure,
  null apiKeyRef path, streaming SSE output, 502 on adapter error).
- Full suite 1830/1830 (+18 from Phase 1's 1812). TypeScript clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

michal merged commit d217eadd13 into main

2026-04-19 21:39:40 +00:00

michal referenced this issue from a commit

2026-04-19 21:39:42 +00:00

Merge pull request 'feat(mcpd): LLM inference proxy + OpenAI/Anthropic adapters' (#53) from feat/llm-infer into main

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: michal/mcpctl#53