The earlier plan recommended an MCPLOCAL_MCPD_TOKEN env var so the pod
would have a ServiceAccount session into mcpd. It's unnecessary: the
pod forwards every inbound client bearer (mcpctl_pat_...) verbatim to
mcpd for all downstream calls — both introspect and project discovery.
mcpd's auth middleware dispatches on the prefix and resolves the
McpToken principal directly. No pod secret, no rotation story.
Updates:
- serve.ts header: explicit "identity model" section calling this out
so future readers don't restore the env var thinking it's missing.
- docs/mcptoken-implementation.md: drop the "mount MCPLOCAL_MCPD_TOKEN"
Pulumi guidance and the "dedicated ServiceAccount" follow-up item;
state the correct image URL (internal 10.0.0.194 registry) and the
gated-vs-ungated rule for LLM config mounts.
No runtime code changes — serve.ts never actually required the token;
this just fixes the documentation and the header comment.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verifies the HTTP-mode revocation lag ≤ 5s two ways:
1. Unit (tests/http/token-auth.test.ts, 8 cases): Fastify preHandler
with injected fetch stub exercises the positive/negative cache
directly — first call returns ok:true, we flip the stub to
revoked:true, wait past the short positive TTL, next call gets 401
with "revoked". Plus: non-Bearer 401, non-mcpctl_pat_ 401, wrong-
project 403, mcpd-unreachable 401, happy-path caching (1 fetch for N
requests within TTL), ok:false from mcpd 401.
2. End-to-end (smoke, run manually): added MCPLOCAL_TOKEN_POSITIVE_TTL_MS
and MCPLOCAL_TOKEN_NEGATIVE_TTL_MS env vars to serve.ts so the smoke
can shrink the 30s positive default for testing. Confirmed: with
positive TTL = 2s, the mcptoken.smoke.test.ts revocation case passes
against a local serve.js pointed at prod mcpd.
Operators get the same knobs in production — default behavior unchanged
(30s positive, 5s negative).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs found while trying to point MCPGW_URL=http://localhost:3200
(the systemd mcplocal) so we could get real smoke coverage before the
Pulumi stack for mcp.ad.itaz.eu lands:
1. describe.skipIf(!gatewayUp) was evaluated at parse time, before
beforeAll ran, so gatewayUp was always false and the whole suite
skipped. Switched to the vllm-managed.test.ts pattern: runtime
`if (!gatewayUp) return` at the start of each it().
2. The revocation 401 assertion only makes sense against the
containerized serve.ts entry, which has a 5s negative introspection
cache. Against systemd mcplocal the whole project router is cached
for minutes, so a deleted token with a warm session still succeeds.
Added IS_HTTP_MODE detection (hostname not localhost/127/0.0.0.0,
or MCPGW_IS_HTTP_MODE=true) and skip the assertion otherwise — still
revoking the token so cleanup runs identically.
Run against systemd mcplocal locally:
MCPGW_URL=http://localhost:3200 pnpm --filter @mcpctl/mcplocal \\
exec vitest run --config vitest.smoke.config.ts mcptoken
→ 6/6 pass (revocation case explicitly deferred).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delivers the final piece of the mcptoken stack: a containerized,
network-accessible mcplocal that serves Streamable-HTTP MCP to off-host
clients (the vLLM use case), authenticated by project-scoped McpTokens.
New binary (same package, new entry):
- src/mcplocal/src/serve.ts — HTTP-only entry. Reads MCPLOCAL_MCPD_URL,
MCPLOCAL_MCPD_TOKEN, MCPLOCAL_HTTP_HOST/PORT, MCPLOCAL_CACHE_DIR from
env. No StdioProxyServer, no --upstream.
- src/mcplocal/src/http/token-auth.ts — Fastify preHandler that
validates mcpctl_pat_ bearers via mcpd's /api/v1/mcptokens/introspect.
30s positive / 5s negative TTL. Rejects wrong-project with 403.
Shared HTTP MCP client:
- src/shared/src/mcp-http/ — reusable McpHttpSession with initialize,
listTools, callTool, close. Handles http+https, SSE, id correlation,
distinct McpProtocolError / McpTransportError. Plus mcpHealthCheck
and deriveBaseUrl helpers.
New CLI verb `mcpctl test mcp <url>`:
- Flags: --token (also $MCPCTL_TOKEN), --tool, --args (JSON),
--expect-tools, --timeout, -o text|json, --no-health.
- Exit codes: 0 PASS, 1 TRANSPORT/AUTH FAIL, 2 CONTRACT FAIL.
Container + deploy:
- deploy/Dockerfile.mcplocal (Node 20 alpine, multi-stage, pnpm
workspace, CMD node src/mcplocal/dist/serve.js, VOLUME
/var/lib/mcplocal/cache, HEALTHCHECK on :3200/healthz).
- scripts/build-mcplocal.sh mirrors build-mcpd.sh.
- fulldeploy.sh is now a 4-step pipeline that also builds + rolls out
mcplocal (gated on `kubectl get deployment/mcplocal` so the script
stays green before the Pulumi stack lands).
Audit + cache:
- project-mcp-endpoint.ts passes MCPLOCAL_CACHE_DIR into FileCache at
both construction sites and, when request.mcpToken is present, calls
collector.setSessionMcpToken(id, ...) so audit events carry the
tokenName/tokenSha.
Tests:
- 9 unit cases on `mcpctl test mcp` (happy path, health miss,
expect-tools hit/miss, transport throw, tool isError, json report,
$MCPCTL_TOKEN env fallback, invalid --args).
- Smoke test src/mcplocal/tests/smoke/mcptoken.smoke.test.ts —
gated on healthz($MCPGW_URL), skipped cleanly when unreachable.
Covers happy path, wrong-project 403, --expect-tools contract
failure, and revocation 401 within the negative-cache window.
1773/1773 workspace tests pass. Pulumi resources (Deployment, Service,
Ingress, PVC, Secret, NetworkPolicy) still need to land in
../kubernetes-deployment before the smoke gate flips on.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the end-to-end CLI surface for McpTokens and the mcpd auth dispatch
that recognizes them.
mcpd auth middleware:
- Dispatch on the `mcpctl_pat_` bearer prefix. McpToken bearers resolve
through a new `findMcpToken(hash)` dep, populating `request.mcpToken`
and `request.userId = ownerId`. Everything else follows the existing
session path.
- Returns 401 for revoked / expired / unknown tokens.
- Global RBAC hook now threads `mcpTokenSha` into `canAccess` /
`canRunOperation` / `getAllowedScope`, and enforces a hard
project-scope check: a McpToken principal can only hit
`/api/v1/projects/<its-project>/...`.
CLI verbs:
- `mcpctl create mcptoken <name> -p <proj> [--rbac empty|clone]
[--bind role:view,resource:servers] [--ttl 30d|never|ISO]
[--description ...] [--force]` — returns the raw token once.
- `mcpctl get mcptokens [-p <proj>]` — table with
NAME/PROJECT/PREFIX/CREATED/LAST USED/EXPIRES/STATUS.
- `mcpctl get mcptoken <name> -p <proj>` and
`mcpctl describe mcptoken <name> -p <proj>` — describe surfaces the
auto-created RBAC bindings.
- `mcpctl delete mcptoken <name> -p <proj>`.
- `apply -f` support with `kind: mcptoken`. Tokens are immutable, so
apply creates if missing and skips if the name is already active.
Audit plumbing:
- `AuditEvent` / collector now carry optional `tokenName` / `tokenSha`.
`setSessionMcpToken` sits alongside `setSessionUserName`; both feed a
per-session principal map used at emit time.
- `AuditEventService` query accepts `tokenName` / `tokenSha` filters.
- Console `AuditEvent` type carries the new fields so a follow-up can
add a TOKEN column.
Completions regenerated. 1764/1764 tests pass workspace-wide.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a per-server tools/list cache in McpRouter (positive + negative TTL)
so a slow or dead upstream only stalls the first discovery call, not every
subsequent client request. Invalidated on upstream add/remove.
Health probes now apply a default liveness spec (tools/list via the real
production path) to any RUNNING instance without an explicit healthCheck,
so synthetic and real failures converge on the same signal.
Includes supporting updates in mcpd-client, discovery, upstream/mcpd,
seeder, and fulldeploy/release scripts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fan-out discovery methods (tools/list, prompts/list, resources/list)
used synthetic request IDs that couldn't be looked up in the
correlation map. This caused upstream_response events to have no
correlationId, making the console unable to find upstream content
for replay ("No content to replay").
Fix: pass correlationId through RouteContext → discovery methods →
onUpstreamCall callback, so the handler can use it directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Exempt /healthz and /health from rate limiter
- Increase rate limit from 500 to 2000 req/min
- Register backup routes even when disabled (status shows disabled)
- Guard restore endpoints with 503 when backup not configured
- Add retry with backoff on 429 in audit smoke tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Persistent file cache in ~/.mcpctl/cache/proxymodel/ with LRU eviction
- Pause queue for temporarily holding MCP traffic
- Hot-reload watcher for custom stages and proxymodel definitions
- CLI: mcpctl cache list/clear/stats commands
- HTTP endpoints for cache and pause management
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extends section drill-down (previously tool-only) to work with
prompts/get using _resultId + _section arguments. Shares the same
section store as tool results, enabling cross-method drill-down.
Large prompts (>2000 chars) are automatically split into sections.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
proxyMode "direct" was a security hole (leaked secrets as plaintext env
vars in .mcp.json) and bypassed all mcplocal features (gating, audit,
RBAC, content pipeline, namespacing). Removed from schema, API, CLI,
and all tests. Old configs with proxyMode are accepted but silently
stripped via Zod .transform() for backward compatibility.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite README Content Pipeline section as Plugin System section
documenting built-in plugins (default, gate, content-pipeline),
plugin hooks, and the relationship between gating and proxyModel
- Update all README examples to use --proxy-model instead of --gated
- Add unit tests: proxyModel normalization in JSON/YAML output (4 tests),
Plugin Config section in describe output (2 tests)
- Add smoke tests: yaml/json output shows resolved proxyModel without
gated field, round-trip compatibility (4 tests)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Exclude src/db/tests from workspace vitest config (needs test DB)
- Make global-setup.ts gracefully skip when test DB unavailable
- Fix exactOptionalPropertyTypes issues in proxymodel-endpoint.ts
- Use proper ProxyModelPlugin type for getPluginHooks function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- proxyModel field now determines both YAML pipeline stages AND plugin
gating behavior ('default'/'gate' = gated, 'content-pipeline' = not)
- Deprecate --gated/--no-gated CLI flags (backward compat preserved:
--no-gated maps to --proxy-model content-pipeline)
- Replace GATED column with PLUGIN in `get projects` output
- Update `describe project` to show "Plugin Config" section
- Unify proxymodel discovery: GET /proxymodels now returns both YAML
pipeline models and TypeScript plugins with type field
- `describe proxymodel gate` shows plugin hooks and extends info
- Update CLI apply schema: gated is now optional (not required)
- Regenerate shell completions
- Tests: proxymodel endpoint (5), smoke tests (8)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add userName column to AuditEvent schema with index and migration
- Add GET /api/v1/auth/me endpoint returning current user identity
- AuditCollector auto-fills userName from session→user map, resolved
lazily via /auth/me on first session creation
- Support userName and date range (from/to) filtering on audit events
and sessions endpoints
- Audit console sidebar groups sessions by project → user
- Add date filter presets (d key: all/today/1h/24h/7d) to console
- Add scrolling and page up/down to sidebar navigation
- Tests: auth-me (4), audit-username collector (4), route filters (2),
smoke tests (2)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Audit Console Phase 1: tool_call_trace emission from mcplocal router,
session_bind/rbac_decision event kinds, GET /audit/sessions endpoint,
full Ink TUI with session sidebar, event timeline, and detail view
(mcpctl console --audit).
System prompts: move 6 hardcoded LLM prompts to mcpctl-system project
with extensible ResourceRuleRegistry validation framework, template
variable enforcement ({{maxTokens}}, {{pageCount}}), and delete-resets-
to-default behavior. All consumers fetch via SystemPromptFetcher with
hardcoded fallbacks.
CLI: -p shorthand for --project across get/create/delete/config commands,
console auto-scroll improvements, shell completions regenerated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Qwen 7B sometimes returns fewer titles than pages (12 for 14).
Instead of rejecting the entire response, pad missing entries with
generic "Page N" titles and truncate extras. Also emphasize exact
count in the prompt.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LLMProviderAdapter now tries all registered providers before giving up:
1. Named provider (if specified)
2. All 'fast' tier providers in order
3. All 'heavy' tier providers in order
4. Legacy active provider
Previously, if the first provider (e.g., vllm-local) failed, the adapter
threw immediately even though Anthropic and Gemini were available. Now it
logs the failure and tries the next candidate.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add warmup() to LlmProvider interface for eager subprocess startup
- ManagedVllmProvider.warmup() starts vLLM in background on project load
- ProviderRegistry.warmupAll() triggers all managed providers
- NamedProvider proxies warmup() to inner provider
- paginate stage generates LLM-powered descriptive page titles when
available, cached by content hash, falls back to generic "Page N"
- project-mcp-endpoint calls warmupAll() on router creation so vLLM
is loading while the session initializes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive MCP server management with kubectl-style CLI.
Key features in this release:
- Declarative YAML apply/get round-trip with project cloning support
- Gated sessions with prompt intelligence for Claude
- Interactive MCP console with traffic inspector
- Persistent STDIO connections for containerized servers
- RBAC with name-scoped bindings
- Shell completions (fish + bash) auto-generated
- Rate-limit retry with exponential backoff in apply
- Project-scoped prompt management
- Credential scrubbing from git history
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the full gated session flow and prompt intelligence system:
- Prisma schema: add gated, priority, summary, chapters, linkTarget fields
- Session gate: state machine (gated → begin_session → ungated) with LLM-powered
tool selection based on prompt index
- Tag matcher: intelligent prompt-to-tool matching with project/server/action tags
- LLM selector: tiered provider selection (fast for gating, heavy for complex tasks)
- Link resolver: cross-project MCP resource references (project/server:uri format)
- Prompt summary service: LLM-generated summaries and chapter extraction
- System project bootstrap: ensures default project exists on startup
- Structural link health checks: enrichWithLinkStatus on prompt GET endpoints
- CLI: create prompt --priority/--link, create project --gated/--no-gated,
describe project shows prompts section, get prompts shows PRI/LINK/STATUS
- Apply/edit: priority, linkTarget, gated fields supported
- Shell completions: fish updated with new flags
- 1,253 tests passing across all packages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /llm/providers endpoint now runs isAvailable() on each provider in
parallel and returns health status per provider. The status command shows
✓/✗ per provider based on actual availability, not just the fast tier.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds tier-based LLM routing so fast local models (vLLM, Ollama) handle
structured tasks while cloud models (Gemini, Anthropic) are reserved for
heavy reasoning. Single-provider configs continue to work via fallback.
- Tier type + ProviderRegistry with assignTier/getProvider/fallback chain
- Multi-provider config format: { providers: [{ name, type, tier, ... }] }
- NamedProvider wrapper for multiple instances of same provider type
- Setup wizard: Simple (legacy) / Advanced (fast+heavy tiers) modes
- Status display: tiered view with /llm/providers endpoint
- Call sites use getProvider('fast') instead of getActive()
- Full backward compatibility with existing single-provider configs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids burning tokens on every `mcpctl status` call. The /llm/health
endpoint now caches successful results for 10min, errors for 1min.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pool refactor made ACP client creation lazy, causing the first
/llm/health call to spawn + initialize + prompt Gemini in one request
(30s+). Now warmup() eagerly starts the subprocess on mcplocal boot.
Also fetch models in parallel with LLM health check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ACP session pool with per-model subprocesses and 8h idle eviction
- Per-project LLM config: local override → mcpd recommendation → global default
- Model override support in ResponsePaginator
- /llm/models endpoint + available models in mcpctl status
- Remove --llm-provider/--llm-model from create project (use edit/apply)
- 8 new smart pagination integration tests (e2e flow)
- 260 mcplocal tests, 330 CLI tests passing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Status command now queries mcplocal's /llm/health endpoint instead of
spawning the gemini binary. This uses the persistent ACP connection
(fast) and works for any configured provider, not just gemini-cli.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace per-call gemini CLI spawning (~10s cold start each time) with
persistent ACP (Agent Client Protocol) subprocess. First call absorbs
the cold start, subsequent calls are near-instant over JSON-RPC stdio.
- Add AcpClient: manages persistent gemini --experimental-acp subprocess
with lazy init, auto-restart on crash/timeout, NDJSON framing
- Add GeminiAcpProvider: LlmProvider wrapper with serial queue for
concurrent calls, same interface as GeminiCliProvider
- Add dispose() to LlmProvider interface + disposeAll() to registry
- Wire provider disposal into mcplocal shutdown handler
- Add status command spinner with progressive output and color-coded
LLM health check results (green checkmark/red cross)
- 25 new tests (17 ACP client + 8 provider)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Intercepts oversized tool responses (>80K chars), caches them, and returns
a page index. LLM can fetch specific pages via _resultId/_page params.
Supports LLM-generated smart summaries with simple fallback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mcplocal now reads ~/.mcpctl/credentials automatically when
MCPLOCAL_MCPD_TOKEN env var is not set, matching CLI behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wait for stdout.write callback before process.exit in STDIO transport
to prevent truncation of large responses (e.g. grafana tools/list)
- Handle MCP notification methods (notifications/initialized, etc.) in
router instead of returning "Method not found" error
- Use -p shorthand in config claude output
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix MCP proxy to support SSE and STDIO transports (not just HTTP POST)
- Enrich tool descriptions with server context for LLM clarity
- Add Prompt and PromptRequest resources with two-resource RBAC model
- Add propose_prompt MCP tool for LLM to create pending prompt requests
- Add prompt resources visible in MCP resources/list (approved + session's pending)
- Add project-level prompt/instructions in MCP initialize response
- Add ServiceAccount subject type for RBAC (SA identity from X-Service-Account header)
- Add CLI commands: create prompt, get prompts/promptrequests, approve promptrequest
- Add prompts to apply config schema
- 956 tests passing across all packages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New `mcpctl mcp -p PROJECT` command: STDIO-to-StreamableHTTP bridge
that reads JSON-RPC from stdin and forwards to mcplocal project endpoint
- Rework `config claude` to write mcpctl mcp entry instead of fetching
server configs from API (no secrets in .mcp.json)
- Keep `config claude-generate` as backward-compat alias
- Fix discovery.ts auth token not being forwarded to mcpd (RBAC bypass)
- Update fish/bash completions for new commands
- 10 new MCP bridge tests, updated claude tests, fixed project-discovery test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace admin role with granular roles: view, create, delete, edit, run
- Two binding types: resource bindings (role+resource+optional name) and
operation bindings (role:run + action like backup, logs, impersonate)
- Name-scoped resource bindings for per-instance access control
- Remove role from project members (all permissions via RBAC)
- Add users, groups, RBAC CRUD endpoints and CLI commands
- describe user/group shows all RBAC access (direct + inherited)
- create rbac supports --subject, --binding, --operation flags
- Backup/restore handles users, groups, RBAC definitions
- mcplocal project-based MCP endpoint discovery
- Full test coverage for all new functionality
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce a Helm-chart-like template system for MCP servers. Templates are
YAML files in templates/ that get seeded into the DB on startup. Users can
browse them with `mcpctl get templates`, inspect with `mcpctl describe
template`, and instantiate with `mcpctl create server --from-template=`.
Also adds Portainer deployment scripts, mcplocal systemd service,
Streamable HTTP MCP endpoint, and RPM packaging for mcpctl-local.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>