Status was showing the server-side LLM list but not whether each one
actually serves inference. This adds a per-LLM probe that POSTs a
tiny prompt to /api/v1/llms/<name>/infer:
messages: [{ role: 'user', content: "Say exactly the word 'hi' and nothing else." }]
max_tokens: 8, temperature: 0
Each registered LLM gets a one-line health line:
Server LLMs: 2 registered (probing live "say hi"...)
fast qwen3-thinking ✓ "hi" 312ms
openai → qwen3-thinking http://litellm.../v1 key:litellm/API_KEY
heavy sonnet ✗ upstream auth failed: 401
anthropic → claude-sonnet-4-5 provider default no key
Probes run in parallel so a single slow LLM doesn't gate the others;
each has its own 15-second timeout. JSON/YAML output gains a
\`health: { ok, ms, say?, error? }\` field per server LLM so dashboards
get the same liveness signal.
Tests: 25/25 (was 24, +1 new for the failure-path render). Workspace
suite: 2006/2006 across 149 files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes:
1. \`mcpctl status\` now lists mcpd-managed Llm rows (the ones created via
\`mcpctl create llm\`) under a new "Server LLMs:" section, grouped by
tier with type, model, upstream URL, and key reference. JSON/YAML
output gains a \`serverLlms\` array.
Bearer token (from \`mcpctl auth login\` / saved credentials) is
passed through; if mcpd is unreachable or returns non-200 the
section is silently omitted (the existing mcpd connectivity line
already conveys that). 6 new tests cover happy path, empty list,
token plumbing, and JSON shape.
2. SPA fallback at \`/ui/<deeplink>\` was returning 500 because we
registered \`@fastify/static\` with \`decorateReply: false\` and then
called \`reply.sendFile\`. Read index.html once at startup and serve
it with \`reply.send(html)\` instead — also dodges a per-request
stat call. Drop \`decorateReply: false\` so future code can use
reply.sendFile if it ever needs to.
Full suite: 2005/2005 across 149 files.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive MCP server management with kubectl-style CLI.
Key features in this release:
- Declarative YAML apply/get round-trip with project cloning support
- Gated sessions with prompt intelligence for Claude
- Interactive MCP console with traffic inspector
- Persistent STDIO connections for containerized servers
- RBAC with name-scoped bindings
- Shell completions (fish + bash) auto-generated
- Rate-limit retry with exponential backoff in apply
- Project-scoped prompt management
- Credential scrubbing from git history
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds tier-based LLM routing so fast local models (vLLM, Ollama) handle
structured tasks while cloud models (Gemini, Anthropic) are reserved for
heavy reasoning. Single-provider configs continue to work via fallback.
- Tier type + ProviderRegistry with assignTier/getProvider/fallback chain
- Multi-provider config format: { providers: [{ name, type, tier, ... }] }
- NamedProvider wrapper for multiple instances of same provider type
- Setup wizard: Simple (legacy) / Advanced (fast+heavy tiers) modes
- Status display: tiered view with /llm/providers endpoint
- Call sites use getProvider('fast') instead of getActive()
- Full backward compatibility with existing single-provider configs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Status command now queries mcplocal's /llm/health endpoint instead of
spawning the gemini binary. This uses the persistent ACP connection
(fast) and works for any configured provider, not just gemini-cli.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace per-call gemini CLI spawning (~10s cold start each time) with
persistent ACP (Agent Client Protocol) subprocess. First call absorbs
the cold start, subsequent calls are near-instant over JSON-RPC stdio.
- Add AcpClient: manages persistent gemini --experimental-acp subprocess
with lazy init, auto-restart on crash/timeout, NDJSON framing
- Add GeminiAcpProvider: LlmProvider wrapper with serial queue for
concurrent calls, same interface as GeminiCliProvider
- Add dispose() to LlmProvider interface + disposeAll() to registry
- Wire provider disposal into mcplocal shutdown handler
- Add status command spinner with progressive output and color-coded
LLM health check results (green checkmark/red cross)
- 25 new tests (17 ACP client + 8 provider)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Setup wizard auto-detects gemini binary via `which`, saves full path
so systemd service can find it without user PATH
- `mcpctl status` tests LLM provider health (gemini: quick prompt test,
ollama: health check, API providers: key stored confirmation)
- Shows error details inline: "gemini-cli / gemini-2.5-flash (not authenticated)"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CLI entry point with Commander.js, config management (~/.mcpctl/config.json
with Zod validation), output formatters (table/json/yaml), config and status
commands with dependency injection for testing. Fix sanitizeString regex ordering.
67 tests passing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>