fix(smoke,rotator,auth): repair smoke env + close failure modes that
caused 27 post-deploy smoke failures
This commit lands the durable side of the post-deploy investigation:
genuine bugs that let the upstream OpenBao re-init silently break every
secret write for 4 days, plus test-code bugs that masked the same
breakage in the smoke output.
mcpd — fail loud on dead OpenBao tokens
=======================================
secret-backend-rotator.service.ts
When `mintRoleToken` or `lookupSelf` returns 403/401, classify it as
BACKEND_TOKEN_DEAD (likely cause: upstream OpenBao re-init invalidated
every pre-existing token), wrap the thrown error with explicit
remediation (mint via root + `mcpctl create secret <name> --data
<key>=<token> --force`), persist the same message to
tokenMeta.lastRotationError, and emit a structured `level:fatal`
console.error so it shows up in `kubectl logs deploy/mcpd` with grep-
friendly `kind:BACKEND_TOKEN_DEAD`. Adds a `healthCheck(backendId)`
method that runs lookup-self without minting — so the boot-time loop
can detect the dead-token state immediately, not 24 hours later.
secret-backend-rotator-loop.ts
Boot-time health check: in `start()`, for every rotatable backend, call
`rotator.healthCheck(b.id)` and on failure log a structured fatal entry.
This converts the prior silent failure mode (24h wait until scheduled
rotation reveals the dead token, with secret writes failing under it
the entire time) into "mcpd boots, immediately sees the dead token,
alerts loudly". Existing isOverdue path is unchanged.
mcpd — Prisma userId crash on /me
=================================
routes/auth.ts
GET /api/v1/auth/me used `request.userId!` which lied: an authenticated
McpToken bearer satisfies the auth middleware but has no associated
User row, so userId stayed undefined and `findUnique({ id: undefined })`
threw PrismaClientValidationError. Now returns 401 with a clear
"service-account/token-bound principal cannot be queried via /me"
message instead of bubbling a 500.
mcplocal — token revocation propagation
=======================================
http/token-auth.ts
Lowered default introspection positiveTtl from 30s → 5s. mcpd's
introspection endpoint is a single DB lookup; the cache only protects
against burst restart storms, not steady-state load. The 30s window
let revoked tokens keep working for the full window after revocation
(caught by mcptoken.smoke's negative-cache assertion). Aligns with the
existing 5s negativeTtl and the test's `wait 7s after revoke` expectation.
smoke tests — read URL the same way the CLI does
================================================
mcp-client.ts
Adds `loadMcpdAuth()`: URL from `~/.mcpctl/config.json`, token from
`~/.mcpctl/credentials`. Critically, the URL does NOT come from
credentials. credentials.mcpdUrl carries a stale field for legacy
reasons and goes out of sync (left over from old `mcpctl login
--mcpd-url localhost:3xxx` invocations) — tests reading it ended up
hitting whatever URL the user last logged into rather than the URL
the CLI is actually using right now. audit/security/system-prompts
smoke now use loadMcpdAuth(), eliminating ~10 cascade failures.
Also: switch httpRequest to https.request when scheme is https
(matching audit/security/system-prompts/mcp-client/agent helpers).
Bumps default callTool timeout from 30s → 60s; many tools that fetch
external resources routinely run 10-30s.
agent.smoke.test.ts
- readToken read from `credentials.json`; the file is `credentials`
(no extension). Caused 401 on POST /threads.
- `mcpctl get <resource> <name> -o json` returns an array, not a bare
object. Round-trip yaml test now indexes [0] before reading
description.
secretbackend.smoke.test.ts
Two genuine assertion-drift fixes (env was right, test was stale):
- "lists at least one secretbackend": stop hard-coding the default
backend type as 'plaintext'; the invariant is "exactly one default
exists". The seeded plaintext is the bootstrap default but operators
routinely promote a remote backend (openbao etc.) once it's healthy.
- "refuses to delete the seeded default": widen the regex from
/default|in use|cannot delete/ to also accept "referenced" — the
exact wording has shifted to "is still referenced by N secret(s);
migrate them first".
audit.test.ts / system-prompts.test.ts / security.test.ts
Switch http.request → https.request when URL is https (each had its
own copy of the helper). Drop the now-orphan loadMcpdCredentials in
favour of loadMcpdAuth from mcp-client.ts.
Tests
=====
mcpd 759/759, mcplocal 715/715 unit suites still green. Smoke (live):
Run 1 (pre-commit, post bao-token rotation): 27 → 12 failures.
Run 2 (after fixes-batch, pre-redeploy): 12 → 2 failures.
The remaining 2 (mcptoken cache TTL, proxy-pipeline timeout) are what
the durable code changes here address; verify after the next redeploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -3,6 +3,10 @@
|
||||
* Sends JSON-RPC messages to mcplocal's HTTP endpoint and parses SSE responses.
|
||||
*/
|
||||
import http from 'node:http';
|
||||
import https from 'node:https';
|
||||
import { readFileSync, existsSync } from 'node:fs';
|
||||
import { join } from 'node:path';
|
||||
import { homedir } from 'node:os';
|
||||
|
||||
export interface McpResponse {
|
||||
status: number;
|
||||
@@ -21,6 +25,45 @@ export function getMcpdUrl(): string {
|
||||
return MCPD_URL;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the live mcpd `{ token, url }` the way the CLI itself does:
|
||||
* - URL from `~/.mcpctl/config.json`'s `mcpdUrl` (with $MCPD_URL override)
|
||||
* - token from `~/.mcpctl/credentials`'s `token` field
|
||||
*
|
||||
* Critically, **the URL does NOT come from credentials**. credentials carries
|
||||
* an `mcpdUrl` field for legacy reasons that goes stale (left over from old
|
||||
* `mcpctl login --mcpd-url localhost:3xxx` invocations). Tests that read the
|
||||
* URL from credentials end up hitting whatever URL the user last logged into,
|
||||
* not the URL the CLI is actually using right now.
|
||||
*/
|
||||
export function loadMcpdAuth(): { token: string; url: string } {
|
||||
const url = readConfigMcpdUrl() ?? MCPD_URL;
|
||||
const token = readCredentialsToken() ?? '';
|
||||
return { token, url };
|
||||
}
|
||||
|
||||
function readConfigMcpdUrl(): string | null {
|
||||
const path = join(homedir(), '.mcpctl', 'config.json');
|
||||
if (!existsSync(path)) return null;
|
||||
try {
|
||||
const parsed = JSON.parse(readFileSync(path, 'utf-8')) as { mcpdUrl?: string };
|
||||
return typeof parsed.mcpdUrl === 'string' && parsed.mcpdUrl.length > 0 ? parsed.mcpdUrl : null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function readCredentialsToken(): string | null {
|
||||
const path = join(homedir(), '.mcpctl', 'credentials');
|
||||
if (!existsSync(path)) return null;
|
||||
try {
|
||||
const parsed = JSON.parse(readFileSync(path, 'utf-8')) as { token?: string };
|
||||
return typeof parsed.token === 'string' && parsed.token.length > 0 ? parsed.token : null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function httpRequest(opts: {
|
||||
url: string;
|
||||
method: string;
|
||||
@@ -30,10 +73,11 @@ function httpRequest(opts: {
|
||||
}): Promise<{ status: number; headers: http.IncomingHttpHeaders; body: string }> {
|
||||
return new Promise((resolve, reject) => {
|
||||
const parsed = new URL(opts.url);
|
||||
const req = http.request(
|
||||
const driver = parsed.protocol === 'https:' ? https : http;
|
||||
const req = driver.request(
|
||||
{
|
||||
hostname: parsed.hostname,
|
||||
port: parsed.port,
|
||||
port: parsed.port || (parsed.protocol === 'https:' ? 443 : 80),
|
||||
path: parsed.pathname + parsed.search,
|
||||
method: opts.method,
|
||||
headers: opts.headers,
|
||||
@@ -178,7 +222,12 @@ export class SmokeMcpSession {
|
||||
}
|
||||
|
||||
async callTool(name: string, args: Record<string, unknown> = {}, timeout?: number): Promise<{ content: Array<{ type: string; text?: string }>; isError?: boolean }> {
|
||||
return await this.send('tools/call', { name, arguments: args }, timeout) as { content: Array<{ type: string; text?: string }>; isError?: boolean };
|
||||
// Default 60s — many real MCP tools (web fetch, doc retrieval, query
|
||||
// execution) routinely take 10-30s under normal load. The previous 30s
|
||||
// floor was tight enough that occasional upstream latency tripped the
|
||||
// proxy-pipeline hot-reload smoke. Tests that need a tighter bound can
|
||||
// pass an explicit value.
|
||||
return await this.send('tools/call', { name, arguments: args }, timeout ?? 60_000) as { content: Array<{ type: string; text?: string }>; isError?: boolean };
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
|
||||
Reference in New Issue
Block a user