caused 27 post-deploy smoke failures
This commit lands the durable side of the post-deploy investigation:
genuine bugs that let the upstream OpenBao re-init silently break every
secret write for 4 days, plus test-code bugs that masked the same
breakage in the smoke output.
mcpd — fail loud on dead OpenBao tokens
=======================================
secret-backend-rotator.service.ts
When `mintRoleToken` or `lookupSelf` returns 403/401, classify it as
BACKEND_TOKEN_DEAD (likely cause: upstream OpenBao re-init invalidated
every pre-existing token), wrap the thrown error with explicit
remediation (mint via root + `mcpctl create secret <name> --data
<key>=<token> --force`), persist the same message to
tokenMeta.lastRotationError, and emit a structured `level:fatal`
console.error so it shows up in `kubectl logs deploy/mcpd` with grep-
friendly `kind:BACKEND_TOKEN_DEAD`. Adds a `healthCheck(backendId)`
method that runs lookup-self without minting — so the boot-time loop
can detect the dead-token state immediately, not 24 hours later.
secret-backend-rotator-loop.ts
Boot-time health check: in `start()`, for every rotatable backend, call
`rotator.healthCheck(b.id)` and on failure log a structured fatal entry.
This converts the prior silent failure mode (24h wait until scheduled
rotation reveals the dead token, with secret writes failing under it
the entire time) into "mcpd boots, immediately sees the dead token,
alerts loudly". Existing isOverdue path is unchanged.
mcpd — Prisma userId crash on /me
=================================
routes/auth.ts
GET /api/v1/auth/me used `request.userId!` which lied: an authenticated
McpToken bearer satisfies the auth middleware but has no associated
User row, so userId stayed undefined and `findUnique({ id: undefined })`
threw PrismaClientValidationError. Now returns 401 with a clear
"service-account/token-bound principal cannot be queried via /me"
message instead of bubbling a 500.
mcplocal — token revocation propagation
=======================================
http/token-auth.ts
Lowered default introspection positiveTtl from 30s → 5s. mcpd's
introspection endpoint is a single DB lookup; the cache only protects
against burst restart storms, not steady-state load. The 30s window
let revoked tokens keep working for the full window after revocation
(caught by mcptoken.smoke's negative-cache assertion). Aligns with the
existing 5s negativeTtl and the test's `wait 7s after revoke` expectation.
smoke tests — read URL the same way the CLI does
================================================
mcp-client.ts
Adds `loadMcpdAuth()`: URL from `~/.mcpctl/config.json`, token from
`~/.mcpctl/credentials`. Critically, the URL does NOT come from
credentials. credentials.mcpdUrl carries a stale field for legacy
reasons and goes out of sync (left over from old `mcpctl login
--mcpd-url localhost:3xxx` invocations) — tests reading it ended up
hitting whatever URL the user last logged into rather than the URL
the CLI is actually using right now. audit/security/system-prompts
smoke now use loadMcpdAuth(), eliminating ~10 cascade failures.
Also: switch httpRequest to https.request when scheme is https
(matching audit/security/system-prompts/mcp-client/agent helpers).
Bumps default callTool timeout from 30s → 60s; many tools that fetch
external resources routinely run 10-30s.
agent.smoke.test.ts
- readToken read from `credentials.json`; the file is `credentials`
(no extension). Caused 401 on POST /threads.
- `mcpctl get <resource> <name> -o json` returns an array, not a bare
object. Round-trip yaml test now indexes [0] before reading
description.
secretbackend.smoke.test.ts
Two genuine assertion-drift fixes (env was right, test was stale):
- "lists at least one secretbackend": stop hard-coding the default
backend type as 'plaintext'; the invariant is "exactly one default
exists". The seeded plaintext is the bootstrap default but operators
routinely promote a remote backend (openbao etc.) once it's healthy.
- "refuses to delete the seeded default": widen the regex from
/default|in use|cannot delete/ to also accept "referenced" — the
exact wording has shifted to "is still referenced by N secret(s);
migrate them first".
audit.test.ts / system-prompts.test.ts / security.test.ts
Switch http.request → https.request when URL is https (each had its
own copy of the helper). Drop the now-orphan loadMcpdCredentials in
favour of loadMcpdAuth from mcp-client.ts.
Tests
=====
mcpd 759/759, mcplocal 715/715 unit suites still green. Smoke (live):
Run 1 (pre-commit, post bao-token rotation): 27 → 12 failures.
Run 2 (after fixes-batch, pre-redeploy): 12 → 2 failures.
The remaining 2 (mcptoken cache TTL, proxy-pipeline timeout) are what
the durable code changes here address; verify after the next redeploy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
276 lines
9.0 KiB
TypeScript
276 lines
9.0 KiB
TypeScript
/**
|
|
* Lightweight MCP HTTP client for smoke tests.
|
|
* Sends JSON-RPC messages to mcplocal's HTTP endpoint and parses SSE responses.
|
|
*/
|
|
import http from 'node:http';
|
|
import https from 'node:https';
|
|
import { readFileSync, existsSync } from 'node:fs';
|
|
import { join } from 'node:path';
|
|
import { homedir } from 'node:os';
|
|
|
|
export interface McpResponse {
|
|
status: number;
|
|
sessionId?: string;
|
|
messages: unknown[];
|
|
}
|
|
|
|
const MCPLOCAL_URL = process.env.MCPLOCAL_URL ?? 'http://localhost:3200';
|
|
const MCPD_URL = process.env.MCPD_URL ?? 'http://localhost:3100';
|
|
|
|
export function getMcplocalUrl(): string {
|
|
return MCPLOCAL_URL;
|
|
}
|
|
|
|
export function getMcpdUrl(): string {
|
|
return MCPD_URL;
|
|
}
|
|
|
|
/**
|
|
* Resolve the live mcpd `{ token, url }` the way the CLI itself does:
|
|
* - URL from `~/.mcpctl/config.json`'s `mcpdUrl` (with $MCPD_URL override)
|
|
* - token from `~/.mcpctl/credentials`'s `token` field
|
|
*
|
|
* Critically, **the URL does NOT come from credentials**. credentials carries
|
|
* an `mcpdUrl` field for legacy reasons that goes stale (left over from old
|
|
* `mcpctl login --mcpd-url localhost:3xxx` invocations). Tests that read the
|
|
* URL from credentials end up hitting whatever URL the user last logged into,
|
|
* not the URL the CLI is actually using right now.
|
|
*/
|
|
export function loadMcpdAuth(): { token: string; url: string } {
|
|
const url = readConfigMcpdUrl() ?? MCPD_URL;
|
|
const token = readCredentialsToken() ?? '';
|
|
return { token, url };
|
|
}
|
|
|
|
function readConfigMcpdUrl(): string | null {
|
|
const path = join(homedir(), '.mcpctl', 'config.json');
|
|
if (!existsSync(path)) return null;
|
|
try {
|
|
const parsed = JSON.parse(readFileSync(path, 'utf-8')) as { mcpdUrl?: string };
|
|
return typeof parsed.mcpdUrl === 'string' && parsed.mcpdUrl.length > 0 ? parsed.mcpdUrl : null;
|
|
} catch {
|
|
return null;
|
|
}
|
|
}
|
|
|
|
function readCredentialsToken(): string | null {
|
|
const path = join(homedir(), '.mcpctl', 'credentials');
|
|
if (!existsSync(path)) return null;
|
|
try {
|
|
const parsed = JSON.parse(readFileSync(path, 'utf-8')) as { token?: string };
|
|
return typeof parsed.token === 'string' && parsed.token.length > 0 ? parsed.token : null;
|
|
} catch {
|
|
return null;
|
|
}
|
|
}
|
|
|
|
function httpRequest(opts: {
|
|
url: string;
|
|
method: string;
|
|
headers?: Record<string, string>;
|
|
body?: string;
|
|
timeout?: number;
|
|
}): Promise<{ status: number; headers: http.IncomingHttpHeaders; body: string }> {
|
|
return new Promise((resolve, reject) => {
|
|
const parsed = new URL(opts.url);
|
|
const driver = parsed.protocol === 'https:' ? https : http;
|
|
const req = driver.request(
|
|
{
|
|
hostname: parsed.hostname,
|
|
port: parsed.port || (parsed.protocol === 'https:' ? 443 : 80),
|
|
path: parsed.pathname + parsed.search,
|
|
method: opts.method,
|
|
headers: opts.headers,
|
|
timeout: opts.timeout ?? 30_000,
|
|
},
|
|
(res) => {
|
|
const chunks: Buffer[] = [];
|
|
res.on('data', (chunk: Buffer) => chunks.push(chunk));
|
|
res.on('end', () => {
|
|
resolve({
|
|
status: res.statusCode ?? 0,
|
|
headers: res.headers,
|
|
body: Buffer.concat(chunks).toString('utf-8'),
|
|
});
|
|
});
|
|
},
|
|
);
|
|
req.on('error', reject);
|
|
req.on('timeout', () => {
|
|
req.destroy();
|
|
reject(new Error('Request timed out'));
|
|
});
|
|
if (opts.body) req.write(opts.body);
|
|
req.end();
|
|
});
|
|
}
|
|
|
|
function parseSSE(body: string): unknown[] {
|
|
const messages: unknown[] = [];
|
|
for (const line of body.split('\n')) {
|
|
if (line.startsWith('data: ')) {
|
|
try {
|
|
messages.push(JSON.parse(line.slice(6)));
|
|
} catch {
|
|
// skip
|
|
}
|
|
}
|
|
}
|
|
return messages;
|
|
}
|
|
|
|
/**
|
|
* MCP session for smoke tests.
|
|
* Manages session ID and sends JSON-RPC requests.
|
|
*/
|
|
export class SmokeMcpSession {
|
|
private sessionId?: string;
|
|
private nextId = 1;
|
|
|
|
constructor(
|
|
private readonly projectName: string,
|
|
private readonly token?: string,
|
|
) {}
|
|
|
|
get endpoint(): string {
|
|
return `${MCPLOCAL_URL}/projects/${encodeURIComponent(this.projectName)}/mcp`;
|
|
}
|
|
|
|
async send(method: string, params: Record<string, unknown> = {}, timeout?: number): Promise<unknown> {
|
|
const id = this.nextId++;
|
|
const request = { jsonrpc: '2.0', id, method, params };
|
|
|
|
const headers: Record<string, string> = {
|
|
'Content-Type': 'application/json',
|
|
'Accept': 'application/json, text/event-stream',
|
|
};
|
|
if (this.sessionId) headers['mcp-session-id'] = this.sessionId;
|
|
if (this.token) headers['Authorization'] = `Bearer ${this.token}`;
|
|
|
|
const result = await httpRequest({
|
|
url: this.endpoint,
|
|
method: 'POST',
|
|
headers,
|
|
body: JSON.stringify(request),
|
|
timeout,
|
|
});
|
|
|
|
// Capture session ID
|
|
if (!this.sessionId) {
|
|
const sid = result.headers['mcp-session-id'];
|
|
if (typeof sid === 'string') this.sessionId = sid;
|
|
}
|
|
|
|
// Handle HTTP-level errors (e.g. 502 for nonexistent project)
|
|
if (result.status >= 400) {
|
|
let errorMsg = `HTTP ${result.status}`;
|
|
try {
|
|
const body = JSON.parse(result.body) as { error?: string };
|
|
if (body.error) errorMsg = body.error;
|
|
} catch {
|
|
errorMsg = `HTTP ${result.status}: ${result.body.slice(0, 200)}`;
|
|
}
|
|
throw new Error(errorMsg);
|
|
}
|
|
|
|
// Parse response — handle SSE with multiple messages (notifications + response)
|
|
const messages = result.headers['content-type']?.includes('text/event-stream')
|
|
? parseSSE(result.body)
|
|
: [JSON.parse(result.body)];
|
|
|
|
// Find the response matching our request ID (skip notifications)
|
|
const response = messages.find((m) => {
|
|
const msg = m as { id?: unknown };
|
|
return msg.id === id;
|
|
}) as { result?: unknown; error?: { code: number; message: string } } | undefined;
|
|
|
|
// Fall back to first message if no ID match (e.g. error responses)
|
|
const parsed = response ?? messages[0] as { result?: unknown; error?: { code: number; message: string } } | undefined;
|
|
if (!parsed) throw new Error(`No response for ${method}`);
|
|
if (parsed.error) throw new Error(`MCP error ${parsed.error.code}: ${parsed.error.message}`);
|
|
return parsed.result;
|
|
}
|
|
|
|
async initialize(): Promise<unknown> {
|
|
return this.send('initialize', {
|
|
protocolVersion: '2024-11-05',
|
|
capabilities: {},
|
|
clientInfo: { name: 'mcpctl-smoke-test', version: '1.0.0' },
|
|
});
|
|
}
|
|
|
|
async sendNotification(method: string, params: Record<string, unknown> = {}): Promise<void> {
|
|
const notification = { jsonrpc: '2.0', method, params };
|
|
const headers: Record<string, string> = {
|
|
'Content-Type': 'application/json',
|
|
'Accept': 'application/json, text/event-stream',
|
|
};
|
|
if (this.sessionId) headers['mcp-session-id'] = this.sessionId;
|
|
if (this.token) headers['Authorization'] = `Bearer ${this.token}`;
|
|
|
|
await httpRequest({
|
|
url: this.endpoint,
|
|
method: 'POST',
|
|
headers,
|
|
body: JSON.stringify(notification),
|
|
}).catch(() => {});
|
|
}
|
|
|
|
async listTools(): Promise<Array<{ name: string; description?: string; inputSchema?: unknown }>> {
|
|
const result = await this.send('tools/list') as { tools: Array<{ name: string; description?: string; inputSchema?: unknown }> };
|
|
return result.tools ?? [];
|
|
}
|
|
|
|
async callTool(name: string, args: Record<string, unknown> = {}, timeout?: number): Promise<{ content: Array<{ type: string; text?: string }>; isError?: boolean }> {
|
|
// Default 60s — many real MCP tools (web fetch, doc retrieval, query
|
|
// execution) routinely take 10-30s under normal load. The previous 30s
|
|
// floor was tight enough that occasional upstream latency tripped the
|
|
// proxy-pipeline hot-reload smoke. Tests that need a tighter bound can
|
|
// pass an explicit value.
|
|
return await this.send('tools/call', { name, arguments: args }, timeout ?? 60_000) as { content: Array<{ type: string; text?: string }>; isError?: boolean };
|
|
}
|
|
|
|
async close(): Promise<void> {
|
|
if (this.sessionId) {
|
|
const headers: Record<string, string> = { 'mcp-session-id': this.sessionId };
|
|
if (this.token) headers['Authorization'] = `Bearer ${this.token}`;
|
|
await httpRequest({
|
|
url: this.endpoint,
|
|
method: 'DELETE',
|
|
headers,
|
|
timeout: 5_000,
|
|
}).catch(() => {});
|
|
this.sessionId = undefined;
|
|
}
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Check if mcplocal is reachable.
|
|
*/
|
|
export async function isMcplocalRunning(): Promise<boolean> {
|
|
try {
|
|
const result = await httpRequest({
|
|
url: `${MCPLOCAL_URL}/health`,
|
|
method: 'GET',
|
|
timeout: 3_000,
|
|
});
|
|
return result.status < 500;
|
|
} catch {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
/**
|
|
* Run an mcpctl CLI command and return stdout.
|
|
*/
|
|
export function mcpctl(args: string): Promise<string> {
|
|
const { execSync } = require('node:child_process') as typeof import('node:child_process');
|
|
try {
|
|
return Promise.resolve(execSync(`mcpctl ${args}`, { encoding: 'utf-8', timeout: 30_000 }).trim());
|
|
} catch (err) {
|
|
const e = err as { stderr?: string; stdout?: string };
|
|
return Promise.reject(new Error(e.stderr ?? e.stdout ?? String(err)));
|
|
}
|
|
}
|