fix: MCP proxy resilience — discovery cache, default liveness probes
Some checks failed
CI/CD / lint (push) Successful in 52s
CI/CD / typecheck (push) Successful in 1m51s
CI/CD / test (push) Successful in 1m1s
CI/CD / smoke (push) Failing after 3m21s
CI/CD / build (push) Successful in 4m9s
CI/CD / publish (push) Has been skipped

Adds a per-server tools/list cache in McpRouter (positive + negative TTL)
so a slow or dead upstream only stalls the first discovery call, not every
subsequent client request. Invalidated on upstream add/remove.

Health probes now apply a default liveness spec (tools/list via the real
production path) to any RUNNING instance without an explicit healthCheck,
so synthetic and real failures converge on the same signal.

Includes supporting updates in mcpd-client, discovery, upstream/mcpd,
seeder, and fulldeploy/release scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Michal
2026-04-17 00:48:57 +01:00
parent c968d76e00
commit 3149ea3ae7
15 changed files with 499 additions and 32 deletions

View File

@@ -174,7 +174,7 @@ const promptRequestColumns: Column<PromptRequestRow>[] = [
const instanceColumns: Column<InstanceRow>[] = [
{ header: 'NAME', key: (r) => r.server?.name ?? '-', width: 20 },
{ header: 'STATUS', key: 'status', width: 10 },
{ header: 'HEALTH', key: (r) => r.healthStatus ?? '-', width: 10 },
{ header: 'HEALTH', key: (r) => r.healthStatus ?? 'unknown', width: 10 },
{ header: 'PORT', key: (r) => r.port != null ? String(r.port) : '-', width: 6 },
{ header: 'CONTAINER', key: (r) => r.containerId ? r.containerId.slice(0, 12) : '-', width: 14 },
{ header: 'ID', key: 'id' },