fix: MCP proxy resilience — timeouts, parallel discovery, error propagation #48

Merged
michal merged 2 commits from feat/k8s-operator into main 2026-04-10 17:29:34 +00:00
Owner

Summary

  • McpdClient timeout: 30s AbortSignal on all fetch calls — prevents indefinite hangs when upstream tool calls are stuck
  • CLI bridge error propagation: JSON-RPC error response written to stdout on failure — MCP client no longer hangs waiting
  • Parallel discovery: discoverTools() and discoverResources() use Promise.allSettled — one broken server can't block the entire project
  • vLLM error cooldown: 60s cooldown after error state — stops retry-on-every-call when vLLM is broken

Root cause

Docmost MCP server couldn't reach its API from k8s pods (NetworkPolicy blocked port 3000, secrets used Tailscale IP). Without timeouts, the entire MCP proxy chain hung indefinitely.

Tests

  • McpdClient: 9 new tests (timeout, error types, withHeaders)
  • Router: parallel discovery test (slow server doesn't block fast ones)
  • vLLM: error cooldown test (fast-fail + retry after cooldown)
  • CLI bridge: updated error response test

1723/1723 tests passing.

## Summary - **McpdClient timeout**: 30s AbortSignal on all fetch calls — prevents indefinite hangs when upstream tool calls are stuck - **CLI bridge error propagation**: JSON-RPC error response written to stdout on failure — MCP client no longer hangs waiting - **Parallel discovery**: `discoverTools()` and `discoverResources()` use `Promise.allSettled` — one broken server can't block the entire project - **vLLM error cooldown**: 60s cooldown after error state — stops retry-on-every-call when vLLM is broken ## Root cause Docmost MCP server couldn't reach its API from k8s pods (NetworkPolicy blocked port 3000, secrets used Tailscale IP). Without timeouts, the entire MCP proxy chain hung indefinitely. ## Tests - McpdClient: 9 new tests (timeout, error types, withHeaders) - Router: parallel discovery test (slow server doesn't block fast ones) - vLLM: error cooldown test (fast-fail + retry after cooldown) - CLI bridge: updated error response test 1723/1723 tests passing.
michal added 2 commits 2026-04-10 17:29:06 +00:00
New smoke test file: backup-and-servers.test.ts
- Backup completeness: prompts, templates, runtime, command, containerPort, replicas
- SSE server proxy (my-home-assistant): 84 tools
- Docker-image STDIO proxy (docmost): 11 tools
- Package STDIO proxy (aws-docs): 4 tools
- Instance status accuracy: RUNNING instances must respond to proxy

These tests would have caught every migration bug:
- Missing runtime (python servers on node runner)
- Missing command (HA SSE in STDIO mode)
- Missing containerPort (SSE on wrong port)
- Backup data loss (prompts, templates, server fields)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fix: MCP proxy resilience — timeouts, parallel discovery, error propagation
All checks were successful
CI/CD / typecheck (pull_request) Successful in 49s
CI/CD / lint (pull_request) Successful in 1m49s
CI/CD / test (pull_request) Successful in 1m4s
CI/CD / build (pull_request) Successful in 1m49s
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
CI/CD / smoke (pull_request) Successful in 10m3s
857f8c72ae
- McpdClient: add 30s AbortSignal timeout to all fetch calls (was infinite)
- CLI bridge: return JSON-RPC error on stdout when HTTP fails (was silent)
- Router: parallel tool/resource discovery via Promise.allSettled (was sequential — one slow server blocked all)
- vllm-managed: 60s error cooldown prevents retry-on-every-call when vLLM is broken
- Tests: McpdClient timeout suite (9), parallel discovery, vllm cooldown, bridge error response

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
michal merged commit c62a350da1 into main 2026-04-10 17:29:34 +00:00
michal deleted branch feat/k8s-operator 2026-04-10 17:29:37 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: michal/mcpctl#48