feat: Kubernetes operator for MCP server management #47

Merged
michal merged 7 commits from feat/k8s-operator into main 2026-04-09 22:46:22 +00:00

7 Commits

Author SHA1 Message Date
Michal
016f8abe68 fix: accurate instance status — STARTING until pod is actually running
All checks were successful
CI/CD / typecheck (pull_request) Successful in 52s
CI/CD / lint (pull_request) Successful in 1m53s
CI/CD / test (pull_request) Successful in 1m2s
CI/CD / build (pull_request) Successful in 4m0s
CI/CD / smoke (pull_request) Successful in 8m38s
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
Instance status now reflects actual container state:
- startOne() sets STARTING (not RUNNING) after container creation
- syncStatus() promotes STARTING→RUNNING when pod is ready
- syncStatus() demotes RUNNING→STARTING if pod restarts (CrashLoop)
- External servers still get RUNNING immediately (no container)

Previously, CrashLooping pods showed as RUNNING in mcpctl get instances.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 23:45:10 +01:00
Michal
1bd5087052 fix: add prompts/templates to backup + STDIO attach for docker-image servers
Two bugs fixed:

1. Backup completeness: JSON backup API now includes prompts and
   templates. Previously these were silently dropped during
   backup/restore, causing data loss on migration.

2. STDIO proxy for docker-image servers: servers with dockerImage
   but no packageName/command (like docmost) now use k8s Attach
   to connect to the container's PID 1 stdin/stdout instead of
   exec. This fixes "has no packageName or command" errors.

Changes:
- backup-service.ts: add BackupPrompt/BackupTemplate types, export them
- restore-service.ts: restore prompts (with project FK) and templates
- mcp-proxy-service.ts: sendViaPersistentAttach for docker-image STDIO
- orchestrator.ts: add attachInteractive to McpOrchestrator interface
- kubernetes-orchestrator.ts: implement attachInteractive via k8s Attach
- k8s-client-official.ts: expose Attach client

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 23:37:16 +01:00
Michal
d293df738a feat: automatic reconciliation loop for MCP server instances
mcpd now runs a periodic reconcileAll() every 30s that:
- Detects crashed/missing containers (syncStatus)
- Cleans up ERROR instances
- Creates replacement pods to match desired replica count

This replaces the old syncStatus-only timer. Servers migrated
from another deployment or recovering from node failures will
automatically get their instances recreated.

6 new tests for reconcileAll covering: missing instances, skip
replicas=0, already-at-count, ERROR cleanup, multi-server,
error isolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 19:00:19 +01:00
Michal
14be2fa18e feat: nodeSelector for MCP server pods + restore fix
- Add MCPD_NODE_SELECTOR env var support in manifest generator
  for mixed-arch clusters (e.g. arm64+amd64)
- Fix backup restore: resolve system user ID instead of
  hardcoded 'system' string

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 13:04:34 +01:00
Michal
3663963a32 fix: resolve system user ID in backup restore for projects
The restore service hardcoded ownerId as the literal string 'system'
instead of looking up the actual system user ID. This caused FK
constraint violations when restoring projects to a fresh database.

Now resolves the system user by email, falling back to the first
available user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 02:04:32 +01:00
Michal
5e45960a18 feat: add Kubernetes orchestrator for MCP server pod management
mcpd can now deploy MCP server instances as Kubernetes pods instead of
Docker containers. Set MCPD_ORCHESTRATOR=kubernetes to enable.

- Add @kubernetes/client-node with thin wrapper (context enforcement
  via MCPD_K8S_CONTEXT to prevent multi-cluster mishaps)
- Rewrite KubernetesOrchestrator: pod CRUD, pod IP extraction,
  exec via SPDY (one-shot + interactive), log streaming
- Manifest generator: stdin:true for STDIO servers, args (not command)
  to preserve runner image entrypoint, security hardening
- Orchestrator selection in main.ts via MCPD_ORCHESTRATOR env var
- 25 unit tests for k8s orchestrator, all 624 tests pass

Tested end-to-end on local k3s:
- mcpd deployed via Pulumi, creates pods in mcpctl-servers namespace
- NetworkPolicy verified: only mcpd can reach MCP server pods
- Python runner (uvx) successfully runs aws-documentation-mcp-server

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 01:55:13 +01:00
Michal
f409952b0c chore: add gstack skill routing rules to CLAUDE.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 01:33:56 +01:00