Instance status now reflects actual container state:
- startOne() sets STARTING (not RUNNING) after container creation
- syncStatus() promotes STARTING→RUNNING when pod is ready
- syncStatus() demotes RUNNING→STARTING if pod restarts (CrashLoop)
- External servers still get RUNNING immediately (no container)
Previously, CrashLooping pods showed as RUNNING in mcpctl get instances.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs fixed:
1. Backup completeness: JSON backup API now includes prompts and
templates. Previously these were silently dropped during
backup/restore, causing data loss on migration.
2. STDIO proxy for docker-image servers: servers with dockerImage
but no packageName/command (like docmost) now use k8s Attach
to connect to the container's PID 1 stdin/stdout instead of
exec. This fixes "has no packageName or command" errors.
Changes:
- backup-service.ts: add BackupPrompt/BackupTemplate types, export them
- restore-service.ts: restore prompts (with project FK) and templates
- mcp-proxy-service.ts: sendViaPersistentAttach for docker-image STDIO
- orchestrator.ts: add attachInteractive to McpOrchestrator interface
- kubernetes-orchestrator.ts: implement attachInteractive via k8s Attach
- k8s-client-official.ts: expose Attach client
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mcpd now runs a periodic reconcileAll() every 30s that:
- Detects crashed/missing containers (syncStatus)
- Cleans up ERROR instances
- Creates replacement pods to match desired replica count
This replaces the old syncStatus-only timer. Servers migrated
from another deployment or recovering from node failures will
automatically get their instances recreated.
6 new tests for reconcileAll covering: missing instances, skip
replicas=0, already-at-count, ERROR cleanup, multi-server,
error isolation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MCPD_NODE_SELECTOR env var support in manifest generator
for mixed-arch clusters (e.g. arm64+amd64)
- Fix backup restore: resolve system user ID instead of
hardcoded 'system' string
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The restore service hardcoded ownerId as the literal string 'system'
instead of looking up the actual system user ID. This caused FK
constraint violations when restoring projects to a fresh database.
Now resolves the system user by email, falling back to the first
available user.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mcpd can now deploy MCP server instances as Kubernetes pods instead of
Docker containers. Set MCPD_ORCHESTRATOR=kubernetes to enable.
- Add @kubernetes/client-node with thin wrapper (context enforcement
via MCPD_K8S_CONTEXT to prevent multi-cluster mishaps)
- Rewrite KubernetesOrchestrator: pod CRUD, pod IP extraction,
exec via SPDY (one-shot + interactive), log streaming
- Manifest generator: stdin:true for STDIO servers, args (not command)
to preserve runner image entrypoint, security hardening
- Orchestrator selection in main.ts via MCPD_ORCHESTRATOR env var
- 25 unit tests for k8s orchestrator, all 624 tests pass
Tested end-to-end on local k3s:
- mcpd deployed via Pulumi, creates pods in mcpctl-servers namespace
- NetworkPolicy verified: only mcpd can reach MCP server pods
- Python runner (uvx) successfully runs aws-documentation-mcp-server
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>