Files

169 lines
11 KiB
Markdown
Raw Permalink Normal View History

2026-02-21 03:10:39 +00:00
# Task ID: 16
**Title:** Implement Instance Lifecycle Management
**Status:** pending
**Dependencies:** 6, 8
**Priority:** medium
**Description:** Create APIs and commands for managing MCP server instance lifecycle: start, stop, restart, status, and health monitoring.
**Details:**
Create instance management:
```typescript
// routes/instances.ts
app.post('/api/instances', async (req) => {
const { profileId } = req.body;
const profile = await prisma.mcpProfile.findUnique({
where: { id: profileId },
include: { server: true }
});
const containerManager = new ContainerManager();
const containerId = await containerManager.startMcpServer(profile.server, profile.config);
const instance = await prisma.mcpInstance.create({
data: {
serverId: profile.serverId,
containerId,
status: 'running',
config: profile.config
}
});
await auditLogger.logServerAction({
userId: req.user.id,
action: 'start',
serverId: profile.server.name,
details: { instanceId: instance.id, containerId }
});
return instance;
});
app.delete('/api/instances/:id', async (req) => {
const instance = await prisma.mcpInstance.findUnique({ where: { id: req.params.id } });
const containerManager = new ContainerManager();
await containerManager.stopMcpServer(instance.containerId);
await prisma.mcpInstance.delete({ where: { id: req.params.id } });
});
app.post('/api/instances/:id/restart', async (req) => {
const instance = await prisma.mcpInstance.findUnique({
where: { id: req.params.id },
include: { server: true }
});
const containerManager = new ContainerManager();
await containerManager.stopMcpServer(instance.containerId);
const newContainerId = await containerManager.startMcpServer(instance.server, instance.config);
return prisma.mcpInstance.update({
where: { id: req.params.id },
data: { containerId: newContainerId, status: 'running' }
});
});
// Health monitoring
app.get('/api/instances/:id/health', async (req) => {
const instance = await prisma.mcpInstance.findUnique({ where: { id: req.params.id } });
const containerManager = new ContainerManager();
const status = await containerManager.getMcpServerStatus(instance.containerId);
const logs = await containerManager.getContainerLogs(instance.containerId, { tail: 50 });
return { status, logs, lastChecked: new Date() };
});
// CLI commands
program
.command('start')
.argument('<profile>', 'Profile name')
.action(async (profile) => {
const instance = await client.post('/api/instances', { profileName: profile });
console.log(`Started instance ${instance.id}`);
});
program
.command('stop')
.argument('<instance-id>', 'Instance ID')
.action(async (id) => {
await client.delete(`/api/instances/${id}`);
console.log(`Stopped instance ${id}`);
});
program
.command('logs')
.argument('<instance-id>', 'Instance ID')
.option('-f, --follow', 'Follow logs')
.action(async (id, options) => {
if (options.follow) {
// Stream logs
} else {
const { logs } = await client.get(`/api/instances/${id}/health`);
console.log(logs);
}
});
```
**Test Strategy:**
Test instance start/stop/restart lifecycle. Test health monitoring updates status correctly. Test logs streaming. Integration test with real Docker containers.
## Subtasks
### 16.1. Write TDD test suites for Instance Lifecycle API endpoints
**Status:** pending
**Dependencies:** None
Create comprehensive Vitest test suites for all instance lifecycle endpoints (POST /api/instances, DELETE /api/instances/:id, POST /api/instances/:id/restart, GET /api/instances/:id/health, GET /api/instances/:id/logs) BEFORE implementation using mocked ContainerManager and Prisma.
**Details:**
Write comprehensive Vitest tests following TDD methodology for all instance lifecycle API endpoints. Tests must cover: (1) POST /api/instances - successful instance creation from profile, invalid profileId handling, ContainerManager.startMcpServer mock expectations, audit logging verification; (2) DELETE /api/instances/:id - successful stop and cleanup, non-existent instance handling, containerId validation to prevent targeting unmanaged containers; (3) POST /api/instances/:id/restart - graceful shutdown with drainTimeout for data pipelines, proper sequencing of stop/start, config preservation; (4) GET /api/instances/:id/health - Prometheus-compatible metrics format, liveness/readiness probe responses, alerting threshold configuration (unhealthy for N minutes), JSON health object structure; (5) GET /api/instances/:id/logs - pagination with cursor, log injection prevention (sanitize ANSI codes and control characters), tail parameter validation. Use msw or vitest-fetch-mock for request mocking. All tests should fail initially (TDD red phase). Include security tests: validate containerId format (UUIDs only), reject path traversal in instance IDs, verify only containers with mcpctl labels can be controlled.
### 16.2. Write TDD test suites for CLI instance management commands
**Status:** pending
**Dependencies:** None
Create Vitest test suites for CLI commands (start, stop, restart, logs, status) BEFORE implementation, testing argument parsing, API client calls, output formatting, and WebSocket/SSE log streaming.
**Details:**
Write comprehensive Vitest tests for all CLI commands following TDD methodology: (1) 'mcpctl start <profile>' - test profile name validation, API call to POST /api/instances, success/error output formatting, instance ID display; (2) 'mcpctl stop <instance-id>' - test instance ID format validation, API call to DELETE /api/instances/:id, graceful shutdown with --drain-timeout flag for data pipeline instances, confirmation prompt (--yes to skip); (3) 'mcpctl restart <instance-id>' - test restart with optional --drain-timeout, API call to POST /api/instances/:id/restart; (4) 'mcpctl logs <instance-id>' - test -f/--follow flag for streaming, --tail N option, --since timestamp option, WebSocket connection for live streaming, graceful disconnect handling; (5) 'mcpctl status <instance-id>' - test health status display, readiness/liveness indicators, uptime calculation, JSON output format. Test network boundary scenarios: WebSocket reconnection on disconnect, SSE fallback when WebSocket unavailable, proxy-friendly streaming options. Include exit code tests for scripting compatibility.
### 16.3. Implement Instance Lifecycle API endpoints with security and audit logging
**Status:** pending
**Dependencies:** 16.1
Implement all instance lifecycle API endpoints (create, stop, restart, health, logs) passing TDD tests from subtask 1, with security validation, graceful shutdown support, and comprehensive audit logging integration.
**Details:**
Implement routes/instances.ts with all lifecycle endpoints: (1) POST /api/instances - validate profileId exists, call ContainerManager.startMcpServer with profile config, create McpInstance record in Prisma, emit audit log with auditLogger.logServerAction({action: 'start', ...}); (2) DELETE /api/instances/:id - validate instance exists and containerId format is UUID, verify container has mcpctl management labels before stopping, call ContainerManager.stopMcpServer with configurable drainTimeout for graceful shutdown of data pipelines, delete McpInstance record, emit audit log; (3) POST /api/instances/:id/restart - implement atomic restart with stop-then-start, preserve config across restart, support drainTimeout query parameter for graceful drain before restart; (4) GET /api/instances/:id/health - call ContainerManager.getMcpServerStatus and getHealthStatus, return structured health object with {status, lastChecked, readiness, liveness, consecutiveFailures, alertThreshold}, format compatible with Prometheus/Grafana alerting; (5) GET /api/instances/:id/logs - call ContainerManager.getContainerLogs with cursor-based pagination, sanitize log output to prevent log injection (strip ANSI escape sequences, null bytes, control characters), support ELK/Loki-compatible structured JSON format. Implement security middleware to validate all containerIds are managed by mcpctl.
### 16.4. Implement CLI commands for instance lifecycle with streaming log support
**Status:** pending
**Dependencies:** 16.2, 16.3
Implement CLI commands (start, stop, restart, logs, status) passing TDD tests from subtask 2, including WebSocket/SSE log streaming that works across network boundaries.
**Details:**
Implement commands/instances.ts with all CLI commands: (1) 'start <profile>' - call API client.post('/api/instances', {profileName: profile}), display instance ID and status, exit code 0 on success; (2) 'stop <instance-id>' - prompt for confirmation unless --yes flag, support --drain-timeout <seconds> for data pipeline graceful shutdown, call client.delete(`/api/instances/${id}`), display stop confirmation; (3) 'restart <instance-id>' - support --drain-timeout flag, call client.post(`/api/instances/${id}/restart`), display new container ID; (4) 'logs <instance-id>' - implement dual transport: WebSocket primary with SSE fallback for proxy-friendly environments, -f/--follow starts WebSocket connection to /api/instances/:id/logs/stream, --tail N parameter (default 50), --since timestamp filter, handle reconnection on disconnect with exponential backoff, gracefully handle Ctrl+C; (5) 'status <instance-id>' - call GET /api/instances/:id/health, display formatted health info with readiness/liveness indicators, support -o json output. Implement WebSocket client that works through corporate proxies (use HTTP upgrade with proper headers). For non-streaming logs, paginate through cursor-based API.
### 16.5. Create integration tests and docker-compose environment for instance lifecycle
**Status:** pending
**Dependencies:** 16.3, 16.4
Build comprehensive integration test suite testing complete instance lifecycle against real Docker containers, including health monitoring with alerting thresholds and log streaming across network boundaries.
**Details:**
Create integration test suite in tests/integration/instance-lifecycle.test.ts: (1) Full lifecycle test - create instance from profile, verify container running with 'docker ps', check health endpoint returns running status, stream logs with follow mode, restart instance (verify old container stopped, new container running), stop with drain timeout, verify container removed; (2) Health monitoring tests - configure alerting threshold (e.g., 3 consecutive failures), simulate unhealthy container, verify health endpoint returns correct consecutiveFailures count, test readiness probe (container ready to serve), test liveness probe (container process alive), verify Prometheus-format metrics exportable at /metrics; (3) Log streaming integration - test WebSocket streaming receives live container output, test SSE fallback when WebSocket unavailable, test log format is ELK/Loki compatible (JSON with timestamp, level, message fields), verify log injection prevention (send malicious log content, verify sanitized output); (4) Data pipeline graceful shutdown - create long-running instance simulating data processing, send stop with drain timeout, verify container receives SIGTERM, verify container has grace period before SIGKILL; (5) Network boundary tests - configure proxy simulation, verify log streaming works through proxy. Update docker-compose.yml to include test-mcp-server with configurable logging behavior.