Files
lab/.taskmaster/tasks/tasks.json
Michal 46b017d77e
Some checks failed
CI/CD / lint (pull_request) Failing after 13s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 36s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
feat: install logging, error trapping, PXE/ISO integration tests
Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.

Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts

Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)

dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility

Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)

201 unit tests passing (11 new).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:26:33 +00:00

180 lines
41 KiB
JSON

{
"master": {
"tasks": [
{
"id": 72,
"title": "Expand Prisma Schema with Resource Relationships",
"description": "Add Network, ServerNic, ServerDisk, and ClusterMember models to the Prisma schema. Add bastionId foreign key to Server model to track which bastion owns each server.",
"details": "Edit `bastion/src/labd/prisma/schema.prisma` to add:\n\n1. **Server model changes**:\n - Add `bastionId String?` with relation to Bastion\n - Add `hardwareInfo Json?` for storing raw HardwareInfo\n - Add `os String?` for installed OS\n\n2. **Network model**:\n```prisma\nmodel Network {\n id String @id @default(uuid())\n name String @unique\n cidr String\n vlan Int?\n gateway String?\n domain String?\n dhcpEnabled Boolean @default(false)\n createdAt DateTime @default(now())\n updatedAt DateTime @updatedAt\n \n nics ServerNic[]\n}\n```\n\n3. **ServerNic model**:\n```prisma\nmodel ServerNic {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n networkId String?\n network Network? @relation(fields: [networkId], references: [id])\n mac String\n ip String?\n name String\n state String @default(\"DOWN\")\n \n @@unique([serverId, mac])\n @@index([networkId])\n}\n```\n\n4. **ServerDisk model**:\n```prisma\nmodel ServerDisk {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n name String\n sizeGb Float\n model String?\n \n @@unique([serverId, name])\n}\n```\n\n5. **ClusterMember model**:\n```prisma\nmodel ClusterMember {\n id String @id @default(uuid())\n clusterId String\n cluster Cluster @relation(fields: [clusterId], references: [id], onDelete: Cascade)\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n role String @default(\"worker\") // control-plane, worker\n joinedAt DateTime @default(now())\n \n @@unique([clusterId, serverId])\n @@index([clusterId])\n @@index([serverId])\n}\n```\n\n6. Update Server model with relations to nics, disks, clusterMemberships, and bastion.\n\nRun `pnpm prisma generate` and `pnpm prisma migrate dev --name add-resource-models`.",
"testStrategy": "1. Run `pnpm prisma validate` to verify schema syntax\n2. Run `pnpm prisma generate` to confirm client generation\n3. Create migration and verify it applies cleanly to local CockroachDB\n4. Write unit tests that create/read/delete each new model\n5. Verify cascade deletes work (deleting Server removes its NICs and Disks)",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 73,
"title": "Implement State Persistence Service in labd",
"description": "Create a new service in labd that persists bastion state syncs to the Server table in CockroachDB. When bastion-state-sync messages arrive, upsert machines into Server with their hardware info, status, and ownership.",
"details": "Create `bastion/src/labd/src/services/state-persistence.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\nimport type { BastionState, HardwareInfo, InstallConfig, InstalledInfo } from \"@lab/shared\";\nimport { logger } from \"./logger.js\";\n\nexport class StatePersistence {\n constructor(private readonly db: PrismaClient) {}\n\n async syncBastionState(bastionId: string, state: BastionState): Promise<void> {\n // Process discovered machines\n for (const [mac, hw] of Object.entries(state.discovered)) {\n await this.upsertDiscoveredServer(bastionId, mac, hw);\n }\n \n // Process queued machines (update status to provisioning)\n for (const [mac, cfg] of Object.entries(state.install_queue)) {\n await this.upsertQueuedServer(bastionId, mac, cfg);\n }\n \n // Process installed machines\n for (const [mac, info] of Object.entries(state.installed)) {\n await this.upsertInstalledServer(bastionId, mac, info);\n }\n }\n\n private async upsertDiscoveredServer(bastionId: string, mac: string, hw: HardwareInfo): Promise<void> {\n const normalized = mac.toLowerCase();\n \n await this.db.server.upsert({\n where: { mac: normalized },\n create: {\n hostname: `unknown-${normalized.replace(/:/g, \"\").slice(-6)}`,\n mac: normalized,\n bastionId,\n status: \"discovered\",\n hardwareInfo: hw as any,\n labels: {\n arch: hw.arch,\n cpu_model: hw.cpu_model,\n cpu_cores: hw.cpu_cores,\n memory_gb: hw.memory_gb,\n },\n },\n update: {\n bastionId,\n status: \"discovered\", // only if not already provisioning/installed\n hardwareInfo: hw as any,\n },\n });\n \n // Sync NICs and Disks\n await this.syncServerHardware(normalized, hw);\n }\n \n private async syncServerHardware(mac: string, hw: HardwareInfo): Promise<void> {\n const server = await this.db.server.findUnique({ where: { mac } });\n if (!server) return;\n \n // Upsert NICs\n for (const nic of hw.nics) {\n await this.db.serverNic.upsert({\n where: { serverId_mac: { serverId: server.id, mac: nic.mac.toLowerCase() } },\n create: { serverId: server.id, mac: nic.mac.toLowerCase(), name: nic.name, state: nic.state },\n update: { name: nic.name, state: nic.state },\n });\n }\n \n // Upsert Disks\n for (const disk of hw.disks) {\n await this.db.serverDisk.upsert({\n where: { serverId_name: { serverId: server.id, name: disk.name } },\n create: { serverId: server.id, name: disk.name, sizeGb: disk.size_gb, model: disk.model },\n update: { sizeGb: disk.size_gb, model: disk.model },\n });\n }\n }\n \n // Similar methods for upsertQueuedServer and upsertInstalledServer...\n}\n```\n\nIntegrate into `server.ts` WebSocket handler by calling `statePersistence.syncBastionState()` when `bastion-state-sync` messages arrive.",
"testStrategy": "1. Unit test StatePersistence with mocked PrismaClient\n2. Integration test: simulate bastion-state-sync message, verify Server rows created\n3. Test idempotency: send same state twice, verify no duplicates\n4. Test status transitions: discovered -> provisioning -> installed\n5. Verify hardware info (NICs, Disks) is correctly persisted",
"priority": "high",
"dependencies": [
72
],
"status": "pending",
"subtasks": []
},
{
"id": 74,
"title": "Add State Loading from labd on Bastion Startup",
"description": "Modify bastion startup to request its persisted state from labd before using the local JSON cache. This ensures bastions restore their state after pod restarts.",
"details": "1. Add new labd API endpoint `GET /api/bastions/:id/state` that returns the aggregated state for a specific bastion from the Server table:\n\n```typescript\n// bastion/src/labd/src/routes/bastions.ts\napp.get<{ Params: { id: string } }>(\"/api/bastions/:id/state\", async (request, reply) => {\n const { id } = request.params;\n \n const servers = await db.server.findMany({\n where: { bastionId: id },\n include: { nics: true, disks: true },\n });\n \n // Transform back to BastionState format\n const state: BastionState = { discovered: {}, install_queue: {}, installed: {} };\n for (const server of servers) {\n const mac = server.mac;\n if (!mac) continue;\n \n switch (server.status) {\n case \"discovered\":\n state.discovered[mac] = transformToHardwareInfo(server);\n break;\n case \"provisioning\":\n state.install_queue[mac] = transformToInstallConfig(server);\n break;\n case \"installed\":\n state.installed[mac] = transformToInstalledInfo(server);\n break;\n }\n }\n \n return reply.send(state);\n});\n```\n\n2. Modify `BastionConnection.connect()` in `labd-connection.ts` to fetch state after enrollment:\n\n```typescript\nprivate async loadRemoteState(): Promise<BastionState | null> {\n if (!this.bastionId || !this.config.labdUrl) return null;\n try {\n const resp = await fetch(`${this.config.labdUrl}/api/bastions/${this.bastionId}/state`);\n if (resp.ok) return await resp.json();\n } catch { /* fall back to local */ }\n return null;\n}\n```\n\n3. In bastion `main.ts`, after establishing labd connection, merge remote state with local state (remote takes precedence for installed machines, local wins for in-progress installs).",
"testStrategy": "1. Integration test: start bastion, let it persist state, restart bastion, verify state restored\n2. Test merge logic: local has in-progress install, remote has discovered - verify install preserved\n3. Test offline mode: labd unavailable, bastion falls back to local JSON\n4. Test fresh start: no local state, no remote state - bastion starts with empty state",
"priority": "high",
"dependencies": [
73
],
"status": "pending",
"subtasks": []
},
{
"id": 75,
"title": "Fix Bastion --dir Environment Variable Default",
"description": "Fix the bug where CLI's --dir default overrides the BASTION_DIR environment variable. The CLI option should use the env var as its default.",
"details": "Edit `bastion/src/cli/src/commands/serve.ts`:\n\n```typescript\n// Before (line 14):\n.option(\"--dir <dir>\", \"Bastion data directory\", \"/tmp/lab-bastion\")\n\n// After:\n.option(\n \"--dir <dir>\",\n \"Bastion data directory\",\n process.env[\"BASTION_DIR\"] ?? \"/tmp/lab-bastion\"\n)\n```\n\nThis ensures:\n1. If `BASTION_DIR` env var is set (e.g., in k8s deployment), it's used as default\n2. Explicit `--dir` flag still overrides both\n3. Falls back to `/tmp/lab-bastion` if neither is set\n\nAlso update the k8s deployment manifest `bastion/deploy/k3s/deployment.yaml` to ensure `BASTION_DIR=/data` is properly set.",
"testStrategy": "1. Unit test: verify option default reads from process.env\n2. Integration test: set BASTION_DIR, run labctl without --dir, verify correct dir used\n3. Integration test: set BASTION_DIR, run labctl with --dir /custom, verify /custom used\n4. Test no env var: verify default /tmp/lab-bastion used",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 76,
"title": "Create Resource Type Registry with Aliases",
"description": "Create a centralized resource type registry that maps resource names, plurals, and short aliases to canonical types. This enables kubectl-style resource resolution.",
"details": "Create `bastion/src/cli/src/utils/resources.ts`:\n\n```typescript\nexport interface ResourceDefinition {\n kind: string; // Canonical type: \"Server\", \"Cluster\", etc.\n singular: string; // \"server\"\n plural: string; // \"servers\"\n aliases: string[]; // [\"srv\"]\n apiPath: string; // \"/api/servers\"\n columns: TableColumn[]; // Default columns for 'get' output\n wideColumns?: TableColumn[]; // Extra columns for -o wide\n}\n\nconst RESOURCE_DEFINITIONS: ResourceDefinition[] = [\n {\n kind: \"Server\",\n singular: \"server\",\n plural: \"servers\",\n aliases: [\"srv\"],\n apiPath: \"/api/servers\",\n columns: serverColumns,\n wideColumns: serverWideColumns,\n },\n {\n kind: \"Cluster\",\n singular: \"cluster\",\n plural: \"clusters\",\n aliases: [],\n apiPath: \"/api/clusters\",\n columns: clusterColumns,\n },\n {\n kind: \"Network\",\n singular: \"network\",\n plural: \"networks\",\n aliases: [\"net\"],\n apiPath: \"/api/networks\",\n columns: networkColumns,\n },\n // ... bastion, role, user, token, audit\n];\n\nconst aliasMap = new Map<string, ResourceDefinition>();\nfor (const def of RESOURCE_DEFINITIONS) {\n aliasMap.set(def.singular, def);\n aliasMap.set(def.plural, def);\n for (const alias of def.aliases) {\n aliasMap.set(alias, def);\n }\n}\n\nexport function resolveResourceType(input: string): ResourceDefinition {\n const normalized = input.toLowerCase();\n const def = aliasMap.get(normalized);\n if (!def) {\n const valid = RESOURCE_DEFINITIONS.map(d => d.plural).join(\", \");\n throw new Error(`Unknown resource type \"${input}\". Valid types: ${valid}`);\n }\n return def;\n}\n\nexport function resolveResourceIdentifier(input: string): {\n type: ResourceDefinition;\n name?: string;\n} {\n // Handle \"server/labmaster\" or just \"servers\"\n const parts = input.split(\"/\");\n const type = resolveResourceType(parts[0]);\n const name = parts.length > 1 ? parts.slice(1).join(\"/\") : undefined;\n return { type, name };\n}\n```\n\nUpdate `bastion/src/cli/src/utils/resource.ts` to use the new registry.",
"testStrategy": "1. Unit test resolveResourceType with all aliases: server, servers, srv -> Server\n2. Test unknown resource type throws descriptive error\n3. Test case insensitivity: SERVER, Server, server all resolve correctly\n4. Test resolveResourceIdentifier parses \"server/labmaster\" correctly",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 77,
"title": "Implement 'labctl get' Command",
"description": "Create the core 'labctl get <resource> [name]' command that lists resources with filtering and output format support. This is the foundation of the kubectl-style CLI.",
"details": "Create `bastion/src/cli/src/commands/get.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType, type ResourceDefinition } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\nimport { formatOutput, type TableColumn } from \"../utils/table.js\";\n\nexport function registerGetCommand(program: Command): void {\n program\n .command(\"get <resource> [name]\")\n .description(\"List resources or get a specific resource by name\")\n .option(\"--status <status>\", \"Filter by status\")\n .option(\"--role <role>\", \"Filter by role (servers only)\")\n .option(\"--cloud <cloud>\", \"Filter by cloud\")\n .option(\"--env <environment>\", \"Filter by environment\")\n .option(\"-l, --label <label>\", \"Filter by label (key=value)\")\n .option(\"-A, --all-namespaces\", \"List across all clouds/environments\")\n .action(async (resource: string, name: string | undefined, opts) => {\n const config = program.opts()[\"_config\"];\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n let data: unknown[];\n \n if (name) {\n // Get specific resource - could be name, ID, or MAC\n const item = await client.getResource(resourceDef, name);\n data = item ? [item] : [];\n } else {\n // List with filters\n data = await client.listResources(resourceDef, {\n status: opts.status,\n role: opts.role,\n cloud: opts.allNamespaces ? undefined : (opts.cloud ?? config.defaultCloud),\n environment: opts.allNamespaces ? undefined : (opts.env ?? config.defaultEnvironment),\n label: opts.label,\n });\n }\n \n if (data.length === 0) {\n console.log(`No ${resourceDef.plural} found.`);\n return;\n }\n \n const columns = config.outputFormat === \"wide\" && resourceDef.wideColumns\n ? [...resourceDef.columns, ...resourceDef.wideColumns]\n : resourceDef.columns;\n \n formatOutput(data, config.outputFormat, columns);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nAdd to `index.ts`: `registerGetCommand(program);`\n\nExtend LabdClient with generic resource methods.",
"testStrategy": "1. Integration test: `labctl get servers` returns list from labd\n2. Test filtering: `labctl get servers --status discovered` only shows discovered\n3. Test name lookup: `labctl get server labmaster` returns single server\n4. Test MAC lookup: `labctl get server 38:05:25:33:e2:e4` resolves by MAC\n5. Test output formats: -o json, -o yaml, -o wide produce correct output\n6. Test unknown resource: `labctl get foo` shows helpful error",
"priority": "high",
"dependencies": [
76
],
"status": "pending",
"subtasks": []
},
{
"id": 78,
"title": "Implement 'labctl describe' Command",
"description": "Create the 'labctl describe <resource> <name>' command that shows detailed information about a resource including relationships, hardware info, and history.",
"details": "Create `bastion/src/cli/src/commands/describe.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nconst BOLD = \"\\x1b[1m\";\nconst DIM = \"\\x1b[2m\";\nconst RESET = \"\\x1b[0m\";\n\ninterface DescribeSection {\n title: string;\n fields: Array<[string, string | undefined]>;\n}\n\nfunction printDescribe(name: string, sections: DescribeSection[]): void {\n console.log(`${BOLD}Name:${RESET} ${name}`);\n for (const section of sections) {\n console.log(`\\n${BOLD}${section.title}:${RESET}`);\n for (const [key, value] of section.fields) {\n if (value !== undefined) {\n console.log(` ${DIM}${key}:${RESET} ${value}`);\n }\n }\n }\n}\n\nexport function registerDescribeCommand(program: Command): void {\n program\n .command(\"describe <resource> <name>\")\n .description(\"Show detailed information about a resource\")\n .action(async (resource: string, name: string) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n const item = await client.describeResource(resourceDef, name);\n if (!item) {\n console.error(`${resourceDef.singular} \"${name}\" not found.`);\n process.exit(1);\n }\n \n // Resource-specific formatting\n switch (resourceDef.kind) {\n case \"Server\":\n printServerDescription(item);\n break;\n case \"Cluster\":\n printClusterDescription(item);\n break;\n default:\n console.log(JSON.stringify(item, null, 2));\n }\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n\nfunction printServerDescription(server: any): void {\n const sections: DescribeSection[] = [\n {\n title: \"Metadata\",\n fields: [\n [\"ID\", server.id],\n [\"Cloud\", server.cloud],\n [\"Environment\", server.environment],\n [\"Role\", server.role],\n [\"Status\", server.status],\n [\"Created\", server.createdAt],\n [\"Last Seen\", server.lastHeartbeat],\n ],\n },\n {\n title: \"Hardware\",\n fields: [\n [\"MAC\", server.mac],\n [\"IP\", server.ip],\n [\"Architecture\", server.hardwareInfo?.arch],\n [\"CPU\", server.hardwareInfo?.cpu_model],\n [\"Cores\", String(server.hardwareInfo?.cpu_cores)],\n [\"Memory\", `${server.hardwareInfo?.memory_gb}GB`],\n [\"Product\", server.hardwareInfo?.product],\n ],\n },\n ];\n \n if (server.nics?.length > 0) {\n sections.push({\n title: \"Network Interfaces\",\n fields: server.nics.map((n: any) => [n.name, `${n.mac} ${n.ip ?? \"\"} (${n.state})`]),\n });\n }\n \n if (server.disks?.length > 0) {\n sections.push({\n title: \"Disks\",\n fields: server.disks.map((d: any) => [d.name, `${d.sizeGb}GB ${d.model ?? \"\"}`]),\n });\n }\n \n if (server.clusterMemberships?.length > 0) {\n sections.push({\n title: \"Cluster Membership\",\n fields: server.clusterMemberships.map((m: any) => [m.cluster.name, m.role]),\n });\n }\n \n printDescribe(server.hostname, sections);\n}\n```",
"testStrategy": "1. Integration test: `labctl describe server labmaster` shows full details\n2. Test hardware info display: CPU, memory, disks, NICs all shown\n3. Test cluster membership: server in cluster shows membership section\n4. Test not found: `labctl describe server nonexistent` shows helpful error\n5. Test different resource types: describe cluster, network, bastion",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 79,
"title": "Implement 'labctl create/delete' Commands",
"description": "Create the 'labctl create <resource>' and 'labctl delete <resource> <name>' commands for creating and removing resources like networks, clusters, and tokens.",
"details": "Create `bastion/src/cli/src/commands/create.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nexport function registerCreateCommand(program: Command): void {\n const create = program\n .command(\"create <resource>\")\n .description(\"Create a resource\");\n \n // labctl create network --name lab --cidr 192.168.8.0/24\n create\n .command(\"network\")\n .description(\"Create a network\")\n .requiredOption(\"--name <name>\", \"Network name\")\n .requiredOption(\"--cidr <cidr>\", \"Network CIDR (e.g., 192.168.8.0/24)\")\n .option(\"--gateway <gateway>\", \"Gateway IP\")\n .option(\"--vlan <vlan>\", \"VLAN ID\", parseInt)\n .option(\"--domain <domain>\", \"DNS domain\")\n .option(\"--dhcp\", \"Enable DHCP\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const network = await client.createNetwork({\n name: opts.name,\n cidr: opts.cidr,\n gateway: opts.gateway,\n vlan: opts.vlan,\n domain: opts.domain,\n dhcpEnabled: opts.dhcp ?? false,\n });\n console.log(`network/${network.name} created`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n \n // labctl create token --label \"worker enrollment\" --type reusable\n create\n .command(\"token\")\n .description(\"Create a join token\")\n .option(\"--label <label>\", \"Token label/description\")\n .option(\"--type <type>\", \"Token type: one-time or reusable\", \"one-time\")\n .option(\"--expires <duration>\", \"Expiration (e.g., 24h, 7d)\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const token = await client.createToken(opts);\n console.log(`Token created: ${token.token}`);\n if (opts.label) console.log(`Label: ${opts.label}`);\n if (token.expiresAt) console.log(`Expires: ${token.expiresAt}`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nCreate `bastion/src/cli/src/commands/delete.ts`:\n\n```typescript\nexport function registerDeleteCommand(program: Command): void {\n program\n .command(\"delete <resource> <name>\")\n .description(\"Delete a resource\")\n .option(\"--force\", \"Skip confirmation\")\n .action(async (resource: string, name: string, opts) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n if (!opts.force) {\n const { confirm } = await import(\"../utils/prompts.js\");\n const yes = await confirm(`Delete ${resourceDef.singular} \"${name}\"?`);\n if (!yes) {\n console.log(\"Cancelled.\");\n return;\n }\n }\n \n try {\n await client.deleteResource(resourceDef, name);\n console.log(`${resourceDef.singular}/${name} deleted`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```",
"testStrategy": "1. Integration test: `labctl create network` creates network in DB\n2. Test validation: missing required flags shows helpful error\n3. Test token creation: token returned is valid UUID, stored in DB\n4. Test delete with confirmation: prompts user, respects --force\n5. Test delete cascade: deleting server removes NICs, disks\n6. Test delete protection: cannot delete bastion with connected servers",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 80,
"title": "Refactor Provision Commands to kubectl-style",
"description": "Refactor existing provision commands to use kubectl-style syntax: 'labctl provision <server>' instead of 'labctl provision install <mac>'.",
"details": "The new command structure should be:\n- `labctl provision <server> --os fedora-43 --role worker` (queue install)\n- `labctl reprovision <server>` (reinstall)\n- `labctl forget <server>` (remove from tracking)\n\nModify `bastion/src/cli/src/commands/install.ts` → rename to `provision.ts`:\n\n```typescript\nexport function registerProvisionCommand(program: Command): void {\n program\n .command(\"provision <server>\")\n .description(\"Queue a server for OS installation\")\n .requiredOption(\"--os <os>\", \"Operating system\", \"fedora-43\")\n .requiredOption(\"--role <role>\", \"Server role\", \"worker\")\n .option(\"--disk <disk>\", \"Target disk (auto-detected if not specified)\")\n .option(\"--hostname <hostname>\", \"Override hostname\")\n .action(async (server: string, opts) => {\n const client = getLabdClient();\n \n // Resolve server: could be hostname, MAC, or ID\n const resolved = await client.resolveServer(server);\n if (!resolved) {\n console.error(`Server \"${server}\" not found.`);\n console.error(\"Tip: Use 'labctl get servers' to see available servers.\");\n process.exit(1);\n }\n \n if (resolved.status === \"installed\") {\n console.error(`Server \"${resolved.hostname}\" is already installed.`);\n console.error(\"Tip: Use 'labctl reprovision' to reinstall.\");\n process.exit(1);\n }\n \n try {\n await client.provisionServer(resolved.mac, {\n hostname: opts.hostname ?? resolved.hostname,\n os: opts.os,\n role: opts.role,\n disk: opts.disk,\n });\n console.log(`Server ${resolved.hostname} queued for ${opts.os} installation as ${opts.role}.`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nSimilarly update reprovision.ts and forget.ts to accept server name/MAC/ID.\n\nUpdate index.ts to register commands at top level instead of under 'provision' subcommand.",
"testStrategy": "1. Test server resolution: provision by hostname, MAC, or UUID all work\n2. Test already installed: provisioning installed server shows reprovision hint\n3. Test unknown server: helpful error message with tip\n4. Test reprovision: reinstalls installed server\n5. Test forget: removes server from all state categories\n6. Backward compat: verify 'labctl provision list' still works (deprecation warning)",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 81,
"title": "Implement Server and Resource API Endpoints in labd",
"description": "Add REST API endpoints in labd for full resource CRUD operations: networks, clusters, tokens. Extend servers endpoint with filters and relationship includes.",
"details": "Create/extend labd route files:\n\n1. **Extend servers.ts**:\n```typescript\n// GET /api/servers - with extended filters and includes\napp.get(\"/api/servers\", async (request, reply) => {\n const { status, role, cloud, environment, label, include } = request.query;\n \n const where = {};\n if (status) where.status = status;\n if (role) where.role = role;\n if (cloud) where.cloud = cloud;\n if (environment) where.environment = environment;\n if (label) where.labels = { path: [labelKey], equals: labelValue };\n \n const servers = await db.server.findMany({\n where,\n include: {\n nics: include?.includes(\"nics\"),\n disks: include?.includes(\"disks\"),\n clusterMemberships: include?.includes(\"clusters\") ? { include: { cluster: true } } : false,\n bastion: include?.includes(\"bastion\"),\n },\n });\n return servers;\n});\n\n// GET /api/servers/:id - by ID, hostname, or MAC\napp.get(\"/api/servers/:identifier\", async (request, reply) => {\n const { identifier } = request.params;\n \n // Try UUID first\n let server = await db.server.findUnique({ where: { id: identifier }, include: fullInclude });\n // Try hostname\n if (!server) server = await db.server.findUnique({ where: { hostname: identifier }, include: fullInclude });\n // Try MAC\n if (!server) server = await db.server.findUnique({ where: { mac: identifier.toLowerCase() }, include: fullInclude });\n \n if (!server) return reply.code(404).send({ error: \"Server not found\" });\n return server;\n});\n```\n\n2. **Create networks.ts**:\n```typescript\n// GET /api/networks, POST /api/networks, DELETE /api/networks/:id\nexport function registerNetworkRoutes(app: FastifyInstance, db: DbClient): void {\n app.get(\"/api/networks\", async () => db.network.findMany());\n \n app.post(\"/api/networks\", async (request, reply) => {\n const { name, cidr, gateway, vlan, domain, dhcpEnabled } = request.body;\n // Validate CIDR format\n const network = await db.network.create({ data: { name, cidr, gateway, vlan, domain, dhcpEnabled } });\n return reply.code(201).send(network);\n });\n \n app.delete(\"/api/networks/:id\", async (request, reply) => {\n await db.network.delete({ where: { id: request.params.id } });\n return reply.code(204).send();\n });\n}\n```\n\n3. **Create clusters.ts**:\n```typescript\n// Similar CRUD for clusters with member management\napp.get(\"/api/clusters/:id/members\", ...);\napp.post(\"/api/clusters/:id/members\", ...);\napp.delete(\"/api/clusters/:id/members/:serverId\", ...);\n```",
"testStrategy": "1. Integration test all CRUD endpoints with HTTP client\n2. Test server resolution: by id, hostname, and MAC all return same server\n3. Test include parameter: nics, disks, clusters included when requested\n4. Test validation: invalid CIDR rejected, duplicate names rejected\n5. Test cascade: delete network with NICs fails or cascades appropriately",
"priority": "medium",
"dependencies": [
72,
73
],
"status": "pending",
"subtasks": []
},
{
"id": 82,
"title": "Implement RBAC Permission Checks in CLI",
"description": "Wire RBAC permission checks into CLI commands. Check user permissions before executing operations using the existing Permission model.",
"details": "1. Create `bastion/src/cli/src/middleware/rbac.ts`:\n\n```typescript\nimport { getLabdClient } from \"../api/config.js\";\n\nexport interface PermissionContext {\n action: string; // read, exec, apply, destroy, manage, admin\n cloud?: string;\n environment?: string;\n server?: string;\n}\n\nexport async function checkPermission(ctx: PermissionContext): Promise<boolean> {\n const client = getLabdClient();\n try {\n const result = await client.checkPermission(ctx);\n return result.allowed;\n } catch {\n // If can't reach labd, fail open for local operations\n return true;\n }\n}\n\nexport async function requirePermission(ctx: PermissionContext): Promise<void> {\n const allowed = await checkPermission(ctx);\n if (!allowed) {\n throw new Error(\n `Permission denied: ${ctx.action} on ${ctx.server ?? \"*\"}@${ctx.cloud ?? \"*\"}/${ctx.environment ?? \"*\"}`\n );\n }\n}\n```\n\n2. Add labd endpoint `POST /api/auth/check-permission`:\n```typescript\napp.post(\"/api/auth/check-permission\", async (request, reply) => {\n const user = await authenticateRequest(request); // from cert or token\n const { action, cloud, environment, server } = request.body;\n \n const permissions = await db.permission.findMany({\n where: {\n role: { userBindings: { some: { userId: user.id } } },\n },\n });\n \n const allowed = permissions.some(p => \n matchesPattern(p.action, action) &&\n matchesPattern(p.cloud, cloud ?? \"*\") &&\n matchesPattern(p.environment, environment ?? \"*\") &&\n matchesPattern(p.server, server ?? \"*\")\n );\n \n return { allowed };\n});\n```\n\n3. Integrate into commands:\n```typescript\n// In provision command\nawait requirePermission({ action: \"apply\", cloud, environment, server: resolved.hostname });\n\n// In delete command\nawait requirePermission({ action: \"destroy\", cloud, environment, server: name });\n\n// In get command (filter results)\nconst servers = await client.listServers(filters);\nconst visible = await filterByPermission(servers, \"read\");\n```",
"testStrategy": "1. Unit test permission matching logic with wildcards\n2. Test admin role: has access to all resources\n3. Test operator role: can read/exec but not destroy\n4. Test viewer role: can only read, provision denied\n5. Test scope matching: permission for cloud=aws doesn't grant access to cloud=baremetal\n6. Test denied action is audit-logged",
"priority": "medium",
"dependencies": [
77,
81
],
"status": "pending",
"subtasks": []
},
{
"id": 83,
"title": "Implement Audit Logging for Resource Operations",
"description": "Log all resource mutations to the AuditLog table. Include user, action, resource type/name, result, and source IP.",
"details": "1. Create `bastion/src/labd/src/services/audit.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\n\nexport interface AuditEntry {\n userId?: string;\n serverId?: string;\n sessionId?: string;\n action: string; // create, update, delete, provision, exec, rbac-denied\n resourceType: string; // server, cluster, network, token, etc.\n resourceName: string;\n args?: string; // sanitized args (no secrets)\n result: \"success\" | \"denied\" | \"error\";\n durationMs?: number;\n sourceIp?: string;\n}\n\nexport class AuditService {\n constructor(private readonly db: PrismaClient) {}\n \n async log(entry: AuditEntry): Promise<void> {\n await this.db.auditLog.create({\n data: {\n userId: entry.userId,\n serverId: entry.serverId,\n sessionId: entry.sessionId,\n action: entry.action,\n resourceType: entry.resourceType,\n resourceName: entry.resourceName,\n args: entry.args,\n result: entry.result,\n durationMs: entry.durationMs,\n sourceIp: entry.sourceIp,\n },\n });\n }\n \n async query(filters: {\n userId?: string;\n action?: string;\n resourceType?: string;\n since?: Date;\n limit?: number;\n }): Promise<AuditEntry[]> {\n return this.db.auditLog.findMany({\n where: {\n userId: filters.userId,\n action: filters.action,\n resourceType: filters.resourceType,\n timestamp: filters.since ? { gte: filters.since } : undefined,\n },\n orderBy: { timestamp: \"desc\" },\n take: filters.limit ?? 100,\n });\n }\n}\n```\n\n2. Add Fastify hook to wrap route handlers:\n```typescript\napp.addHook(\"onResponse\", async (request, reply) => {\n // Log mutations (POST, PUT, DELETE)\n if ([\"POST\", \"PUT\", \"DELETE\"].includes(request.method)) {\n const path = request.url;\n const resourceMatch = path.match(/\\/api\\/(\\w+)(?:\\/([^/]+))?/);\n if (resourceMatch) {\n await auditService.log({\n action: methodToAction(request.method),\n resourceType: resourceMatch[1],\n resourceName: resourceMatch[2] ?? \"\",\n result: reply.statusCode < 400 ? \"success\" : \"error\",\n sourceIp: request.ip,\n });\n }\n }\n});\n```\n\n3. Add `labctl get audit` command to view audit logs.",
"testStrategy": "1. Integration test: create network, verify audit log entry created\n2. Test RBAC denial is logged with result=denied\n3. Test sensitive data sanitization: tokens/passwords not in args\n4. Test query filters: by user, action, resourceType, time range\n5. Test `labctl get audit` displays recent entries correctly",
"priority": "medium",
"dependencies": [
81,
82
],
"status": "pending",
"subtasks": []
},
{
"id": 84,
"title": "Update CLI Entry Point and Help Text",
"description": "Update the CLI entry point to register all new commands and update help text to reflect the kubectl-style interface. Add deprecation warnings for old command structure.",
"details": "Update `bastion/src/cli/src/index.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { APP_VERSION } from \"@lab/shared\";\nimport { loadConfig } from \"./config/index.js\";\n\n// New kubectl-style commands\nimport { registerGetCommand } from \"./commands/get.js\";\nimport { registerDescribeCommand } from \"./commands/describe.js\";\nimport { registerCreateCommand } from \"./commands/create.js\";\nimport { registerDeleteCommand } from \"./commands/delete.js\";\nimport { registerApplyCommand } from \"./commands/apply.js\";\nimport { registerEditCommand } from \"./commands/edit.js\";\n\n// Action commands\nimport { registerProvisionCommand } from \"./commands/provision.js\";\nimport { registerReprovisionCommand } from \"./commands/reprovision.js\";\nimport { registerForgetCommand } from \"./commands/forget.js\";\n\n// Bastion management\nimport { registerBastionCommand } from \"./commands/bastion.js\"; // start/stop/status\n\n// App management (unchanged)\nimport { registerAppCommand } from \"./commands/app.js\";\n\n// Utility\nimport { registerConfigCommand } from \"./commands/config.js\";\nimport { registerLoginCommand } from \"./commands/login.js\";\nimport { registerDoctorCommand } from \"./commands/doctor.js\";\n\nexport function createProgram(): Command {\n const program = new Command();\n \n program\n .name(\"labctl\")\n .description(\"Lab infrastructure management CLI\")\n .version(APP_VERSION);\n \n // Global options\n program\n .option(\"-o, --output <format>\", \"output format (table, json, yaml, wide)\", \"table\")\n .option(\"--server <url>\", \"override labd server URL\")\n .option(\"--env <name>\", \"override default environment\")\n .option(\"--cloud <name>\", \"override default cloud\")\n .option(\"--debug\", \"enable debug output\")\n .option(\"--no-color\", \"disable colored output\");\n \n // Core CRUD commands\n registerGetCommand(program); // labctl get <resource> [name]\n registerDescribeCommand(program); // labctl describe <resource> <name>\n registerCreateCommand(program); // labctl create <resource>\n registerDeleteCommand(program); // labctl delete <resource> <name>\n registerApplyCommand(program); // labctl apply -f <file>\n registerEditCommand(program); // labctl edit <resource> <name>\n \n // Provisioning actions\n registerProvisionCommand(program); // labctl provision <server>\n registerReprovisionCommand(program);// labctl reprovision <server>\n registerForgetCommand(program); // labctl forget <server>\n \n // Bastion management\n registerBastionCommand(program); // labctl bastion start|stop|status\n \n // App management\n registerAppCommand(program); // labctl app install|health k3s\n \n // Utility\n registerConfigCommand(program);\n registerLoginCommand(program);\n registerDoctorCommand(program);\n \n // Legacy compatibility with deprecation warnings\n registerLegacyCommands(program);\n \n return program;\n}\n\nfunction registerLegacyCommands(program: Command): void {\n // labctl provision list -> labctl get servers (with warning)\n program\n .command(\"provision\")\n .command(\"list\")\n .action(() => {\n console.warn(\"DEPRECATED: Use 'labctl get servers' instead.\");\n // Delegate to get servers\n });\n}\n```\n\nUpdate shell completions in `scripts/generate-completions.ts` for new command structure.",
"testStrategy": "1. Test --help shows all new commands with descriptions\n2. Test resource type help: `labctl get --help` lists valid resources\n3. Test deprecated commands show warning but still work\n4. Test shell completions generated for new commands\n5. Test global options: -o, --server, --env, --cloud all work",
"priority": "low",
"dependencies": [
77,
78,
79,
80
],
"status": "pending",
"subtasks": []
}
],
"metadata": {
"created": "2026-03-26T04:26:49.813Z",
"updated": "2026-03-26T04:26:49.813Z",
"description": "Tasks for master context"
}
}
}