feat: install logging, error trapping, PXE/ISO integration tests
Some checks failed
CI/CD / lint (pull_request) Failing after 13s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 36s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped

Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.

Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts

Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)

dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility

Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)

201 unit tests passing (11 new).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Michal
2026-03-26 22:26:33 +00:00
parent ffc4a782d2
commit 46b017d77e
189 changed files with 16241 additions and 432 deletions

View File

@@ -1,22 +1,21 @@
{
"models": {
"main": {
"provider": "anthropic",
"modelId": "claude-sonnet-4-20250514",
"maxTokens": 64000,
"provider": "claude-code",
"modelId": "opus",
"maxTokens": 32000,
"temperature": 0.2
},
"research": {
"provider": "anthropic",
"modelId": "claude-sonnet-4-20250514",
"maxTokens": 64000,
"provider": "claude-code",
"modelId": "opus",
"maxTokens": 32000,
"temperature": 0.2
},
"resolution": "main",
"fallback": {
"provider": "anthropic",
"modelId": "claude-3-7-sonnet-20250219",
"maxTokens": 120000,
"provider": "claude-code",
"modelId": "sonnet",
"maxTokens": 64000,
"temperature": 0.2
}
},

6
.taskmaster/state.json Normal file
View File

@@ -0,0 +1,6 @@
{
"currentTag": "master",
"lastSwitched": "2026-03-18T00:17:54.213Z",
"branchTagMapping": {},
"migrationNoticeShown": true
}

View File

@@ -0,0 +1,180 @@
{
"master": {
"tasks": [
{
"id": 72,
"title": "Expand Prisma Schema with Resource Relationships",
"description": "Add Network, ServerNic, ServerDisk, and ClusterMember models to the Prisma schema. Add bastionId foreign key to Server model to track which bastion owns each server.",
"details": "Edit `bastion/src/labd/prisma/schema.prisma` to add:\n\n1. **Server model changes**:\n - Add `bastionId String?` with relation to Bastion\n - Add `hardwareInfo Json?` for storing raw HardwareInfo\n - Add `os String?` for installed OS\n\n2. **Network model**:\n```prisma\nmodel Network {\n id String @id @default(uuid())\n name String @unique\n cidr String\n vlan Int?\n gateway String?\n domain String?\n dhcpEnabled Boolean @default(false)\n createdAt DateTime @default(now())\n updatedAt DateTime @updatedAt\n \n nics ServerNic[]\n}\n```\n\n3. **ServerNic model**:\n```prisma\nmodel ServerNic {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n networkId String?\n network Network? @relation(fields: [networkId], references: [id])\n mac String\n ip String?\n name String\n state String @default(\"DOWN\")\n \n @@unique([serverId, mac])\n @@index([networkId])\n}\n```\n\n4. **ServerDisk model**:\n```prisma\nmodel ServerDisk {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n name String\n sizeGb Float\n model String?\n \n @@unique([serverId, name])\n}\n```\n\n5. **ClusterMember model**:\n```prisma\nmodel ClusterMember {\n id String @id @default(uuid())\n clusterId String\n cluster Cluster @relation(fields: [clusterId], references: [id], onDelete: Cascade)\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n role String @default(\"worker\") // control-plane, worker\n joinedAt DateTime @default(now())\n \n @@unique([clusterId, serverId])\n @@index([clusterId])\n @@index([serverId])\n}\n```\n\n6. Update Server model with relations to nics, disks, clusterMemberships, and bastion.\n\nRun `pnpm prisma generate` and `pnpm prisma migrate dev --name add-resource-models`.",
"testStrategy": "1. Run `pnpm prisma validate` to verify schema syntax\n2. Run `pnpm prisma generate` to confirm client generation\n3. Create migration and verify it applies cleanly to local CockroachDB\n4. Write unit tests that create/read/delete each new model\n5. Verify cascade deletes work (deleting Server removes its NICs and Disks)",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 73,
"title": "Implement State Persistence Service in labd",
"description": "Create a new service in labd that persists bastion state syncs to the Server table in CockroachDB. When bastion-state-sync messages arrive, upsert machines into Server with their hardware info, status, and ownership.",
"details": "Create `bastion/src/labd/src/services/state-persistence.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\nimport type { BastionState, HardwareInfo, InstallConfig, InstalledInfo } from \"@lab/shared\";\nimport { logger } from \"./logger.js\";\n\nexport class StatePersistence {\n constructor(private readonly db: PrismaClient) {}\n\n async syncBastionState(bastionId: string, state: BastionState): Promise<void> {\n // Process discovered machines\n for (const [mac, hw] of Object.entries(state.discovered)) {\n await this.upsertDiscoveredServer(bastionId, mac, hw);\n }\n \n // Process queued machines (update status to provisioning)\n for (const [mac, cfg] of Object.entries(state.install_queue)) {\n await this.upsertQueuedServer(bastionId, mac, cfg);\n }\n \n // Process installed machines\n for (const [mac, info] of Object.entries(state.installed)) {\n await this.upsertInstalledServer(bastionId, mac, info);\n }\n }\n\n private async upsertDiscoveredServer(bastionId: string, mac: string, hw: HardwareInfo): Promise<void> {\n const normalized = mac.toLowerCase();\n \n await this.db.server.upsert({\n where: { mac: normalized },\n create: {\n hostname: `unknown-${normalized.replace(/:/g, \"\").slice(-6)}`,\n mac: normalized,\n bastionId,\n status: \"discovered\",\n hardwareInfo: hw as any,\n labels: {\n arch: hw.arch,\n cpu_model: hw.cpu_model,\n cpu_cores: hw.cpu_cores,\n memory_gb: hw.memory_gb,\n },\n },\n update: {\n bastionId,\n status: \"discovered\", // only if not already provisioning/installed\n hardwareInfo: hw as any,\n },\n });\n \n // Sync NICs and Disks\n await this.syncServerHardware(normalized, hw);\n }\n \n private async syncServerHardware(mac: string, hw: HardwareInfo): Promise<void> {\n const server = await this.db.server.findUnique({ where: { mac } });\n if (!server) return;\n \n // Upsert NICs\n for (const nic of hw.nics) {\n await this.db.serverNic.upsert({\n where: { serverId_mac: { serverId: server.id, mac: nic.mac.toLowerCase() } },\n create: { serverId: server.id, mac: nic.mac.toLowerCase(), name: nic.name, state: nic.state },\n update: { name: nic.name, state: nic.state },\n });\n }\n \n // Upsert Disks\n for (const disk of hw.disks) {\n await this.db.serverDisk.upsert({\n where: { serverId_name: { serverId: server.id, name: disk.name } },\n create: { serverId: server.id, name: disk.name, sizeGb: disk.size_gb, model: disk.model },\n update: { sizeGb: disk.size_gb, model: disk.model },\n });\n }\n }\n \n // Similar methods for upsertQueuedServer and upsertInstalledServer...\n}\n```\n\nIntegrate into `server.ts` WebSocket handler by calling `statePersistence.syncBastionState()` when `bastion-state-sync` messages arrive.",
"testStrategy": "1. Unit test StatePersistence with mocked PrismaClient\n2. Integration test: simulate bastion-state-sync message, verify Server rows created\n3. Test idempotency: send same state twice, verify no duplicates\n4. Test status transitions: discovered -> provisioning -> installed\n5. Verify hardware info (NICs, Disks) is correctly persisted",
"priority": "high",
"dependencies": [
72
],
"status": "pending",
"subtasks": []
},
{
"id": 74,
"title": "Add State Loading from labd on Bastion Startup",
"description": "Modify bastion startup to request its persisted state from labd before using the local JSON cache. This ensures bastions restore their state after pod restarts.",
"details": "1. Add new labd API endpoint `GET /api/bastions/:id/state` that returns the aggregated state for a specific bastion from the Server table:\n\n```typescript\n// bastion/src/labd/src/routes/bastions.ts\napp.get<{ Params: { id: string } }>(\"/api/bastions/:id/state\", async (request, reply) => {\n const { id } = request.params;\n \n const servers = await db.server.findMany({\n where: { bastionId: id },\n include: { nics: true, disks: true },\n });\n \n // Transform back to BastionState format\n const state: BastionState = { discovered: {}, install_queue: {}, installed: {} };\n for (const server of servers) {\n const mac = server.mac;\n if (!mac) continue;\n \n switch (server.status) {\n case \"discovered\":\n state.discovered[mac] = transformToHardwareInfo(server);\n break;\n case \"provisioning\":\n state.install_queue[mac] = transformToInstallConfig(server);\n break;\n case \"installed\":\n state.installed[mac] = transformToInstalledInfo(server);\n break;\n }\n }\n \n return reply.send(state);\n});\n```\n\n2. Modify `BastionConnection.connect()` in `labd-connection.ts` to fetch state after enrollment:\n\n```typescript\nprivate async loadRemoteState(): Promise<BastionState | null> {\n if (!this.bastionId || !this.config.labdUrl) return null;\n try {\n const resp = await fetch(`${this.config.labdUrl}/api/bastions/${this.bastionId}/state`);\n if (resp.ok) return await resp.json();\n } catch { /* fall back to local */ }\n return null;\n}\n```\n\n3. In bastion `main.ts`, after establishing labd connection, merge remote state with local state (remote takes precedence for installed machines, local wins for in-progress installs).",
"testStrategy": "1. Integration test: start bastion, let it persist state, restart bastion, verify state restored\n2. Test merge logic: local has in-progress install, remote has discovered - verify install preserved\n3. Test offline mode: labd unavailable, bastion falls back to local JSON\n4. Test fresh start: no local state, no remote state - bastion starts with empty state",
"priority": "high",
"dependencies": [
73
],
"status": "pending",
"subtasks": []
},
{
"id": 75,
"title": "Fix Bastion --dir Environment Variable Default",
"description": "Fix the bug where CLI's --dir default overrides the BASTION_DIR environment variable. The CLI option should use the env var as its default.",
"details": "Edit `bastion/src/cli/src/commands/serve.ts`:\n\n```typescript\n// Before (line 14):\n.option(\"--dir <dir>\", \"Bastion data directory\", \"/tmp/lab-bastion\")\n\n// After:\n.option(\n \"--dir <dir>\",\n \"Bastion data directory\",\n process.env[\"BASTION_DIR\"] ?? \"/tmp/lab-bastion\"\n)\n```\n\nThis ensures:\n1. If `BASTION_DIR` env var is set (e.g., in k8s deployment), it's used as default\n2. Explicit `--dir` flag still overrides both\n3. Falls back to `/tmp/lab-bastion` if neither is set\n\nAlso update the k8s deployment manifest `bastion/deploy/k3s/deployment.yaml` to ensure `BASTION_DIR=/data` is properly set.",
"testStrategy": "1. Unit test: verify option default reads from process.env\n2. Integration test: set BASTION_DIR, run labctl without --dir, verify correct dir used\n3. Integration test: set BASTION_DIR, run labctl with --dir /custom, verify /custom used\n4. Test no env var: verify default /tmp/lab-bastion used",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 76,
"title": "Create Resource Type Registry with Aliases",
"description": "Create a centralized resource type registry that maps resource names, plurals, and short aliases to canonical types. This enables kubectl-style resource resolution.",
"details": "Create `bastion/src/cli/src/utils/resources.ts`:\n\n```typescript\nexport interface ResourceDefinition {\n kind: string; // Canonical type: \"Server\", \"Cluster\", etc.\n singular: string; // \"server\"\n plural: string; // \"servers\"\n aliases: string[]; // [\"srv\"]\n apiPath: string; // \"/api/servers\"\n columns: TableColumn[]; // Default columns for 'get' output\n wideColumns?: TableColumn[]; // Extra columns for -o wide\n}\n\nconst RESOURCE_DEFINITIONS: ResourceDefinition[] = [\n {\n kind: \"Server\",\n singular: \"server\",\n plural: \"servers\",\n aliases: [\"srv\"],\n apiPath: \"/api/servers\",\n columns: serverColumns,\n wideColumns: serverWideColumns,\n },\n {\n kind: \"Cluster\",\n singular: \"cluster\",\n plural: \"clusters\",\n aliases: [],\n apiPath: \"/api/clusters\",\n columns: clusterColumns,\n },\n {\n kind: \"Network\",\n singular: \"network\",\n plural: \"networks\",\n aliases: [\"net\"],\n apiPath: \"/api/networks\",\n columns: networkColumns,\n },\n // ... bastion, role, user, token, audit\n];\n\nconst aliasMap = new Map<string, ResourceDefinition>();\nfor (const def of RESOURCE_DEFINITIONS) {\n aliasMap.set(def.singular, def);\n aliasMap.set(def.plural, def);\n for (const alias of def.aliases) {\n aliasMap.set(alias, def);\n }\n}\n\nexport function resolveResourceType(input: string): ResourceDefinition {\n const normalized = input.toLowerCase();\n const def = aliasMap.get(normalized);\n if (!def) {\n const valid = RESOURCE_DEFINITIONS.map(d => d.plural).join(\", \");\n throw new Error(`Unknown resource type \"${input}\". Valid types: ${valid}`);\n }\n return def;\n}\n\nexport function resolveResourceIdentifier(input: string): {\n type: ResourceDefinition;\n name?: string;\n} {\n // Handle \"server/labmaster\" or just \"servers\"\n const parts = input.split(\"/\");\n const type = resolveResourceType(parts[0]);\n const name = parts.length > 1 ? parts.slice(1).join(\"/\") : undefined;\n return { type, name };\n}\n```\n\nUpdate `bastion/src/cli/src/utils/resource.ts` to use the new registry.",
"testStrategy": "1. Unit test resolveResourceType with all aliases: server, servers, srv -> Server\n2. Test unknown resource type throws descriptive error\n3. Test case insensitivity: SERVER, Server, server all resolve correctly\n4. Test resolveResourceIdentifier parses \"server/labmaster\" correctly",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 77,
"title": "Implement 'labctl get' Command",
"description": "Create the core 'labctl get <resource> [name]' command that lists resources with filtering and output format support. This is the foundation of the kubectl-style CLI.",
"details": "Create `bastion/src/cli/src/commands/get.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType, type ResourceDefinition } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\nimport { formatOutput, type TableColumn } from \"../utils/table.js\";\n\nexport function registerGetCommand(program: Command): void {\n program\n .command(\"get <resource> [name]\")\n .description(\"List resources or get a specific resource by name\")\n .option(\"--status <status>\", \"Filter by status\")\n .option(\"--role <role>\", \"Filter by role (servers only)\")\n .option(\"--cloud <cloud>\", \"Filter by cloud\")\n .option(\"--env <environment>\", \"Filter by environment\")\n .option(\"-l, --label <label>\", \"Filter by label (key=value)\")\n .option(\"-A, --all-namespaces\", \"List across all clouds/environments\")\n .action(async (resource: string, name: string | undefined, opts) => {\n const config = program.opts()[\"_config\"];\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n let data: unknown[];\n \n if (name) {\n // Get specific resource - could be name, ID, or MAC\n const item = await client.getResource(resourceDef, name);\n data = item ? [item] : [];\n } else {\n // List with filters\n data = await client.listResources(resourceDef, {\n status: opts.status,\n role: opts.role,\n cloud: opts.allNamespaces ? undefined : (opts.cloud ?? config.defaultCloud),\n environment: opts.allNamespaces ? undefined : (opts.env ?? config.defaultEnvironment),\n label: opts.label,\n });\n }\n \n if (data.length === 0) {\n console.log(`No ${resourceDef.plural} found.`);\n return;\n }\n \n const columns = config.outputFormat === \"wide\" && resourceDef.wideColumns\n ? [...resourceDef.columns, ...resourceDef.wideColumns]\n : resourceDef.columns;\n \n formatOutput(data, config.outputFormat, columns);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nAdd to `index.ts`: `registerGetCommand(program);`\n\nExtend LabdClient with generic resource methods.",
"testStrategy": "1. Integration test: `labctl get servers` returns list from labd\n2. Test filtering: `labctl get servers --status discovered` only shows discovered\n3. Test name lookup: `labctl get server labmaster` returns single server\n4. Test MAC lookup: `labctl get server 38:05:25:33:e2:e4` resolves by MAC\n5. Test output formats: -o json, -o yaml, -o wide produce correct output\n6. Test unknown resource: `labctl get foo` shows helpful error",
"priority": "high",
"dependencies": [
76
],
"status": "pending",
"subtasks": []
},
{
"id": 78,
"title": "Implement 'labctl describe' Command",
"description": "Create the 'labctl describe <resource> <name>' command that shows detailed information about a resource including relationships, hardware info, and history.",
"details": "Create `bastion/src/cli/src/commands/describe.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nconst BOLD = \"\\x1b[1m\";\nconst DIM = \"\\x1b[2m\";\nconst RESET = \"\\x1b[0m\";\n\ninterface DescribeSection {\n title: string;\n fields: Array<[string, string | undefined]>;\n}\n\nfunction printDescribe(name: string, sections: DescribeSection[]): void {\n console.log(`${BOLD}Name:${RESET} ${name}`);\n for (const section of sections) {\n console.log(`\\n${BOLD}${section.title}:${RESET}`);\n for (const [key, value] of section.fields) {\n if (value !== undefined) {\n console.log(` ${DIM}${key}:${RESET} ${value}`);\n }\n }\n }\n}\n\nexport function registerDescribeCommand(program: Command): void {\n program\n .command(\"describe <resource> <name>\")\n .description(\"Show detailed information about a resource\")\n .action(async (resource: string, name: string) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n const item = await client.describeResource(resourceDef, name);\n if (!item) {\n console.error(`${resourceDef.singular} \"${name}\" not found.`);\n process.exit(1);\n }\n \n // Resource-specific formatting\n switch (resourceDef.kind) {\n case \"Server\":\n printServerDescription(item);\n break;\n case \"Cluster\":\n printClusterDescription(item);\n break;\n default:\n console.log(JSON.stringify(item, null, 2));\n }\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n\nfunction printServerDescription(server: any): void {\n const sections: DescribeSection[] = [\n {\n title: \"Metadata\",\n fields: [\n [\"ID\", server.id],\n [\"Cloud\", server.cloud],\n [\"Environment\", server.environment],\n [\"Role\", server.role],\n [\"Status\", server.status],\n [\"Created\", server.createdAt],\n [\"Last Seen\", server.lastHeartbeat],\n ],\n },\n {\n title: \"Hardware\",\n fields: [\n [\"MAC\", server.mac],\n [\"IP\", server.ip],\n [\"Architecture\", server.hardwareInfo?.arch],\n [\"CPU\", server.hardwareInfo?.cpu_model],\n [\"Cores\", String(server.hardwareInfo?.cpu_cores)],\n [\"Memory\", `${server.hardwareInfo?.memory_gb}GB`],\n [\"Product\", server.hardwareInfo?.product],\n ],\n },\n ];\n \n if (server.nics?.length > 0) {\n sections.push({\n title: \"Network Interfaces\",\n fields: server.nics.map((n: any) => [n.name, `${n.mac} ${n.ip ?? \"\"} (${n.state})`]),\n });\n }\n \n if (server.disks?.length > 0) {\n sections.push({\n title: \"Disks\",\n fields: server.disks.map((d: any) => [d.name, `${d.sizeGb}GB ${d.model ?? \"\"}`]),\n });\n }\n \n if (server.clusterMemberships?.length > 0) {\n sections.push({\n title: \"Cluster Membership\",\n fields: server.clusterMemberships.map((m: any) => [m.cluster.name, m.role]),\n });\n }\n \n printDescribe(server.hostname, sections);\n}\n```",
"testStrategy": "1. Integration test: `labctl describe server labmaster` shows full details\n2. Test hardware info display: CPU, memory, disks, NICs all shown\n3. Test cluster membership: server in cluster shows membership section\n4. Test not found: `labctl describe server nonexistent` shows helpful error\n5. Test different resource types: describe cluster, network, bastion",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 79,
"title": "Implement 'labctl create/delete' Commands",
"description": "Create the 'labctl create <resource>' and 'labctl delete <resource> <name>' commands for creating and removing resources like networks, clusters, and tokens.",
"details": "Create `bastion/src/cli/src/commands/create.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nexport function registerCreateCommand(program: Command): void {\n const create = program\n .command(\"create <resource>\")\n .description(\"Create a resource\");\n \n // labctl create network --name lab --cidr 192.168.8.0/24\n create\n .command(\"network\")\n .description(\"Create a network\")\n .requiredOption(\"--name <name>\", \"Network name\")\n .requiredOption(\"--cidr <cidr>\", \"Network CIDR (e.g., 192.168.8.0/24)\")\n .option(\"--gateway <gateway>\", \"Gateway IP\")\n .option(\"--vlan <vlan>\", \"VLAN ID\", parseInt)\n .option(\"--domain <domain>\", \"DNS domain\")\n .option(\"--dhcp\", \"Enable DHCP\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const network = await client.createNetwork({\n name: opts.name,\n cidr: opts.cidr,\n gateway: opts.gateway,\n vlan: opts.vlan,\n domain: opts.domain,\n dhcpEnabled: opts.dhcp ?? false,\n });\n console.log(`network/${network.name} created`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n \n // labctl create token --label \"worker enrollment\" --type reusable\n create\n .command(\"token\")\n .description(\"Create a join token\")\n .option(\"--label <label>\", \"Token label/description\")\n .option(\"--type <type>\", \"Token type: one-time or reusable\", \"one-time\")\n .option(\"--expires <duration>\", \"Expiration (e.g., 24h, 7d)\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const token = await client.createToken(opts);\n console.log(`Token created: ${token.token}`);\n if (opts.label) console.log(`Label: ${opts.label}`);\n if (token.expiresAt) console.log(`Expires: ${token.expiresAt}`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nCreate `bastion/src/cli/src/commands/delete.ts`:\n\n```typescript\nexport function registerDeleteCommand(program: Command): void {\n program\n .command(\"delete <resource> <name>\")\n .description(\"Delete a resource\")\n .option(\"--force\", \"Skip confirmation\")\n .action(async (resource: string, name: string, opts) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n if (!opts.force) {\n const { confirm } = await import(\"../utils/prompts.js\");\n const yes = await confirm(`Delete ${resourceDef.singular} \"${name}\"?`);\n if (!yes) {\n console.log(\"Cancelled.\");\n return;\n }\n }\n \n try {\n await client.deleteResource(resourceDef, name);\n console.log(`${resourceDef.singular}/${name} deleted`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```",
"testStrategy": "1. Integration test: `labctl create network` creates network in DB\n2. Test validation: missing required flags shows helpful error\n3. Test token creation: token returned is valid UUID, stored in DB\n4. Test delete with confirmation: prompts user, respects --force\n5. Test delete cascade: deleting server removes NICs, disks\n6. Test delete protection: cannot delete bastion with connected servers",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 80,
"title": "Refactor Provision Commands to kubectl-style",
"description": "Refactor existing provision commands to use kubectl-style syntax: 'labctl provision <server>' instead of 'labctl provision install <mac>'.",
"details": "The new command structure should be:\n- `labctl provision <server> --os fedora-43 --role worker` (queue install)\n- `labctl reprovision <server>` (reinstall)\n- `labctl forget <server>` (remove from tracking)\n\nModify `bastion/src/cli/src/commands/install.ts` → rename to `provision.ts`:\n\n```typescript\nexport function registerProvisionCommand(program: Command): void {\n program\n .command(\"provision <server>\")\n .description(\"Queue a server for OS installation\")\n .requiredOption(\"--os <os>\", \"Operating system\", \"fedora-43\")\n .requiredOption(\"--role <role>\", \"Server role\", \"worker\")\n .option(\"--disk <disk>\", \"Target disk (auto-detected if not specified)\")\n .option(\"--hostname <hostname>\", \"Override hostname\")\n .action(async (server: string, opts) => {\n const client = getLabdClient();\n \n // Resolve server: could be hostname, MAC, or ID\n const resolved = await client.resolveServer(server);\n if (!resolved) {\n console.error(`Server \"${server}\" not found.`);\n console.error(\"Tip: Use 'labctl get servers' to see available servers.\");\n process.exit(1);\n }\n \n if (resolved.status === \"installed\") {\n console.error(`Server \"${resolved.hostname}\" is already installed.`);\n console.error(\"Tip: Use 'labctl reprovision' to reinstall.\");\n process.exit(1);\n }\n \n try {\n await client.provisionServer(resolved.mac, {\n hostname: opts.hostname ?? resolved.hostname,\n os: opts.os,\n role: opts.role,\n disk: opts.disk,\n });\n console.log(`Server ${resolved.hostname} queued for ${opts.os} installation as ${opts.role}.`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nSimilarly update reprovision.ts and forget.ts to accept server name/MAC/ID.\n\nUpdate index.ts to register commands at top level instead of under 'provision' subcommand.",
"testStrategy": "1. Test server resolution: provision by hostname, MAC, or UUID all work\n2. Test already installed: provisioning installed server shows reprovision hint\n3. Test unknown server: helpful error message with tip\n4. Test reprovision: reinstalls installed server\n5. Test forget: removes server from all state categories\n6. Backward compat: verify 'labctl provision list' still works (deprecation warning)",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 81,
"title": "Implement Server and Resource API Endpoints in labd",
"description": "Add REST API endpoints in labd for full resource CRUD operations: networks, clusters, tokens. Extend servers endpoint with filters and relationship includes.",
"details": "Create/extend labd route files:\n\n1. **Extend servers.ts**:\n```typescript\n// GET /api/servers - with extended filters and includes\napp.get(\"/api/servers\", async (request, reply) => {\n const { status, role, cloud, environment, label, include } = request.query;\n \n const where = {};\n if (status) where.status = status;\n if (role) where.role = role;\n if (cloud) where.cloud = cloud;\n if (environment) where.environment = environment;\n if (label) where.labels = { path: [labelKey], equals: labelValue };\n \n const servers = await db.server.findMany({\n where,\n include: {\n nics: include?.includes(\"nics\"),\n disks: include?.includes(\"disks\"),\n clusterMemberships: include?.includes(\"clusters\") ? { include: { cluster: true } } : false,\n bastion: include?.includes(\"bastion\"),\n },\n });\n return servers;\n});\n\n// GET /api/servers/:id - by ID, hostname, or MAC\napp.get(\"/api/servers/:identifier\", async (request, reply) => {\n const { identifier } = request.params;\n \n // Try UUID first\n let server = await db.server.findUnique({ where: { id: identifier }, include: fullInclude });\n // Try hostname\n if (!server) server = await db.server.findUnique({ where: { hostname: identifier }, include: fullInclude });\n // Try MAC\n if (!server) server = await db.server.findUnique({ where: { mac: identifier.toLowerCase() }, include: fullInclude });\n \n if (!server) return reply.code(404).send({ error: \"Server not found\" });\n return server;\n});\n```\n\n2. **Create networks.ts**:\n```typescript\n// GET /api/networks, POST /api/networks, DELETE /api/networks/:id\nexport function registerNetworkRoutes(app: FastifyInstance, db: DbClient): void {\n app.get(\"/api/networks\", async () => db.network.findMany());\n \n app.post(\"/api/networks\", async (request, reply) => {\n const { name, cidr, gateway, vlan, domain, dhcpEnabled } = request.body;\n // Validate CIDR format\n const network = await db.network.create({ data: { name, cidr, gateway, vlan, domain, dhcpEnabled } });\n return reply.code(201).send(network);\n });\n \n app.delete(\"/api/networks/:id\", async (request, reply) => {\n await db.network.delete({ where: { id: request.params.id } });\n return reply.code(204).send();\n });\n}\n```\n\n3. **Create clusters.ts**:\n```typescript\n// Similar CRUD for clusters with member management\napp.get(\"/api/clusters/:id/members\", ...);\napp.post(\"/api/clusters/:id/members\", ...);\napp.delete(\"/api/clusters/:id/members/:serverId\", ...);\n```",
"testStrategy": "1. Integration test all CRUD endpoints with HTTP client\n2. Test server resolution: by id, hostname, and MAC all return same server\n3. Test include parameter: nics, disks, clusters included when requested\n4. Test validation: invalid CIDR rejected, duplicate names rejected\n5. Test cascade: delete network with NICs fails or cascades appropriately",
"priority": "medium",
"dependencies": [
72,
73
],
"status": "pending",
"subtasks": []
},
{
"id": 82,
"title": "Implement RBAC Permission Checks in CLI",
"description": "Wire RBAC permission checks into CLI commands. Check user permissions before executing operations using the existing Permission model.",
"details": "1. Create `bastion/src/cli/src/middleware/rbac.ts`:\n\n```typescript\nimport { getLabdClient } from \"../api/config.js\";\n\nexport interface PermissionContext {\n action: string; // read, exec, apply, destroy, manage, admin\n cloud?: string;\n environment?: string;\n server?: string;\n}\n\nexport async function checkPermission(ctx: PermissionContext): Promise<boolean> {\n const client = getLabdClient();\n try {\n const result = await client.checkPermission(ctx);\n return result.allowed;\n } catch {\n // If can't reach labd, fail open for local operations\n return true;\n }\n}\n\nexport async function requirePermission(ctx: PermissionContext): Promise<void> {\n const allowed = await checkPermission(ctx);\n if (!allowed) {\n throw new Error(\n `Permission denied: ${ctx.action} on ${ctx.server ?? \"*\"}@${ctx.cloud ?? \"*\"}/${ctx.environment ?? \"*\"}`\n );\n }\n}\n```\n\n2. Add labd endpoint `POST /api/auth/check-permission`:\n```typescript\napp.post(\"/api/auth/check-permission\", async (request, reply) => {\n const user = await authenticateRequest(request); // from cert or token\n const { action, cloud, environment, server } = request.body;\n \n const permissions = await db.permission.findMany({\n where: {\n role: { userBindings: { some: { userId: user.id } } },\n },\n });\n \n const allowed = permissions.some(p => \n matchesPattern(p.action, action) &&\n matchesPattern(p.cloud, cloud ?? \"*\") &&\n matchesPattern(p.environment, environment ?? \"*\") &&\n matchesPattern(p.server, server ?? \"*\")\n );\n \n return { allowed };\n});\n```\n\n3. Integrate into commands:\n```typescript\n// In provision command\nawait requirePermission({ action: \"apply\", cloud, environment, server: resolved.hostname });\n\n// In delete command\nawait requirePermission({ action: \"destroy\", cloud, environment, server: name });\n\n// In get command (filter results)\nconst servers = await client.listServers(filters);\nconst visible = await filterByPermission(servers, \"read\");\n```",
"testStrategy": "1. Unit test permission matching logic with wildcards\n2. Test admin role: has access to all resources\n3. Test operator role: can read/exec but not destroy\n4. Test viewer role: can only read, provision denied\n5. Test scope matching: permission for cloud=aws doesn't grant access to cloud=baremetal\n6. Test denied action is audit-logged",
"priority": "medium",
"dependencies": [
77,
81
],
"status": "pending",
"subtasks": []
},
{
"id": 83,
"title": "Implement Audit Logging for Resource Operations",
"description": "Log all resource mutations to the AuditLog table. Include user, action, resource type/name, result, and source IP.",
"details": "1. Create `bastion/src/labd/src/services/audit.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\n\nexport interface AuditEntry {\n userId?: string;\n serverId?: string;\n sessionId?: string;\n action: string; // create, update, delete, provision, exec, rbac-denied\n resourceType: string; // server, cluster, network, token, etc.\n resourceName: string;\n args?: string; // sanitized args (no secrets)\n result: \"success\" | \"denied\" | \"error\";\n durationMs?: number;\n sourceIp?: string;\n}\n\nexport class AuditService {\n constructor(private readonly db: PrismaClient) {}\n \n async log(entry: AuditEntry): Promise<void> {\n await this.db.auditLog.create({\n data: {\n userId: entry.userId,\n serverId: entry.serverId,\n sessionId: entry.sessionId,\n action: entry.action,\n resourceType: entry.resourceType,\n resourceName: entry.resourceName,\n args: entry.args,\n result: entry.result,\n durationMs: entry.durationMs,\n sourceIp: entry.sourceIp,\n },\n });\n }\n \n async query(filters: {\n userId?: string;\n action?: string;\n resourceType?: string;\n since?: Date;\n limit?: number;\n }): Promise<AuditEntry[]> {\n return this.db.auditLog.findMany({\n where: {\n userId: filters.userId,\n action: filters.action,\n resourceType: filters.resourceType,\n timestamp: filters.since ? { gte: filters.since } : undefined,\n },\n orderBy: { timestamp: \"desc\" },\n take: filters.limit ?? 100,\n });\n }\n}\n```\n\n2. Add Fastify hook to wrap route handlers:\n```typescript\napp.addHook(\"onResponse\", async (request, reply) => {\n // Log mutations (POST, PUT, DELETE)\n if ([\"POST\", \"PUT\", \"DELETE\"].includes(request.method)) {\n const path = request.url;\n const resourceMatch = path.match(/\\/api\\/(\\w+)(?:\\/([^/]+))?/);\n if (resourceMatch) {\n await auditService.log({\n action: methodToAction(request.method),\n resourceType: resourceMatch[1],\n resourceName: resourceMatch[2] ?? \"\",\n result: reply.statusCode < 400 ? \"success\" : \"error\",\n sourceIp: request.ip,\n });\n }\n }\n});\n```\n\n3. Add `labctl get audit` command to view audit logs.",
"testStrategy": "1. Integration test: create network, verify audit log entry created\n2. Test RBAC denial is logged with result=denied\n3. Test sensitive data sanitization: tokens/passwords not in args\n4. Test query filters: by user, action, resourceType, time range\n5. Test `labctl get audit` displays recent entries correctly",
"priority": "medium",
"dependencies": [
81,
82
],
"status": "pending",
"subtasks": []
},
{
"id": 84,
"title": "Update CLI Entry Point and Help Text",
"description": "Update the CLI entry point to register all new commands and update help text to reflect the kubectl-style interface. Add deprecation warnings for old command structure.",
"details": "Update `bastion/src/cli/src/index.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { APP_VERSION } from \"@lab/shared\";\nimport { loadConfig } from \"./config/index.js\";\n\n// New kubectl-style commands\nimport { registerGetCommand } from \"./commands/get.js\";\nimport { registerDescribeCommand } from \"./commands/describe.js\";\nimport { registerCreateCommand } from \"./commands/create.js\";\nimport { registerDeleteCommand } from \"./commands/delete.js\";\nimport { registerApplyCommand } from \"./commands/apply.js\";\nimport { registerEditCommand } from \"./commands/edit.js\";\n\n// Action commands\nimport { registerProvisionCommand } from \"./commands/provision.js\";\nimport { registerReprovisionCommand } from \"./commands/reprovision.js\";\nimport { registerForgetCommand } from \"./commands/forget.js\";\n\n// Bastion management\nimport { registerBastionCommand } from \"./commands/bastion.js\"; // start/stop/status\n\n// App management (unchanged)\nimport { registerAppCommand } from \"./commands/app.js\";\n\n// Utility\nimport { registerConfigCommand } from \"./commands/config.js\";\nimport { registerLoginCommand } from \"./commands/login.js\";\nimport { registerDoctorCommand } from \"./commands/doctor.js\";\n\nexport function createProgram(): Command {\n const program = new Command();\n \n program\n .name(\"labctl\")\n .description(\"Lab infrastructure management CLI\")\n .version(APP_VERSION);\n \n // Global options\n program\n .option(\"-o, --output <format>\", \"output format (table, json, yaml, wide)\", \"table\")\n .option(\"--server <url>\", \"override labd server URL\")\n .option(\"--env <name>\", \"override default environment\")\n .option(\"--cloud <name>\", \"override default cloud\")\n .option(\"--debug\", \"enable debug output\")\n .option(\"--no-color\", \"disable colored output\");\n \n // Core CRUD commands\n registerGetCommand(program); // labctl get <resource> [name]\n registerDescribeCommand(program); // labctl describe <resource> <name>\n registerCreateCommand(program); // labctl create <resource>\n registerDeleteCommand(program); // labctl delete <resource> <name>\n registerApplyCommand(program); // labctl apply -f <file>\n registerEditCommand(program); // labctl edit <resource> <name>\n \n // Provisioning actions\n registerProvisionCommand(program); // labctl provision <server>\n registerReprovisionCommand(program);// labctl reprovision <server>\n registerForgetCommand(program); // labctl forget <server>\n \n // Bastion management\n registerBastionCommand(program); // labctl bastion start|stop|status\n \n // App management\n registerAppCommand(program); // labctl app install|health k3s\n \n // Utility\n registerConfigCommand(program);\n registerLoginCommand(program);\n registerDoctorCommand(program);\n \n // Legacy compatibility with deprecation warnings\n registerLegacyCommands(program);\n \n return program;\n}\n\nfunction registerLegacyCommands(program: Command): void {\n // labctl provision list -> labctl get servers (with warning)\n program\n .command(\"provision\")\n .command(\"list\")\n .action(() => {\n console.warn(\"DEPRECATED: Use 'labctl get servers' instead.\");\n // Delegate to get servers\n });\n}\n```\n\nUpdate shell completions in `scripts/generate-completions.ts` for new command structure.",
"testStrategy": "1. Test --help shows all new commands with descriptions\n2. Test resource type help: `labctl get --help` lists valid resources\n3. Test deprecated commands show warning but still work\n4. Test shell completions generated for new commands\n5. Test global options: -o, --server, --env, --cloud all work",
"priority": "low",
"dependencies": [
77,
78,
79,
80
],
"status": "pending",
"subtasks": []
}
],
"metadata": {
"created": "2026-03-26T04:26:49.813Z",
"updated": "2026-03-26T04:26:49.813Z",
"description": "Tasks for master context"
}
}
}

View File

@@ -0,0 +1,47 @@
<context>
# Overview
[Provide a high-level overview of your product here. Explain what problem it solves, who it's for, and why it's valuable.]
# Core Features
[List and describe the main features of your product. For each feature, include:
- What it does
- Why it's important
- How it works at a high level]
# User Experience
[Describe the user journey and experience. Include:
- User personas
- Key user flows
- UI/UX considerations]
</context>
<PRD>
# Technical Architecture
[Outline the technical implementation details:
- System components
- Data models
- APIs and integrations
- Infrastructure requirements]
# Development Roadmap
[Break down the development process into phases:
- MVP requirements
- Future enhancements
- Do not think about timelines whatsoever -- all that matters is scope and detailing exactly what needs to be build in each phase so it can later be cut up into tasks]
# Logical Dependency Chain
[Define the logical order of development:
- Which features need to be built first (foundation)
- Getting as quickly as possible to something usable/visible front end that works
- Properly pacing and scoping each feature so it is atomic but can also be built upon and improved as development approaches]
# Risks and Mitigations
[Identify potential risks and how they'll be addressed:
- Technical challenges
- Figuring out the MVP that we can build upon
- Resource constraints]
# Appendix
[Include any additional information:
- Research findings
- Technical specifications]
</PRD>

View File

@@ -0,0 +1,511 @@
<rpg-method>
# Repository Planning Graph (RPG) Method - PRD Template
This template teaches you (AI or human) how to create structured, dependency-aware PRDs using the RPG methodology from Microsoft Research. The key insight: separate WHAT (functional) from HOW (structural), then connect them with explicit dependencies.
## Core Principles
1. **Dual-Semantics**: Think functional (capabilities) AND structural (code organization) separately, then map them
2. **Explicit Dependencies**: Never assume - always state what depends on what
3. **Topological Order**: Build foundation first, then layers on top
4. **Progressive Refinement**: Start broad, refine iteratively
## How to Use This Template
- Follow the instructions in each `<instruction>` block
- Look at `<example>` blocks to see good vs bad patterns
- Fill in the content sections with your project details
- The AI reading this will learn the RPG method by following along
- Task Master will parse the resulting PRD into dependency-aware tasks
## Recommended Tools for Creating PRDs
When using this template to **create** a PRD (not parse it), use **code-context-aware AI assistants** for best results:
**Why?** The AI needs to understand your existing codebase to make good architectural decisions about modules, dependencies, and integration points.
**Recommended tools:**
- **Claude Code** (claude-code CLI) - Best for structured reasoning and large contexts
- **Cursor/Windsurf** - IDE integration with full codebase context
- **Gemini CLI** (gemini-cli) - Massive context window for large codebases
- **Codex/Grok CLI** - Strong code generation with context awareness
**Note:** Once your PRD is created, `task-master parse-prd` works with any configured AI model - it just needs to read the PRD text itself, not your codebase.
</rpg-method>
---
<overview>
<instruction>
Start with the problem, not the solution. Be specific about:
- What pain point exists?
- Who experiences it?
- Why existing solutions don't work?
- What success looks like (measurable outcomes)?
Keep this section focused - don't jump into implementation details yet.
</instruction>
## Problem Statement
[Describe the core problem. Be concrete about user pain points.]
## Target Users
[Define personas, their workflows, and what they're trying to achieve.]
## Success Metrics
[Quantifiable outcomes. Examples: "80% task completion via autopilot", "< 5% manual intervention rate"]
</overview>
---
<functional-decomposition>
<instruction>
Now think about CAPABILITIES (what the system DOES), not code structure yet.
Step 1: Identify high-level capability domains
- Think: "What major things does this system do?"
- Examples: Data Management, Core Processing, Presentation Layer
Step 2: For each capability, enumerate specific features
- Use explore-exploit strategy:
* Exploit: What features are REQUIRED for core value?
* Explore: What features make this domain COMPLETE?
Step 3: For each feature, define:
- Description: What it does in one sentence
- Inputs: What data/context it needs
- Outputs: What it produces/returns
- Behavior: Key logic or transformations
<example type="good">
Capability: Data Validation
Feature: Schema validation
- Description: Validate JSON payloads against defined schemas
- Inputs: JSON object, schema definition
- Outputs: Validation result (pass/fail) + error details
- Behavior: Iterate fields, check types, enforce constraints
Feature: Business rule validation
- Description: Apply domain-specific validation rules
- Inputs: Validated data object, rule set
- Outputs: Boolean + list of violated rules
- Behavior: Execute rules sequentially, short-circuit on failure
</example>
<example type="bad">
Capability: validation.js
(Problem: This is a FILE, not a CAPABILITY. Mixing structure into functional thinking.)
Capability: Validation
Feature: Make sure data is good
(Problem: Too vague. No inputs/outputs. Not actionable.)
</example>
</instruction>
## Capability Tree
### Capability: [Name]
[Brief description of what this capability domain covers]
#### Feature: [Name]
- **Description**: [One sentence]
- **Inputs**: [What it needs]
- **Outputs**: [What it produces]
- **Behavior**: [Key logic]
#### Feature: [Name]
- **Description**:
- **Inputs**:
- **Outputs**:
- **Behavior**:
### Capability: [Name]
...
</functional-decomposition>
---
<structural-decomposition>
<instruction>
NOW think about code organization. Map capabilities to actual file/folder structure.
Rules:
1. Each capability maps to a module (folder or file)
2. Features within a capability map to functions/classes
3. Use clear module boundaries - each module has ONE responsibility
4. Define what each module exports (public interface)
The goal: Create a clear mapping between "what it does" (functional) and "where it lives" (structural).
<example type="good">
Capability: Data Validation
→ Maps to: src/validation/
├── schema-validator.js (Schema validation feature)
├── rule-validator.js (Business rule validation feature)
└── index.js (Public exports)
Exports:
- validateSchema(data, schema)
- validateRules(data, rules)
</example>
<example type="bad">
Capability: Data Validation
→ Maps to: src/utils.js
(Problem: "utils" is not a clear module boundary. Where do I find validation logic?)
Capability: Data Validation
→ Maps to: src/validation/everything.js
(Problem: One giant file. Features should map to separate files for maintainability.)
</example>
</instruction>
## Repository Structure
```
project-root/
├── src/
│ ├── [module-name]/ # Maps to: [Capability Name]
│ │ ├── [file].js # Maps to: [Feature Name]
│ │ └── index.js # Public exports
│ └── [module-name]/
├── tests/
└── docs/
```
## Module Definitions
### Module: [Name]
- **Maps to capability**: [Capability from functional decomposition]
- **Responsibility**: [Single clear purpose]
- **File structure**:
```
module-name/
├── feature1.js
├── feature2.js
└── index.js
```
- **Exports**:
- `functionName()` - [what it does]
- `ClassName` - [what it does]
</structural-decomposition>
---
<dependency-graph>
<instruction>
This is THE CRITICAL SECTION for Task Master parsing.
Define explicit dependencies between modules. This creates the topological order for task execution.
Rules:
1. List modules in dependency order (foundation first)
2. For each module, state what it depends on
3. Foundation modules should have NO dependencies
4. Every non-foundation module should depend on at least one other module
5. Think: "What must EXIST before I can build this module?"
<example type="good">
Foundation Layer (no dependencies):
- error-handling: No dependencies
- config-manager: No dependencies
- base-types: No dependencies
Data Layer:
- schema-validator: Depends on [base-types, error-handling]
- data-ingestion: Depends on [schema-validator, config-manager]
Core Layer:
- algorithm-engine: Depends on [base-types, error-handling]
- pipeline-orchestrator: Depends on [algorithm-engine, data-ingestion]
</example>
<example type="bad">
- validation: Depends on API
- API: Depends on validation
(Problem: Circular dependency. This will cause build/runtime issues.)
- user-auth: Depends on everything
(Problem: Too many dependencies. Should be more focused.)
</example>
</instruction>
## Dependency Chain
### Foundation Layer (Phase 0)
No dependencies - these are built first.
- **[Module Name]**: [What it provides]
- **[Module Name]**: [What it provides]
### [Layer Name] (Phase 1)
- **[Module Name]**: Depends on [[module-from-phase-0], [module-from-phase-0]]
- **[Module Name]**: Depends on [[module-from-phase-0]]
### [Layer Name] (Phase 2)
- **[Module Name]**: Depends on [[module-from-phase-1], [module-from-foundation]]
[Continue building up layers...]
</dependency-graph>
---
<implementation-roadmap>
<instruction>
Turn the dependency graph into concrete development phases.
Each phase should:
1. Have clear entry criteria (what must exist before starting)
2. Contain tasks that can be parallelized (no inter-dependencies within phase)
3. Have clear exit criteria (how do we know phase is complete?)
4. Build toward something USABLE (not just infrastructure)
Phase ordering follows topological sort of dependency graph.
<example type="good">
Phase 0: Foundation
Entry: Clean repository
Tasks:
- Implement error handling utilities
- Create base type definitions
- Setup configuration system
Exit: Other modules can import foundation without errors
Phase 1: Data Layer
Entry: Phase 0 complete
Tasks:
- Implement schema validator (uses: base types, error handling)
- Build data ingestion pipeline (uses: validator, config)
Exit: End-to-end data flow from input to validated output
</example>
<example type="bad">
Phase 1: Build Everything
Tasks:
- API
- Database
- UI
- Tests
(Problem: No clear focus. Too broad. Dependencies not considered.)
</example>
</instruction>
## Development Phases
### Phase 0: [Foundation Name]
**Goal**: [What foundational capability this establishes]
**Entry Criteria**: [What must be true before starting]
**Tasks**:
- [ ] [Task name] (depends on: [none or list])
- Acceptance criteria: [How we know it's done]
- Test strategy: [What tests prove it works]
- [ ] [Task name] (depends on: [none or list])
**Exit Criteria**: [Observable outcome that proves phase complete]
**Delivers**: [What can users/developers do after this phase?]
---
### Phase 1: [Layer Name]
**Goal**:
**Entry Criteria**: Phase 0 complete
**Tasks**:
- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
**Exit Criteria**:
**Delivers**:
---
[Continue with more phases...]
</implementation-roadmap>
---
<test-strategy>
<instruction>
Define how testing will be integrated throughout development (TDD approach).
Specify:
1. Test pyramid ratios (unit vs integration vs e2e)
2. Coverage requirements
3. Critical test scenarios
4. Test generation guidelines for Surgical Test Generator
This section guides the AI when generating tests during the RED phase of TDD.
<example type="good">
Critical Test Scenarios for Data Validation module:
- Happy path: Valid data passes all checks
- Edge cases: Empty strings, null values, boundary numbers
- Error cases: Invalid types, missing required fields
- Integration: Validator works with ingestion pipeline
</example>
</instruction>
## Test Pyramid
```
/\
/E2E\ ← [X]% (End-to-end, slow, comprehensive)
/------\
/Integration\ ← [Y]% (Module interactions)
/------------\
/ Unit Tests \ ← [Z]% (Fast, isolated, deterministic)
/----------------\
```
## Coverage Requirements
- Line coverage: [X]% minimum
- Branch coverage: [X]% minimum
- Function coverage: [X]% minimum
- Statement coverage: [X]% minimum
## Critical Test Scenarios
### [Module/Feature Name]
**Happy path**:
- [Scenario description]
- Expected: [What should happen]
**Edge cases**:
- [Scenario description]
- Expected: [What should happen]
**Error cases**:
- [Scenario description]
- Expected: [How system handles failure]
**Integration points**:
- [What interactions to test]
- Expected: [End-to-end behavior]
## Test Generation Guidelines
[Specific instructions for Surgical Test Generator about what to focus on, what patterns to follow, project-specific test conventions]
</test-strategy>
---
<architecture>
<instruction>
Describe technical architecture, data models, and key design decisions.
Keep this section AFTER functional/structural decomposition - implementation details come after understanding structure.
</instruction>
## System Components
[Major architectural pieces and their responsibilities]
## Data Models
[Core data structures, schemas, database design]
## Technology Stack
[Languages, frameworks, key libraries]
**Decision: [Technology/Pattern]**
- **Rationale**: [Why chosen]
- **Trade-offs**: [What we're giving up]
- **Alternatives considered**: [What else we looked at]
</architecture>
---
<risks>
<instruction>
Identify risks that could derail development and how to mitigate them.
Categories:
- Technical risks (complexity, unknowns)
- Dependency risks (blocking issues)
- Scope risks (creep, underestimation)
</instruction>
## Technical Risks
**Risk**: [Description]
- **Impact**: [High/Medium/Low - effect on project]
- **Likelihood**: [High/Medium/Low]
- **Mitigation**: [How to address]
- **Fallback**: [Plan B if mitigation fails]
## Dependency Risks
[External dependencies, blocking issues]
## Scope Risks
[Scope creep, underestimation, unclear requirements]
</risks>
---
<appendix>
## References
[Papers, documentation, similar systems]
## Glossary
[Domain-specific terms]
## Open Questions
[Things to resolve during development]
</appendix>
---
<task-master-integration>
# How Task Master Uses This PRD
When you run `task-master parse-prd <file>.txt`, the parser:
1. **Extracts capabilities** → Main tasks
- Each `### Capability:` becomes a top-level task
2. **Extracts features** → Subtasks
- Each `#### Feature:` becomes a subtask under its capability
3. **Parses dependencies** → Task dependencies
- `Depends on: [X, Y]` sets task.dependencies = ["X", "Y"]
4. **Orders by phases** → Task priorities
- Phase 0 tasks = highest priority
- Phase N tasks = lower priority, properly sequenced
5. **Uses test strategy** → Test generation context
- Feeds test scenarios to Surgical Test Generator during implementation
**Result**: A dependency-aware task graph that can be executed in topological order.
## Why RPG Structure Matters
Traditional flat PRDs lead to:
- ❌ Unclear task dependencies
- ❌ Arbitrary task ordering
- ❌ Circular dependencies discovered late
- ❌ Poorly scoped tasks
RPG-structured PRDs provide:
- ✅ Explicit dependency chains
- ✅ Topological execution order
- ✅ Clear module boundaries
- ✅ Validated task graph before implementation
## Tips for Best Results
1. **Spend time on dependency graph** - This is the most valuable section for Task Master
2. **Keep features atomic** - Each feature should be independently testable
3. **Progressive refinement** - Start broad, use `task-master expand` to break down complex tasks
4. **Use research mode** - `task-master parse-prd --research` leverages AI for better task generation
</task-master-integration>