feat: install logging, error trapping, PXE/ISO integration tests
Some checks failed
CI/CD / lint (pull_request) Failing after 13s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 36s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
Some checks failed
CI/CD / lint (pull_request) Failing after 13s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 36s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
Kickstart installs on real hardware failed silently — no error reporting, only 3 progress callbacks, zero log streaming. This overhaul makes every install fully observable. Kickstart improvements: - Error trapping in %pre and %post (trap ERR sends failure details to bastion) - 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata - Background log streamer: tails %post output and batch-sends to /api/log - bastion_log() function for explicit log lines from kickstart scripts Bastion API: - POST /api/log — receives raw log lines from kickstart (single or batch) - InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence - GET /api/logs/:mac — now returns log_lines + log_total alongside stages - SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log) - Progress events forwarded to labd via bastion-progress WebSocket message - Post-provision k3s logs routed through progressBus (was console-only) dnsmasq fixes found during VM testing: - HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach) - pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode) - PXEClient vendor class echo for UEFI firmware compatibility Integration tests: - PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install - ISO boot test: blank VM boots from bastion-generated ISO → same flow - Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot) - test-provision.sh: runs both PXE + ISO tests with prerequisite checks - 250GB sparse QCOW2 disk (LVM layout needs ~204GB) 201 unit tests passing (11 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,22 +1,21 @@
|
||||
{
|
||||
"models": {
|
||||
"main": {
|
||||
"provider": "anthropic",
|
||||
"modelId": "claude-sonnet-4-20250514",
|
||||
"maxTokens": 64000,
|
||||
"provider": "claude-code",
|
||||
"modelId": "opus",
|
||||
"maxTokens": 32000,
|
||||
"temperature": 0.2
|
||||
},
|
||||
"research": {
|
||||
"provider": "anthropic",
|
||||
"modelId": "claude-sonnet-4-20250514",
|
||||
"maxTokens": 64000,
|
||||
"provider": "claude-code",
|
||||
"modelId": "opus",
|
||||
"maxTokens": 32000,
|
||||
"temperature": 0.2
|
||||
},
|
||||
"resolution": "main",
|
||||
"fallback": {
|
||||
"provider": "anthropic",
|
||||
"modelId": "claude-3-7-sonnet-20250219",
|
||||
"maxTokens": 120000,
|
||||
"provider": "claude-code",
|
||||
"modelId": "sonnet",
|
||||
"maxTokens": 64000,
|
||||
"temperature": 0.2
|
||||
}
|
||||
},
|
||||
|
||||
6
.taskmaster/state.json
Normal file
6
.taskmaster/state.json
Normal file
@@ -0,0 +1,6 @@
|
||||
{
|
||||
"currentTag": "master",
|
||||
"lastSwitched": "2026-03-18T00:17:54.213Z",
|
||||
"branchTagMapping": {},
|
||||
"migrationNoticeShown": true
|
||||
}
|
||||
180
.taskmaster/tasks/tasks.json
Normal file
180
.taskmaster/tasks/tasks.json
Normal file
@@ -0,0 +1,180 @@
|
||||
{
|
||||
"master": {
|
||||
"tasks": [
|
||||
{
|
||||
"id": 72,
|
||||
"title": "Expand Prisma Schema with Resource Relationships",
|
||||
"description": "Add Network, ServerNic, ServerDisk, and ClusterMember models to the Prisma schema. Add bastionId foreign key to Server model to track which bastion owns each server.",
|
||||
"details": "Edit `bastion/src/labd/prisma/schema.prisma` to add:\n\n1. **Server model changes**:\n - Add `bastionId String?` with relation to Bastion\n - Add `hardwareInfo Json?` for storing raw HardwareInfo\n - Add `os String?` for installed OS\n\n2. **Network model**:\n```prisma\nmodel Network {\n id String @id @default(uuid())\n name String @unique\n cidr String\n vlan Int?\n gateway String?\n domain String?\n dhcpEnabled Boolean @default(false)\n createdAt DateTime @default(now())\n updatedAt DateTime @updatedAt\n \n nics ServerNic[]\n}\n```\n\n3. **ServerNic model**:\n```prisma\nmodel ServerNic {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n networkId String?\n network Network? @relation(fields: [networkId], references: [id])\n mac String\n ip String?\n name String\n state String @default(\"DOWN\")\n \n @@unique([serverId, mac])\n @@index([networkId])\n}\n```\n\n4. **ServerDisk model**:\n```prisma\nmodel ServerDisk {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n name String\n sizeGb Float\n model String?\n \n @@unique([serverId, name])\n}\n```\n\n5. **ClusterMember model**:\n```prisma\nmodel ClusterMember {\n id String @id @default(uuid())\n clusterId String\n cluster Cluster @relation(fields: [clusterId], references: [id], onDelete: Cascade)\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n role String @default(\"worker\") // control-plane, worker\n joinedAt DateTime @default(now())\n \n @@unique([clusterId, serverId])\n @@index([clusterId])\n @@index([serverId])\n}\n```\n\n6. Update Server model with relations to nics, disks, clusterMemberships, and bastion.\n\nRun `pnpm prisma generate` and `pnpm prisma migrate dev --name add-resource-models`.",
|
||||
"testStrategy": "1. Run `pnpm prisma validate` to verify schema syntax\n2. Run `pnpm prisma generate` to confirm client generation\n3. Create migration and verify it applies cleanly to local CockroachDB\n4. Write unit tests that create/read/delete each new model\n5. Verify cascade deletes work (deleting Server removes its NICs and Disks)",
|
||||
"priority": "high",
|
||||
"dependencies": [],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 73,
|
||||
"title": "Implement State Persistence Service in labd",
|
||||
"description": "Create a new service in labd that persists bastion state syncs to the Server table in CockroachDB. When bastion-state-sync messages arrive, upsert machines into Server with their hardware info, status, and ownership.",
|
||||
"details": "Create `bastion/src/labd/src/services/state-persistence.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\nimport type { BastionState, HardwareInfo, InstallConfig, InstalledInfo } from \"@lab/shared\";\nimport { logger } from \"./logger.js\";\n\nexport class StatePersistence {\n constructor(private readonly db: PrismaClient) {}\n\n async syncBastionState(bastionId: string, state: BastionState): Promise<void> {\n // Process discovered machines\n for (const [mac, hw] of Object.entries(state.discovered)) {\n await this.upsertDiscoveredServer(bastionId, mac, hw);\n }\n \n // Process queued machines (update status to provisioning)\n for (const [mac, cfg] of Object.entries(state.install_queue)) {\n await this.upsertQueuedServer(bastionId, mac, cfg);\n }\n \n // Process installed machines\n for (const [mac, info] of Object.entries(state.installed)) {\n await this.upsertInstalledServer(bastionId, mac, info);\n }\n }\n\n private async upsertDiscoveredServer(bastionId: string, mac: string, hw: HardwareInfo): Promise<void> {\n const normalized = mac.toLowerCase();\n \n await this.db.server.upsert({\n where: { mac: normalized },\n create: {\n hostname: `unknown-${normalized.replace(/:/g, \"\").slice(-6)}`,\n mac: normalized,\n bastionId,\n status: \"discovered\",\n hardwareInfo: hw as any,\n labels: {\n arch: hw.arch,\n cpu_model: hw.cpu_model,\n cpu_cores: hw.cpu_cores,\n memory_gb: hw.memory_gb,\n },\n },\n update: {\n bastionId,\n status: \"discovered\", // only if not already provisioning/installed\n hardwareInfo: hw as any,\n },\n });\n \n // Sync NICs and Disks\n await this.syncServerHardware(normalized, hw);\n }\n \n private async syncServerHardware(mac: string, hw: HardwareInfo): Promise<void> {\n const server = await this.db.server.findUnique({ where: { mac } });\n if (!server) return;\n \n // Upsert NICs\n for (const nic of hw.nics) {\n await this.db.serverNic.upsert({\n where: { serverId_mac: { serverId: server.id, mac: nic.mac.toLowerCase() } },\n create: { serverId: server.id, mac: nic.mac.toLowerCase(), name: nic.name, state: nic.state },\n update: { name: nic.name, state: nic.state },\n });\n }\n \n // Upsert Disks\n for (const disk of hw.disks) {\n await this.db.serverDisk.upsert({\n where: { serverId_name: { serverId: server.id, name: disk.name } },\n create: { serverId: server.id, name: disk.name, sizeGb: disk.size_gb, model: disk.model },\n update: { sizeGb: disk.size_gb, model: disk.model },\n });\n }\n }\n \n // Similar methods for upsertQueuedServer and upsertInstalledServer...\n}\n```\n\nIntegrate into `server.ts` WebSocket handler by calling `statePersistence.syncBastionState()` when `bastion-state-sync` messages arrive.",
|
||||
"testStrategy": "1. Unit test StatePersistence with mocked PrismaClient\n2. Integration test: simulate bastion-state-sync message, verify Server rows created\n3. Test idempotency: send same state twice, verify no duplicates\n4. Test status transitions: discovered -> provisioning -> installed\n5. Verify hardware info (NICs, Disks) is correctly persisted",
|
||||
"priority": "high",
|
||||
"dependencies": [
|
||||
72
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 74,
|
||||
"title": "Add State Loading from labd on Bastion Startup",
|
||||
"description": "Modify bastion startup to request its persisted state from labd before using the local JSON cache. This ensures bastions restore their state after pod restarts.",
|
||||
"details": "1. Add new labd API endpoint `GET /api/bastions/:id/state` that returns the aggregated state for a specific bastion from the Server table:\n\n```typescript\n// bastion/src/labd/src/routes/bastions.ts\napp.get<{ Params: { id: string } }>(\"/api/bastions/:id/state\", async (request, reply) => {\n const { id } = request.params;\n \n const servers = await db.server.findMany({\n where: { bastionId: id },\n include: { nics: true, disks: true },\n });\n \n // Transform back to BastionState format\n const state: BastionState = { discovered: {}, install_queue: {}, installed: {} };\n for (const server of servers) {\n const mac = server.mac;\n if (!mac) continue;\n \n switch (server.status) {\n case \"discovered\":\n state.discovered[mac] = transformToHardwareInfo(server);\n break;\n case \"provisioning\":\n state.install_queue[mac] = transformToInstallConfig(server);\n break;\n case \"installed\":\n state.installed[mac] = transformToInstalledInfo(server);\n break;\n }\n }\n \n return reply.send(state);\n});\n```\n\n2. Modify `BastionConnection.connect()` in `labd-connection.ts` to fetch state after enrollment:\n\n```typescript\nprivate async loadRemoteState(): Promise<BastionState | null> {\n if (!this.bastionId || !this.config.labdUrl) return null;\n try {\n const resp = await fetch(`${this.config.labdUrl}/api/bastions/${this.bastionId}/state`);\n if (resp.ok) return await resp.json();\n } catch { /* fall back to local */ }\n return null;\n}\n```\n\n3. In bastion `main.ts`, after establishing labd connection, merge remote state with local state (remote takes precedence for installed machines, local wins for in-progress installs).",
|
||||
"testStrategy": "1. Integration test: start bastion, let it persist state, restart bastion, verify state restored\n2. Test merge logic: local has in-progress install, remote has discovered - verify install preserved\n3. Test offline mode: labd unavailable, bastion falls back to local JSON\n4. Test fresh start: no local state, no remote state - bastion starts with empty state",
|
||||
"priority": "high",
|
||||
"dependencies": [
|
||||
73
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 75,
|
||||
"title": "Fix Bastion --dir Environment Variable Default",
|
||||
"description": "Fix the bug where CLI's --dir default overrides the BASTION_DIR environment variable. The CLI option should use the env var as its default.",
|
||||
"details": "Edit `bastion/src/cli/src/commands/serve.ts`:\n\n```typescript\n// Before (line 14):\n.option(\"--dir <dir>\", \"Bastion data directory\", \"/tmp/lab-bastion\")\n\n// After:\n.option(\n \"--dir <dir>\",\n \"Bastion data directory\",\n process.env[\"BASTION_DIR\"] ?? \"/tmp/lab-bastion\"\n)\n```\n\nThis ensures:\n1. If `BASTION_DIR` env var is set (e.g., in k8s deployment), it's used as default\n2. Explicit `--dir` flag still overrides both\n3. Falls back to `/tmp/lab-bastion` if neither is set\n\nAlso update the k8s deployment manifest `bastion/deploy/k3s/deployment.yaml` to ensure `BASTION_DIR=/data` is properly set.",
|
||||
"testStrategy": "1. Unit test: verify option default reads from process.env\n2. Integration test: set BASTION_DIR, run labctl without --dir, verify correct dir used\n3. Integration test: set BASTION_DIR, run labctl with --dir /custom, verify /custom used\n4. Test no env var: verify default /tmp/lab-bastion used",
|
||||
"priority": "high",
|
||||
"dependencies": [],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 76,
|
||||
"title": "Create Resource Type Registry with Aliases",
|
||||
"description": "Create a centralized resource type registry that maps resource names, plurals, and short aliases to canonical types. This enables kubectl-style resource resolution.",
|
||||
"details": "Create `bastion/src/cli/src/utils/resources.ts`:\n\n```typescript\nexport interface ResourceDefinition {\n kind: string; // Canonical type: \"Server\", \"Cluster\", etc.\n singular: string; // \"server\"\n plural: string; // \"servers\"\n aliases: string[]; // [\"srv\"]\n apiPath: string; // \"/api/servers\"\n columns: TableColumn[]; // Default columns for 'get' output\n wideColumns?: TableColumn[]; // Extra columns for -o wide\n}\n\nconst RESOURCE_DEFINITIONS: ResourceDefinition[] = [\n {\n kind: \"Server\",\n singular: \"server\",\n plural: \"servers\",\n aliases: [\"srv\"],\n apiPath: \"/api/servers\",\n columns: serverColumns,\n wideColumns: serverWideColumns,\n },\n {\n kind: \"Cluster\",\n singular: \"cluster\",\n plural: \"clusters\",\n aliases: [],\n apiPath: \"/api/clusters\",\n columns: clusterColumns,\n },\n {\n kind: \"Network\",\n singular: \"network\",\n plural: \"networks\",\n aliases: [\"net\"],\n apiPath: \"/api/networks\",\n columns: networkColumns,\n },\n // ... bastion, role, user, token, audit\n];\n\nconst aliasMap = new Map<string, ResourceDefinition>();\nfor (const def of RESOURCE_DEFINITIONS) {\n aliasMap.set(def.singular, def);\n aliasMap.set(def.plural, def);\n for (const alias of def.aliases) {\n aliasMap.set(alias, def);\n }\n}\n\nexport function resolveResourceType(input: string): ResourceDefinition {\n const normalized = input.toLowerCase();\n const def = aliasMap.get(normalized);\n if (!def) {\n const valid = RESOURCE_DEFINITIONS.map(d => d.plural).join(\", \");\n throw new Error(`Unknown resource type \"${input}\". Valid types: ${valid}`);\n }\n return def;\n}\n\nexport function resolveResourceIdentifier(input: string): {\n type: ResourceDefinition;\n name?: string;\n} {\n // Handle \"server/labmaster\" or just \"servers\"\n const parts = input.split(\"/\");\n const type = resolveResourceType(parts[0]);\n const name = parts.length > 1 ? parts.slice(1).join(\"/\") : undefined;\n return { type, name };\n}\n```\n\nUpdate `bastion/src/cli/src/utils/resource.ts` to use the new registry.",
|
||||
"testStrategy": "1. Unit test resolveResourceType with all aliases: server, servers, srv -> Server\n2. Test unknown resource type throws descriptive error\n3. Test case insensitivity: SERVER, Server, server all resolve correctly\n4. Test resolveResourceIdentifier parses \"server/labmaster\" correctly",
|
||||
"priority": "high",
|
||||
"dependencies": [],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 77,
|
||||
"title": "Implement 'labctl get' Command",
|
||||
"description": "Create the core 'labctl get <resource> [name]' command that lists resources with filtering and output format support. This is the foundation of the kubectl-style CLI.",
|
||||
"details": "Create `bastion/src/cli/src/commands/get.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType, type ResourceDefinition } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\nimport { formatOutput, type TableColumn } from \"../utils/table.js\";\n\nexport function registerGetCommand(program: Command): void {\n program\n .command(\"get <resource> [name]\")\n .description(\"List resources or get a specific resource by name\")\n .option(\"--status <status>\", \"Filter by status\")\n .option(\"--role <role>\", \"Filter by role (servers only)\")\n .option(\"--cloud <cloud>\", \"Filter by cloud\")\n .option(\"--env <environment>\", \"Filter by environment\")\n .option(\"-l, --label <label>\", \"Filter by label (key=value)\")\n .option(\"-A, --all-namespaces\", \"List across all clouds/environments\")\n .action(async (resource: string, name: string | undefined, opts) => {\n const config = program.opts()[\"_config\"];\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n let data: unknown[];\n \n if (name) {\n // Get specific resource - could be name, ID, or MAC\n const item = await client.getResource(resourceDef, name);\n data = item ? [item] : [];\n } else {\n // List with filters\n data = await client.listResources(resourceDef, {\n status: opts.status,\n role: opts.role,\n cloud: opts.allNamespaces ? undefined : (opts.cloud ?? config.defaultCloud),\n environment: opts.allNamespaces ? undefined : (opts.env ?? config.defaultEnvironment),\n label: opts.label,\n });\n }\n \n if (data.length === 0) {\n console.log(`No ${resourceDef.plural} found.`);\n return;\n }\n \n const columns = config.outputFormat === \"wide\" && resourceDef.wideColumns\n ? [...resourceDef.columns, ...resourceDef.wideColumns]\n : resourceDef.columns;\n \n formatOutput(data, config.outputFormat, columns);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nAdd to `index.ts`: `registerGetCommand(program);`\n\nExtend LabdClient with generic resource methods.",
|
||||
"testStrategy": "1. Integration test: `labctl get servers` returns list from labd\n2. Test filtering: `labctl get servers --status discovered` only shows discovered\n3. Test name lookup: `labctl get server labmaster` returns single server\n4. Test MAC lookup: `labctl get server 38:05:25:33:e2:e4` resolves by MAC\n5. Test output formats: -o json, -o yaml, -o wide produce correct output\n6. Test unknown resource: `labctl get foo` shows helpful error",
|
||||
"priority": "high",
|
||||
"dependencies": [
|
||||
76
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 78,
|
||||
"title": "Implement 'labctl describe' Command",
|
||||
"description": "Create the 'labctl describe <resource> <name>' command that shows detailed information about a resource including relationships, hardware info, and history.",
|
||||
"details": "Create `bastion/src/cli/src/commands/describe.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nconst BOLD = \"\\x1b[1m\";\nconst DIM = \"\\x1b[2m\";\nconst RESET = \"\\x1b[0m\";\n\ninterface DescribeSection {\n title: string;\n fields: Array<[string, string | undefined]>;\n}\n\nfunction printDescribe(name: string, sections: DescribeSection[]): void {\n console.log(`${BOLD}Name:${RESET} ${name}`);\n for (const section of sections) {\n console.log(`\\n${BOLD}${section.title}:${RESET}`);\n for (const [key, value] of section.fields) {\n if (value !== undefined) {\n console.log(` ${DIM}${key}:${RESET} ${value}`);\n }\n }\n }\n}\n\nexport function registerDescribeCommand(program: Command): void {\n program\n .command(\"describe <resource> <name>\")\n .description(\"Show detailed information about a resource\")\n .action(async (resource: string, name: string) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n const item = await client.describeResource(resourceDef, name);\n if (!item) {\n console.error(`${resourceDef.singular} \"${name}\" not found.`);\n process.exit(1);\n }\n \n // Resource-specific formatting\n switch (resourceDef.kind) {\n case \"Server\":\n printServerDescription(item);\n break;\n case \"Cluster\":\n printClusterDescription(item);\n break;\n default:\n console.log(JSON.stringify(item, null, 2));\n }\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n\nfunction printServerDescription(server: any): void {\n const sections: DescribeSection[] = [\n {\n title: \"Metadata\",\n fields: [\n [\"ID\", server.id],\n [\"Cloud\", server.cloud],\n [\"Environment\", server.environment],\n [\"Role\", server.role],\n [\"Status\", server.status],\n [\"Created\", server.createdAt],\n [\"Last Seen\", server.lastHeartbeat],\n ],\n },\n {\n title: \"Hardware\",\n fields: [\n [\"MAC\", server.mac],\n [\"IP\", server.ip],\n [\"Architecture\", server.hardwareInfo?.arch],\n [\"CPU\", server.hardwareInfo?.cpu_model],\n [\"Cores\", String(server.hardwareInfo?.cpu_cores)],\n [\"Memory\", `${server.hardwareInfo?.memory_gb}GB`],\n [\"Product\", server.hardwareInfo?.product],\n ],\n },\n ];\n \n if (server.nics?.length > 0) {\n sections.push({\n title: \"Network Interfaces\",\n fields: server.nics.map((n: any) => [n.name, `${n.mac} ${n.ip ?? \"\"} (${n.state})`]),\n });\n }\n \n if (server.disks?.length > 0) {\n sections.push({\n title: \"Disks\",\n fields: server.disks.map((d: any) => [d.name, `${d.sizeGb}GB ${d.model ?? \"\"}`]),\n });\n }\n \n if (server.clusterMemberships?.length > 0) {\n sections.push({\n title: \"Cluster Membership\",\n fields: server.clusterMemberships.map((m: any) => [m.cluster.name, m.role]),\n });\n }\n \n printDescribe(server.hostname, sections);\n}\n```",
|
||||
"testStrategy": "1. Integration test: `labctl describe server labmaster` shows full details\n2. Test hardware info display: CPU, memory, disks, NICs all shown\n3. Test cluster membership: server in cluster shows membership section\n4. Test not found: `labctl describe server nonexistent` shows helpful error\n5. Test different resource types: describe cluster, network, bastion",
|
||||
"priority": "medium",
|
||||
"dependencies": [
|
||||
77
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 79,
|
||||
"title": "Implement 'labctl create/delete' Commands",
|
||||
"description": "Create the 'labctl create <resource>' and 'labctl delete <resource> <name>' commands for creating and removing resources like networks, clusters, and tokens.",
|
||||
"details": "Create `bastion/src/cli/src/commands/create.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nexport function registerCreateCommand(program: Command): void {\n const create = program\n .command(\"create <resource>\")\n .description(\"Create a resource\");\n \n // labctl create network --name lab --cidr 192.168.8.0/24\n create\n .command(\"network\")\n .description(\"Create a network\")\n .requiredOption(\"--name <name>\", \"Network name\")\n .requiredOption(\"--cidr <cidr>\", \"Network CIDR (e.g., 192.168.8.0/24)\")\n .option(\"--gateway <gateway>\", \"Gateway IP\")\n .option(\"--vlan <vlan>\", \"VLAN ID\", parseInt)\n .option(\"--domain <domain>\", \"DNS domain\")\n .option(\"--dhcp\", \"Enable DHCP\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const network = await client.createNetwork({\n name: opts.name,\n cidr: opts.cidr,\n gateway: opts.gateway,\n vlan: opts.vlan,\n domain: opts.domain,\n dhcpEnabled: opts.dhcp ?? false,\n });\n console.log(`network/${network.name} created`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n \n // labctl create token --label \"worker enrollment\" --type reusable\n create\n .command(\"token\")\n .description(\"Create a join token\")\n .option(\"--label <label>\", \"Token label/description\")\n .option(\"--type <type>\", \"Token type: one-time or reusable\", \"one-time\")\n .option(\"--expires <duration>\", \"Expiration (e.g., 24h, 7d)\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const token = await client.createToken(opts);\n console.log(`Token created: ${token.token}`);\n if (opts.label) console.log(`Label: ${opts.label}`);\n if (token.expiresAt) console.log(`Expires: ${token.expiresAt}`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nCreate `bastion/src/cli/src/commands/delete.ts`:\n\n```typescript\nexport function registerDeleteCommand(program: Command): void {\n program\n .command(\"delete <resource> <name>\")\n .description(\"Delete a resource\")\n .option(\"--force\", \"Skip confirmation\")\n .action(async (resource: string, name: string, opts) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n if (!opts.force) {\n const { confirm } = await import(\"../utils/prompts.js\");\n const yes = await confirm(`Delete ${resourceDef.singular} \"${name}\"?`);\n if (!yes) {\n console.log(\"Cancelled.\");\n return;\n }\n }\n \n try {\n await client.deleteResource(resourceDef, name);\n console.log(`${resourceDef.singular}/${name} deleted`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```",
|
||||
"testStrategy": "1. Integration test: `labctl create network` creates network in DB\n2. Test validation: missing required flags shows helpful error\n3. Test token creation: token returned is valid UUID, stored in DB\n4. Test delete with confirmation: prompts user, respects --force\n5. Test delete cascade: deleting server removes NICs, disks\n6. Test delete protection: cannot delete bastion with connected servers",
|
||||
"priority": "medium",
|
||||
"dependencies": [
|
||||
77
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 80,
|
||||
"title": "Refactor Provision Commands to kubectl-style",
|
||||
"description": "Refactor existing provision commands to use kubectl-style syntax: 'labctl provision <server>' instead of 'labctl provision install <mac>'.",
|
||||
"details": "The new command structure should be:\n- `labctl provision <server> --os fedora-43 --role worker` (queue install)\n- `labctl reprovision <server>` (reinstall)\n- `labctl forget <server>` (remove from tracking)\n\nModify `bastion/src/cli/src/commands/install.ts` → rename to `provision.ts`:\n\n```typescript\nexport function registerProvisionCommand(program: Command): void {\n program\n .command(\"provision <server>\")\n .description(\"Queue a server for OS installation\")\n .requiredOption(\"--os <os>\", \"Operating system\", \"fedora-43\")\n .requiredOption(\"--role <role>\", \"Server role\", \"worker\")\n .option(\"--disk <disk>\", \"Target disk (auto-detected if not specified)\")\n .option(\"--hostname <hostname>\", \"Override hostname\")\n .action(async (server: string, opts) => {\n const client = getLabdClient();\n \n // Resolve server: could be hostname, MAC, or ID\n const resolved = await client.resolveServer(server);\n if (!resolved) {\n console.error(`Server \"${server}\" not found.`);\n console.error(\"Tip: Use 'labctl get servers' to see available servers.\");\n process.exit(1);\n }\n \n if (resolved.status === \"installed\") {\n console.error(`Server \"${resolved.hostname}\" is already installed.`);\n console.error(\"Tip: Use 'labctl reprovision' to reinstall.\");\n process.exit(1);\n }\n \n try {\n await client.provisionServer(resolved.mac, {\n hostname: opts.hostname ?? resolved.hostname,\n os: opts.os,\n role: opts.role,\n disk: opts.disk,\n });\n console.log(`Server ${resolved.hostname} queued for ${opts.os} installation as ${opts.role}.`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nSimilarly update reprovision.ts and forget.ts to accept server name/MAC/ID.\n\nUpdate index.ts to register commands at top level instead of under 'provision' subcommand.",
|
||||
"testStrategy": "1. Test server resolution: provision by hostname, MAC, or UUID all work\n2. Test already installed: provisioning installed server shows reprovision hint\n3. Test unknown server: helpful error message with tip\n4. Test reprovision: reinstalls installed server\n5. Test forget: removes server from all state categories\n6. Backward compat: verify 'labctl provision list' still works (deprecation warning)",
|
||||
"priority": "medium",
|
||||
"dependencies": [
|
||||
77
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 81,
|
||||
"title": "Implement Server and Resource API Endpoints in labd",
|
||||
"description": "Add REST API endpoints in labd for full resource CRUD operations: networks, clusters, tokens. Extend servers endpoint with filters and relationship includes.",
|
||||
"details": "Create/extend labd route files:\n\n1. **Extend servers.ts**:\n```typescript\n// GET /api/servers - with extended filters and includes\napp.get(\"/api/servers\", async (request, reply) => {\n const { status, role, cloud, environment, label, include } = request.query;\n \n const where = {};\n if (status) where.status = status;\n if (role) where.role = role;\n if (cloud) where.cloud = cloud;\n if (environment) where.environment = environment;\n if (label) where.labels = { path: [labelKey], equals: labelValue };\n \n const servers = await db.server.findMany({\n where,\n include: {\n nics: include?.includes(\"nics\"),\n disks: include?.includes(\"disks\"),\n clusterMemberships: include?.includes(\"clusters\") ? { include: { cluster: true } } : false,\n bastion: include?.includes(\"bastion\"),\n },\n });\n return servers;\n});\n\n// GET /api/servers/:id - by ID, hostname, or MAC\napp.get(\"/api/servers/:identifier\", async (request, reply) => {\n const { identifier } = request.params;\n \n // Try UUID first\n let server = await db.server.findUnique({ where: { id: identifier }, include: fullInclude });\n // Try hostname\n if (!server) server = await db.server.findUnique({ where: { hostname: identifier }, include: fullInclude });\n // Try MAC\n if (!server) server = await db.server.findUnique({ where: { mac: identifier.toLowerCase() }, include: fullInclude });\n \n if (!server) return reply.code(404).send({ error: \"Server not found\" });\n return server;\n});\n```\n\n2. **Create networks.ts**:\n```typescript\n// GET /api/networks, POST /api/networks, DELETE /api/networks/:id\nexport function registerNetworkRoutes(app: FastifyInstance, db: DbClient): void {\n app.get(\"/api/networks\", async () => db.network.findMany());\n \n app.post(\"/api/networks\", async (request, reply) => {\n const { name, cidr, gateway, vlan, domain, dhcpEnabled } = request.body;\n // Validate CIDR format\n const network = await db.network.create({ data: { name, cidr, gateway, vlan, domain, dhcpEnabled } });\n return reply.code(201).send(network);\n });\n \n app.delete(\"/api/networks/:id\", async (request, reply) => {\n await db.network.delete({ where: { id: request.params.id } });\n return reply.code(204).send();\n });\n}\n```\n\n3. **Create clusters.ts**:\n```typescript\n// Similar CRUD for clusters with member management\napp.get(\"/api/clusters/:id/members\", ...);\napp.post(\"/api/clusters/:id/members\", ...);\napp.delete(\"/api/clusters/:id/members/:serverId\", ...);\n```",
|
||||
"testStrategy": "1. Integration test all CRUD endpoints with HTTP client\n2. Test server resolution: by id, hostname, and MAC all return same server\n3. Test include parameter: nics, disks, clusters included when requested\n4. Test validation: invalid CIDR rejected, duplicate names rejected\n5. Test cascade: delete network with NICs fails or cascades appropriately",
|
||||
"priority": "medium",
|
||||
"dependencies": [
|
||||
72,
|
||||
73
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 82,
|
||||
"title": "Implement RBAC Permission Checks in CLI",
|
||||
"description": "Wire RBAC permission checks into CLI commands. Check user permissions before executing operations using the existing Permission model.",
|
||||
"details": "1. Create `bastion/src/cli/src/middleware/rbac.ts`:\n\n```typescript\nimport { getLabdClient } from \"../api/config.js\";\n\nexport interface PermissionContext {\n action: string; // read, exec, apply, destroy, manage, admin\n cloud?: string;\n environment?: string;\n server?: string;\n}\n\nexport async function checkPermission(ctx: PermissionContext): Promise<boolean> {\n const client = getLabdClient();\n try {\n const result = await client.checkPermission(ctx);\n return result.allowed;\n } catch {\n // If can't reach labd, fail open for local operations\n return true;\n }\n}\n\nexport async function requirePermission(ctx: PermissionContext): Promise<void> {\n const allowed = await checkPermission(ctx);\n if (!allowed) {\n throw new Error(\n `Permission denied: ${ctx.action} on ${ctx.server ?? \"*\"}@${ctx.cloud ?? \"*\"}/${ctx.environment ?? \"*\"}`\n );\n }\n}\n```\n\n2. Add labd endpoint `POST /api/auth/check-permission`:\n```typescript\napp.post(\"/api/auth/check-permission\", async (request, reply) => {\n const user = await authenticateRequest(request); // from cert or token\n const { action, cloud, environment, server } = request.body;\n \n const permissions = await db.permission.findMany({\n where: {\n role: { userBindings: { some: { userId: user.id } } },\n },\n });\n \n const allowed = permissions.some(p => \n matchesPattern(p.action, action) &&\n matchesPattern(p.cloud, cloud ?? \"*\") &&\n matchesPattern(p.environment, environment ?? \"*\") &&\n matchesPattern(p.server, server ?? \"*\")\n );\n \n return { allowed };\n});\n```\n\n3. Integrate into commands:\n```typescript\n// In provision command\nawait requirePermission({ action: \"apply\", cloud, environment, server: resolved.hostname });\n\n// In delete command\nawait requirePermission({ action: \"destroy\", cloud, environment, server: name });\n\n// In get command (filter results)\nconst servers = await client.listServers(filters);\nconst visible = await filterByPermission(servers, \"read\");\n```",
|
||||
"testStrategy": "1. Unit test permission matching logic with wildcards\n2. Test admin role: has access to all resources\n3. Test operator role: can read/exec but not destroy\n4. Test viewer role: can only read, provision denied\n5. Test scope matching: permission for cloud=aws doesn't grant access to cloud=baremetal\n6. Test denied action is audit-logged",
|
||||
"priority": "medium",
|
||||
"dependencies": [
|
||||
77,
|
||||
81
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 83,
|
||||
"title": "Implement Audit Logging for Resource Operations",
|
||||
"description": "Log all resource mutations to the AuditLog table. Include user, action, resource type/name, result, and source IP.",
|
||||
"details": "1. Create `bastion/src/labd/src/services/audit.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\n\nexport interface AuditEntry {\n userId?: string;\n serverId?: string;\n sessionId?: string;\n action: string; // create, update, delete, provision, exec, rbac-denied\n resourceType: string; // server, cluster, network, token, etc.\n resourceName: string;\n args?: string; // sanitized args (no secrets)\n result: \"success\" | \"denied\" | \"error\";\n durationMs?: number;\n sourceIp?: string;\n}\n\nexport class AuditService {\n constructor(private readonly db: PrismaClient) {}\n \n async log(entry: AuditEntry): Promise<void> {\n await this.db.auditLog.create({\n data: {\n userId: entry.userId,\n serverId: entry.serverId,\n sessionId: entry.sessionId,\n action: entry.action,\n resourceType: entry.resourceType,\n resourceName: entry.resourceName,\n args: entry.args,\n result: entry.result,\n durationMs: entry.durationMs,\n sourceIp: entry.sourceIp,\n },\n });\n }\n \n async query(filters: {\n userId?: string;\n action?: string;\n resourceType?: string;\n since?: Date;\n limit?: number;\n }): Promise<AuditEntry[]> {\n return this.db.auditLog.findMany({\n where: {\n userId: filters.userId,\n action: filters.action,\n resourceType: filters.resourceType,\n timestamp: filters.since ? { gte: filters.since } : undefined,\n },\n orderBy: { timestamp: \"desc\" },\n take: filters.limit ?? 100,\n });\n }\n}\n```\n\n2. Add Fastify hook to wrap route handlers:\n```typescript\napp.addHook(\"onResponse\", async (request, reply) => {\n // Log mutations (POST, PUT, DELETE)\n if ([\"POST\", \"PUT\", \"DELETE\"].includes(request.method)) {\n const path = request.url;\n const resourceMatch = path.match(/\\/api\\/(\\w+)(?:\\/([^/]+))?/);\n if (resourceMatch) {\n await auditService.log({\n action: methodToAction(request.method),\n resourceType: resourceMatch[1],\n resourceName: resourceMatch[2] ?? \"\",\n result: reply.statusCode < 400 ? \"success\" : \"error\",\n sourceIp: request.ip,\n });\n }\n }\n});\n```\n\n3. Add `labctl get audit` command to view audit logs.",
|
||||
"testStrategy": "1. Integration test: create network, verify audit log entry created\n2. Test RBAC denial is logged with result=denied\n3. Test sensitive data sanitization: tokens/passwords not in args\n4. Test query filters: by user, action, resourceType, time range\n5. Test `labctl get audit` displays recent entries correctly",
|
||||
"priority": "medium",
|
||||
"dependencies": [
|
||||
81,
|
||||
82
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
},
|
||||
{
|
||||
"id": 84,
|
||||
"title": "Update CLI Entry Point and Help Text",
|
||||
"description": "Update the CLI entry point to register all new commands and update help text to reflect the kubectl-style interface. Add deprecation warnings for old command structure.",
|
||||
"details": "Update `bastion/src/cli/src/index.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { APP_VERSION } from \"@lab/shared\";\nimport { loadConfig } from \"./config/index.js\";\n\n// New kubectl-style commands\nimport { registerGetCommand } from \"./commands/get.js\";\nimport { registerDescribeCommand } from \"./commands/describe.js\";\nimport { registerCreateCommand } from \"./commands/create.js\";\nimport { registerDeleteCommand } from \"./commands/delete.js\";\nimport { registerApplyCommand } from \"./commands/apply.js\";\nimport { registerEditCommand } from \"./commands/edit.js\";\n\n// Action commands\nimport { registerProvisionCommand } from \"./commands/provision.js\";\nimport { registerReprovisionCommand } from \"./commands/reprovision.js\";\nimport { registerForgetCommand } from \"./commands/forget.js\";\n\n// Bastion management\nimport { registerBastionCommand } from \"./commands/bastion.js\"; // start/stop/status\n\n// App management (unchanged)\nimport { registerAppCommand } from \"./commands/app.js\";\n\n// Utility\nimport { registerConfigCommand } from \"./commands/config.js\";\nimport { registerLoginCommand } from \"./commands/login.js\";\nimport { registerDoctorCommand } from \"./commands/doctor.js\";\n\nexport function createProgram(): Command {\n const program = new Command();\n \n program\n .name(\"labctl\")\n .description(\"Lab infrastructure management CLI\")\n .version(APP_VERSION);\n \n // Global options\n program\n .option(\"-o, --output <format>\", \"output format (table, json, yaml, wide)\", \"table\")\n .option(\"--server <url>\", \"override labd server URL\")\n .option(\"--env <name>\", \"override default environment\")\n .option(\"--cloud <name>\", \"override default cloud\")\n .option(\"--debug\", \"enable debug output\")\n .option(\"--no-color\", \"disable colored output\");\n \n // Core CRUD commands\n registerGetCommand(program); // labctl get <resource> [name]\n registerDescribeCommand(program); // labctl describe <resource> <name>\n registerCreateCommand(program); // labctl create <resource>\n registerDeleteCommand(program); // labctl delete <resource> <name>\n registerApplyCommand(program); // labctl apply -f <file>\n registerEditCommand(program); // labctl edit <resource> <name>\n \n // Provisioning actions\n registerProvisionCommand(program); // labctl provision <server>\n registerReprovisionCommand(program);// labctl reprovision <server>\n registerForgetCommand(program); // labctl forget <server>\n \n // Bastion management\n registerBastionCommand(program); // labctl bastion start|stop|status\n \n // App management\n registerAppCommand(program); // labctl app install|health k3s\n \n // Utility\n registerConfigCommand(program);\n registerLoginCommand(program);\n registerDoctorCommand(program);\n \n // Legacy compatibility with deprecation warnings\n registerLegacyCommands(program);\n \n return program;\n}\n\nfunction registerLegacyCommands(program: Command): void {\n // labctl provision list -> labctl get servers (with warning)\n program\n .command(\"provision\")\n .command(\"list\")\n .action(() => {\n console.warn(\"DEPRECATED: Use 'labctl get servers' instead.\");\n // Delegate to get servers\n });\n}\n```\n\nUpdate shell completions in `scripts/generate-completions.ts` for new command structure.",
|
||||
"testStrategy": "1. Test --help shows all new commands with descriptions\n2. Test resource type help: `labctl get --help` lists valid resources\n3. Test deprecated commands show warning but still work\n4. Test shell completions generated for new commands\n5. Test global options: -o, --server, --env, --cloud all work",
|
||||
"priority": "low",
|
||||
"dependencies": [
|
||||
77,
|
||||
78,
|
||||
79,
|
||||
80
|
||||
],
|
||||
"status": "pending",
|
||||
"subtasks": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"created": "2026-03-26T04:26:49.813Z",
|
||||
"updated": "2026-03-26T04:26:49.813Z",
|
||||
"description": "Tasks for master context"
|
||||
}
|
||||
}
|
||||
}
|
||||
47
.taskmaster/templates/example_prd.txt
Normal file
47
.taskmaster/templates/example_prd.txt
Normal file
@@ -0,0 +1,47 @@
|
||||
<context>
|
||||
# Overview
|
||||
[Provide a high-level overview of your product here. Explain what problem it solves, who it's for, and why it's valuable.]
|
||||
|
||||
# Core Features
|
||||
[List and describe the main features of your product. For each feature, include:
|
||||
- What it does
|
||||
- Why it's important
|
||||
- How it works at a high level]
|
||||
|
||||
# User Experience
|
||||
[Describe the user journey and experience. Include:
|
||||
- User personas
|
||||
- Key user flows
|
||||
- UI/UX considerations]
|
||||
</context>
|
||||
<PRD>
|
||||
# Technical Architecture
|
||||
[Outline the technical implementation details:
|
||||
- System components
|
||||
- Data models
|
||||
- APIs and integrations
|
||||
- Infrastructure requirements]
|
||||
|
||||
# Development Roadmap
|
||||
[Break down the development process into phases:
|
||||
- MVP requirements
|
||||
- Future enhancements
|
||||
- Do not think about timelines whatsoever -- all that matters is scope and detailing exactly what needs to be build in each phase so it can later be cut up into tasks]
|
||||
|
||||
# Logical Dependency Chain
|
||||
[Define the logical order of development:
|
||||
- Which features need to be built first (foundation)
|
||||
- Getting as quickly as possible to something usable/visible front end that works
|
||||
- Properly pacing and scoping each feature so it is atomic but can also be built upon and improved as development approaches]
|
||||
|
||||
# Risks and Mitigations
|
||||
[Identify potential risks and how they'll be addressed:
|
||||
- Technical challenges
|
||||
- Figuring out the MVP that we can build upon
|
||||
- Resource constraints]
|
||||
|
||||
# Appendix
|
||||
[Include any additional information:
|
||||
- Research findings
|
||||
- Technical specifications]
|
||||
</PRD>
|
||||
511
.taskmaster/templates/example_prd_rpg.txt
Normal file
511
.taskmaster/templates/example_prd_rpg.txt
Normal file
@@ -0,0 +1,511 @@
|
||||
<rpg-method>
|
||||
# Repository Planning Graph (RPG) Method - PRD Template
|
||||
|
||||
This template teaches you (AI or human) how to create structured, dependency-aware PRDs using the RPG methodology from Microsoft Research. The key insight: separate WHAT (functional) from HOW (structural), then connect them with explicit dependencies.
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Dual-Semantics**: Think functional (capabilities) AND structural (code organization) separately, then map them
|
||||
2. **Explicit Dependencies**: Never assume - always state what depends on what
|
||||
3. **Topological Order**: Build foundation first, then layers on top
|
||||
4. **Progressive Refinement**: Start broad, refine iteratively
|
||||
|
||||
## How to Use This Template
|
||||
|
||||
- Follow the instructions in each `<instruction>` block
|
||||
- Look at `<example>` blocks to see good vs bad patterns
|
||||
- Fill in the content sections with your project details
|
||||
- The AI reading this will learn the RPG method by following along
|
||||
- Task Master will parse the resulting PRD into dependency-aware tasks
|
||||
|
||||
## Recommended Tools for Creating PRDs
|
||||
|
||||
When using this template to **create** a PRD (not parse it), use **code-context-aware AI assistants** for best results:
|
||||
|
||||
**Why?** The AI needs to understand your existing codebase to make good architectural decisions about modules, dependencies, and integration points.
|
||||
|
||||
**Recommended tools:**
|
||||
- **Claude Code** (claude-code CLI) - Best for structured reasoning and large contexts
|
||||
- **Cursor/Windsurf** - IDE integration with full codebase context
|
||||
- **Gemini CLI** (gemini-cli) - Massive context window for large codebases
|
||||
- **Codex/Grok CLI** - Strong code generation with context awareness
|
||||
|
||||
**Note:** Once your PRD is created, `task-master parse-prd` works with any configured AI model - it just needs to read the PRD text itself, not your codebase.
|
||||
</rpg-method>
|
||||
|
||||
---
|
||||
|
||||
<overview>
|
||||
<instruction>
|
||||
Start with the problem, not the solution. Be specific about:
|
||||
- What pain point exists?
|
||||
- Who experiences it?
|
||||
- Why existing solutions don't work?
|
||||
- What success looks like (measurable outcomes)?
|
||||
|
||||
Keep this section focused - don't jump into implementation details yet.
|
||||
</instruction>
|
||||
|
||||
## Problem Statement
|
||||
[Describe the core problem. Be concrete about user pain points.]
|
||||
|
||||
## Target Users
|
||||
[Define personas, their workflows, and what they're trying to achieve.]
|
||||
|
||||
## Success Metrics
|
||||
[Quantifiable outcomes. Examples: "80% task completion via autopilot", "< 5% manual intervention rate"]
|
||||
|
||||
</overview>
|
||||
|
||||
---
|
||||
|
||||
<functional-decomposition>
|
||||
<instruction>
|
||||
Now think about CAPABILITIES (what the system DOES), not code structure yet.
|
||||
|
||||
Step 1: Identify high-level capability domains
|
||||
- Think: "What major things does this system do?"
|
||||
- Examples: Data Management, Core Processing, Presentation Layer
|
||||
|
||||
Step 2: For each capability, enumerate specific features
|
||||
- Use explore-exploit strategy:
|
||||
* Exploit: What features are REQUIRED for core value?
|
||||
* Explore: What features make this domain COMPLETE?
|
||||
|
||||
Step 3: For each feature, define:
|
||||
- Description: What it does in one sentence
|
||||
- Inputs: What data/context it needs
|
||||
- Outputs: What it produces/returns
|
||||
- Behavior: Key logic or transformations
|
||||
|
||||
<example type="good">
|
||||
Capability: Data Validation
|
||||
Feature: Schema validation
|
||||
- Description: Validate JSON payloads against defined schemas
|
||||
- Inputs: JSON object, schema definition
|
||||
- Outputs: Validation result (pass/fail) + error details
|
||||
- Behavior: Iterate fields, check types, enforce constraints
|
||||
|
||||
Feature: Business rule validation
|
||||
- Description: Apply domain-specific validation rules
|
||||
- Inputs: Validated data object, rule set
|
||||
- Outputs: Boolean + list of violated rules
|
||||
- Behavior: Execute rules sequentially, short-circuit on failure
|
||||
</example>
|
||||
|
||||
<example type="bad">
|
||||
Capability: validation.js
|
||||
(Problem: This is a FILE, not a CAPABILITY. Mixing structure into functional thinking.)
|
||||
|
||||
Capability: Validation
|
||||
Feature: Make sure data is good
|
||||
(Problem: Too vague. No inputs/outputs. Not actionable.)
|
||||
</example>
|
||||
</instruction>
|
||||
|
||||
## Capability Tree
|
||||
|
||||
### Capability: [Name]
|
||||
[Brief description of what this capability domain covers]
|
||||
|
||||
#### Feature: [Name]
|
||||
- **Description**: [One sentence]
|
||||
- **Inputs**: [What it needs]
|
||||
- **Outputs**: [What it produces]
|
||||
- **Behavior**: [Key logic]
|
||||
|
||||
#### Feature: [Name]
|
||||
- **Description**:
|
||||
- **Inputs**:
|
||||
- **Outputs**:
|
||||
- **Behavior**:
|
||||
|
||||
### Capability: [Name]
|
||||
...
|
||||
|
||||
</functional-decomposition>
|
||||
|
||||
---
|
||||
|
||||
<structural-decomposition>
|
||||
<instruction>
|
||||
NOW think about code organization. Map capabilities to actual file/folder structure.
|
||||
|
||||
Rules:
|
||||
1. Each capability maps to a module (folder or file)
|
||||
2. Features within a capability map to functions/classes
|
||||
3. Use clear module boundaries - each module has ONE responsibility
|
||||
4. Define what each module exports (public interface)
|
||||
|
||||
The goal: Create a clear mapping between "what it does" (functional) and "where it lives" (structural).
|
||||
|
||||
<example type="good">
|
||||
Capability: Data Validation
|
||||
→ Maps to: src/validation/
|
||||
├── schema-validator.js (Schema validation feature)
|
||||
├── rule-validator.js (Business rule validation feature)
|
||||
└── index.js (Public exports)
|
||||
|
||||
Exports:
|
||||
- validateSchema(data, schema)
|
||||
- validateRules(data, rules)
|
||||
</example>
|
||||
|
||||
<example type="bad">
|
||||
Capability: Data Validation
|
||||
→ Maps to: src/utils.js
|
||||
(Problem: "utils" is not a clear module boundary. Where do I find validation logic?)
|
||||
|
||||
Capability: Data Validation
|
||||
→ Maps to: src/validation/everything.js
|
||||
(Problem: One giant file. Features should map to separate files for maintainability.)
|
||||
</example>
|
||||
</instruction>
|
||||
|
||||
## Repository Structure
|
||||
|
||||
```
|
||||
project-root/
|
||||
├── src/
|
||||
│ ├── [module-name]/ # Maps to: [Capability Name]
|
||||
│ │ ├── [file].js # Maps to: [Feature Name]
|
||||
│ │ └── index.js # Public exports
|
||||
│ └── [module-name]/
|
||||
├── tests/
|
||||
└── docs/
|
||||
```
|
||||
|
||||
## Module Definitions
|
||||
|
||||
### Module: [Name]
|
||||
- **Maps to capability**: [Capability from functional decomposition]
|
||||
- **Responsibility**: [Single clear purpose]
|
||||
- **File structure**:
|
||||
```
|
||||
module-name/
|
||||
├── feature1.js
|
||||
├── feature2.js
|
||||
└── index.js
|
||||
```
|
||||
- **Exports**:
|
||||
- `functionName()` - [what it does]
|
||||
- `ClassName` - [what it does]
|
||||
|
||||
</structural-decomposition>
|
||||
|
||||
---
|
||||
|
||||
<dependency-graph>
|
||||
<instruction>
|
||||
This is THE CRITICAL SECTION for Task Master parsing.
|
||||
|
||||
Define explicit dependencies between modules. This creates the topological order for task execution.
|
||||
|
||||
Rules:
|
||||
1. List modules in dependency order (foundation first)
|
||||
2. For each module, state what it depends on
|
||||
3. Foundation modules should have NO dependencies
|
||||
4. Every non-foundation module should depend on at least one other module
|
||||
5. Think: "What must EXIST before I can build this module?"
|
||||
|
||||
<example type="good">
|
||||
Foundation Layer (no dependencies):
|
||||
- error-handling: No dependencies
|
||||
- config-manager: No dependencies
|
||||
- base-types: No dependencies
|
||||
|
||||
Data Layer:
|
||||
- schema-validator: Depends on [base-types, error-handling]
|
||||
- data-ingestion: Depends on [schema-validator, config-manager]
|
||||
|
||||
Core Layer:
|
||||
- algorithm-engine: Depends on [base-types, error-handling]
|
||||
- pipeline-orchestrator: Depends on [algorithm-engine, data-ingestion]
|
||||
</example>
|
||||
|
||||
<example type="bad">
|
||||
- validation: Depends on API
|
||||
- API: Depends on validation
|
||||
(Problem: Circular dependency. This will cause build/runtime issues.)
|
||||
|
||||
- user-auth: Depends on everything
|
||||
(Problem: Too many dependencies. Should be more focused.)
|
||||
</example>
|
||||
</instruction>
|
||||
|
||||
## Dependency Chain
|
||||
|
||||
### Foundation Layer (Phase 0)
|
||||
No dependencies - these are built first.
|
||||
|
||||
- **[Module Name]**: [What it provides]
|
||||
- **[Module Name]**: [What it provides]
|
||||
|
||||
### [Layer Name] (Phase 1)
|
||||
- **[Module Name]**: Depends on [[module-from-phase-0], [module-from-phase-0]]
|
||||
- **[Module Name]**: Depends on [[module-from-phase-0]]
|
||||
|
||||
### [Layer Name] (Phase 2)
|
||||
- **[Module Name]**: Depends on [[module-from-phase-1], [module-from-foundation]]
|
||||
|
||||
[Continue building up layers...]
|
||||
|
||||
</dependency-graph>
|
||||
|
||||
---
|
||||
|
||||
<implementation-roadmap>
|
||||
<instruction>
|
||||
Turn the dependency graph into concrete development phases.
|
||||
|
||||
Each phase should:
|
||||
1. Have clear entry criteria (what must exist before starting)
|
||||
2. Contain tasks that can be parallelized (no inter-dependencies within phase)
|
||||
3. Have clear exit criteria (how do we know phase is complete?)
|
||||
4. Build toward something USABLE (not just infrastructure)
|
||||
|
||||
Phase ordering follows topological sort of dependency graph.
|
||||
|
||||
<example type="good">
|
||||
Phase 0: Foundation
|
||||
Entry: Clean repository
|
||||
Tasks:
|
||||
- Implement error handling utilities
|
||||
- Create base type definitions
|
||||
- Setup configuration system
|
||||
Exit: Other modules can import foundation without errors
|
||||
|
||||
Phase 1: Data Layer
|
||||
Entry: Phase 0 complete
|
||||
Tasks:
|
||||
- Implement schema validator (uses: base types, error handling)
|
||||
- Build data ingestion pipeline (uses: validator, config)
|
||||
Exit: End-to-end data flow from input to validated output
|
||||
</example>
|
||||
|
||||
<example type="bad">
|
||||
Phase 1: Build Everything
|
||||
Tasks:
|
||||
- API
|
||||
- Database
|
||||
- UI
|
||||
- Tests
|
||||
(Problem: No clear focus. Too broad. Dependencies not considered.)
|
||||
</example>
|
||||
</instruction>
|
||||
|
||||
## Development Phases
|
||||
|
||||
### Phase 0: [Foundation Name]
|
||||
**Goal**: [What foundational capability this establishes]
|
||||
|
||||
**Entry Criteria**: [What must be true before starting]
|
||||
|
||||
**Tasks**:
|
||||
- [ ] [Task name] (depends on: [none or list])
|
||||
- Acceptance criteria: [How we know it's done]
|
||||
- Test strategy: [What tests prove it works]
|
||||
|
||||
- [ ] [Task name] (depends on: [none or list])
|
||||
|
||||
**Exit Criteria**: [Observable outcome that proves phase complete]
|
||||
|
||||
**Delivers**: [What can users/developers do after this phase?]
|
||||
|
||||
---
|
||||
|
||||
### Phase 1: [Layer Name]
|
||||
**Goal**:
|
||||
|
||||
**Entry Criteria**: Phase 0 complete
|
||||
|
||||
**Tasks**:
|
||||
- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
|
||||
- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
|
||||
|
||||
**Exit Criteria**:
|
||||
|
||||
**Delivers**:
|
||||
|
||||
---
|
||||
|
||||
[Continue with more phases...]
|
||||
|
||||
</implementation-roadmap>
|
||||
|
||||
---
|
||||
|
||||
<test-strategy>
|
||||
<instruction>
|
||||
Define how testing will be integrated throughout development (TDD approach).
|
||||
|
||||
Specify:
|
||||
1. Test pyramid ratios (unit vs integration vs e2e)
|
||||
2. Coverage requirements
|
||||
3. Critical test scenarios
|
||||
4. Test generation guidelines for Surgical Test Generator
|
||||
|
||||
This section guides the AI when generating tests during the RED phase of TDD.
|
||||
|
||||
<example type="good">
|
||||
Critical Test Scenarios for Data Validation module:
|
||||
- Happy path: Valid data passes all checks
|
||||
- Edge cases: Empty strings, null values, boundary numbers
|
||||
- Error cases: Invalid types, missing required fields
|
||||
- Integration: Validator works with ingestion pipeline
|
||||
</example>
|
||||
</instruction>
|
||||
|
||||
## Test Pyramid
|
||||
|
||||
```
|
||||
/\
|
||||
/E2E\ ← [X]% (End-to-end, slow, comprehensive)
|
||||
/------\
|
||||
/Integration\ ← [Y]% (Module interactions)
|
||||
/------------\
|
||||
/ Unit Tests \ ← [Z]% (Fast, isolated, deterministic)
|
||||
/----------------\
|
||||
```
|
||||
|
||||
## Coverage Requirements
|
||||
- Line coverage: [X]% minimum
|
||||
- Branch coverage: [X]% minimum
|
||||
- Function coverage: [X]% minimum
|
||||
- Statement coverage: [X]% minimum
|
||||
|
||||
## Critical Test Scenarios
|
||||
|
||||
### [Module/Feature Name]
|
||||
**Happy path**:
|
||||
- [Scenario description]
|
||||
- Expected: [What should happen]
|
||||
|
||||
**Edge cases**:
|
||||
- [Scenario description]
|
||||
- Expected: [What should happen]
|
||||
|
||||
**Error cases**:
|
||||
- [Scenario description]
|
||||
- Expected: [How system handles failure]
|
||||
|
||||
**Integration points**:
|
||||
- [What interactions to test]
|
||||
- Expected: [End-to-end behavior]
|
||||
|
||||
## Test Generation Guidelines
|
||||
[Specific instructions for Surgical Test Generator about what to focus on, what patterns to follow, project-specific test conventions]
|
||||
|
||||
</test-strategy>
|
||||
|
||||
---
|
||||
|
||||
<architecture>
|
||||
<instruction>
|
||||
Describe technical architecture, data models, and key design decisions.
|
||||
|
||||
Keep this section AFTER functional/structural decomposition - implementation details come after understanding structure.
|
||||
</instruction>
|
||||
|
||||
## System Components
|
||||
[Major architectural pieces and their responsibilities]
|
||||
|
||||
## Data Models
|
||||
[Core data structures, schemas, database design]
|
||||
|
||||
## Technology Stack
|
||||
[Languages, frameworks, key libraries]
|
||||
|
||||
**Decision: [Technology/Pattern]**
|
||||
- **Rationale**: [Why chosen]
|
||||
- **Trade-offs**: [What we're giving up]
|
||||
- **Alternatives considered**: [What else we looked at]
|
||||
|
||||
</architecture>
|
||||
|
||||
---
|
||||
|
||||
<risks>
|
||||
<instruction>
|
||||
Identify risks that could derail development and how to mitigate them.
|
||||
|
||||
Categories:
|
||||
- Technical risks (complexity, unknowns)
|
||||
- Dependency risks (blocking issues)
|
||||
- Scope risks (creep, underestimation)
|
||||
</instruction>
|
||||
|
||||
## Technical Risks
|
||||
**Risk**: [Description]
|
||||
- **Impact**: [High/Medium/Low - effect on project]
|
||||
- **Likelihood**: [High/Medium/Low]
|
||||
- **Mitigation**: [How to address]
|
||||
- **Fallback**: [Plan B if mitigation fails]
|
||||
|
||||
## Dependency Risks
|
||||
[External dependencies, blocking issues]
|
||||
|
||||
## Scope Risks
|
||||
[Scope creep, underestimation, unclear requirements]
|
||||
|
||||
</risks>
|
||||
|
||||
---
|
||||
|
||||
<appendix>
|
||||
## References
|
||||
[Papers, documentation, similar systems]
|
||||
|
||||
## Glossary
|
||||
[Domain-specific terms]
|
||||
|
||||
## Open Questions
|
||||
[Things to resolve during development]
|
||||
</appendix>
|
||||
|
||||
---
|
||||
|
||||
<task-master-integration>
|
||||
# How Task Master Uses This PRD
|
||||
|
||||
When you run `task-master parse-prd <file>.txt`, the parser:
|
||||
|
||||
1. **Extracts capabilities** → Main tasks
|
||||
- Each `### Capability:` becomes a top-level task
|
||||
|
||||
2. **Extracts features** → Subtasks
|
||||
- Each `#### Feature:` becomes a subtask under its capability
|
||||
|
||||
3. **Parses dependencies** → Task dependencies
|
||||
- `Depends on: [X, Y]` sets task.dependencies = ["X", "Y"]
|
||||
|
||||
4. **Orders by phases** → Task priorities
|
||||
- Phase 0 tasks = highest priority
|
||||
- Phase N tasks = lower priority, properly sequenced
|
||||
|
||||
5. **Uses test strategy** → Test generation context
|
||||
- Feeds test scenarios to Surgical Test Generator during implementation
|
||||
|
||||
**Result**: A dependency-aware task graph that can be executed in topological order.
|
||||
|
||||
## Why RPG Structure Matters
|
||||
|
||||
Traditional flat PRDs lead to:
|
||||
- ❌ Unclear task dependencies
|
||||
- ❌ Arbitrary task ordering
|
||||
- ❌ Circular dependencies discovered late
|
||||
- ❌ Poorly scoped tasks
|
||||
|
||||
RPG-structured PRDs provide:
|
||||
- ✅ Explicit dependency chains
|
||||
- ✅ Topological execution order
|
||||
- ✅ Clear module boundaries
|
||||
- ✅ Validated task graph before implementation
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
1. **Spend time on dependency graph** - This is the most valuable section for Task Master
|
||||
2. **Keep features atomic** - Each feature should be independently testable
|
||||
3. **Progressive refinement** - Start broad, use `task-master expand` to break down complex tasks
|
||||
4. **Use research mode** - `task-master parse-prd --research` leverages AI for better task generation
|
||||
</task-master-integration>
|
||||
Reference in New Issue
Block a user