feat: install logging, error trapping, PXE/ISO integration tests
Some checks failed
CI/CD / lint (pull_request) Failing after 13s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 36s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped

Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.

Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts

Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)

dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility

Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)

201 unit tests passing (11 new).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Michal
2026-03-26 22:26:33 +00:00
parent ffc4a782d2
commit 46b017d77e
189 changed files with 16241 additions and 432 deletions

12
.env.example Normal file
View File

@@ -0,0 +1,12 @@
# API Keys (Required to enable respective provider)
ANTHROPIC_API_KEY="your_anthropic_api_key_here" # Required: Format: sk-ant-api03-...
PERPLEXITY_API_KEY="your_perplexity_api_key_here" # Optional: Format: pplx-...
OPENAI_API_KEY="your_openai_api_key_here" # Optional, for OpenAI models. Format: sk-proj-...
GOOGLE_API_KEY="your_google_api_key_here" # Optional, for Google Gemini models.
MISTRAL_API_KEY="your_mistral_key_here" # Optional, for Mistral AI models.
XAI_API_KEY="YOUR_XAI_KEY_HERE" # Optional, for xAI AI models.
GROQ_API_KEY="YOUR_GROQ_KEY_HERE" # Optional, for Groq models.
OPENROUTER_API_KEY="YOUR_OPENROUTER_KEY_HERE" # Optional, for OpenRouter models.
AZURE_OPENAI_API_KEY="your_azure_key_here" # Optional, for Azure OpenAI models (requires endpoint in .taskmaster/config.json).
OLLAMA_API_KEY="your_ollama_api_key_here" # Optional: For remote Ollama servers that require authentication.
GITHUB_API_KEY="your_github_api_key_here" # Optional: For GitHub import/export features. Format: ghp_... or github_pat_...

25
.gitignore vendored Normal file
View File

@@ -0,0 +1,25 @@
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
dev-debug.log
# Dependency directories
node_modules/
# Environment variables
.env
# Editor directories and files
.idea
.vscode
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
# OS specific
.DS_Store

12
.mcp.json Normal file
View File

@@ -0,0 +1,12 @@
{
"mcpServers": {
"labctl": {
"command": "mcpctl",
"args": [
"mcp",
"-p",
"labctl"
]
}
}
}

View File

@@ -1,22 +1,21 @@
{
"models": {
"main": {
"provider": "anthropic",
"modelId": "claude-sonnet-4-20250514",
"maxTokens": 64000,
"provider": "claude-code",
"modelId": "opus",
"maxTokens": 32000,
"temperature": 0.2
},
"research": {
"provider": "anthropic",
"modelId": "claude-sonnet-4-20250514",
"maxTokens": 64000,
"provider": "claude-code",
"modelId": "opus",
"maxTokens": 32000,
"temperature": 0.2
},
"resolution": "main",
"fallback": {
"provider": "anthropic",
"modelId": "claude-3-7-sonnet-20250219",
"maxTokens": 120000,
"provider": "claude-code",
"modelId": "sonnet",
"maxTokens": 64000,
"temperature": 0.2
}
},

6
.taskmaster/state.json Normal file
View File

@@ -0,0 +1,6 @@
{
"currentTag": "master",
"lastSwitched": "2026-03-18T00:17:54.213Z",
"branchTagMapping": {},
"migrationNoticeShown": true
}

View File

@@ -0,0 +1,180 @@
{
"master": {
"tasks": [
{
"id": 72,
"title": "Expand Prisma Schema with Resource Relationships",
"description": "Add Network, ServerNic, ServerDisk, and ClusterMember models to the Prisma schema. Add bastionId foreign key to Server model to track which bastion owns each server.",
"details": "Edit `bastion/src/labd/prisma/schema.prisma` to add:\n\n1. **Server model changes**:\n - Add `bastionId String?` with relation to Bastion\n - Add `hardwareInfo Json?` for storing raw HardwareInfo\n - Add `os String?` for installed OS\n\n2. **Network model**:\n```prisma\nmodel Network {\n id String @id @default(uuid())\n name String @unique\n cidr String\n vlan Int?\n gateway String?\n domain String?\n dhcpEnabled Boolean @default(false)\n createdAt DateTime @default(now())\n updatedAt DateTime @updatedAt\n \n nics ServerNic[]\n}\n```\n\n3. **ServerNic model**:\n```prisma\nmodel ServerNic {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n networkId String?\n network Network? @relation(fields: [networkId], references: [id])\n mac String\n ip String?\n name String\n state String @default(\"DOWN\")\n \n @@unique([serverId, mac])\n @@index([networkId])\n}\n```\n\n4. **ServerDisk model**:\n```prisma\nmodel ServerDisk {\n id String @id @default(uuid())\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n name String\n sizeGb Float\n model String?\n \n @@unique([serverId, name])\n}\n```\n\n5. **ClusterMember model**:\n```prisma\nmodel ClusterMember {\n id String @id @default(uuid())\n clusterId String\n cluster Cluster @relation(fields: [clusterId], references: [id], onDelete: Cascade)\n serverId String\n server Server @relation(fields: [serverId], references: [id], onDelete: Cascade)\n role String @default(\"worker\") // control-plane, worker\n joinedAt DateTime @default(now())\n \n @@unique([clusterId, serverId])\n @@index([clusterId])\n @@index([serverId])\n}\n```\n\n6. Update Server model with relations to nics, disks, clusterMemberships, and bastion.\n\nRun `pnpm prisma generate` and `pnpm prisma migrate dev --name add-resource-models`.",
"testStrategy": "1. Run `pnpm prisma validate` to verify schema syntax\n2. Run `pnpm prisma generate` to confirm client generation\n3. Create migration and verify it applies cleanly to local CockroachDB\n4. Write unit tests that create/read/delete each new model\n5. Verify cascade deletes work (deleting Server removes its NICs and Disks)",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 73,
"title": "Implement State Persistence Service in labd",
"description": "Create a new service in labd that persists bastion state syncs to the Server table in CockroachDB. When bastion-state-sync messages arrive, upsert machines into Server with their hardware info, status, and ownership.",
"details": "Create `bastion/src/labd/src/services/state-persistence.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\nimport type { BastionState, HardwareInfo, InstallConfig, InstalledInfo } from \"@lab/shared\";\nimport { logger } from \"./logger.js\";\n\nexport class StatePersistence {\n constructor(private readonly db: PrismaClient) {}\n\n async syncBastionState(bastionId: string, state: BastionState): Promise<void> {\n // Process discovered machines\n for (const [mac, hw] of Object.entries(state.discovered)) {\n await this.upsertDiscoveredServer(bastionId, mac, hw);\n }\n \n // Process queued machines (update status to provisioning)\n for (const [mac, cfg] of Object.entries(state.install_queue)) {\n await this.upsertQueuedServer(bastionId, mac, cfg);\n }\n \n // Process installed machines\n for (const [mac, info] of Object.entries(state.installed)) {\n await this.upsertInstalledServer(bastionId, mac, info);\n }\n }\n\n private async upsertDiscoveredServer(bastionId: string, mac: string, hw: HardwareInfo): Promise<void> {\n const normalized = mac.toLowerCase();\n \n await this.db.server.upsert({\n where: { mac: normalized },\n create: {\n hostname: `unknown-${normalized.replace(/:/g, \"\").slice(-6)}`,\n mac: normalized,\n bastionId,\n status: \"discovered\",\n hardwareInfo: hw as any,\n labels: {\n arch: hw.arch,\n cpu_model: hw.cpu_model,\n cpu_cores: hw.cpu_cores,\n memory_gb: hw.memory_gb,\n },\n },\n update: {\n bastionId,\n status: \"discovered\", // only if not already provisioning/installed\n hardwareInfo: hw as any,\n },\n });\n \n // Sync NICs and Disks\n await this.syncServerHardware(normalized, hw);\n }\n \n private async syncServerHardware(mac: string, hw: HardwareInfo): Promise<void> {\n const server = await this.db.server.findUnique({ where: { mac } });\n if (!server) return;\n \n // Upsert NICs\n for (const nic of hw.nics) {\n await this.db.serverNic.upsert({\n where: { serverId_mac: { serverId: server.id, mac: nic.mac.toLowerCase() } },\n create: { serverId: server.id, mac: nic.mac.toLowerCase(), name: nic.name, state: nic.state },\n update: { name: nic.name, state: nic.state },\n });\n }\n \n // Upsert Disks\n for (const disk of hw.disks) {\n await this.db.serverDisk.upsert({\n where: { serverId_name: { serverId: server.id, name: disk.name } },\n create: { serverId: server.id, name: disk.name, sizeGb: disk.size_gb, model: disk.model },\n update: { sizeGb: disk.size_gb, model: disk.model },\n });\n }\n }\n \n // Similar methods for upsertQueuedServer and upsertInstalledServer...\n}\n```\n\nIntegrate into `server.ts` WebSocket handler by calling `statePersistence.syncBastionState()` when `bastion-state-sync` messages arrive.",
"testStrategy": "1. Unit test StatePersistence with mocked PrismaClient\n2. Integration test: simulate bastion-state-sync message, verify Server rows created\n3. Test idempotency: send same state twice, verify no duplicates\n4. Test status transitions: discovered -> provisioning -> installed\n5. Verify hardware info (NICs, Disks) is correctly persisted",
"priority": "high",
"dependencies": [
72
],
"status": "pending",
"subtasks": []
},
{
"id": 74,
"title": "Add State Loading from labd on Bastion Startup",
"description": "Modify bastion startup to request its persisted state from labd before using the local JSON cache. This ensures bastions restore their state after pod restarts.",
"details": "1. Add new labd API endpoint `GET /api/bastions/:id/state` that returns the aggregated state for a specific bastion from the Server table:\n\n```typescript\n// bastion/src/labd/src/routes/bastions.ts\napp.get<{ Params: { id: string } }>(\"/api/bastions/:id/state\", async (request, reply) => {\n const { id } = request.params;\n \n const servers = await db.server.findMany({\n where: { bastionId: id },\n include: { nics: true, disks: true },\n });\n \n // Transform back to BastionState format\n const state: BastionState = { discovered: {}, install_queue: {}, installed: {} };\n for (const server of servers) {\n const mac = server.mac;\n if (!mac) continue;\n \n switch (server.status) {\n case \"discovered\":\n state.discovered[mac] = transformToHardwareInfo(server);\n break;\n case \"provisioning\":\n state.install_queue[mac] = transformToInstallConfig(server);\n break;\n case \"installed\":\n state.installed[mac] = transformToInstalledInfo(server);\n break;\n }\n }\n \n return reply.send(state);\n});\n```\n\n2. Modify `BastionConnection.connect()` in `labd-connection.ts` to fetch state after enrollment:\n\n```typescript\nprivate async loadRemoteState(): Promise<BastionState | null> {\n if (!this.bastionId || !this.config.labdUrl) return null;\n try {\n const resp = await fetch(`${this.config.labdUrl}/api/bastions/${this.bastionId}/state`);\n if (resp.ok) return await resp.json();\n } catch { /* fall back to local */ }\n return null;\n}\n```\n\n3. In bastion `main.ts`, after establishing labd connection, merge remote state with local state (remote takes precedence for installed machines, local wins for in-progress installs).",
"testStrategy": "1. Integration test: start bastion, let it persist state, restart bastion, verify state restored\n2. Test merge logic: local has in-progress install, remote has discovered - verify install preserved\n3. Test offline mode: labd unavailable, bastion falls back to local JSON\n4. Test fresh start: no local state, no remote state - bastion starts with empty state",
"priority": "high",
"dependencies": [
73
],
"status": "pending",
"subtasks": []
},
{
"id": 75,
"title": "Fix Bastion --dir Environment Variable Default",
"description": "Fix the bug where CLI's --dir default overrides the BASTION_DIR environment variable. The CLI option should use the env var as its default.",
"details": "Edit `bastion/src/cli/src/commands/serve.ts`:\n\n```typescript\n// Before (line 14):\n.option(\"--dir <dir>\", \"Bastion data directory\", \"/tmp/lab-bastion\")\n\n// After:\n.option(\n \"--dir <dir>\",\n \"Bastion data directory\",\n process.env[\"BASTION_DIR\"] ?? \"/tmp/lab-bastion\"\n)\n```\n\nThis ensures:\n1. If `BASTION_DIR` env var is set (e.g., in k8s deployment), it's used as default\n2. Explicit `--dir` flag still overrides both\n3. Falls back to `/tmp/lab-bastion` if neither is set\n\nAlso update the k8s deployment manifest `bastion/deploy/k3s/deployment.yaml` to ensure `BASTION_DIR=/data` is properly set.",
"testStrategy": "1. Unit test: verify option default reads from process.env\n2. Integration test: set BASTION_DIR, run labctl without --dir, verify correct dir used\n3. Integration test: set BASTION_DIR, run labctl with --dir /custom, verify /custom used\n4. Test no env var: verify default /tmp/lab-bastion used",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 76,
"title": "Create Resource Type Registry with Aliases",
"description": "Create a centralized resource type registry that maps resource names, plurals, and short aliases to canonical types. This enables kubectl-style resource resolution.",
"details": "Create `bastion/src/cli/src/utils/resources.ts`:\n\n```typescript\nexport interface ResourceDefinition {\n kind: string; // Canonical type: \"Server\", \"Cluster\", etc.\n singular: string; // \"server\"\n plural: string; // \"servers\"\n aliases: string[]; // [\"srv\"]\n apiPath: string; // \"/api/servers\"\n columns: TableColumn[]; // Default columns for 'get' output\n wideColumns?: TableColumn[]; // Extra columns for -o wide\n}\n\nconst RESOURCE_DEFINITIONS: ResourceDefinition[] = [\n {\n kind: \"Server\",\n singular: \"server\",\n plural: \"servers\",\n aliases: [\"srv\"],\n apiPath: \"/api/servers\",\n columns: serverColumns,\n wideColumns: serverWideColumns,\n },\n {\n kind: \"Cluster\",\n singular: \"cluster\",\n plural: \"clusters\",\n aliases: [],\n apiPath: \"/api/clusters\",\n columns: clusterColumns,\n },\n {\n kind: \"Network\",\n singular: \"network\",\n plural: \"networks\",\n aliases: [\"net\"],\n apiPath: \"/api/networks\",\n columns: networkColumns,\n },\n // ... bastion, role, user, token, audit\n];\n\nconst aliasMap = new Map<string, ResourceDefinition>();\nfor (const def of RESOURCE_DEFINITIONS) {\n aliasMap.set(def.singular, def);\n aliasMap.set(def.plural, def);\n for (const alias of def.aliases) {\n aliasMap.set(alias, def);\n }\n}\n\nexport function resolveResourceType(input: string): ResourceDefinition {\n const normalized = input.toLowerCase();\n const def = aliasMap.get(normalized);\n if (!def) {\n const valid = RESOURCE_DEFINITIONS.map(d => d.plural).join(\", \");\n throw new Error(`Unknown resource type \"${input}\". Valid types: ${valid}`);\n }\n return def;\n}\n\nexport function resolveResourceIdentifier(input: string): {\n type: ResourceDefinition;\n name?: string;\n} {\n // Handle \"server/labmaster\" or just \"servers\"\n const parts = input.split(\"/\");\n const type = resolveResourceType(parts[0]);\n const name = parts.length > 1 ? parts.slice(1).join(\"/\") : undefined;\n return { type, name };\n}\n```\n\nUpdate `bastion/src/cli/src/utils/resource.ts` to use the new registry.",
"testStrategy": "1. Unit test resolveResourceType with all aliases: server, servers, srv -> Server\n2. Test unknown resource type throws descriptive error\n3. Test case insensitivity: SERVER, Server, server all resolve correctly\n4. Test resolveResourceIdentifier parses \"server/labmaster\" correctly",
"priority": "high",
"dependencies": [],
"status": "pending",
"subtasks": []
},
{
"id": 77,
"title": "Implement 'labctl get' Command",
"description": "Create the core 'labctl get <resource> [name]' command that lists resources with filtering and output format support. This is the foundation of the kubectl-style CLI.",
"details": "Create `bastion/src/cli/src/commands/get.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType, type ResourceDefinition } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\nimport { formatOutput, type TableColumn } from \"../utils/table.js\";\n\nexport function registerGetCommand(program: Command): void {\n program\n .command(\"get <resource> [name]\")\n .description(\"List resources or get a specific resource by name\")\n .option(\"--status <status>\", \"Filter by status\")\n .option(\"--role <role>\", \"Filter by role (servers only)\")\n .option(\"--cloud <cloud>\", \"Filter by cloud\")\n .option(\"--env <environment>\", \"Filter by environment\")\n .option(\"-l, --label <label>\", \"Filter by label (key=value)\")\n .option(\"-A, --all-namespaces\", \"List across all clouds/environments\")\n .action(async (resource: string, name: string | undefined, opts) => {\n const config = program.opts()[\"_config\"];\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n let data: unknown[];\n \n if (name) {\n // Get specific resource - could be name, ID, or MAC\n const item = await client.getResource(resourceDef, name);\n data = item ? [item] : [];\n } else {\n // List with filters\n data = await client.listResources(resourceDef, {\n status: opts.status,\n role: opts.role,\n cloud: opts.allNamespaces ? undefined : (opts.cloud ?? config.defaultCloud),\n environment: opts.allNamespaces ? undefined : (opts.env ?? config.defaultEnvironment),\n label: opts.label,\n });\n }\n \n if (data.length === 0) {\n console.log(`No ${resourceDef.plural} found.`);\n return;\n }\n \n const columns = config.outputFormat === \"wide\" && resourceDef.wideColumns\n ? [...resourceDef.columns, ...resourceDef.wideColumns]\n : resourceDef.columns;\n \n formatOutput(data, config.outputFormat, columns);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nAdd to `index.ts`: `registerGetCommand(program);`\n\nExtend LabdClient with generic resource methods.",
"testStrategy": "1. Integration test: `labctl get servers` returns list from labd\n2. Test filtering: `labctl get servers --status discovered` only shows discovered\n3. Test name lookup: `labctl get server labmaster` returns single server\n4. Test MAC lookup: `labctl get server 38:05:25:33:e2:e4` resolves by MAC\n5. Test output formats: -o json, -o yaml, -o wide produce correct output\n6. Test unknown resource: `labctl get foo` shows helpful error",
"priority": "high",
"dependencies": [
76
],
"status": "pending",
"subtasks": []
},
{
"id": 78,
"title": "Implement 'labctl describe' Command",
"description": "Create the 'labctl describe <resource> <name>' command that shows detailed information about a resource including relationships, hardware info, and history.",
"details": "Create `bastion/src/cli/src/commands/describe.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nconst BOLD = \"\\x1b[1m\";\nconst DIM = \"\\x1b[2m\";\nconst RESET = \"\\x1b[0m\";\n\ninterface DescribeSection {\n title: string;\n fields: Array<[string, string | undefined]>;\n}\n\nfunction printDescribe(name: string, sections: DescribeSection[]): void {\n console.log(`${BOLD}Name:${RESET} ${name}`);\n for (const section of sections) {\n console.log(`\\n${BOLD}${section.title}:${RESET}`);\n for (const [key, value] of section.fields) {\n if (value !== undefined) {\n console.log(` ${DIM}${key}:${RESET} ${value}`);\n }\n }\n }\n}\n\nexport function registerDescribeCommand(program: Command): void {\n program\n .command(\"describe <resource> <name>\")\n .description(\"Show detailed information about a resource\")\n .action(async (resource: string, name: string) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n try {\n const item = await client.describeResource(resourceDef, name);\n if (!item) {\n console.error(`${resourceDef.singular} \"${name}\" not found.`);\n process.exit(1);\n }\n \n // Resource-specific formatting\n switch (resourceDef.kind) {\n case \"Server\":\n printServerDescription(item);\n break;\n case \"Cluster\":\n printClusterDescription(item);\n break;\n default:\n console.log(JSON.stringify(item, null, 2));\n }\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n\nfunction printServerDescription(server: any): void {\n const sections: DescribeSection[] = [\n {\n title: \"Metadata\",\n fields: [\n [\"ID\", server.id],\n [\"Cloud\", server.cloud],\n [\"Environment\", server.environment],\n [\"Role\", server.role],\n [\"Status\", server.status],\n [\"Created\", server.createdAt],\n [\"Last Seen\", server.lastHeartbeat],\n ],\n },\n {\n title: \"Hardware\",\n fields: [\n [\"MAC\", server.mac],\n [\"IP\", server.ip],\n [\"Architecture\", server.hardwareInfo?.arch],\n [\"CPU\", server.hardwareInfo?.cpu_model],\n [\"Cores\", String(server.hardwareInfo?.cpu_cores)],\n [\"Memory\", `${server.hardwareInfo?.memory_gb}GB`],\n [\"Product\", server.hardwareInfo?.product],\n ],\n },\n ];\n \n if (server.nics?.length > 0) {\n sections.push({\n title: \"Network Interfaces\",\n fields: server.nics.map((n: any) => [n.name, `${n.mac} ${n.ip ?? \"\"} (${n.state})`]),\n });\n }\n \n if (server.disks?.length > 0) {\n sections.push({\n title: \"Disks\",\n fields: server.disks.map((d: any) => [d.name, `${d.sizeGb}GB ${d.model ?? \"\"}`]),\n });\n }\n \n if (server.clusterMemberships?.length > 0) {\n sections.push({\n title: \"Cluster Membership\",\n fields: server.clusterMemberships.map((m: any) => [m.cluster.name, m.role]),\n });\n }\n \n printDescribe(server.hostname, sections);\n}\n```",
"testStrategy": "1. Integration test: `labctl describe server labmaster` shows full details\n2. Test hardware info display: CPU, memory, disks, NICs all shown\n3. Test cluster membership: server in cluster shows membership section\n4. Test not found: `labctl describe server nonexistent` shows helpful error\n5. Test different resource types: describe cluster, network, bastion",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 79,
"title": "Implement 'labctl create/delete' Commands",
"description": "Create the 'labctl create <resource>' and 'labctl delete <resource> <name>' commands for creating and removing resources like networks, clusters, and tokens.",
"details": "Create `bastion/src/cli/src/commands/create.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { resolveResourceType } from \"../utils/resources.js\";\nimport { getLabdClient } from \"../api/config.js\";\n\nexport function registerCreateCommand(program: Command): void {\n const create = program\n .command(\"create <resource>\")\n .description(\"Create a resource\");\n \n // labctl create network --name lab --cidr 192.168.8.0/24\n create\n .command(\"network\")\n .description(\"Create a network\")\n .requiredOption(\"--name <name>\", \"Network name\")\n .requiredOption(\"--cidr <cidr>\", \"Network CIDR (e.g., 192.168.8.0/24)\")\n .option(\"--gateway <gateway>\", \"Gateway IP\")\n .option(\"--vlan <vlan>\", \"VLAN ID\", parseInt)\n .option(\"--domain <domain>\", \"DNS domain\")\n .option(\"--dhcp\", \"Enable DHCP\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const network = await client.createNetwork({\n name: opts.name,\n cidr: opts.cidr,\n gateway: opts.gateway,\n vlan: opts.vlan,\n domain: opts.domain,\n dhcpEnabled: opts.dhcp ?? false,\n });\n console.log(`network/${network.name} created`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n \n // labctl create token --label \"worker enrollment\" --type reusable\n create\n .command(\"token\")\n .description(\"Create a join token\")\n .option(\"--label <label>\", \"Token label/description\")\n .option(\"--type <type>\", \"Token type: one-time or reusable\", \"one-time\")\n .option(\"--expires <duration>\", \"Expiration (e.g., 24h, 7d)\")\n .action(async (opts) => {\n const client = getLabdClient();\n try {\n const token = await client.createToken(opts);\n console.log(`Token created: ${token.token}`);\n if (opts.label) console.log(`Label: ${opts.label}`);\n if (token.expiresAt) console.log(`Expires: ${token.expiresAt}`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nCreate `bastion/src/cli/src/commands/delete.ts`:\n\n```typescript\nexport function registerDeleteCommand(program: Command): void {\n program\n .command(\"delete <resource> <name>\")\n .description(\"Delete a resource\")\n .option(\"--force\", \"Skip confirmation\")\n .action(async (resource: string, name: string, opts) => {\n const resourceDef = resolveResourceType(resource);\n const client = getLabdClient();\n \n if (!opts.force) {\n const { confirm } = await import(\"../utils/prompts.js\");\n const yes = await confirm(`Delete ${resourceDef.singular} \"${name}\"?`);\n if (!yes) {\n console.log(\"Cancelled.\");\n return;\n }\n }\n \n try {\n await client.deleteResource(resourceDef, name);\n console.log(`${resourceDef.singular}/${name} deleted`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```",
"testStrategy": "1. Integration test: `labctl create network` creates network in DB\n2. Test validation: missing required flags shows helpful error\n3. Test token creation: token returned is valid UUID, stored in DB\n4. Test delete with confirmation: prompts user, respects --force\n5. Test delete cascade: deleting server removes NICs, disks\n6. Test delete protection: cannot delete bastion with connected servers",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 80,
"title": "Refactor Provision Commands to kubectl-style",
"description": "Refactor existing provision commands to use kubectl-style syntax: 'labctl provision <server>' instead of 'labctl provision install <mac>'.",
"details": "The new command structure should be:\n- `labctl provision <server> --os fedora-43 --role worker` (queue install)\n- `labctl reprovision <server>` (reinstall)\n- `labctl forget <server>` (remove from tracking)\n\nModify `bastion/src/cli/src/commands/install.ts` → rename to `provision.ts`:\n\n```typescript\nexport function registerProvisionCommand(program: Command): void {\n program\n .command(\"provision <server>\")\n .description(\"Queue a server for OS installation\")\n .requiredOption(\"--os <os>\", \"Operating system\", \"fedora-43\")\n .requiredOption(\"--role <role>\", \"Server role\", \"worker\")\n .option(\"--disk <disk>\", \"Target disk (auto-detected if not specified)\")\n .option(\"--hostname <hostname>\", \"Override hostname\")\n .action(async (server: string, opts) => {\n const client = getLabdClient();\n \n // Resolve server: could be hostname, MAC, or ID\n const resolved = await client.resolveServer(server);\n if (!resolved) {\n console.error(`Server \"${server}\" not found.`);\n console.error(\"Tip: Use 'labctl get servers' to see available servers.\");\n process.exit(1);\n }\n \n if (resolved.status === \"installed\") {\n console.error(`Server \"${resolved.hostname}\" is already installed.`);\n console.error(\"Tip: Use 'labctl reprovision' to reinstall.\");\n process.exit(1);\n }\n \n try {\n await client.provisionServer(resolved.mac, {\n hostname: opts.hostname ?? resolved.hostname,\n os: opts.os,\n role: opts.role,\n disk: opts.disk,\n });\n console.log(`Server ${resolved.hostname} queued for ${opts.os} installation as ${opts.role}.`);\n } catch (err) {\n console.error(`Error: ${err instanceof Error ? err.message : String(err)}`);\n process.exit(1);\n }\n });\n}\n```\n\nSimilarly update reprovision.ts and forget.ts to accept server name/MAC/ID.\n\nUpdate index.ts to register commands at top level instead of under 'provision' subcommand.",
"testStrategy": "1. Test server resolution: provision by hostname, MAC, or UUID all work\n2. Test already installed: provisioning installed server shows reprovision hint\n3. Test unknown server: helpful error message with tip\n4. Test reprovision: reinstalls installed server\n5. Test forget: removes server from all state categories\n6. Backward compat: verify 'labctl provision list' still works (deprecation warning)",
"priority": "medium",
"dependencies": [
77
],
"status": "pending",
"subtasks": []
},
{
"id": 81,
"title": "Implement Server and Resource API Endpoints in labd",
"description": "Add REST API endpoints in labd for full resource CRUD operations: networks, clusters, tokens. Extend servers endpoint with filters and relationship includes.",
"details": "Create/extend labd route files:\n\n1. **Extend servers.ts**:\n```typescript\n// GET /api/servers - with extended filters and includes\napp.get(\"/api/servers\", async (request, reply) => {\n const { status, role, cloud, environment, label, include } = request.query;\n \n const where = {};\n if (status) where.status = status;\n if (role) where.role = role;\n if (cloud) where.cloud = cloud;\n if (environment) where.environment = environment;\n if (label) where.labels = { path: [labelKey], equals: labelValue };\n \n const servers = await db.server.findMany({\n where,\n include: {\n nics: include?.includes(\"nics\"),\n disks: include?.includes(\"disks\"),\n clusterMemberships: include?.includes(\"clusters\") ? { include: { cluster: true } } : false,\n bastion: include?.includes(\"bastion\"),\n },\n });\n return servers;\n});\n\n// GET /api/servers/:id - by ID, hostname, or MAC\napp.get(\"/api/servers/:identifier\", async (request, reply) => {\n const { identifier } = request.params;\n \n // Try UUID first\n let server = await db.server.findUnique({ where: { id: identifier }, include: fullInclude });\n // Try hostname\n if (!server) server = await db.server.findUnique({ where: { hostname: identifier }, include: fullInclude });\n // Try MAC\n if (!server) server = await db.server.findUnique({ where: { mac: identifier.toLowerCase() }, include: fullInclude });\n \n if (!server) return reply.code(404).send({ error: \"Server not found\" });\n return server;\n});\n```\n\n2. **Create networks.ts**:\n```typescript\n// GET /api/networks, POST /api/networks, DELETE /api/networks/:id\nexport function registerNetworkRoutes(app: FastifyInstance, db: DbClient): void {\n app.get(\"/api/networks\", async () => db.network.findMany());\n \n app.post(\"/api/networks\", async (request, reply) => {\n const { name, cidr, gateway, vlan, domain, dhcpEnabled } = request.body;\n // Validate CIDR format\n const network = await db.network.create({ data: { name, cidr, gateway, vlan, domain, dhcpEnabled } });\n return reply.code(201).send(network);\n });\n \n app.delete(\"/api/networks/:id\", async (request, reply) => {\n await db.network.delete({ where: { id: request.params.id } });\n return reply.code(204).send();\n });\n}\n```\n\n3. **Create clusters.ts**:\n```typescript\n// Similar CRUD for clusters with member management\napp.get(\"/api/clusters/:id/members\", ...);\napp.post(\"/api/clusters/:id/members\", ...);\napp.delete(\"/api/clusters/:id/members/:serverId\", ...);\n```",
"testStrategy": "1. Integration test all CRUD endpoints with HTTP client\n2. Test server resolution: by id, hostname, and MAC all return same server\n3. Test include parameter: nics, disks, clusters included when requested\n4. Test validation: invalid CIDR rejected, duplicate names rejected\n5. Test cascade: delete network with NICs fails or cascades appropriately",
"priority": "medium",
"dependencies": [
72,
73
],
"status": "pending",
"subtasks": []
},
{
"id": 82,
"title": "Implement RBAC Permission Checks in CLI",
"description": "Wire RBAC permission checks into CLI commands. Check user permissions before executing operations using the existing Permission model.",
"details": "1. Create `bastion/src/cli/src/middleware/rbac.ts`:\n\n```typescript\nimport { getLabdClient } from \"../api/config.js\";\n\nexport interface PermissionContext {\n action: string; // read, exec, apply, destroy, manage, admin\n cloud?: string;\n environment?: string;\n server?: string;\n}\n\nexport async function checkPermission(ctx: PermissionContext): Promise<boolean> {\n const client = getLabdClient();\n try {\n const result = await client.checkPermission(ctx);\n return result.allowed;\n } catch {\n // If can't reach labd, fail open for local operations\n return true;\n }\n}\n\nexport async function requirePermission(ctx: PermissionContext): Promise<void> {\n const allowed = await checkPermission(ctx);\n if (!allowed) {\n throw new Error(\n `Permission denied: ${ctx.action} on ${ctx.server ?? \"*\"}@${ctx.cloud ?? \"*\"}/${ctx.environment ?? \"*\"}`\n );\n }\n}\n```\n\n2. Add labd endpoint `POST /api/auth/check-permission`:\n```typescript\napp.post(\"/api/auth/check-permission\", async (request, reply) => {\n const user = await authenticateRequest(request); // from cert or token\n const { action, cloud, environment, server } = request.body;\n \n const permissions = await db.permission.findMany({\n where: {\n role: { userBindings: { some: { userId: user.id } } },\n },\n });\n \n const allowed = permissions.some(p => \n matchesPattern(p.action, action) &&\n matchesPattern(p.cloud, cloud ?? \"*\") &&\n matchesPattern(p.environment, environment ?? \"*\") &&\n matchesPattern(p.server, server ?? \"*\")\n );\n \n return { allowed };\n});\n```\n\n3. Integrate into commands:\n```typescript\n// In provision command\nawait requirePermission({ action: \"apply\", cloud, environment, server: resolved.hostname });\n\n// In delete command\nawait requirePermission({ action: \"destroy\", cloud, environment, server: name });\n\n// In get command (filter results)\nconst servers = await client.listServers(filters);\nconst visible = await filterByPermission(servers, \"read\");\n```",
"testStrategy": "1. Unit test permission matching logic with wildcards\n2. Test admin role: has access to all resources\n3. Test operator role: can read/exec but not destroy\n4. Test viewer role: can only read, provision denied\n5. Test scope matching: permission for cloud=aws doesn't grant access to cloud=baremetal\n6. Test denied action is audit-logged",
"priority": "medium",
"dependencies": [
77,
81
],
"status": "pending",
"subtasks": []
},
{
"id": 83,
"title": "Implement Audit Logging for Resource Operations",
"description": "Log all resource mutations to the AuditLog table. Include user, action, resource type/name, result, and source IP.",
"details": "1. Create `bastion/src/labd/src/services/audit.ts`:\n\n```typescript\nimport type { PrismaClient } from \"@prisma/client\";\n\nexport interface AuditEntry {\n userId?: string;\n serverId?: string;\n sessionId?: string;\n action: string; // create, update, delete, provision, exec, rbac-denied\n resourceType: string; // server, cluster, network, token, etc.\n resourceName: string;\n args?: string; // sanitized args (no secrets)\n result: \"success\" | \"denied\" | \"error\";\n durationMs?: number;\n sourceIp?: string;\n}\n\nexport class AuditService {\n constructor(private readonly db: PrismaClient) {}\n \n async log(entry: AuditEntry): Promise<void> {\n await this.db.auditLog.create({\n data: {\n userId: entry.userId,\n serverId: entry.serverId,\n sessionId: entry.sessionId,\n action: entry.action,\n resourceType: entry.resourceType,\n resourceName: entry.resourceName,\n args: entry.args,\n result: entry.result,\n durationMs: entry.durationMs,\n sourceIp: entry.sourceIp,\n },\n });\n }\n \n async query(filters: {\n userId?: string;\n action?: string;\n resourceType?: string;\n since?: Date;\n limit?: number;\n }): Promise<AuditEntry[]> {\n return this.db.auditLog.findMany({\n where: {\n userId: filters.userId,\n action: filters.action,\n resourceType: filters.resourceType,\n timestamp: filters.since ? { gte: filters.since } : undefined,\n },\n orderBy: { timestamp: \"desc\" },\n take: filters.limit ?? 100,\n });\n }\n}\n```\n\n2. Add Fastify hook to wrap route handlers:\n```typescript\napp.addHook(\"onResponse\", async (request, reply) => {\n // Log mutations (POST, PUT, DELETE)\n if ([\"POST\", \"PUT\", \"DELETE\"].includes(request.method)) {\n const path = request.url;\n const resourceMatch = path.match(/\\/api\\/(\\w+)(?:\\/([^/]+))?/);\n if (resourceMatch) {\n await auditService.log({\n action: methodToAction(request.method),\n resourceType: resourceMatch[1],\n resourceName: resourceMatch[2] ?? \"\",\n result: reply.statusCode < 400 ? \"success\" : \"error\",\n sourceIp: request.ip,\n });\n }\n }\n});\n```\n\n3. Add `labctl get audit` command to view audit logs.",
"testStrategy": "1. Integration test: create network, verify audit log entry created\n2. Test RBAC denial is logged with result=denied\n3. Test sensitive data sanitization: tokens/passwords not in args\n4. Test query filters: by user, action, resourceType, time range\n5. Test `labctl get audit` displays recent entries correctly",
"priority": "medium",
"dependencies": [
81,
82
],
"status": "pending",
"subtasks": []
},
{
"id": 84,
"title": "Update CLI Entry Point and Help Text",
"description": "Update the CLI entry point to register all new commands and update help text to reflect the kubectl-style interface. Add deprecation warnings for old command structure.",
"details": "Update `bastion/src/cli/src/index.ts`:\n\n```typescript\nimport { Command } from \"commander\";\nimport { APP_VERSION } from \"@lab/shared\";\nimport { loadConfig } from \"./config/index.js\";\n\n// New kubectl-style commands\nimport { registerGetCommand } from \"./commands/get.js\";\nimport { registerDescribeCommand } from \"./commands/describe.js\";\nimport { registerCreateCommand } from \"./commands/create.js\";\nimport { registerDeleteCommand } from \"./commands/delete.js\";\nimport { registerApplyCommand } from \"./commands/apply.js\";\nimport { registerEditCommand } from \"./commands/edit.js\";\n\n// Action commands\nimport { registerProvisionCommand } from \"./commands/provision.js\";\nimport { registerReprovisionCommand } from \"./commands/reprovision.js\";\nimport { registerForgetCommand } from \"./commands/forget.js\";\n\n// Bastion management\nimport { registerBastionCommand } from \"./commands/bastion.js\"; // start/stop/status\n\n// App management (unchanged)\nimport { registerAppCommand } from \"./commands/app.js\";\n\n// Utility\nimport { registerConfigCommand } from \"./commands/config.js\";\nimport { registerLoginCommand } from \"./commands/login.js\";\nimport { registerDoctorCommand } from \"./commands/doctor.js\";\n\nexport function createProgram(): Command {\n const program = new Command();\n \n program\n .name(\"labctl\")\n .description(\"Lab infrastructure management CLI\")\n .version(APP_VERSION);\n \n // Global options\n program\n .option(\"-o, --output <format>\", \"output format (table, json, yaml, wide)\", \"table\")\n .option(\"--server <url>\", \"override labd server URL\")\n .option(\"--env <name>\", \"override default environment\")\n .option(\"--cloud <name>\", \"override default cloud\")\n .option(\"--debug\", \"enable debug output\")\n .option(\"--no-color\", \"disable colored output\");\n \n // Core CRUD commands\n registerGetCommand(program); // labctl get <resource> [name]\n registerDescribeCommand(program); // labctl describe <resource> <name>\n registerCreateCommand(program); // labctl create <resource>\n registerDeleteCommand(program); // labctl delete <resource> <name>\n registerApplyCommand(program); // labctl apply -f <file>\n registerEditCommand(program); // labctl edit <resource> <name>\n \n // Provisioning actions\n registerProvisionCommand(program); // labctl provision <server>\n registerReprovisionCommand(program);// labctl reprovision <server>\n registerForgetCommand(program); // labctl forget <server>\n \n // Bastion management\n registerBastionCommand(program); // labctl bastion start|stop|status\n \n // App management\n registerAppCommand(program); // labctl app install|health k3s\n \n // Utility\n registerConfigCommand(program);\n registerLoginCommand(program);\n registerDoctorCommand(program);\n \n // Legacy compatibility with deprecation warnings\n registerLegacyCommands(program);\n \n return program;\n}\n\nfunction registerLegacyCommands(program: Command): void {\n // labctl provision list -> labctl get servers (with warning)\n program\n .command(\"provision\")\n .command(\"list\")\n .action(() => {\n console.warn(\"DEPRECATED: Use 'labctl get servers' instead.\");\n // Delegate to get servers\n });\n}\n```\n\nUpdate shell completions in `scripts/generate-completions.ts` for new command structure.",
"testStrategy": "1. Test --help shows all new commands with descriptions\n2. Test resource type help: `labctl get --help` lists valid resources\n3. Test deprecated commands show warning but still work\n4. Test shell completions generated for new commands\n5. Test global options: -o, --server, --env, --cloud all work",
"priority": "low",
"dependencies": [
77,
78,
79,
80
],
"status": "pending",
"subtasks": []
}
],
"metadata": {
"created": "2026-03-26T04:26:49.813Z",
"updated": "2026-03-26T04:26:49.813Z",
"description": "Tasks for master context"
}
}
}

View File

@@ -0,0 +1,47 @@
<context>
# Overview
[Provide a high-level overview of your product here. Explain what problem it solves, who it's for, and why it's valuable.]
# Core Features
[List and describe the main features of your product. For each feature, include:
- What it does
- Why it's important
- How it works at a high level]
# User Experience
[Describe the user journey and experience. Include:
- User personas
- Key user flows
- UI/UX considerations]
</context>
<PRD>
# Technical Architecture
[Outline the technical implementation details:
- System components
- Data models
- APIs and integrations
- Infrastructure requirements]
# Development Roadmap
[Break down the development process into phases:
- MVP requirements
- Future enhancements
- Do not think about timelines whatsoever -- all that matters is scope and detailing exactly what needs to be build in each phase so it can later be cut up into tasks]
# Logical Dependency Chain
[Define the logical order of development:
- Which features need to be built first (foundation)
- Getting as quickly as possible to something usable/visible front end that works
- Properly pacing and scoping each feature so it is atomic but can also be built upon and improved as development approaches]
# Risks and Mitigations
[Identify potential risks and how they'll be addressed:
- Technical challenges
- Figuring out the MVP that we can build upon
- Resource constraints]
# Appendix
[Include any additional information:
- Research findings
- Technical specifications]
</PRD>

View File

@@ -0,0 +1,511 @@
<rpg-method>
# Repository Planning Graph (RPG) Method - PRD Template
This template teaches you (AI or human) how to create structured, dependency-aware PRDs using the RPG methodology from Microsoft Research. The key insight: separate WHAT (functional) from HOW (structural), then connect them with explicit dependencies.
## Core Principles
1. **Dual-Semantics**: Think functional (capabilities) AND structural (code organization) separately, then map them
2. **Explicit Dependencies**: Never assume - always state what depends on what
3. **Topological Order**: Build foundation first, then layers on top
4. **Progressive Refinement**: Start broad, refine iteratively
## How to Use This Template
- Follow the instructions in each `<instruction>` block
- Look at `<example>` blocks to see good vs bad patterns
- Fill in the content sections with your project details
- The AI reading this will learn the RPG method by following along
- Task Master will parse the resulting PRD into dependency-aware tasks
## Recommended Tools for Creating PRDs
When using this template to **create** a PRD (not parse it), use **code-context-aware AI assistants** for best results:
**Why?** The AI needs to understand your existing codebase to make good architectural decisions about modules, dependencies, and integration points.
**Recommended tools:**
- **Claude Code** (claude-code CLI) - Best for structured reasoning and large contexts
- **Cursor/Windsurf** - IDE integration with full codebase context
- **Gemini CLI** (gemini-cli) - Massive context window for large codebases
- **Codex/Grok CLI** - Strong code generation with context awareness
**Note:** Once your PRD is created, `task-master parse-prd` works with any configured AI model - it just needs to read the PRD text itself, not your codebase.
</rpg-method>
---
<overview>
<instruction>
Start with the problem, not the solution. Be specific about:
- What pain point exists?
- Who experiences it?
- Why existing solutions don't work?
- What success looks like (measurable outcomes)?
Keep this section focused - don't jump into implementation details yet.
</instruction>
## Problem Statement
[Describe the core problem. Be concrete about user pain points.]
## Target Users
[Define personas, their workflows, and what they're trying to achieve.]
## Success Metrics
[Quantifiable outcomes. Examples: "80% task completion via autopilot", "< 5% manual intervention rate"]
</overview>
---
<functional-decomposition>
<instruction>
Now think about CAPABILITIES (what the system DOES), not code structure yet.
Step 1: Identify high-level capability domains
- Think: "What major things does this system do?"
- Examples: Data Management, Core Processing, Presentation Layer
Step 2: For each capability, enumerate specific features
- Use explore-exploit strategy:
* Exploit: What features are REQUIRED for core value?
* Explore: What features make this domain COMPLETE?
Step 3: For each feature, define:
- Description: What it does in one sentence
- Inputs: What data/context it needs
- Outputs: What it produces/returns
- Behavior: Key logic or transformations
<example type="good">
Capability: Data Validation
Feature: Schema validation
- Description: Validate JSON payloads against defined schemas
- Inputs: JSON object, schema definition
- Outputs: Validation result (pass/fail) + error details
- Behavior: Iterate fields, check types, enforce constraints
Feature: Business rule validation
- Description: Apply domain-specific validation rules
- Inputs: Validated data object, rule set
- Outputs: Boolean + list of violated rules
- Behavior: Execute rules sequentially, short-circuit on failure
</example>
<example type="bad">
Capability: validation.js
(Problem: This is a FILE, not a CAPABILITY. Mixing structure into functional thinking.)
Capability: Validation
Feature: Make sure data is good
(Problem: Too vague. No inputs/outputs. Not actionable.)
</example>
</instruction>
## Capability Tree
### Capability: [Name]
[Brief description of what this capability domain covers]
#### Feature: [Name]
- **Description**: [One sentence]
- **Inputs**: [What it needs]
- **Outputs**: [What it produces]
- **Behavior**: [Key logic]
#### Feature: [Name]
- **Description**:
- **Inputs**:
- **Outputs**:
- **Behavior**:
### Capability: [Name]
...
</functional-decomposition>
---
<structural-decomposition>
<instruction>
NOW think about code organization. Map capabilities to actual file/folder structure.
Rules:
1. Each capability maps to a module (folder or file)
2. Features within a capability map to functions/classes
3. Use clear module boundaries - each module has ONE responsibility
4. Define what each module exports (public interface)
The goal: Create a clear mapping between "what it does" (functional) and "where it lives" (structural).
<example type="good">
Capability: Data Validation
→ Maps to: src/validation/
├── schema-validator.js (Schema validation feature)
├── rule-validator.js (Business rule validation feature)
└── index.js (Public exports)
Exports:
- validateSchema(data, schema)
- validateRules(data, rules)
</example>
<example type="bad">
Capability: Data Validation
→ Maps to: src/utils.js
(Problem: "utils" is not a clear module boundary. Where do I find validation logic?)
Capability: Data Validation
→ Maps to: src/validation/everything.js
(Problem: One giant file. Features should map to separate files for maintainability.)
</example>
</instruction>
## Repository Structure
```
project-root/
├── src/
│ ├── [module-name]/ # Maps to: [Capability Name]
│ │ ├── [file].js # Maps to: [Feature Name]
│ │ └── index.js # Public exports
│ └── [module-name]/
├── tests/
└── docs/
```
## Module Definitions
### Module: [Name]
- **Maps to capability**: [Capability from functional decomposition]
- **Responsibility**: [Single clear purpose]
- **File structure**:
```
module-name/
├── feature1.js
├── feature2.js
└── index.js
```
- **Exports**:
- `functionName()` - [what it does]
- `ClassName` - [what it does]
</structural-decomposition>
---
<dependency-graph>
<instruction>
This is THE CRITICAL SECTION for Task Master parsing.
Define explicit dependencies between modules. This creates the topological order for task execution.
Rules:
1. List modules in dependency order (foundation first)
2. For each module, state what it depends on
3. Foundation modules should have NO dependencies
4. Every non-foundation module should depend on at least one other module
5. Think: "What must EXIST before I can build this module?"
<example type="good">
Foundation Layer (no dependencies):
- error-handling: No dependencies
- config-manager: No dependencies
- base-types: No dependencies
Data Layer:
- schema-validator: Depends on [base-types, error-handling]
- data-ingestion: Depends on [schema-validator, config-manager]
Core Layer:
- algorithm-engine: Depends on [base-types, error-handling]
- pipeline-orchestrator: Depends on [algorithm-engine, data-ingestion]
</example>
<example type="bad">
- validation: Depends on API
- API: Depends on validation
(Problem: Circular dependency. This will cause build/runtime issues.)
- user-auth: Depends on everything
(Problem: Too many dependencies. Should be more focused.)
</example>
</instruction>
## Dependency Chain
### Foundation Layer (Phase 0)
No dependencies - these are built first.
- **[Module Name]**: [What it provides]
- **[Module Name]**: [What it provides]
### [Layer Name] (Phase 1)
- **[Module Name]**: Depends on [[module-from-phase-0], [module-from-phase-0]]
- **[Module Name]**: Depends on [[module-from-phase-0]]
### [Layer Name] (Phase 2)
- **[Module Name]**: Depends on [[module-from-phase-1], [module-from-foundation]]
[Continue building up layers...]
</dependency-graph>
---
<implementation-roadmap>
<instruction>
Turn the dependency graph into concrete development phases.
Each phase should:
1. Have clear entry criteria (what must exist before starting)
2. Contain tasks that can be parallelized (no inter-dependencies within phase)
3. Have clear exit criteria (how do we know phase is complete?)
4. Build toward something USABLE (not just infrastructure)
Phase ordering follows topological sort of dependency graph.
<example type="good">
Phase 0: Foundation
Entry: Clean repository
Tasks:
- Implement error handling utilities
- Create base type definitions
- Setup configuration system
Exit: Other modules can import foundation without errors
Phase 1: Data Layer
Entry: Phase 0 complete
Tasks:
- Implement schema validator (uses: base types, error handling)
- Build data ingestion pipeline (uses: validator, config)
Exit: End-to-end data flow from input to validated output
</example>
<example type="bad">
Phase 1: Build Everything
Tasks:
- API
- Database
- UI
- Tests
(Problem: No clear focus. Too broad. Dependencies not considered.)
</example>
</instruction>
## Development Phases
### Phase 0: [Foundation Name]
**Goal**: [What foundational capability this establishes]
**Entry Criteria**: [What must be true before starting]
**Tasks**:
- [ ] [Task name] (depends on: [none or list])
- Acceptance criteria: [How we know it's done]
- Test strategy: [What tests prove it works]
- [ ] [Task name] (depends on: [none or list])
**Exit Criteria**: [Observable outcome that proves phase complete]
**Delivers**: [What can users/developers do after this phase?]
---
### Phase 1: [Layer Name]
**Goal**:
**Entry Criteria**: Phase 0 complete
**Tasks**:
- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
**Exit Criteria**:
**Delivers**:
---
[Continue with more phases...]
</implementation-roadmap>
---
<test-strategy>
<instruction>
Define how testing will be integrated throughout development (TDD approach).
Specify:
1. Test pyramid ratios (unit vs integration vs e2e)
2. Coverage requirements
3. Critical test scenarios
4. Test generation guidelines for Surgical Test Generator
This section guides the AI when generating tests during the RED phase of TDD.
<example type="good">
Critical Test Scenarios for Data Validation module:
- Happy path: Valid data passes all checks
- Edge cases: Empty strings, null values, boundary numbers
- Error cases: Invalid types, missing required fields
- Integration: Validator works with ingestion pipeline
</example>
</instruction>
## Test Pyramid
```
/\
/E2E\ ← [X]% (End-to-end, slow, comprehensive)
/------\
/Integration\ ← [Y]% (Module interactions)
/------------\
/ Unit Tests \ ← [Z]% (Fast, isolated, deterministic)
/----------------\
```
## Coverage Requirements
- Line coverage: [X]% minimum
- Branch coverage: [X]% minimum
- Function coverage: [X]% minimum
- Statement coverage: [X]% minimum
## Critical Test Scenarios
### [Module/Feature Name]
**Happy path**:
- [Scenario description]
- Expected: [What should happen]
**Edge cases**:
- [Scenario description]
- Expected: [What should happen]
**Error cases**:
- [Scenario description]
- Expected: [How system handles failure]
**Integration points**:
- [What interactions to test]
- Expected: [End-to-end behavior]
## Test Generation Guidelines
[Specific instructions for Surgical Test Generator about what to focus on, what patterns to follow, project-specific test conventions]
</test-strategy>
---
<architecture>
<instruction>
Describe technical architecture, data models, and key design decisions.
Keep this section AFTER functional/structural decomposition - implementation details come after understanding structure.
</instruction>
## System Components
[Major architectural pieces and their responsibilities]
## Data Models
[Core data structures, schemas, database design]
## Technology Stack
[Languages, frameworks, key libraries]
**Decision: [Technology/Pattern]**
- **Rationale**: [Why chosen]
- **Trade-offs**: [What we're giving up]
- **Alternatives considered**: [What else we looked at]
</architecture>
---
<risks>
<instruction>
Identify risks that could derail development and how to mitigate them.
Categories:
- Technical risks (complexity, unknowns)
- Dependency risks (blocking issues)
- Scope risks (creep, underestimation)
</instruction>
## Technical Risks
**Risk**: [Description]
- **Impact**: [High/Medium/Low - effect on project]
- **Likelihood**: [High/Medium/Low]
- **Mitigation**: [How to address]
- **Fallback**: [Plan B if mitigation fails]
## Dependency Risks
[External dependencies, blocking issues]
## Scope Risks
[Scope creep, underestimation, unclear requirements]
</risks>
---
<appendix>
## References
[Papers, documentation, similar systems]
## Glossary
[Domain-specific terms]
## Open Questions
[Things to resolve during development]
</appendix>
---
<task-master-integration>
# How Task Master Uses This PRD
When you run `task-master parse-prd <file>.txt`, the parser:
1. **Extracts capabilities** → Main tasks
- Each `### Capability:` becomes a top-level task
2. **Extracts features** → Subtasks
- Each `#### Feature:` becomes a subtask under its capability
3. **Parses dependencies** → Task dependencies
- `Depends on: [X, Y]` sets task.dependencies = ["X", "Y"]
4. **Orders by phases** → Task priorities
- Phase 0 tasks = highest priority
- Phase N tasks = lower priority, properly sequenced
5. **Uses test strategy** → Test generation context
- Feeds test scenarios to Surgical Test Generator during implementation
**Result**: A dependency-aware task graph that can be executed in topological order.
## Why RPG Structure Matters
Traditional flat PRDs lead to:
- ❌ Unclear task dependencies
- ❌ Arbitrary task ordering
- ❌ Circular dependencies discovered late
- ❌ Poorly scoped tasks
RPG-structured PRDs provide:
- ✅ Explicit dependency chains
- ✅ Topological execution order
- ✅ Clear module boundaries
- ✅ Validated task graph before implementation
## Tips for Best Results
1. **Spend time on dependency graph** - This is the most valuable section for Task Master
2. **Keep features atomic** - Each feature should be independently testable
3. **Progressive refinement** - Start broad, use `task-master expand` to break down complex tasks
4. **Use research mode** - `task-master parse-prd --research` leverages AI for better task generation
</task-master-integration>

244
STATUS.md Normal file
View File

@@ -0,0 +1,244 @@
# labctl Platform — Implementation Status
## What This Document Is
An honest assessment of what code exists, what works, what is stubbed, and what
hasn't been started — measured against the PRD phases.
---
## Architecture Overview (as built)
```
labctl CLI ──HTTP──▶ bastion (PXE server) ← WORKING
labctl CLI ──HTTP──▶ labd (master daemon) ← PARTIALLY WORKING
├── CockroachDB/Prisma ← SCHEMA DEFINED, NOT DEPLOYED
├── /ws/agent WebSocket ← ACCEPTS CONNECTIONS, DOES NOT ROUTE
└── mTLS CA ← NOT IMPLEMENTED
lab-agent ──WS──▶ labd ← LIBRARY CODE, NO DAEMON BINARY
```
---
## Package Inventory
| Package | Lines of Source | Tests | Status |
|---------|---------------|-------|--------|
| @lab/shared | ~200 | 0 | Complete — types, protocol, errors |
| @lab/bastion | ~800 | 32 | **Production-ready** — PXE discovery, install, reprovision |
| @lab/cli | ~600 | 0 (uses bastion tests) | Complete — all commands implemented |
| @lab/labd | ~500 | 2 | Partial — routes exist, core features stubbed |
| @lab/agent | ~300 | 0 | Library only — no daemon binary |
All 5 packages compile. 32 tests pass.
---
## Phase 1: Foundation
### DONE — Working in production
| Feature | Code | How It Works |
|---------|------|-------------|
| PXE bastion server | `src/bastion/` | Fastify HTTP + dnsmasq DHCP/TFTP. Machines PXE boot, get iPXE script from `/dispatch?mac=XX`, chain to discovery or install kickstart. State persisted to JSON file. |
| Machine discovery | `routes/dispatch.ts`, `templates/discover.ks.ts` | Unknown MACs get a mini-kickstart that boots a RAM-only Fedora, scrapes hardware via `/proc`, `/sys`, `dmidecode`, POSTs to `/api/discover`, then reboots. No disk touch. |
| Machine installation | `routes/api.ts`, `templates/install.ks.ts` | Queue a MAC via `POST /api/install`. Next PXE boot gets a full Kickstart with LVM partitioning (worker: longhorn LV, infra: rancher LV), SSH keys, k3s kernel prereqs, progress callbacks. |
| Reprovision with data preservation | `commands/reprovision.ts`, `install.ks.ts` | `%pre` script detects existing LVM. Reformats `/`, `/var`, `/boot` but preserves `/home`, `/srv`, `/var/lib/longhorn`, `/var/lib/rancher`. |
| CLI: init/provision commands | `src/cli/src/commands/` | `labctl init bastion standalone start/stop/status`, `labctl provision list/install/reprovision/forget`. All talk to bastion HTTP API. |
| CLI: config management | `config/index.ts`, `commands/config.ts` | `labctl config list/get/set/path`. YAML config at `~/.labctl/config.yaml` with env var overrides. |
| labd scaffold | `src/labd/` | Fastify server with health, server listing, token management routes. Prisma schema for all models. Starts with or without database. |
| Prisma schema | `prisma/schema.prisma` | 10 models: Server, Agent, User, Role, Permission, UserRole, JoinToken, AuditLog, PulumiRun, Cluster. CockroachDB provider. |
| Database seeding | `prisma/seed.ts` | Creates admin/viewer/operator roles with proper allow/deny permissions. Idempotent via upsert. |
| Multi-arch builds + packaging | `nfpm.yaml`, `scripts/` | nfpm config for RPM/DEB. Bun compile for standalone binary (102MB labctl in `dist/`). |
| Gitea CI/CD | `.gitea/` (on remote) | Lint → typecheck → test → build → publish pipeline on mysources.co.uk. |
### DONE — Code exists, not yet connected end-to-end
| Feature | Code | What's Real | What's Missing |
|---------|------|------------|----------------|
| lab-agent connection library | `lab-agent/src/services/connection.ts` | `AgentConnection` class: WebSocket to labd, heartbeat (10s), exponential backoff reconnect (1-30s), state machine (disconnected/connecting/connected/reconnecting), handles server-shutdown messages. | **No daemon binary.** This is a library — nothing starts it. No systemd unit. No enrollment flow. |
| lab-agent command executor | `lab-agent/src/services/executor.ts` | `CommandExecutor` class: `spawn()` with timeout handling (SIGTERM then SIGKILL after 5s), stdout/stderr streaming via EventEmitter, stdin writing, signal forwarding. | **Not wired to WebSocket.** The executor and connection don't talk to each other. No message dispatch. |
| Agent registry (labd) | `labd/src/services/agent-registry.ts` | `AgentRegistry`: in-memory Map tracking by serverId and hostname, lifecycle events, heartbeat updates. Singleton exported. | **Not used by /ws/agent handler.** The WebSocket handler in `server.ts` just logs messages — it doesn't call `agentRegistry.register()`. |
| Message router (labd) | `labd/src/services/message-router.ts` | `MessageRouter`: handler registration, pending request tracking with timeouts, streaming support, log subscription, agent cleanup on disconnect. | **Not used.** `server.ts` doesn't call `messageRouter.handleMessage()`. The router exists but is dead code. |
| Token management | `labd/src/routes/auth.ts` | Create, list, revoke join tokens. Validates one-time vs reusable, expiry, revocation. Marks tokens as used. | Token validation works. **But enrollment returns `certificatePem: null`** — no actual certificate is issued. |
| CLI API client | `cli/src/api/client.ts` | `LabdClient` with mTLS support, typed methods for servers/tokens/health/enrollment. | Works for REST endpoints. **No CLI commands use it yet** — existing commands still talk directly to bastion HTTP. |
| CLI WebSocket streaming | `cli/src/api/websocket.ts` | `streamExec()` and `streamLogs()` functions. | **No `labctl exec` or `labctl logs` commands exist.** The streaming code has no consumer. |
| Zod validation | `labd/src/validation/` | Schemas for createToken, enrollment, serverFilters, createRole, permission patterns. Middleware for body/query validation. | **Not applied to routes.** The schemas and middleware exist but no route uses `preHandler: [validateBody(schema)]`. |
| Encryption service | `labd/src/services/encryption.ts` | AES-256-GCM with scrypt key derivation. Encrypt/decrypt roundtrip. Singleton from `CA_ENCRYPTION_KEY` env var. | **Not used anywhere.** No CA key is encrypted, no kubeconfig is stored. |
| Graceful shutdown | `labd/src/services/shutdown.ts` | SIGTERM/SIGINT handlers, agent notification, message router cleanup, DB disconnect, force exit timer. | Works but agent notification is a no-op since no agents are registered (see above). |
| Rate limiting | `labd/src/middleware/rate-limit.ts` | `@fastify/rate-limit`: 100/min global, 10/min for enrollment, 20/min for tokens. | **Wired up in `server.ts`.** This actually works. |
| Health checks | `labd/src/routes/health.ts` | `/healthz`, `/health`, `/health/detailed`, `/health/live`, `/health/ready`. Checks DB latency and agent count. | Works. Returns `agents: { connected: 0 }` since no agents ever register. |
| Error hierarchy | `shared/src/errors/` | `LabError`, `NotFoundError`, `PermissionDeniedError`, `ValidationError`, `AgentNotConnectedError`. | **Not used in routes.** Routes still use inline `reply.code(404).send({error: ...})`. |
| Table formatting | `cli/src/utils/table.ts` | `printTable`, `formatStatus`, `formatRelativeTime`, predefined column sets. | **Not used by existing commands.** `provision list` has its own inline formatting. |
| Resource parsing | `cli/src/utils/resource.ts` | Parse `server/labmaster`, `app/kube-system/nginx` format. | **Not used.** No commands accept `type/name` arguments yet. |
| Doctor command | `cli/src/commands/doctor.ts` | Config, cert, connectivity diagnostics. | Works standalone. |
| Login command | `cli/src/commands/login.ts` | Generates EC keypair, prompts for token, POSTs to `/api/auth/user-enroll`. | **labd has no `/api/auth/user-enroll` endpoint.** Only `/api/auth/enroll` exists (for agents). Login will 404. |
### NOT DONE — Phase 1 items from PRD with no code
| Feature | PRD Description | Status |
|---------|----------------|--------|
| Certificate Authority | Built-in CA in labd. Generate root CA, sign CSRs, revoke certs, rotate. | **Nothing.** No CA code. No X.509 operations. No `@peculiar/x509` dependency. `EncryptionService` exists but it's for data-at-rest, not PKI. |
| RBAC engine | Middleware that checks permissions on every request. Deny overrides allow. | **Nothing.** `auth.ts` middleware is a placeholder. No route checks permissions. Anyone can call any endpoint. |
| Audit logging | Log every action with user, session, action, resource, result, duration. | **Nothing.** `AuditLog` Prisma model exists but nothing writes to it. No audit middleware. |
| `labctl exec` | Remote command execution via labd → agent WebSocket relay. | **Nothing.** No `exec` CLI command. The executor library exists in lab-agent but isn't connected. |
| `labctl logs` | Resource-scoped log streaming (server, app, bastion, audit). | **Nothing.** No `logs` CLI command. |
| `labctl get servers` | List servers from labd with filters. | **Nothing.** No `get` CLI command. The API client has `getServers()` but no command calls it. |
| Smoke test stack | `podman-compose` with CockroachDB + labd + 2 agents, testing enrollment/heartbeat/exec/RBAC. | **Nothing.** `stack/docker-compose.yml` exists but only runs bastion + CockroachDB, not labd or agents. |
| Agent enrollment during PXE | Embed join token in kickstart, agent auto-enrolls on first boot. | **Nothing.** Kickstart installs k3s prereqs but doesn't install or start lab-agent. |
---
## Phase 2: Deployment
**Nothing from Phase 2 has been built.**
| Feature | Status |
|---------|--------|
| Reprovision labmaster as labmaster.ad.itaz.eu | Not done — manual operation |
| Deploy k3s with Cilium CNI | Not done — kickstart only sets up kernel prereqs, leaves a comment "run `curl -sfL https://get.k3s.io`" |
| Deploy CockroachDB on k3s | Not done — `docker-compose.yml` runs it in-memory for dev, no k8s manifests for CRDB |
| Deploy labd on k3s | **K8s manifests exist** (`deploy/k8s/labd/base/`) — Deployment, Service, ConfigMap, HPA, PDB. But no CockroachDB to connect to and no TLS configured. |
| Deploy bastion as managed app | Not done — bastion runs standalone, no Pulumi chart |
| Auto-enroll agents during PXE | Not done — no agent install in kickstart, no token embedding |
---
## Phase 3: Infrastructure as Code
**Nothing from Phase 3 has been built.**
| Feature | Status |
|---------|--------|
| Module system | Not done — no `module.yaml`, no module loader |
| Pulumi charts | Not done — no Pulumi dependency, no chart structure |
| `labctl apps install/upgrade/rollback` | Not done — no `apps` command |
| `labctl apply -f` | Not done — no `apply` command |
| `kubectl proxy` (audited) | Not done — no kubectl proxy |
| Kubeconfig store (encrypted) | `EncryptionService` exists but nothing uses it. `Cluster.kubeconfigEnc` field exists in Prisma but nothing reads/writes it. |
---
## Phase 4: Multi-Cloud
**Nothing from Phase 4 has been built.**
| Feature | Status |
|---------|--------|
| AWS provider | Not done |
| Reusable join tokens for ASGs | Token model supports `reusable` type, but no AWS integration |
| Cilium Cluster Mesh | Not done |
| Ephemeral test environments | Not done |
| Grafana Loki | Not done |
---
## Infrastructure Files
| File | Status |
|------|--------|
| `Dockerfile.labd` | Exists. Multi-stage Alpine build. Would work if you `docker build` it. |
| `Dockerfile.bastion` | Exists. Multi-stage Fedora build. Would work. |
| `.dockerignore` | Exists. |
| `deploy/k8s/labd/base/` | Kustomize manifests for labd (Deployment, Service, ConfigMap, HPA, PDB). Points at a non-existent CockroachDB and has no TLS. |
| `stack/docker-compose.yml` | Runs bastion + CockroachDB for local dev. Works. |
| `nfpm.yaml` | RPM/DEB packaging config. Works with `nfpm pkg`. |
---
## The Disconnection Problem
The core issue is that many services were built in isolation but never wired together:
```
┌─────────────────────────────────────────────────────────┐
│ BUILT BUT NOT CONNECTED │
│ │
│ AgentConnection ──✗──▶ /ws/agent handler │
│ CommandExecutor ──✗──▶ MessageRouter │
│ MessageRouter ──✗──▶ /ws/agent handler │
│ AgentRegistry ──✗──▶ /ws/agent handler │
│ Zod schemas ──✗──▶ Route preHandlers │
│ Error classes ──✗──▶ Route error handling │
│ LabdClient ──✗──▶ CLI commands (get/exec/logs) │
│ Table formatting──✗──▶ CLI commands │
│ Resource parsing──✗──▶ CLI commands │
│ EncryptionService──✗──▶ CA / kubeconfig storage │
│ Login command ──✗──▶ /api/auth/user-enroll (missing) │
│ Audit logging ──✗──▶ Any middleware │
│ RBAC engine ──✗──▶ Any middleware │
└─────────────────────────────────────────────────────────┘
```
---
## What Actually Works End-to-End Today
1. **PXE boot a bare-metal machine:**
```
labctl init bastion standalone start
# Machine PXE boots → discovered automatically
labctl provision list
labctl provision install AA:BB:CC:DD:EE:FF worker-1 --role worker
# Machine reboots → installs Fedora → reports complete
```
2. **Manage bastion lifecycle:**
```
labctl init bastion standalone status
labctl init bastion standalone stop
```
3. **Start labd (without database):**
```
LABD_PORT=3100 tsx src/labd/src/main.ts
# Starts with stub DB, health endpoint works, token/server routes return errors
```
4. **Start labd (with CockroachDB):**
```
docker-compose -f stack/docker-compose.yml up cockroachdb
DATABASE_URL=postgresql://root@localhost:26257/lab tsx src/labd/src/main.ts
# Token creation/listing/revocation works
# Server listing works (empty until agents register)
```
5. **CLI diagnostics:**
```
labctl doctor
labctl config list
labctl version
```
That's it. No agent communication, no remote exec, no log streaming, no RBAC, no certificates.
---
## Recommended Next Steps (to make Phase 1 actually work)
### Priority 1: Wire up the agent connection
1. Update `/ws/agent` handler to use `agentRegistry.register()` and `messageRouter.handleMessage()`
2. Create lab-agent daemon binary that uses `AgentConnection` + `CommandExecutor`
3. Create systemd unit for lab-agent
### Priority 2: Certificate Authority
1. Add `@peculiar/x509` dependency
2. Implement CA service: generate root CA, sign CSRs
3. Wire enrollment route to actually sign and return certificates
4. Store CA key encrypted using `EncryptionService`
### Priority 3: RBAC + Audit
1. Create RBAC middleware that checks `Permission` table
2. Create audit middleware that writes to `AuditLog`
3. Apply both to all routes
### Priority 4: CLI commands for labd
1. `labctl get servers` using `LabdClient.getServers()`
2. `labctl exec server/<name>` using `streamExec()`
3. `labctl logs server/<name>` using `streamLogs()`
### Priority 5: Smoke test stack
1. Update `docker-compose.yml` to include labd + 2 agents
2. Write integration tests for enrollment → heartbeat → exec → logs

8
bastion/.dockerignore Normal file
View File

@@ -0,0 +1,8 @@
node_modules
dist
.git
*.log
.env
.env.*
*.tsbuildinfo
.taskmaster

View File

@@ -0,0 +1,132 @@
# PRD: Refactor K3s Module from Bash Heredocs to Pulumi TypeScript
## Problem
The k3s install/configure/health module currently generates ~300 lines of bash heredoc strings embedded in TypeScript files (`install.ts`, `configure.ts`, `health.ts`). These are unmaintainable, untestable, and impossible to compose. This is the same bash-in-code problem that drove the bastion TypeScript rewrite.
## Vision
The lab platform uses Pulumi as its IaC engine:
- **Central execution**: labd runs Pulumi programs in labcontroller k8s for cloud/remote resources with RBAC, global state, and audit trail (PulumiRun table already exists in CockroachDB)
- **Local execution**: lab-agents run Pulumi programs directly on bare-metal nodes
- **Multi-environment**: supports multiple datacenters, clouds (baremetal, AWS, GCP), production/dev/ephemeral environments
## Current State
### Files to replace
- `src/modules/modules/k3s/src/install.ts` — 275 lines, generates bash for 10 install phases
- `src/modules/modules/k3s/src/configure.ts` — 118 lines, generates bash for 5 configure phases
- `src/modules/modules/k3s/src/health.ts` — 57 lines, generates bash for 6 health checks
### Existing infrastructure
- `sshExec(ip, user, command, opts)` and `sshExecStreaming()` — SSH execution primitives in `src/modules/src/ssh.ts`
- Module system: `ModuleRunner`, `ModuleRegistry`, `Module` interface with install/configure/health phases
- `@lab/shared` types: `BastionConfig`, `K3sInstallContext`, roles, OS types
- PulumiRun model in Prisma schema (labd) — tracks Pulumi execution state
- labcontroller module generates k8s manifests (cockroachdb.ts, labd.ts, bastion.ts) — these also need Pulumi migration eventually
### 32 distinct operations currently in bash
**Install phase (10 steps):**
1. Load kernel modules (br_netfilter, overlay, ip_conntrack)
2. Apply CIS sysctl hardening (9 params)
3. Disable swap
4. Disable firewall (firewalld/ufw — mask to survive reboot)
5. Set SELinux permissive
6. Write k3s server config (flannel=none, secrets-encryption, audit, CIS hardened)
7. Write audit policy YAML
8. Clean up stale CNI (flannel.1 vxlan, cilium interfaces, port 8472 conflicts)
9. Install k3s binary (curl | sh)
10. Install Cilium CNI (detect arch, detect interface, kubeProxyReplacement)
**Configure phase (5 steps):**
1. Fix CoreDNS upstream DNS (systemd-resolved 127.0.0.53 unreachable from pod netns)
2. Configure log rotation
3. Check certificate expiry
4. Apply default network policies (deny-ingress, allow-dns-egress)
5. Apply Pod Security Standards (restricted)
**Health checks (6 checks):**
1. k3s service active
2. Node Ready condition
3. API server /healthz
4. Secrets encryption enabled
5. Cilium status
6. kube-system pod status
## Requirements
### Architecture decisions needed (discuss with user via task-master)
1. **Pulumi structure**: micro-stacks vs monorepo-by-env vs component-library vs GitOps operator
2. **Multi-cloud support**: how stacks are organized across baremetal/AWS/GCP
3. **Environment model**: how prod/dev/ephemeral environments are represented
4. **State backend**: Pulumi Cloud vs self-hosted (S3/CockroachDB)
5. **Execution model**: who runs `pulumi up` — labd central, lab-agent local, or both?
### Operation design
- Each operation is a typed TypeScript async function using `sshExec()`
- Standard interface: `OperationContext` in, `OperationResult` out
- **Idempotent**: check before act, report `changed: boolean`
- **Composable**: operations grouped into logical units (host-prep, networking, hardening)
- **Testable**: mock sshExec for unit tests
- **Future Pulumi-ready**: each function maps 1:1 to a `remote.Command` resource
### Groups (logical composition)
- `host-prep`: kernel-modules + sysctl + swap + firewall + selinux
- `k3s-server`: k3s-config + audit-policy + cni-cleanup + k3s-install
- `k3s-agent`: k3s-config (agent) + k3s-install (agent mode)
- `networking`: cilium + dns-fix + network-policy
- `hardening`: pod-security + cert-check + log-rotation
### Pulumi integration (when added)
- Add `@pulumi/pulumi` and `@pulumi/command` as dependencies
- Each operation becomes a `command.remote.Command` resource
- Groups become `pulumi.ComponentResource` classes
- K3sCluster becomes a top-level ComponentResource that composes groups
- Stacks per environment: `lab-baremetal`, `aws-prod`, `dev`, `ephemeral-pr-123`
## File structure
```
src/modules/modules/k3s/src/
├── types.ts # K3sConfig, OperationContext, OperationResult
├── utils.ts # sshOpts(), runSequential(), file helpers
├── operations/ # ~15 atomic operations
│ ├── kernel-modules.ts
│ ├── sysctl.ts
│ ├── swap.ts
│ ├── firewall.ts
│ ├── selinux.ts
│ ├── k3s-config.ts
│ ├── audit-policy.ts
│ ├── cni-cleanup.ts
│ ├── k3s-install.ts
│ ├── cilium.ts
│ ├── dns-fix.ts
│ ├── log-rotation.ts
│ ├── network-policy.ts
│ ├── pod-security.ts
│ └── cert-check.ts
├── groups/ # Logical groupings
│ ├── host-prep.ts
│ ├── k3s-server.ts
│ ├── k3s-agent.ts
│ ├── networking.ts
│ └── hardening.ts
├── health/ # Health checks
│ ├── k3s-service.ts
│ ├── node-ready.ts
│ ├── api-health.ts
│ ├── secrets-encryption.ts
│ ├── cilium-status.ts
│ └── pod-status.ts
├── k3s-module.ts # Module implementation
└── index.ts # Public exports
```
## Success criteria
- Zero bash heredoc strings in the k3s module
- Every operation independently testable with mocked sshExec
- `labctl app k3s install <target>` works end-to-end
- `labctl app k3s health` works end-to-end
- Existing test suite passes (updated for new API)
- Clear path to wrapping operations as Pulumi resources

View File

@@ -0,0 +1,172 @@
# PRD: Resource Tracking & kubectl-style CLI
## Problem
The lab platform currently has fragmented state management:
- Bastion keeps machine state in an ephemeral JSON file (`/tmp/lab-bastion/state.json`) that is lost on pod restart
- labd receives state syncs from bastions but only stores them in memory — the `Server` table in CockroachDB is never written to
- There is no system to track relationships between resources (servers belong to clusters, clusters run on servers, networks connect servers)
- The CLI (`labctl`) uses an inconsistent verb-noun structure (`labctl provision list`, `labctl app k3s install`) instead of a uniform resource-oriented pattern
- RBAC permissions reference resources (server, cloud, environment) but there is no resource registry to validate against
## Vision
A unified resource tracking system where all infrastructure objects (servers, clusters, networks, bastions, VMs) are persisted in CockroachDB via labd, with relationships between them, and managed through a kubectl-style CLI. This replaces the ephemeral JSON state and becomes the single source of truth for the platform.
## Current State
### Database (CockroachDB via Prisma)
Existing models that are scaffolded but mostly unused:
- `Server` — hostname, mac, cloud, environment, role, labels, ip, status (0 rows)
- `Agent` — mTLS certificate enrollment per server (0 rows)
- `Bastion` — PXE server registration (1 row, labmaster)
- `Cluster` — k8s cluster metadata (0 rows)
- `User`, `Role`, `Permission`, `UserRole` — RBAC framework (seeded with 3 roles, 6 permissions)
- `JoinToken` — agent/bastion enrollment tokens
- `AuditLog` — action audit trail
### Bastion State (ephemeral JSON)
Three categories tracked per-bastion:
- `discovered` — machines found via PXE with hardware info (CPU, RAM, disks, NICs, arch)
- `install_queue` — machines queued for OS install with progress tracking
- `installed` — machines with OS installed (hostname, role, IP, OS)
### CLI Structure (current)
```
labctl init bastion standalone [start|stop|status]
labctl provision [list|install|reprovision|forget|logs]
labctl app [k3s|labcontroller]
labctl config [list|get|set]
labctl roles
labctl doctor
labctl login
labctl logs
```
## Requirements
### 1. Persist Bastion State to Database
When labd receives `bastion-state-sync` messages, it must upsert machines into the `Server` table:
- Discovered machines → create/update Server with status "discovered", store HardwareInfo as JSON labels
- Queued machines → update Server status to "provisioning"
- Installed machines → update Server with hostname, IP, role, OS, status "installed"
- Track which bastion owns which server (add `bastionId` to Server model)
- Track hardware info: arch, cpu_model, cpu_cores, memory_gb, disks, nics
The bastion's local JSON state becomes a cache; labd's database is the source of truth. On bastion startup, it should load its state from labd if available.
### 2. Resource Model Expansion
Add new models to the Prisma schema for tracking infrastructure:
**Network** — L2/L3 network segments
- name, cidr, vlan, gateway, domain, dhcpEnabled
- Servers have NICs on networks
**ServerNic** — NIC-to-network mapping
- serverId, networkId, mac, ip, name, state (UP/DOWN)
- Derived from HardwareInfo during discovery
**ServerDisk** — Disk inventory per server
- serverId, name, sizeGb, model
- Derived from HardwareInfo during discovery
**ClusterMember** — Server-to-cluster membership
- clusterId, serverId, role (control-plane, worker)
### 3. kubectl-style CLI Redesign
Restructure labctl to follow the `mcpctl` / `kubectl` pattern:
```
# Core CRUD verbs that work on any resource
labctl get <resource> [name] # List or get specific resource
labctl describe <resource> <name> # Detailed view with relationships
labctl create <resource> [flags] # Create a resource
labctl delete <resource> <name> # Delete a resource
labctl edit <resource> <name> # Edit in $EDITOR
labctl apply -f <file> # Declarative apply from YAML
# Resource types (with aliases)
servers (server, srv)
clusters (cluster)
networks (network, net)
bastions (bastion)
roles (role)
users (user)
tokens (token)
audit (audit)
# Output formats
-o table (default), -o json, -o yaml, -o wide
# Examples
labctl get servers # List all servers
labctl get servers -o wide # With extra columns (disks, NICs)
labctl get server labmaster # Get specific server
labctl describe server labmaster # Full details + relationships
labctl get servers --role worker # Filter by role
labctl get servers --status discovered # Filter by status
labctl get clusters # List clusters
labctl describe cluster lab-k3s # Cluster members, health
labctl get networks # List networks
labctl create network --name lab --cidr 192.168.8.0/24 --gateway 192.168.8.1
# Provisioning becomes actions on server resources
labctl provision <server> --os fedora-43 --role worker # Queue install
labctl reprovision <server> # Reinstall
labctl forget <server> # Remove from tracking
# App management stays as-is but simplified
labctl app install k3s <server>
labctl app health k3s [server]
# Admin
labctl bastion start [--foreground] # Start local bastion
labctl bastion status # Bastion health
labctl login # Auth
labctl doctor # Diagnostics
```
### 4. Resource Aliases & Resolution
Follow mcpctl's pattern from `shared.ts`:
- Accept singular, plural, and short aliases: `server`, `servers`, `srv` all resolve to the same resource
- Accept name or ID: `labctl get server labmaster` or `labctl get server <uuid>`
- Accept MAC address for servers: `labctl get server 38:05:25:33:e2:e4`
### 5. RBAC Integration
The existing Permission model uses `action:cloud:environment:server` patterns. Wire this into the resource system:
- CLI commands check permissions before executing
- `labctl get` respects read permissions (only show resources the user can see)
- `labctl provision` requires `apply` permission on the target server
- `labctl delete` requires `destroy` permission
- Audit all resource operations to the AuditLog table
### 6. Bastion State Directory Fix
Fix the bug where the CLI's `--dir` default (`/tmp/lab-bastion`) overrides the `BASTION_DIR=/data` environment variable. The CLI option should use the env var as its default:
```typescript
.option("--dir <dir>", "Bastion data directory", process.env["BASTION_DIR"] ?? "/tmp/lab-bastion")
```
## Technical Constraints
- Database: CockroachDB with Prisma ORM (already deployed)
- API: Fastify + WebSocket (labd)
- CLI: Commander.js (labctl)
- Auth: mTLS certificates (planned), join tokens (implemented)
- Monorepo: pnpm workspace with @lab/shared, @lab/bastion, @lab/cli, @lab/labd
- The bastion-to-labd WebSocket protocol is defined in @lab/shared/protocol
## Success Criteria
1. `labctl get servers` shows all machines (discovered, provisioning, installed) from the database
2. Server state survives bastion and labd pod restarts
3. `labctl describe server <name>` shows hardware info, network, cluster membership
4. Resources have tracked relationships (server→cluster, server→network, bastion→server)
5. RBAC permissions are enforced on CLI operations
6. All resource mutations are audit-logged
7. CLI follows consistent kubectl-style `verb resource [name] [flags]` pattern

View File

@@ -0,0 +1,93 @@
# Dockerfile.bastion -- PXE boot server (dnsmasq DHCP/TFTP + HTTP)
# Requires host networking and NET_ADMIN/NET_RAW capabilities.
# ── Stage 1: Build ───────────────────────────────────────────────
FROM node:22-alpine AS builder
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
WORKDIR /app
# Copy workspace config and package manifests first (layer cache)
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json tsconfig.base.json tsconfig.json ./
COPY src/shared/package.json src/shared/tsconfig.json src/shared/
COPY src/bastion/package.json src/bastion/tsconfig.json src/bastion/
COPY src/cli/package.json src/cli/tsconfig.json src/cli/
COPY src/modules/package.json src/modules/tsconfig.json src/modules/
# Install all dependencies (dev included -- needed for build)
RUN pnpm install --frozen-lockfile
# Copy source code
COPY src/shared/src/ src/shared/src/
COPY src/bastion/src/ src/bastion/src/
COPY src/cli/src/ src/cli/src/
COPY src/modules/src/ src/modules/src/
COPY src/modules/modules/ src/modules/modules/
# Build TypeScript
RUN pnpm build
# ── Stage 1b: Build iPXE snp.efi (uses UEFI SNP protocol for ISO boot) ──
FROM fedora:43 AS ipxe-builder
RUN dnf install -y git gcc make perl-interpreter xz-devel gcc-aarch64-linux-gnu && dnf clean all
RUN git clone --depth=1 https://github.com/ipxe/ipxe.git /tmp/ipxe
RUN cd /tmp/ipxe/src && make bin-x86_64-efi/snp.efi && \
make CROSS_COMPILE=aarch64-linux-gnu- bin-arm64-efi/snp.efi
# ── Stage 2: Production runtime (Fedora -- needs dnsmasq) ───────
FROM fedora:43
RUN dnf install -y \
dnsmasq \
ipxe-bootimgs-x86 \
ipxe-bootimgs-aarch64 \
iproute \
curl \
openssh-clients \
nodejs \
npm \
xorriso \
mtools \
&& dnf clean all
# iPXE snp.efi built from source (Fedora only ships snponly, which can't
# boot from CD-ROM/USB -- it requires PXE chainloading)
COPY --from=ipxe-builder /tmp/ipxe/src/bin-x86_64-efi/snp.efi /usr/share/ipxe/ipxe-snp-x86_64.efi
COPY --from=ipxe-builder /tmp/ipxe/src/bin-arm64-efi/snp.efi /usr/share/ipxe/arm64-efi/ipxe-snp.efi
# Install pnpm
RUN npm install -g pnpm@9
WORKDIR /app
# Copy workspace config and package manifests
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json ./
COPY src/shared/package.json src/shared/
COPY src/bastion/package.json src/bastion/
COPY src/cli/package.json src/cli/
COPY src/modules/package.json src/modules/
# Install production dependencies
RUN pnpm install --frozen-lockfile --prod 2>/dev/null || pnpm install --prod
# Copy built output from builder
COPY --from=builder /app/src/shared/dist/ src/shared/dist/
COPY --from=builder /app/src/bastion/dist/ src/bastion/dist/
COPY --from=builder /app/src/cli/dist/ src/cli/dist/
COPY --from=builder /app/src/modules/dist/ src/modules/dist/
# Create data directories
RUN mkdir -p /data/state /data/tftp /data/http
ENV NODE_ENV=production
ENV BASTION_DIR=/data
ENV HTTP_PORT=8080
EXPOSE 8080/tcp
EXPOSE 67/udp
EXPOSE 69/udp
EXPOSE 4011/udp
ENTRYPOINT ["node", "src/cli/dist/index.js", "init", "bastion", "standalone", "start", "--foreground"]

73
bastion/Dockerfile.labd Normal file
View File

@@ -0,0 +1,73 @@
# Dockerfile.labd -- multi-stage build for the labd master daemon
# Runs the Fastify API server with Prisma/CockroachDB backend.
# ── Stage 1: Build ───────────────────────────────────────────────
FROM node:22-alpine AS builder
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
WORKDIR /app
# Copy workspace config and package manifests first (layer cache)
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json tsconfig.base.json tsconfig.json ./
COPY src/shared/package.json src/shared/tsconfig.json src/shared/
COPY src/labd/package.json src/labd/tsconfig.json src/labd/
# Install all dependencies (dev included -- needed for build)
RUN pnpm install --frozen-lockfile
# Copy Prisma schema and generate client
COPY src/labd/prisma/ src/labd/prisma/
RUN pnpm --filter @lab/labd exec prisma generate
# Copy source code
COPY src/shared/src/ src/shared/src/
COPY src/labd/src/ src/labd/src/
# Build TypeScript (shared first via project references)
RUN pnpm --filter @lab/shared build && pnpm --filter @lab/labd build
# Hoist the generated Prisma client so stage 2 can COPY it from a stable path
RUN mkdir -p /app/_prisma && \
cp -r $(find /app/node_modules/.pnpm -path '*/.prisma/client' -type d | head -1) /app/_prisma/client
# ── Stage 2: Production runtime ─────────────────────────────────
FROM node:22-alpine
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
WORKDIR /app
# Copy workspace config and package manifests
COPY pnpm-workspace.yaml pnpm-lock.yaml package.json ./
COPY src/shared/package.json src/shared/
COPY src/labd/package.json src/labd/
# Install production dependencies only
RUN pnpm install --frozen-lockfile --prod 2>/dev/null || pnpm install --prod
# Copy built output from builder
COPY --from=builder /app/src/shared/dist/ src/shared/dist/
COPY --from=builder /app/src/labd/dist/ src/labd/dist/
# Copy Prisma schema + generated client into pnpm store location
# Prisma expects .prisma/client as a sibling of @prisma/ in the same node_modules
COPY --from=builder /app/src/labd/prisma/ src/labd/prisma/
COPY --from=builder /app/_prisma/client/ /tmp/_prisma_client/
RUN PRISMA_CLIENT_DIR=$(find /app/node_modules/.pnpm -path '*/@prisma/client' -type d | head -1) && \
NM_DIR="$(dirname "$(dirname "$PRISMA_CLIENT_DIR")")" && \
mkdir -p "$NM_DIR/.prisma/client" && \
cp -r /tmp/_prisma_client/* "$NM_DIR/.prisma/client/" && \
echo "Installed Prisma generated client at: $NM_DIR/.prisma/client/" && \
rm -rf /tmp/_prisma_client
ENV NODE_ENV=production
ENV DATABASE_URL=postgresql://root@cockroachdb:26257/labctl?sslmode=disable
ENV LABD_PORT=3100
ENV LABD_HOST=0.0.0.0
EXPOSE 3100
USER node
ENTRYPOINT ["node", "src/labd/dist/main.js"]

View File

@@ -5,7 +5,7 @@ _labctl() {
local cur prev words cword
_init_completion || return
local top_commands="init provision"
local top_commands="version init provision config login doctor app roles"
# Extract the subcommand chain (skip options and their values)
local -a subcmd_chain=()
@@ -23,7 +23,7 @@ _labctl() {
case "$chain_str" in
"init bastion standalone start")
COMPREPLY=($(compgen -W "--port --dir --domain --dhcp-mode --fedora --arch --timezone --locale --skip-dnsmasq --skip-artifacts -h --help" -- "$cur"))
COMPREPLY=($(compgen -W "--port --dir --domain --dhcp-mode --fedora --arch --timezone --locale --skip-dnsmasq --skip-artifacts --foreground -h --help" -- "$cur"))
return ;;
"init bastion standalone stop")
COMPREPLY=($(compgen -W "--dir -h --help" -- "$cur"))
@@ -34,6 +34,21 @@ _labctl() {
"init bastion standalone")
COMPREPLY=($(compgen -W "start stop status -h --help" -- "$cur"))
return ;;
"app labcontroller deploy")
COMPREPLY=($(compgen -W "--user --port --crdb-replicas -h --help" -- "$cur"))
return ;;
"app labcontroller status")
COMPREPLY=($(compgen -W "--user --port -h --help" -- "$cur"))
return ;;
"app k3s install")
COMPREPLY=($(compgen -W "--role --user --port --k3s-server --k3s-token -h --help" -- "$cur"))
return ;;
"app k3s health")
COMPREPLY=($(compgen -W "--user --port -h --help" -- "$cur"))
return ;;
"app k3s list")
COMPREPLY=($(compgen -W "--user --port -h --help" -- "$cur"))
return ;;
"init bastion")
COMPREPLY=($(compgen -W "standalone -h --help" -- "$cur"))
return ;;
@@ -41,19 +56,58 @@ _labctl() {
COMPREPLY=($(compgen -W "--port -h --help" -- "$cur"))
return ;;
"provision install")
COMPREPLY=($(compgen -W "--role --disk --port -h --help" -- "$cur"))
COMPREPLY=($(compgen -W "--role --os --disk --port -h --help" -- "$cur"))
return ;;
"provision reprovision")
COMPREPLY=($(compgen -W "--role --disk --port -h --help" -- "$cur"))
COMPREPLY=($(compgen -W "--role --os --disk --port -h --help" -- "$cur"))
return ;;
"provision forget")
COMPREPLY=($(compgen -W "--port -h --help" -- "$cur"))
return ;;
"provision logs")
COMPREPLY=($(compgen -W "-f --follow --port -h --help" -- "$cur"))
return ;;
"config list")
COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;;
"config get")
COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;;
"config set")
COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;;
"config path")
COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;;
"app labcontroller")
COMPREPLY=($(compgen -W "deploy status -h --help" -- "$cur"))
return ;;
"app k3s")
COMPREPLY=($(compgen -W "install health list -h --help" -- "$cur"))
return ;;
"version")
COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;;
"init")
COMPREPLY=($(compgen -W "bastion -h --help" -- "$cur"))
return ;;
"provision")
COMPREPLY=($(compgen -W "list install reprovision forget -h --help" -- "$cur"))
COMPREPLY=($(compgen -W "list install reprovision forget logs -h --help" -- "$cur"))
return ;;
"config")
COMPREPLY=($(compgen -W "list get set path -h --help" -- "$cur"))
return ;;
"login")
COMPREPLY=($(compgen -W "--server -h --help" -- "$cur"))
return ;;
"doctor")
COMPREPLY=($(compgen -W "--json -h --help" -- "$cur"))
return ;;
"app")
COMPREPLY=($(compgen -W "labcontroller k3s -h --help" -- "$cur"))
return ;;
"roles")
COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;;
"")
COMPREPLY=($(compgen -W "$top_commands -h --help -v --version" -- "$cur"))

View File

@@ -8,7 +8,7 @@ complete -c labctl -f
complete -c labctl -s v -l version -d 'Show version'
complete -c labctl -s h -l help -d 'Show help'
# Helper: test if a subcommand chain is active
# Helper: test if exactly a subcommand chain is active (no extra positional args)
function __labctl_using_cmd
set -l tokens (commandline -opc)
set -l expected $argv
@@ -33,9 +33,63 @@ function __labctl_using_cmd
test $found -eq $depth
end
# Helper: test if command starts with a subcommand chain (options still apply after args)
function __labctl_in_cmd
set -l tokens (commandline -opc)
set -l expected $argv
set -l depth (count $expected)
set -l found 0
for tok in $tokens[2..]
if string match -q -- "-*" $tok
continue
end
set found (math $found + 1)
if test $found -le $depth
if test "$tok" != "$expected[$found]"
return 1
end
end
end
test $found -ge $depth
end
# Dynamic: fetch machine hostnames from bastion (installed + queued)
function __labctl_installed_hosts
curl -s http://localhost:8080/api/machines 2>/dev/null |
python3 -c 'import sys,json; d=json.load(sys.stdin); hosts=[v.get("hostname","") for v in {**d.get("install_queue",{}), **d.get("installed",{})}.values() if v.get("hostname")]; [print(h) for h in set(hosts)]' 2>/dev/null
end
# Dynamic: fetch all known MAC addresses (discovered + queue + installed)
function __labctl_known_macs
curl -s http://localhost:8080/api/machines 2>/dev/null |
python3 -c 'import sys,json; d=json.load(sys.stdin); [print(k) for k in {**d.get("discovered",{}), **d.get("install_queue",{}), **d.get("installed",{})}]' 2>/dev/null
end
# Dynamic: fetch hostnames and MACs from all states
function __labctl_hosts_and_macs
curl -s http://localhost:8080/api/machines 2>/dev/null |
python3 -c 'import sys,json; d=json.load(sys.stdin); a={**d.get("discovered",{}), **d.get("install_queue",{}), **d.get("installed",{})}; macs=list(a.keys()); hosts=[v.get("hostname","") for v in {**d.get("install_queue",{}), **d.get("installed",{})}.values() if v.get("hostname")]; [print(x) for x in set(macs+hosts)]' 2>/dev/null
end
# Target argument completions
complete -c labctl -n "__labctl_using_cmd app k3s install" -a "(__labctl_installed_hosts)" -d 'installed host'
complete -c labctl -n "__labctl_using_cmd app k3s health" -a "(__labctl_installed_hosts)" -d 'installed host'
complete -c labctl -n "__labctl_using_cmd app labcontroller deploy" -a "(__labctl_installed_hosts)" -d 'installed host'
complete -c labctl -n "__labctl_using_cmd app labcontroller status" -a "(__labctl_installed_hosts)" -d 'installed host'
complete -c labctl -n "__labctl_using_cmd provision install" -a "(__labctl_known_macs)" -d 'MAC address'
complete -c labctl -n "__labctl_using_cmd provision reprovision" -a "(__labctl_hosts_and_macs)" -d 'host or MAC'
complete -c labctl -n "__labctl_using_cmd provision forget" -a "(__labctl_hosts_and_macs)" -d 'host or MAC'
complete -c labctl -n "__labctl_using_cmd provision logs" -a "(__labctl_hosts_and_macs)" -d 'host or MAC'
# Top-level commands
complete -c labctl -n "not __fish_seen_subcommand_from init provision" -a init -d 'Initialise infrastructure components'
complete -c labctl -n "not __fish_seen_subcommand_from init provision" -a provision -d 'Machine provisioning operations'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a version -d 'Show version information'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a init -d 'Initialise infrastructure components'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a provision -d 'Machine provisioning operations'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a config -d 'View and modify CLI configuration'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a login -d 'Authenticate with labd and obtain client certificate'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a doctor -d 'Diagnose configuration and connectivity issues'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a app -d 'Application management'
complete -c labctl -n "not __fish_seen_subcommand_from version init provision config login doctor app roles" -a roles -d 'List available machine roles'
# init subcommands
complete -c labctl -n "__labctl_using_cmd init" -a bastion -d 'Bastion PXE server management'
@@ -49,43 +103,100 @@ complete -c labctl -n "__labctl_using_cmd init bastion standalone" -a stop -d 'S
complete -c labctl -n "__labctl_using_cmd init bastion standalone" -a status -d 'Show bastion server status'
# init bastion standalone start options
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l port -d 'HTTP port' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l dir -d 'Bastion data directory' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l domain -d 'Internal domain for hostnames' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l dhcp-mode -d 'DHCP mode: proxy or full' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l fedora -d 'Fedora version' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l arch -d 'Architecture' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l timezone -d 'Timezone' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l locale -d 'Locale' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l skip-dnsmasq -d 'Skip starting dnsmasq (for testing)'
complete -c labctl -n "__labctl_using_cmd init bastion standalone start" -l skip-artifacts -d 'Skip downloading boot artifacts (for testing)'
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l port -d 'HTTP port' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l dir -d 'Bastion data directory' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l domain -d 'Internal domain for hostnames' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l dhcp-mode -d 'DHCP mode: proxy or full' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l fedora -d 'Fedora version' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l arch -d 'Architecture' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l timezone -d 'Timezone' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l locale -d 'Locale' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l skip-dnsmasq -d 'Skip starting dnsmasq (for testing)'
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l skip-artifacts -d 'Skip downloading boot artifacts (for testing)'
complete -c labctl -n "__labctl_in_cmd init bastion standalone start" -l foreground -d 'Run in foreground (default: daemonize)'
# init bastion standalone stop options
complete -c labctl -n "__labctl_using_cmd init bastion standalone stop" -l dir -d 'Bastion data directory' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone stop" -l dir -d 'Bastion data directory' -x
# init bastion standalone status options
complete -c labctl -n "__labctl_using_cmd init bastion standalone status" -l dir -d 'Bastion data directory' -x
complete -c labctl -n "__labctl_using_cmd init bastion standalone status" -l port -d 'Bastion HTTP port' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone status" -l dir -d 'Bastion data directory' -x
complete -c labctl -n "__labctl_in_cmd init bastion standalone status" -l port -d 'Bastion HTTP port' -x
# provision subcommands
complete -c labctl -n "__labctl_using_cmd provision" -a list -d 'List all known machines'
complete -c labctl -n "__labctl_using_cmd provision" -a install -d 'Queue a discovered machine for Fedora installation'
complete -c labctl -n "__labctl_using_cmd provision" -a reprovision -d 'Queue install + SSH reboot into PXE for reprovision'
complete -c labctl -n "__labctl_using_cmd provision" -a install -d 'Queue a discovered machine for OS installation'
complete -c labctl -n "__labctl_using_cmd provision" -a reprovision -d 'Queue install + SSH reboot into PXE (target: hostname, MAC, or IP)'
complete -c labctl -n "__labctl_using_cmd provision" -a forget -d 'Remove a machine from bastion state'
complete -c labctl -n "__labctl_using_cmd provision" -a logs -d 'Show provisioning logs for a machine (hostname, MAC, or IP)'
# provision list options
complete -c labctl -n "__labctl_using_cmd provision list" -l port -d 'Bastion HTTP port' -x
complete -c labctl -n "__labctl_in_cmd provision list" -l port -d 'Bastion HTTP port' -x
# provision install options
complete -c labctl -n "__labctl_using_cmd provision install" -l role -d 'Machine role: worker or infra' -x
complete -c labctl -n "__labctl_using_cmd provision install" -l disk -d 'Target disk device (auto-detect if omitted)' -x
complete -c labctl -n "__labctl_using_cmd provision install" -l port -d 'Bastion HTTP port' -x
complete -c labctl -n "__labctl_in_cmd provision install" -l role -d 'Machine role (see below)' -xa 'vanilla worker infra labcontroller'
complete -c labctl -n "__labctl_in_cmd provision install" -l os -d 'Operating system' -xa 'fedora-43 ubuntu-26.04'
complete -c labctl -n "__labctl_in_cmd provision install" -l disk -d 'Target disk device (auto-detect if omitted)' -x
complete -c labctl -n "__labctl_in_cmd provision install" -l port -d 'Bastion HTTP port' -x
# provision reprovision options
complete -c labctl -n "__labctl_using_cmd provision reprovision" -l role -d 'Machine role: worker or infra' -x
complete -c labctl -n "__labctl_using_cmd provision reprovision" -l disk -d 'Target disk device (auto-detect if omitted)' -x
complete -c labctl -n "__labctl_using_cmd provision reprovision" -l port -d 'Bastion HTTP port' -x
complete -c labctl -n "__labctl_in_cmd provision reprovision" -l role -d 'Machine role (see below)' -xa 'vanilla worker infra labcontroller'
complete -c labctl -n "__labctl_in_cmd provision reprovision" -l os -d 'Operating system' -xa 'fedora-43 ubuntu-26.04'
complete -c labctl -n "__labctl_in_cmd provision reprovision" -l disk -d 'Target disk device (auto-detect if omitted)' -x
complete -c labctl -n "__labctl_in_cmd provision reprovision" -l port -d 'Bastion HTTP port' -x
# provision forget options
complete -c labctl -n "__labctl_using_cmd provision forget" -l port -d 'Bastion HTTP port' -x
complete -c labctl -n "__labctl_in_cmd provision forget" -l port -d 'Bastion HTTP port' -x
# provision logs options
complete -c labctl -n "__labctl_in_cmd provision logs" -s f -l follow -d 'Follow logs in real-time (SSE stream)'
complete -c labctl -n "__labctl_in_cmd provision logs" -l port -d 'Bastion HTTP port' -x
# config subcommands
complete -c labctl -n "__labctl_using_cmd config" -a list -d 'Show all configuration values'
complete -c labctl -n "__labctl_using_cmd config" -a get -d 'Get a configuration value'
complete -c labctl -n "__labctl_using_cmd config" -a set -d 'Set a configuration value'
complete -c labctl -n "__labctl_using_cmd config" -a path -d 'Show configuration file path'
# login options
complete -c labctl -n "__labctl_in_cmd login" -l server -d 'labd server URL' -x
# doctor options
complete -c labctl -n "__labctl_in_cmd doctor" -l json -d 'Output results as JSON'
# app subcommands
complete -c labctl -n "__labctl_using_cmd app" -a labcontroller -d 'Labcontroller deployment (bastion + labd + CockroachDB)'
complete -c labctl -n "__labctl_using_cmd app" -a k3s -d 'k3s cluster management'
# app labcontroller subcommands
complete -c labctl -n "__labctl_using_cmd app labcontroller" -a deploy -d 'Deploy labcontroller stack to a k3s node'
complete -c labctl -n "__labctl_using_cmd app labcontroller" -a status -d 'Check labcontroller deployment status (all hosts if no target)'
# app labcontroller deploy options
complete -c labctl -n "__labctl_in_cmd app labcontroller deploy" -l user -d 'SSH user' -x
complete -c labctl -n "__labctl_in_cmd app labcontroller deploy" -l port -d 'Bastion HTTP port' -x
complete -c labctl -n "__labctl_in_cmd app labcontroller deploy" -l crdb-replicas -d 'CockroachDB replicas' -x
# app labcontroller status options
complete -c labctl -n "__labctl_in_cmd app labcontroller status" -l user -d 'SSH user' -x
complete -c labctl -n "__labctl_in_cmd app labcontroller status" -l port -d 'Bastion HTTP port' -x
# app k3s subcommands
complete -c labctl -n "__labctl_using_cmd app k3s" -a install -d 'Install k3s on a target machine (hostname, IP, or MAC)'
complete -c labctl -n "__labctl_using_cmd app k3s" -a health -d 'Check k3s health (all hosts if no target given)'
complete -c labctl -n "__labctl_using_cmd app k3s" -a list -d 'List installed machines and their k3s status'
# app k3s install options
complete -c labctl -n "__labctl_in_cmd app k3s install" -l role -d 'k3s role: infra (server) or worker (agent)' -x
complete -c labctl -n "__labctl_in_cmd app k3s install" -l user -d 'SSH user' -x
complete -c labctl -n "__labctl_in_cmd app k3s install" -l port -d 'Bastion HTTP port (for resolving target)' -x
complete -c labctl -n "__labctl_in_cmd app k3s install" -l k3s-server -d 'k3s server URL (required for worker role)' -x
complete -c labctl -n "__labctl_in_cmd app k3s install" -l k3s-token -d 'k3s join token (required for worker role)' -x
# app k3s health options
complete -c labctl -n "__labctl_in_cmd app k3s health" -l user -d 'SSH user' -x
complete -c labctl -n "__labctl_in_cmd app k3s health" -l port -d 'Bastion HTTP port' -x
# app k3s list options
complete -c labctl -n "__labctl_in_cmd app k3s list" -l user -d 'SSH user' -x
complete -c labctl -n "__labctl_in_cmd app k3s list" -l port -d 'Bastion HTTP port' -x

View File

@@ -10,3 +10,4 @@ data:
DHCP_MODE: "proxy"
TIMEZONE: "Europe/London"
LOCALE: "en_GB.UTF-8"
LABD_URL: "http://labd.lab-system.svc.cluster.local:3100"

View File

@@ -7,6 +7,8 @@ metadata:
app: bastion
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: bastion
@@ -15,10 +17,18 @@ spec:
labels:
app: bastion
spec:
imagePullSecrets:
- name: gitea-registry
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
dnsConfig:
options:
- name: ndots
value: "1"
containers:
- name: bastion
image: mysources.co.uk/michal/lab-bastion:latest
image: mysources.co.uk/michal/lab/bastion:latest
imagePullPolicy: Always
command:
- node
- src/cli/dist/index.js
@@ -26,9 +36,16 @@ spec:
- bastion
- standalone
- start
- --foreground
envFrom:
- configMapRef:
name: bastion-config
env:
- name: BASTION_JOIN_TOKEN
valueFrom:
secretKeyRef:
name: bastion-join-token
key: token
ports:
- containerPort: 8080
name: http
@@ -43,17 +60,21 @@ spec:
add:
- NET_ADMIN
- NET_RAW
startupProbe:
httpGet:
path: /api/machines
port: 8080
failureThreshold: 60
periodSeconds: 10
livenessProbe:
httpGet:
path: /api/machines
port: 8080
initialDelaySeconds: 15
periodSeconds: 30
readinessProbe:
httpGet:
path: /api/machines
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
volumes:
- name: state

View File

@@ -0,0 +1,8 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: labd-config
data:
LABD_PORT: "3100"
LABD_HOST: "0.0.0.0"
LABD_LOG_LEVEL: "info"

View File

@@ -0,0 +1,44 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: labd
spec:
replicas: 1
selector:
matchLabels:
app: labd
template:
metadata:
labels:
app: labd
spec:
containers:
- name: labd
image: mysources.co.uk/michal/lab/labd:latest
imagePullPolicy: Always
ports:
- containerPort: 3100
envFrom:
- configMapRef:
name: labd-config
- secretRef:
name: labd-secrets
livenessProbe:
httpGet:
path: /health/live
port: 3100
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /health/ready
port: 3100
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi

View File

@@ -0,0 +1,18 @@
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: labd
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: labd
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

View File

@@ -0,0 +1,14 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: lab-infra
commonLabels:
app: labd
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
- hpa.yaml
- pdb.yaml

View File

@@ -0,0 +1,9 @@
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: labd
spec:
maxUnavailable: 1
selector:
matchLabels:
app: labd

View File

@@ -0,0 +1,12 @@
apiVersion: v1
kind: Service
metadata:
name: labd
spec:
type: ClusterIP
selector:
app: labd
ports:
- port: 3100
targetPort: 3100
protocol: TCP

View File

@@ -13,7 +13,14 @@
"lint": "eslint 'src/*/src/**/*.ts'",
"lint:fix": "eslint 'src/*/src/**/*.ts' --fix",
"completions:generate": "tsx scripts/generate-completions.ts --write",
"completions:check": "tsx scripts/generate-completions.ts --check"
"completions:check": "tsx scripts/generate-completions.ts --check",
"test:integration": "vitest run -c tests/integration/vitest.config.ts",
"test:integration:k3s": "vitest run -c tests/integration/vitest.config.ts -t k3s",
"test:integration:k3s:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t k3s",
"test:integration:pxe": "vitest run -c tests/integration/vitest.config.ts -t 'PXE boot'",
"test:integration:pxe:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'PXE boot'",
"test:integration:iso": "vitest run -c tests/integration/vitest.config.ts -t 'ISO boot'",
"test:integration:iso:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'ISO boot'"
},
"engines": {
"node": ">=20.0.0",

679
bastion/pnpm-lock.yaml generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
#!/bin/bash
# Build bastion container image and push to Gitea container registry
# Build bastion container image (multi-arch) and push to Gitea container registry
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
@@ -12,20 +12,28 @@ if [ -f .env ]; then
fi
# ── Argument parsing ───────────────────────────────────────────────
TARGET_ARCH=""
PUSH=false
PLATFORMS="linux/amd64,linux/arm64"
usage() {
cat <<EOF
Usage: $(basename "$0") [OPTIONS] [TAG]
Build bastion container image and optionally push to registry.
Build bastion container image (multi-arch) and optionally push to registry.
Options:
--arch ARCH Target platform: x86_64 or arm64 (default: host arch)
-h, --help Show this help message
--push Push to registry after building
--platforms LIST Comma-separated platforms (default: linux/amd64,linux/arm64)
-h, --help Show this help message
Arguments:
TAG Image tag (default: version from package.json)
TAG Image tag (default: version from package.json)
Examples:
$(basename "$0") # build multi-arch, no push
$(basename "$0") --push # build + push with version tag
$(basename "$0") --push latest # build + push as :latest
$(basename "$0") --platforms linux/amd64 # build amd64 only
EOF
exit 0
}
@@ -33,8 +41,12 @@ EOF
POSITIONAL_ARGS=()
while [[ $# -gt 0 ]]; do
case "$1" in
--arch)
TARGET_ARCH="$2"
--push)
PUSH=true
shift
;;
--platforms)
PLATFORMS="$2"
shift 2
;;
-h|--help)
@@ -47,56 +59,69 @@ while [[ $# -gt 0 ]]; do
esac
done
# Registry defaults to internal address (external proxy has body size limit)
REGISTRY="${GITEA_REGISTRY:-mysources.co.uk}"
IMAGE="lab-bastion"
REPO="michal/lab/bastion"
FULL_IMAGE="$REGISTRY/$REPO"
VERSION=$(node -p "require('./package.json').version")
TAG="${POSITIONAL_ARGS[0]:-$VERSION}"
# ── Resolve target platform ───────────────────────────────────────
detect_host_arch() {
local machine
machine="$(uname -m)"
case "$machine" in
x86_64) echo "x86_64" ;;
aarch64) echo "arm64" ;;
arm64) echo "arm64" ;;
*) echo "$machine" ;;
esac
}
echo "==> Building bastion image"
echo " Tag: $TAG"
echo " Platforms: $PLATFORMS"
echo " Registry: $FULL_IMAGE"
docker_platform_for() {
case "$1" in
x86_64) echo "linux/amd64" ;;
arm64) echo "linux/arm64" ;;
esac
}
# ── Build multi-arch manifest ────────────────────────────────────
MANIFEST="lab-bastion:$TAG"
ARCH="${TARGET_ARCH:-$(detect_host_arch)}"
PLATFORM="$(docker_platform_for "$ARCH")"
# Remove existing manifest/image with the same tag
podman manifest rm "$MANIFEST" 2>/dev/null || true
podman rmi "$MANIFEST" 2>/dev/null || true
echo "==> Building bastion image (tag: $TAG, platform: $PLATFORM)..."
podman build --platform "$PLATFORM" -t "$IMAGE:$TAG" -f stack/Dockerfile .
echo "==> Building for platforms: $PLATFORMS..."
podman build \
--platform "$PLATFORMS" \
--manifest "$MANIFEST" \
-f Dockerfile.bastion \
.
echo "==> Tagging as $REGISTRY/michal/$IMAGE:$TAG..."
podman tag "$IMAGE:$TAG" "$REGISTRY/michal/$IMAGE:$TAG"
echo "==> Build complete. Manifest:"
podman manifest inspect "$MANIFEST" | grep -E '"(architecture|os)"'
# ── Push ─────────────────────────────────────────────────────────
if [ "$PUSH" = true ]; then
if [ -z "$GITEA_TOKEN" ]; then
# Try reading from ~/.gitea-token
if [ -f "$HOME/.gitea-token" ]; then
GITEA_TOKEN="$(cat "$HOME/.gitea-token")"
else
echo "ERROR: GITEA_TOKEN not set and ~/.gitea-token not found"
exit 1
fi
fi
if [ -n "$GITEA_TOKEN" ]; then
echo "==> Logging in to $REGISTRY..."
podman login --tls-verify=false -u michal -p "$GITEA_TOKEN" "$REGISTRY"
podman login -u michal -p "$GITEA_TOKEN" "$REGISTRY"
echo "==> Pushing to $REGISTRY/michal/$IMAGE:$TAG..."
podman push --tls-verify=false "$REGISTRY/michal/$IMAGE:$TAG"
echo "==> Pushing $FULL_IMAGE:$TAG..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:$TAG"
# Ensure package is linked to the repository
# Also tag as :latest if not already
if [ "$TAG" != "latest" ]; then
echo "==> Also pushing as :latest..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:latest"
fi
# Link package to repository if script exists
if [ -f "$SCRIPT_DIR/link-package.sh" ]; then
source "$SCRIPT_DIR/link-package.sh"
link_package "container" "$IMAGE"
link_package "container" "bastion"
fi
echo "==> Pushed successfully!"
else
echo "==> GITEA_TOKEN not set, skipping push."
echo "==> Skipping push (use --push to push to registry)"
fi
echo "==> Done!"
echo " Image: $REGISTRY/michal/$IMAGE:$TAG"
echo " Platform: $PLATFORM"
echo " Image: $FULL_IMAGE:$TAG"
echo " Platforms: $PLATFORMS"

118
bastion/scripts/build-labd.sh Executable file
View File

@@ -0,0 +1,118 @@
#!/bin/bash
# Build labd container image (multi-arch) and push to Gitea container registry
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
cd "$PROJECT_ROOT"
# Load .env for GITEA_TOKEN
if [ -f .env ]; then
set -a; source .env; set +a
fi
# ── Argument parsing ───────────────────────────────────────────────
PUSH=false
PLATFORMS="linux/amd64,linux/arm64"
usage() {
cat <<EOF
Usage: $(basename "$0") [OPTIONS] [TAG]
Build labd container image (multi-arch) and optionally push to registry.
Options:
--push Push to registry after building
--platforms LIST Comma-separated platforms (default: linux/amd64,linux/arm64)
-h, --help Show this help message
Arguments:
TAG Image tag (default: version from package.json)
EOF
exit 0
}
POSITIONAL_ARGS=()
while [[ $# -gt 0 ]]; do
case "$1" in
--push)
PUSH=true
shift
;;
--platforms)
PLATFORMS="$2"
shift 2
;;
-h|--help)
usage
;;
*)
POSITIONAL_ARGS+=("$1")
shift
;;
esac
done
REGISTRY="${GITEA_REGISTRY:-mysources.co.uk}"
REPO="michal/lab/labd"
FULL_IMAGE="$REGISTRY/$REPO"
VERSION=$(node -p "require('./package.json').version")
TAG="${POSITIONAL_ARGS[0]:-$VERSION}"
echo "==> Building labd image"
echo " Tag: $TAG"
echo " Platforms: $PLATFORMS"
echo " Registry: $FULL_IMAGE"
# ── Build multi-arch manifest ────────────────────────────────────
MANIFEST="lab-labd:$TAG"
# Remove existing manifest/image with the same tag
podman manifest rm "$MANIFEST" 2>/dev/null || true
podman rmi "$MANIFEST" 2>/dev/null || true
echo "==> Building for platforms: $PLATFORMS..."
podman build \
--platform "$PLATFORMS" \
--manifest "$MANIFEST" \
-f Dockerfile.labd \
.
echo "==> Build complete. Manifest:"
podman manifest inspect "$MANIFEST" | grep -E '"(architecture|os)"'
# ── Push ─────────────────────────────────────────────────────────
if [ "$PUSH" = true ]; then
if [ -z "$GITEA_TOKEN" ]; then
if [ -f "$HOME/.gitea-token" ]; then
GITEA_TOKEN="$(cat "$HOME/.gitea-token")"
else
echo "ERROR: GITEA_TOKEN not set and ~/.gitea-token not found"
exit 1
fi
fi
echo "==> Logging in to $REGISTRY..."
podman login -u michal -p "$GITEA_TOKEN" "$REGISTRY"
echo "==> Pushing $FULL_IMAGE:$TAG..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:$TAG"
if [ "$TAG" != "latest" ]; then
echo "==> Also pushing as :latest..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:latest"
fi
if [ -f "$SCRIPT_DIR/link-package.sh" ]; then
source "$SCRIPT_DIR/link-package.sh"
link_package "container" "labd"
fi
echo "==> Pushed successfully!"
else
echo "==> Skipping push (use --push to push to registry)"
fi
echo "==> Done!"
echo " Image: $FULL_IMAGE:$TAG"
echo " Platforms: $PLATFORMS"

View File

@@ -154,8 +154,8 @@ function generateFish(root: CmdInfo): string {
const allCmds = collectCommands(root);
// Helper function for fish: test if exactly the given subcommand chain is present
emit('# Helper: test if a subcommand chain is active');
// Helper: test if EXACTLY the given subcommand chain is present (for subcommand suggestions)
emit('# Helper: test if exactly a subcommand chain is active (no extra positional args)');
emit(`function __${BIN}_using_cmd`);
emit(' set -l tokens (commandline -opc)');
emit(' set -l expected $argv');
@@ -181,6 +181,65 @@ function generateFish(root: CmdInfo): string {
emit('end');
emit('');
// Helper: test if command chain STARTS WITH the given prefix (for options that apply after args)
emit('# Helper: test if command starts with a subcommand chain (options still apply after args)');
emit(`function __${BIN}_in_cmd`);
emit(' set -l tokens (commandline -opc)');
emit(' set -l expected $argv');
emit(' set -l depth (count $expected)');
emit(' set -l found 0');
emit(' for tok in $tokens[2..]');
emit(' if string match -q -- "-*" $tok');
emit(' continue');
emit(' end');
emit(' set found (math $found + 1)');
emit(' if test $found -le $depth');
emit(' if test "$tok" != "$expected[$found]"');
emit(' return 1');
emit(' end');
emit(' end');
emit(' end');
emit(' test $found -ge $depth');
emit('end');
emit('');
// Dynamic completions: fetch machine data from bastion API
emit('# Dynamic: fetch machine hostnames from bastion (installed + queued)');
emit(`function __${BIN}_installed_hosts`);
emit(' curl -s http://localhost:8080/api/machines 2>/dev/null | ');
emit(" python3 -c 'import sys,json; d=json.load(sys.stdin); hosts=[v.get(\"hostname\",\"\") for v in {**d.get(\"install_queue\",{}), **d.get(\"installed\",{})}.values() if v.get(\"hostname\")]; [print(h) for h in set(hosts)]' 2>/dev/null");
emit('end');
emit('');
emit('# Dynamic: fetch all known MAC addresses (discovered + queue + installed)');
emit(`function __${BIN}_known_macs`);
emit(' curl -s http://localhost:8080/api/machines 2>/dev/null | ');
emit(" python3 -c 'import sys,json; d=json.load(sys.stdin); [print(k) for k in {**d.get(\"discovered\",{}), **d.get(\"install_queue\",{}), **d.get(\"installed\",{})}]' 2>/dev/null");
emit('end');
emit('');
emit('# Dynamic: fetch hostnames and MACs from all states');
emit(`function __${BIN}_hosts_and_macs`);
emit(' curl -s http://localhost:8080/api/machines 2>/dev/null | ');
emit(" python3 -c 'import sys,json; d=json.load(sys.stdin); a={**d.get(\"discovered\",{}), **d.get(\"install_queue\",{}), **d.get(\"installed\",{})}; macs=list(a.keys()); hosts=[v.get(\"hostname\",\"\") for v in {**d.get(\"install_queue\",{}), **d.get(\"installed\",{})}.values() if v.get(\"hostname\")]; [print(x) for x in set(macs+hosts)]' 2>/dev/null");
emit('end');
emit('');
// Target completions for commands that accept hostname/IP/MAC
emit('# Target argument completions');
// app k3s — takes hostname/IP
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd app k3s install" -a "(__${BIN}_installed_hosts)" -d 'installed host'`);
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd app k3s health" -a "(__${BIN}_installed_hosts)" -d 'installed host'`);
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd app labcontroller deploy" -a "(__${BIN}_installed_hosts)" -d 'installed host'`);
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd app labcontroller status" -a "(__${BIN}_installed_hosts)" -d 'installed host'`);
// provision install — takes MAC then hostname
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd provision install" -a "(__${BIN}_known_macs)" -d 'MAC address'`);
// provision reprovision/forget/logs — takes MAC or hostname
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd provision reprovision" -a "(__${BIN}_hosts_and_macs)" -d 'host or MAC'`);
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd provision forget" -a "(__${BIN}_hosts_and_macs)" -d 'host or MAC'`);
emit(`complete -c ${BIN} -n "__${BIN}_using_cmd provision logs" -a "(__${BIN}_hosts_and_macs)" -d 'host or MAC'`);
emit('');
// Top-level commands
const topCmds = root.subcommands.filter((c) => !c.hidden);
emit('# Top-level commands');
@@ -204,9 +263,9 @@ function generateFish(root: CmdInfo): string {
emit('');
}
// Options for this command
// Options for this command (use __in_cmd so options complete even after positional args)
if (cmd.options.length > 0) {
const condition = `__${BIN}_using_cmd ${path.join(' ')}`;
const condition = `__${BIN}_in_cmd ${path.join(' ')}`;
emit(`# ${path.join(' ')} options`);
for (const opt of cmd.options) {
const parts = [`complete -c ${BIN} -n "${condition}"`];

View File

@@ -0,0 +1,71 @@
#!/bin/bash
# Run integration tests inside a Node container with access to host libvirt.
#
# Usage: sudo ./scripts/test-integration.sh [vitest args...]
# Example: sudo ./scripts/test-integration.sh -t k3s
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
# Detect real user (even when running via sudo)
REAL_USER="${SUDO_USER:-$(whoami)}"
REAL_HOME="/home/${REAL_USER}"
echo "==> Running integration tests in container"
echo " Project: ${PROJECT_ROOT}"
echo " User: ${REAL_USER}"
echo " SSH key: ${REAL_HOME}/.ssh/"
echo ""
# Check prerequisites
if ! command -v podman &>/dev/null && ! command -v docker &>/dev/null; then
echo "ERROR: podman or docker required"
exit 1
fi
RUNTIME="podman"
if ! command -v podman &>/dev/null; then
RUNTIME="docker"
fi
# Check libvirt socket
if [ ! -S /var/run/libvirt/libvirt-sock ]; then
echo "ERROR: libvirt socket not found at /var/run/libvirt/libvirt-sock"
echo " Is libvirtd running? Try: sudo systemctl start libvirtd"
exit 1
fi
# Create a temp dir for cloud-init artifacts (avoids SELinux /tmp relabel)
WORK_TMP="/var/tmp/lab-integration-$$"
mkdir -p "${WORK_TMP}"
trap "rm -rf ${WORK_TMP}" EXIT
exec $RUNTIME run --rm \
--name lab-integration-test \
--privileged \
--security-opt label=disable \
--network=host \
-v "${PROJECT_ROOT}:${PROJECT_ROOT}" \
-v "${REAL_HOME}/.ssh:${REAL_HOME}/.ssh:ro" \
-v "/var/run/libvirt/libvirt-sock:/var/run/libvirt/libvirt-sock" \
-v "/var/lib/libvirt/images:/var/lib/libvirt/images" \
-v "${WORK_TMP}:/tmp/lab-integration-tests" \
-w "${PROJECT_ROOT}" \
-e "SSH_KEY_PATH=${REAL_HOME}/.ssh/id_rsa" \
-e "HOME=${REAL_HOME}" \
node:22-bookworm \
bash -c "
# Install system deps for libvirt client + cloud-init ISO creation
apt-get update -qq && apt-get install -y -qq libvirt-clients virtinst genisoimage openssh-client qemu-utils sudo >/dev/null 2>&1
# Install pnpm
corepack enable && corepack prepare pnpm@9 --activate >/dev/null 2>&1
echo '==> Installing project dependencies...'
pnpm install --frozen-lockfile 2>/dev/null
echo '==> Running integration tests...'
echo ''
pnpm run test:integration $*
"

131
bastion/scripts/test-provision.sh Executable file
View File

@@ -0,0 +1,131 @@
#!/bin/bash
# Run PXE and/or ISO boot integration tests.
#
# Usage:
# sudo ./scripts/test-provision.sh # run both PXE + ISO tests
# sudo ./scripts/test-provision.sh pxe # PXE only
# sudo ./scripts/test-provision.sh iso # ISO only
#
# Prerequisites:
# libvirtd, OVMF (edk2-ovmf), iPXE (ipxe-bootimgs-x86),
# dnsmasq, xorriso, mtools, virt-install, qemu-img
set -e
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(dirname "$SCRIPT_DIR")"
cd "$PROJECT_ROOT"
# Detect real user for SSH keys
REAL_USER="${SUDO_USER:-$(whoami)}"
REAL_HOME=$(getent passwd "$REAL_USER" | cut -d: -f6)
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BOLD='\033[1m'
RESET='\033[0m'
echo ""
echo -e "${BOLD}Lab Bastion -- Provision Integration Tests${RESET}"
echo "==========================================="
echo ""
# --- Prerequisite checks ---
MISSING=""
for cmd in virsh virt-install qemu-img dnsmasq xorriso mformat mcopy curl; do
if ! command -v "$cmd" &>/dev/null; then
MISSING="$MISSING $cmd"
fi
done
if [ -n "$MISSING" ]; then
echo -e "${RED}Missing tools:${RESET}$MISSING"
echo "Install: sudo dnf install libvirt virt-install qemu-img dnsmasq xorriso mtools curl"
exit 1
fi
if ! systemctl is-active libvirtd &>/dev/null; then
echo -e "${RED}libvirtd not running.${RESET} Start with: sudo systemctl start libvirtd"
exit 1
fi
if [ ! -f /usr/share/edk2/ovmf/OVMF_CODE.fd ]; then
echo -e "${RED}OVMF firmware not found.${RESET} Install: sudo dnf install edk2-ovmf"
exit 1
fi
IPXE_EFI=""
for f in /usr/share/ipxe/ipxe-snponly-x86_64.efi /usr/share/ipxe/ipxe-snp-x86_64.efi /usr/share/ipxe/ipxe-x86_64.efi; do
[ -f "$f" ] && IPXE_EFI="$f" && break
done
if [ -z "$IPXE_EFI" ]; then
echo -e "${RED}iPXE EFI binary not found.${RESET} Install: sudo dnf install ipxe-bootimgs-x86"
exit 1
fi
# Find SSH key
SSH_KEY=""
for name in id_ed25519 id_ecdsa id_rsa; do
if [ -f "$REAL_HOME/.ssh/$name" ] && [ -f "$REAL_HOME/.ssh/$name.pub" ]; then
SSH_KEY="$REAL_HOME/.ssh/$name"
break
fi
done
if [ -z "$SSH_KEY" ]; then
echo -e "${RED}No SSH key found in $REAL_HOME/.ssh/${RESET}"
exit 1
fi
echo -e " User: ${BOLD}$REAL_USER${RESET}"
echo -e " SSH key: ${BOLD}$SSH_KEY${RESET}"
echo -e " iPXE: ${BOLD}$IPXE_EFI${RESET}"
echo ""
# --- Determine which tests to run ---
MODE="${1:-both}"
run_test() {
local name="$1" pattern="$2"
echo ""
echo -e "${YELLOW}━━━ Running $name test ━━━${RESET}"
echo ""
if SSH_KEY_PATH="$SSH_KEY" HOME="$REAL_HOME" \
npx vitest run -c tests/integration/vitest.config.ts -t "$pattern" 2>&1; then
echo ""
echo -e "${GREEN}$name test passed${RESET}"
return 0
else
echo ""
echo -e "${RED}$name test failed${RESET}"
return 1
fi
}
FAILED=0
case "$MODE" in
pxe)
run_test "PXE boot" "PXE boot" || FAILED=1
;;
iso)
run_test "ISO boot" "ISO boot" || FAILED=1
;;
both|all)
run_test "PXE boot" "PXE boot" || FAILED=1
run_test "ISO boot" "ISO boot" || FAILED=1
;;
*)
echo "Usage: $0 [pxe|iso|both]"
exit 1
;;
esac
echo ""
if [ "$FAILED" -eq 0 ]; then
echo -e "${GREEN}${BOLD}All provision tests passed.${RESET}"
else
echo -e "${RED}${BOLD}Some tests failed.${RESET}"
exit 1
fi

View File

@@ -9,6 +9,10 @@
".": {
"import": "./dist/main.js",
"types": "./dist/main.d.ts"
},
"./iso-builder": {
"import": "./dist/services/iso-builder.js",
"types": "./dist/services/iso-builder.d.ts"
}
},
"scripts": {
@@ -20,12 +24,15 @@
},
"dependencies": {
"@fastify/static": "^8.0.0",
"@lab/modules": "workspace:*",
"@lab/shared": "workspace:*",
"execa": "^9.5.0",
"fastify": "^5.0.0",
"winston": "^3.17.0"
"winston": "^3.17.0",
"ws": "^8.19.0"
},
"devDependencies": {
"@types/node": "^22.10.0"
"@types/node": "^22.10.0",
"@types/ws": "^8.18.0"
}
}

View File

@@ -14,6 +14,10 @@ export function loadConfig(overrides: Partial<BastionConfig> = {}): BastionConfi
const dhcpRangeStart = overrides.dhcpRangeStart ?? process.env["DHCP_RANGE_START"] ?? "";
const dhcpRangeEnd = overrides.dhcpRangeEnd ?? process.env["DHCP_RANGE_END"] ?? "";
const ubuntuVersion = overrides.ubuntuVersion ?? process.env["UBUNTU_VERSION"] ?? "26.04";
const ubuntuMirror = overrides.ubuntuMirror ?? process.env["UBUNTU_MIRROR"]
?? `https://releases.ubuntu.com/${ubuntuVersion}`;
const fedoraMirror = `https://download.fedoraproject.org/pub/fedora/linux/releases/${fedoraVersion}/Everything/${arch}/os`;
const tftpDir = `${bastionDir}/tftp`;
const httpDir = `${bastionDir}/http`;
@@ -30,6 +34,8 @@ export function loadConfig(overrides: Partial<BastionConfig> = {}): BastionConfi
dhcpMode,
dhcpRangeStart,
dhcpRangeEnd,
ubuntuVersion,
ubuntuMirror,
// These are populated at runtime by the network service
iface: overrides.iface ?? "",
serverIp: overrides.serverIp ?? "",
@@ -39,6 +45,8 @@ export function loadConfig(overrides: Partial<BastionConfig> = {}): BastionConfi
adminUser: overrides.adminUser ?? "",
skipDnsmasq: overrides.skipDnsmasq,
skipArtifacts: overrides.skipArtifacts,
labdUrl: overrides.labdUrl ?? process.env["LABD_URL"],
bastionJoinToken: overrides.bastionJoinToken ?? process.env["BASTION_JOIN_TOKEN"],
fedoraMirror,
tftpDir,
httpDir,

View File

@@ -11,6 +11,9 @@ import { startDnsmasq, stopDnsmasq, generateDnsmasqConf } from "./services/dnsma
import { generateDiscoverKickstart } from "./services/kickstart-generator.js";
import { renderBootIpxe } from "./templates/boot.ipxe.js";
import { logger } from "./services/logger.js";
import { BastionConnection } from "./services/labd-connection.js";
import { progressBus } from "./services/progress-events.js";
import { ensureBootIso } from "./routes/boot-iso.js";
function copyIfMissing(src: string, dest: string, label: string): void {
if (existsSync(dest)) {
@@ -91,11 +94,9 @@ export async function startBastion(overrides: Partial<BastionConfig> = {}): Prom
let config = loadConfig(overrides);
config = populateNetworkConfig(config);
// PID file management: kill old instance if running
// Bastion needs root for dnsmasq (DHCP port 67)
if (!config.skipDnsmasq && process.getuid?.() !== 0) {
logger.error("Must run as root (dnsmasq needs DHCP/TFTP ports). Use: sudo labctl init bastion standalone start");
process.exit(1);
throw new Error("Must run as root (dnsmasq needs DHCP/TFTP ports). Use: sudo labctl init bastion standalone start");
}
mkdirSync(config.bastionDir, { recursive: true, mode: 0o755 });
@@ -164,6 +165,23 @@ export async function startBastion(overrides: Partial<BastionConfig> = {}): Prom
"Fedora initrd",
);
// Ubuntu netboot artifacts (non-fatal — Ubuntu version may not be released yet)
try {
logger.info(`Preparing Ubuntu ${config.ubuntuVersion} netboot artifacts...`);
download(
`${config.ubuntuMirror}/casper/vmlinuz`,
`${config.httpDir}/ubuntu-vmlinuz`,
"Ubuntu kernel",
);
download(
`${config.ubuntuMirror}/casper/initrd`,
`${config.httpDir}/ubuntu-initrd`,
"Ubuntu initrd",
);
} catch {
logger.warn(`Ubuntu ${config.ubuntuVersion} artifacts not available -- Ubuntu provisioning disabled`);
}
// Symlink iPXE binaries into HTTP dir for UEFI HTTP Boot
for (const name of ["ipxe.efi", "ipxe-arm64.efi"]) {
const src = `${config.tftpDir}/${name}`;
@@ -172,6 +190,13 @@ export async function startBastion(overrides: Partial<BastionConfig> = {}): Prom
symlinkSafe(src, dest);
}
}
// Generate boot ISO (served as static file for Range request support)
try {
ensureBootIso(config);
} catch (err) {
logger.warn(`Boot ISO generation failed: ${err instanceof Error ? err.message : String(err)}`);
}
} else {
logger.info("Skipping boot artifacts (--skip-artifacts)");
}
@@ -196,7 +221,7 @@ export async function startBastion(overrides: Partial<BastionConfig> = {}): Prom
}
// Start HTTP server
const { app } = createApp(config);
const { app, state } = createApp(config);
await app.listen({ port: config.httpPort, host: "0.0.0.0" });
logger.info(`HTTP server listening on :${config.httpPort}`);
@@ -220,12 +245,72 @@ export async function startBastion(overrides: Partial<BastionConfig> = {}): Prom
logger.info("Skipping dnsmasq (--skip-dnsmasq)");
}
// Connect to labd if configured (otherwise run standalone)
let labdConn: BastionConnection | null = null;
if (config.labdUrl) {
labdConn = new BastionConnection(config, () => state.load());
// Wire up command handlers so labd can send install/forget/role commands
labdConn.onCommand("command-install", async (msg) => {
if (msg.type !== "command-install") throw new Error("unexpected");
state.update((s) => {
s.install_queue[msg.mac] = {
hostname: msg.hostname,
disk: msg.disk ?? "/dev/sda",
role: msg.role as import("@lab/shared").Role,
os: msg.os as import("@lab/shared").OsId,
queued_at: new Date().toISOString(),
};
});
return { status: "ok", data: { mac: msg.mac, hostname: msg.hostname } };
});
labdConn.onCommand("command-forget", async (msg) => {
if (msg.type !== "command-forget") throw new Error("unexpected");
const mac = msg.mac.toLowerCase();
state.update((s) => {
delete s.discovered[mac];
delete s.install_queue[mac];
delete s.installed[mac];
});
return { status: "ok", data: { mac } };
});
labdConn.onCommand("command-role-update", async (msg) => {
if (msg.type !== "command-role-update") throw new Error("unexpected");
const mac = msg.mac.toLowerCase();
const current = state.load();
if (!current.installed[mac]) {
return { status: "error", error: `MAC ${mac} not found in installed machines` };
}
state.update((s) => {
const inst = s.installed[mac];
if (inst) inst.role = msg.role;
});
return { status: "ok", data: { mac, role: msg.role } };
});
// Push state to labd on every local state change
state.onChange(() => labdConn?.syncState());
// Forward progress events (stages only, not raw log lines) to labd
progressBus.on((event) => {
if (event.stage !== "log") {
labdConn?.sendProgress(event.mac, event.stage, event.detail);
}
});
labdConn.connect();
logger.info(`Registering with labd at ${config.labdUrl}`);
}
// Print banner
printBanner(config);
// Graceful shutdown
const shutdown = async (): Promise<void> => {
logger.info("Shutting down...");
if (labdConn) labdConn.close();
if (config.skipDnsmasq !== true) stopDnsmasq();
closeFirewall(config);
await app.close();

View File

@@ -5,13 +5,19 @@
// /api/discover - receive hardware discovery reports from PXE-booted machines
import type { FastifyInstance } from "fastify";
import type { HardwareInfo, InstalledInfo } from "@lab/shared";
import type { HardwareInfo, InstalledInfo, Role } from "@lab/shared";
import { isValidOsId, SUPPORTED_ROLES } from "@lab/shared";
import type { StateManager } from "../services/state.js";
import { logger } from "../services/logger.js";
import { triggerPostProvisionK3s } from "../services/post-provision.js";
import { progressBus } from "../services/progress-events.js";
import type { ProgressEvent } from "../services/progress-events.js";
import type { InstallLogBuffer } from "../services/install-log.js";
export function registerApiRoutes(
app: FastifyInstance,
state: StateManager,
installLog: InstallLogBuffer,
): void {
// List all machines
app.get("/api/machines", async (_request, reply) => {
@@ -25,9 +31,10 @@ export function registerApiRoutes(
hostname?: string;
disk?: string;
role?: string;
os?: string;
};
}>("/api/install", async (request, reply) => {
const { mac: rawMac, hostname, disk, role } = request.body ?? {};
const { mac: rawMac, hostname, disk, role, os } = request.body ?? {};
const mac = (rawMac ?? "").toLowerCase().replace(/-/g, ":");
if (mac === "") {
@@ -35,27 +42,34 @@ export function registerApiRoutes(
}
const validRole = role ?? "worker";
if (validRole !== "worker" && validRole !== "infra") {
return reply.status(400).send({ error: "role must be 'worker' or 'infra'" });
if (!(SUPPORTED_ROLES as readonly string[]).includes(validRole)) {
return reply.status(400).send({ error: `invalid role: '${validRole}'. Supported: ${SUPPORTED_ROLES.join(", ")}` });
}
const osId = os ?? "fedora-43";
if (!isValidOsId(osId)) {
return reply.status(400).send({ error: `invalid os: '${osId}'. Supported: fedora-43, ubuntu-26.04` });
}
state.update((s) => {
s.install_queue[mac] = {
hostname: hostname ?? "lab-node",
disk: disk ?? "",
role: validRole as "worker" | "infra",
role: validRole as Role,
os: osId,
queued_at: new Date().toISOString(),
};
});
logger.info(`INSTALL QUEUED: ${mac} -> hostname=${hostname ?? "lab-node"} role=${validRole}`);
logger.info(`INSTALL QUEUED: ${mac} -> hostname=${hostname ?? "lab-node"} role=${validRole} os=${osId}`);
return reply.send({
status: "queued",
mac,
hostname: hostname ?? "lab-node",
role: validRole,
message: `PXE boot the machine to start installation (role=${validRole})`,
os: osId,
message: `PXE boot the machine to start installation (role=${validRole}, os=${osId})`,
});
});
@@ -85,6 +99,13 @@ export function registerApiRoutes(
const color = stageName === "complete" ? GREEN : stageName === "error" ? RED : YELLOW;
console.log(` ${color}${icon}${RESET} ${mac} ${BOLD}${stageName}${RESET}${detailStr ? ` -- ${detailStr}` : ""}`);
// Emit progress event for SSE clients
const hostname = state.load().install_queue[mac]?.hostname ?? mac;
progressBus.emit({
mac, hostname, stage: stageName, detail: detailStr,
timestamp: new Date().toISOString(),
});
state.update((s) => {
const queueEntry = s.install_queue[mac];
if (queueEntry) {
@@ -94,6 +115,14 @@ export function registerApiRoutes(
queueEntry.progress_detail = detailStr;
}
// Append to progress log history
if (!queueEntry.log) queueEntry.log = [];
queueEntry.log.push({
stage: stageName,
detail: detailStr,
timestamp: new Date().toISOString(),
});
// Move to installed on completion
if (stageName === "complete") {
const cfg = s.install_queue[mac];
@@ -106,14 +135,19 @@ export function registerApiRoutes(
const installedInfo: InstalledInfo = {
hostname: cfg?.hostname ?? "?",
role: cfg?.role ?? "?",
...(cfg?.os !== undefined ? { os: cfg.os } : {}),
ip,
installed_at: new Date().toISOString(),
};
s.installed[mac] = installedInfo;
const installedRole = state.load().installed[mac]?.role;
const admin = installedRole !== undefined && installedRole !== "" ? "michal" : "root";
const admin = installedInfo.role !== "vanilla" && installedInfo.role !== "" ? "michal" : "root";
console.log(`\n \x1b[0;32m\x1b[1m ssh ${admin}@${ip}\x1b[0m\n`); // eslint-disable-line no-console
// Auto-install k3s for non-vanilla roles
if (installedInfo.role !== "vanilla" && ip !== "") {
void triggerPostProvisionK3s(installedInfo.hostname, ip, installedInfo.role, admin, mac);
}
}
}
});
@@ -121,6 +155,40 @@ export function registerApiRoutes(
return reply.send({ status: "ok" });
});
// Receive raw log lines from kickstart scripts
app.post<{
Body: {
mac?: string;
line?: string;
lines?: string[];
tail?: string;
};
}>("/api/log", async (request, reply) => {
const { mac: rawMac, line, lines: rawLines, tail } = request.body ?? {};
const mac = (rawMac ?? "unknown").toLowerCase();
// Collect all lines from the various input formats
const allLines: string[] = [];
if (line) allLines.push(line);
if (rawLines) allLines.push(...rawLines);
if (tail) {
// tail is a string with escaped \n — split it into lines
allLines.push(...tail.split("\\n").filter(Boolean));
}
if (allLines.length === 0) {
return reply.send({ status: "ok", lines: 0 });
}
// Look up hostname from install queue for enriching events
const hostname = state.load().install_queue[mac]?.hostname ?? mac;
// Append to the install log buffer (this also emits to progressBus)
installLog.append(mac, allLines, hostname);
return reply.send({ status: "ok", lines: allLines.length });
});
// Delete a machine from all state
app.delete<{
Params: { mac: string };
@@ -209,4 +277,125 @@ export function registerApiRoutes(
return reply.send({ status: "ok", mac, new: isNew });
});
// Update a machine's role (e.g. promote infra -> labcontroller)
app.post<{
Body: {
mac?: string;
role?: string;
};
}>("/api/role", async (request, reply) => {
const { mac: rawMac, role } = request.body ?? {};
const mac = (rawMac ?? "").toLowerCase().replace(/-/g, ":");
if (mac === "") {
return reply.status(400).send({ error: "mac is required" });
}
if (!role) {
return reply.status(400).send({ error: "role is required" });
}
let found = false;
state.update((s) => {
if (s.installed[mac]) {
const oldRole = s.installed[mac].role;
s.installed[mac].role = role;
found = true;
logger.info(`ROLE UPDATED: ${mac} (${s.installed[mac].hostname}) ${oldRole} -> ${role}`);
}
});
if (!found) {
return reply.status(404).send({ error: "machine not found in installed state", mac });
}
return reply.send({ status: "updated", mac, role });
});
// Get provision logs for a machine (current state snapshot + raw log lines)
app.get<{
Params: { mac: string };
Querystring: { lines?: string; offset?: string };
}>("/api/logs/:mac", async (request, reply) => {
const mac = request.params.mac.toLowerCase().replace(/-/g, ":");
const logLimit = parseInt(request.query.lines ?? "200", 10);
const logOffset = parseInt(request.query.offset ?? "0", 10);
const currentState = state.load();
const queueEntry = currentState.install_queue[mac];
const installedEntry = currentState.installed[mac];
if (queueEntry) {
return reply.send({
mac,
hostname: queueEntry.hostname,
status: "installing",
progress: queueEntry.progress ?? "queued",
progress_detail: queueEntry.progress_detail ?? "",
progress_at: queueEntry.progress_at ?? queueEntry.queued_at,
role: queueEntry.role,
os: queueEntry.os,
stages: queueEntry.log ?? [],
log_lines: installLog.getLines(mac, logOffset, logLimit),
log_total: installLog.lineCount(mac),
});
}
if (installedEntry) {
return reply.send({
mac,
hostname: installedEntry.hostname,
status: "installed",
progress: "complete",
progress_detail: `ready at ${installedEntry.ip}`,
progress_at: installedEntry.installed_at,
role: installedEntry.role,
ip: installedEntry.ip,
log_lines: installLog.getLines(mac, logOffset, logLimit),
log_total: installLog.lineCount(mac),
});
}
return reply.status(404).send({ error: "machine not found", mac });
});
// SSE stream: follow provision progress for a machine (or all machines)
app.get<{
Params: { mac: string };
}>("/api/logs/:mac/follow", async (request, reply) => {
const filterMac = request.params.mac === "all"
? null
: request.params.mac.toLowerCase().replace(/-/g, ":");
void reply.raw.writeHead(200, {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
});
// Send current state as first event
const currentState = state.load();
const queueEntry = filterMac ? currentState.install_queue[filterMac] : undefined;
if (queueEntry) {
const initData = JSON.stringify({
mac: filterMac, hostname: queueEntry.hostname,
stage: queueEntry.progress ?? "queued",
detail: queueEntry.progress_detail ?? "",
timestamp: queueEntry.progress_at ?? queueEntry.queued_at,
});
reply.raw.write(`data: ${initData}\n\n`);
}
const onProgress = (event: ProgressEvent): void => {
if (filterMac && event.mac !== filterMac) return;
// Use SSE event types so clients can filter: "stage" for progress, "log" for raw lines
const eventType = event.stage === "log" ? "log" : "stage";
reply.raw.write(`event: ${eventType}\ndata: ${JSON.stringify(event)}\n\n`);
};
progressBus.on(onProgress);
request.raw.on("close", () => {
progressBus.off(onProgress);
});
});
}

View File

@@ -0,0 +1,249 @@
// Boot ISO generation.
// Generates a UEFI-bootable iPXE ISO using xorriso+mtools.
// The ISO is placed in httpDir so @fastify/static serves it with Range request
// support (required by JetKVM, which streams via HTTP Range + NBD).
//
// The ISO embeds kernel + initrd so machines without UEFI NIC support
// (no SNP protocol) can still boot. iPXE loads them from file:/ and the
// Linux kernel handles networking with its own drivers.
import { createHash } from "node:crypto";
import { execSync } from "node:child_process";
import { existsSync, readFileSync, statSync, writeFileSync, mkdirSync, rmSync, unlinkSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
import type { BastionConfig } from "@lab/shared";
import { logger } from "../services/logger.js";
// iPXE SNP variant (scans all UEFI SNP handles, works from CD-ROM/USB boot).
const IPXE_ISO_PATHS: Record<string, { src: string[]; efiName: string }> = {
x86_64: {
src: [
"/usr/share/ipxe/ipxe-snp-x86_64.efi",
"/usr/share/ipxe/ipxe-x86_64.efi",
],
efiName: "BOOTX64.EFI",
},
aarch64: {
src: [
"/usr/share/ipxe/arm64-efi/ipxe-snp.efi",
"/usr/share/ipxe/arm64-efi/ipxe.efi",
],
efiName: "BOOTAA64.EFI",
},
};
// Fedora PXE kernel/initrd paths per architecture
const FEDORA_MIRROR_BASE = "https://download.fedoraproject.org/pub/fedora/linux/releases";
interface BootPayload {
arch: string;
vmlinuz: string;
initrd: string;
}
function downloadIfMissing(url: string, dest: string, label: string): void {
if (existsSync(dest)) {
logger.info(` ${label} -- cached`);
return;
}
logger.info(` ${label} -- downloading...`);
execSync(`curl -# -L -f -o "${dest}" "${url}"`, { stdio: "inherit" });
}
function generateIso(config: BastionConfig, outputPath: string): void {
const work = join(tmpdir(), `bastion-iso-${process.pid}`);
mkdirSync(join(work, "EFI", "BOOT"), { recursive: true });
const bastionUrl = `http://${config.serverIp}:${config.httpPort}`;
// Copy available iPXE EFI binaries
const archs: string[] = [];
for (const [arch, paths] of Object.entries(IPXE_ISO_PATHS)) {
const srcFile = paths.src.find((s) => existsSync(s));
if (srcFile) {
execSync(`cp "${srcFile}" "${join(work, "EFI", "BOOT", paths.efiName)}"`, { stdio: "pipe" });
archs.push(arch);
logger.info(` iPXE ISO ${arch}: ${srcFile}`);
}
}
if (archs.length === 0) throw new Error("No iPXE EFI binaries found");
// Download and stage kernel/initrd for each architecture.
// These are embedded in the ISO so machines without UEFI NIC support
// can boot the Linux installer (which has its own NIC drivers).
const cacheDir = join(config.bastionDir, "iso-cache");
mkdirSync(cacheDir, { recursive: true });
const payloads: BootPayload[] = [];
for (const arch of ["x86_64", "aarch64"]) {
const mirror = `${FEDORA_MIRROR_BASE}/${config.fedoraVersion}/Everything/${arch}/os`;
const vmlinuzCache = join(cacheDir, `vmlinuz-${arch}`);
const initrdCache = join(cacheDir, `initrd-${arch}`);
try {
downloadIfMissing(
`${mirror}/images/pxeboot/vmlinuz`,
vmlinuzCache,
`Fedora ${arch} kernel`,
);
downloadIfMissing(
`${mirror}/images/pxeboot/initrd.img`,
initrdCache,
`Fedora ${arch} initrd`,
);
payloads.push({ arch, vmlinuz: vmlinuzCache, initrd: initrdCache });
} catch {
logger.warn(` Fedora ${arch} kernel/initrd not available -- skipping`);
}
}
// Write iPXE autoexec script.
// Strategy: try DHCP (for machines with UEFI NIC support), then fall back
// to booting the embedded kernel/initrd from the ISO filesystem.
// iPXE's ${buildarch} resolves to "x86_64" or "arm64".
const ipxeScript = [
"#!ipxe",
"",
"echo",
"echo =============================================",
"echo Lab PXE Bastion -- ISO Boot",
"echo =============================================",
"echo",
"",
"# Try DHCP (works if UEFI has NIC driver / SNP support)",
"set attempts:int32 0",
":retry",
"dhcp && goto netboot ||",
"inc attempts",
"iseq ${attempts} 3 || goto retry_wait",
"goto localboot",
":retry_wait",
"echo DHCP failed (attempt ${attempts}/3), retrying...",
"sleep 2",
"goto retry",
"",
"# Network available -- chain to bastion for dynamic dispatch",
":netboot",
"echo Network OK. Chaining to bastion...",
`chain ${bastionUrl}/boot.ipxe || shell`,
"",
"# No network -- boot embedded kernel (Linux has its own NIC drivers)",
":localboot",
"echo No UEFI network support. Booting embedded installer...",
"echo Linux will configure networking with its own drivers.",
"echo",
"# Map iPXE arch names to Fedora mirror paths (arm64 -> aarch64)",
"set fedarch ${buildarch}",
"iseq ${buildarch} arm64 && set fedarch aarch64 ||",
`kernel file:/vmlinuz-\${buildarch} inst.ks=${bastionUrl}/discover.ks inst.repo=${FEDORA_MIRROR_BASE}/${config.fedoraVersion}/Everything/\${fedarch}/os inst.text || goto no_kernel`,
`initrd file:/initrd-\${buildarch} || goto no_kernel`,
"boot || shell",
"",
":no_kernel",
"echo ERROR: kernel not found for this architecture. Dropping to shell.",
"shell",
].join("\n");
writeFileSync(join(work, "autoexec.ipxe"), ipxeScript);
// Calculate EFI partition size: iPXE binaries + autoexec + kernel/initrd + margin
let payloadSize = 2 * 1024 * 1024; // 2MB base for iPXE + autoexec + FAT overhead
for (const p of payloads) {
payloadSize += statSync(p.vmlinuz).size;
payloadSize += statSync(p.initrd).size;
}
const efiSizeMB = Math.ceil(payloadSize / (1024 * 1024)) + 4; // +4MB margin
logger.info(` EFI partition: ${efiSizeMB}MB (${payloads.length} arch payloads)`);
// Create FAT EFI system partition
const efiImg = join(work, "efi.img");
execSync(`dd if=/dev/zero of="${efiImg}" bs=1M count=${efiSizeMB} 2>/dev/null`, { stdio: "pipe" });
execSync(`mformat -i "${efiImg}" -v LABBOOT ::`, { stdio: "pipe" });
execSync(`mmd -i "${efiImg}" ::/EFI`, { stdio: "pipe" });
execSync(`mmd -i "${efiImg}" ::/EFI/BOOT`, { stdio: "pipe" });
for (const arch of archs) {
const paths = IPXE_ISO_PATHS[arch]!;
execSync(`mcopy -i "${efiImg}" "${join(work, "EFI", "BOOT", paths.efiName)}" ::/EFI/BOOT/${paths.efiName}`, { stdio: "pipe" });
}
execSync(`mcopy -i "${efiImg}" "${join(work, "autoexec.ipxe")}" ::/autoexec.ipxe`, { stdio: "pipe" });
// Copy kernel/initrd onto EFI partition with arch-specific names
for (const p of payloads) {
// iPXE ${buildarch} returns "x86_64" or "arm64"
const archLabel = p.arch === "aarch64" ? "arm64" : p.arch;
execSync(`mcopy -i "${efiImg}" "${p.vmlinuz}" ::/vmlinuz-${archLabel}`, { stdio: "pipe" });
execSync(`mcopy -i "${efiImg}" "${p.initrd}" ::/initrd-${archLabel}`, { stdio: "pipe" });
logger.info(` Embedded ${archLabel}: vmlinuz + initrd`);
}
// Build hybrid ISO: El Torito EFI boot + GPT EFI partition
execSync([
`xorriso -as mkisofs`,
`-o "${outputPath}"`,
`-R`,
`-V LAB_BOOT`,
`-e efi.img`,
`-no-emul-boot`,
`-partition_offset 16`,
`-append_partition 2 0xEF "${efiImg}"`,
`-appended_part_as_gpt`,
`"${work}"`,
].join(" "), { stdio: "pipe" });
rmSync(work, { recursive: true, force: true });
logger.info(`Generated boot ISO (${archs.join(", ")}): ${outputPath}`);
}
/** Compute a short hash of all inputs that affect ISO content. */
function computeIsoHash(config: BastionConfig): string {
const h = createHash("sha256");
h.update(`${config.serverIp}:${config.httpPort}`);
h.update(config.fedoraVersion);
for (const paths of Object.values(IPXE_ISO_PATHS)) {
const srcFile = paths.src.find((s) => existsSync(s));
if (srcFile) {
const st = statSync(srcFile);
h.update(`${srcFile}:${st.size}:${st.mtimeMs}`);
}
}
// Include kernel/initrd cache state
const cacheDir = join(config.bastionDir, "iso-cache");
for (const arch of ["x86_64", "aarch64"]) {
const vmlinuz = join(cacheDir, `vmlinuz-${arch}`);
if (existsSync(vmlinuz)) {
const st = statSync(vmlinuz);
h.update(`${vmlinuz}:${st.size}`);
}
}
return h.digest("hex").slice(0, 16);
}
/**
* Ensure boot.iso exists and is up-to-date in httpDir.
* Called during startup so @fastify/static can serve it with Range support.
*/
export function ensureBootIso(config: BastionConfig): void {
const isoPath = join(config.httpDir, "boot.iso");
const hashPath = join(config.httpDir, "boot.iso.hash");
const currentHash = computeIsoHash(config);
const cachedHash = existsSync(hashPath) ? readFileSync(hashPath, "utf-8").trim() : "";
if (existsSync(isoPath) && currentHash === cachedHash) {
logger.info(" Boot ISO -- cached (up to date)");
return;
}
if (existsSync(isoPath)) {
logger.info(" Boot ISO -- inputs changed, regenerating...");
try { unlinkSync(isoPath); } catch { /* ignore */ }
} else {
logger.info(" Boot ISO -- generating...");
}
generateIso(config, isoPath);
writeFileSync(hashPath, currentHash);
}

View File

@@ -12,6 +12,7 @@ import {
renderInstallIpxe,
renderLocalBootIpxe,
} from "../templates/boot.ipxe.js";
import { renderUbuntuInstallIpxe } from "../templates/ubuntu-boot.ipxe.js";
import { logger } from "../services/logger.js";
export function registerDispatchRoutes(
@@ -26,16 +27,28 @@ export function registerDispatchRoutes(
const queueEntry = currentState.install_queue[mac];
if (queueEntry) {
const hostname = queueEntry.hostname ?? "lab-node";
logger.info(`INSTALL STARTED: ${mac} -> ${hostname}`);
const os = queueEntry.os ?? "fedora-43";
logger.info(`INSTALL STARTED: ${mac} -> ${hostname} (${os})`);
const script = renderInstallIpxe({
mac,
hostname,
serverIp: config.serverIp,
httpPort: config.httpPort,
fedoraVersion: config.fedoraVersion,
fedoraMirror: config.fedoraMirror,
});
let script: string;
if (os.startsWith("ubuntu")) {
script = renderUbuntuInstallIpxe({
mac,
hostname,
serverIp: config.serverIp,
httpPort: config.httpPort,
ubuntuVersion: config.ubuntuVersion,
});
} else {
script = renderInstallIpxe({
mac,
hostname,
serverIp: config.serverIp,
httpPort: config.httpPort,
fedoraVersion: config.fedoraVersion,
fedoraMirror: config.fedoraMirror,
});
}
return reply.type("text/plain").send(script);
}

View File

@@ -1,10 +1,12 @@
// Kickstart generation routes.
// Serves per-MAC install kickstart and the static discovery kickstart.
// Serves per-MAC install kickstart, static discovery kickstart,
// and Ubuntu autoinstall cloud-init endpoints.
import type { FastifyInstance } from "fastify";
import type { BastionConfig } from "@lab/shared";
import type { StateManager } from "../services/state.js";
import { generateInstallKickstart, generateDiscoverKickstart } from "../services/kickstart-generator.js";
import { renderUbuntuAutoinstall, renderUbuntuMetaData, type UbuntuAutoinstallParams } from "../templates/ubuntu-autoinstall.js";
export function registerKickstartRoutes(
app: FastifyInstance,
@@ -31,4 +33,39 @@ export function registerKickstartRoutes(
const ks = generateDiscoverKickstart(config);
return reply.type("text/plain").send(ks);
});
// Ubuntu autoinstall user-data (cloud-init)
app.get<{ Params: { mac: string } }>("/autoinstall/:mac/user-data", async (request, reply) => {
const mac = request.params.mac.toLowerCase().replace(/-/g, ":");
const currentState = state.load();
const queueEntry = currentState.install_queue[mac];
const aiParams: UbuntuAutoinstallParams = {
hostname: queueEntry?.hostname ?? "lab-node",
disk: queueEntry?.disk ?? "",
role: queueEntry?.role ?? "worker",
domain: config.domain,
ubuntuVersion: config.ubuntuVersion,
timezone: config.timezone,
locale: config.locale,
serverIp: config.serverIp,
httpPort: config.httpPort,
sshKeys: config.sshKeys,
adminUser: config.adminUser,
};
const userData = renderUbuntuAutoinstall(aiParams);
return reply.type("text/plain").send(userData);
});
// Ubuntu autoinstall meta-data (cloud-init)
app.get<{ Params: { mac: string } }>("/autoinstall/:mac/meta-data", async (request, reply) => {
const mac = request.params.mac.toLowerCase().replace(/-/g, ":");
const currentState = state.load();
const queueEntry = currentState.install_queue[mac];
const hostname = queueEntry?.hostname ?? "lab-node";
const metaData = renderUbuntuMetaData(hostname);
return reply.type("text/plain").send(metaData);
});
}

View File

@@ -5,12 +5,14 @@ import fastifyStatic from "@fastify/static";
import { mkdirSync, existsSync } from "node:fs";
import type { BastionConfig } from "@lab/shared";
import { StateManager } from "./services/state.js";
import { InstallLogBuffer } from "./services/install-log.js";
import { logger } from "./services/logger.js";
import { registerDispatchRoutes } from "./routes/dispatch.js";
import { registerKickstartRoutes } from "./routes/kickstart.js";
import { registerApiRoutes } from "./routes/api.js";
export function createApp(config: BastionConfig): { app: ReturnType<typeof Fastify>; state: StateManager } {
export function createApp(config: BastionConfig): { app: ReturnType<typeof Fastify>; state: StateManager; installLog: InstallLogBuffer } {
const app = Fastify({
logger: false, // We use winston instead
});
@@ -18,6 +20,8 @@ export function createApp(config: BastionConfig): { app: ReturnType<typeof Fasti
const state = new StateManager(config.stateFile);
state.init();
const installLog = new InstallLogBuffer(config.bastionDir);
// Serve static files (vmlinuz, initrd.img, iPXE binaries) from the HTTP directory
mkdirSync(config.httpDir, { recursive: true });
app.register(fastifyStatic, {
@@ -38,14 +42,16 @@ export function createApp(config: BastionConfig): { app: ReturnType<typeof Fasti
// Register route handlers
registerDispatchRoutes(app, config, state);
registerKickstartRoutes(app, config, state);
registerApiRoutes(app, state);
registerApiRoutes(app, state, installLog);
// boot.iso is generated at startup and served as a static file from httpDir
// (static serving supports HTTP Range requests, required by JetKVM streaming)
// Log all requests
app.addHook("onRequest", async (request) => {
logger.info(`HTTP: ${request.ip} ${request.method} ${request.url}`);
});
return { app, state };
return { app, state, installLog };
}
export async function startServer(config: BastionConfig): Promise<void> {

View File

@@ -0,0 +1,86 @@
// Per-machine install log buffer.
// Stores raw log lines in memory (ring buffer) and persists to disk.
// Used by /api/log for ingestion and /api/logs/:mac/follow for SSE streaming.
import { mkdirSync, appendFileSync, readFileSync, existsSync } from "node:fs";
import { join } from "node:path";
import { progressBus } from "./progress-events.js";
const MAX_LINES_IN_MEMORY = 2000;
export interface LogLine {
line: string;
timestamp: string;
}
export class InstallLogBuffer {
/** In-memory ring buffer per MAC */
private buffers = new Map<string, LogLine[]>();
private logDir: string;
constructor(bastionDir: string) {
this.logDir = join(bastionDir, "logs");
mkdirSync(this.logDir, { recursive: true });
}
/** Append log lines for a machine. Stores in memory + appends to file. */
append(mac: string, lines: string[], hostname?: string): void {
const now = new Date().toISOString();
const buffer = this.buffers.get(mac) ?? [];
const newEntries: LogLine[] = lines.map((line) => ({ line, timestamp: now }));
buffer.push(...newEntries);
// Trim to ring buffer size
if (buffer.length > MAX_LINES_IN_MEMORY) {
buffer.splice(0, buffer.length - MAX_LINES_IN_MEMORY);
}
this.buffers.set(mac, buffer);
// Persist to file
const filePath = this.logFilePath(mac);
const fileContent = lines.map((l) => `${now} ${l}`).join("\n") + "\n";
appendFileSync(filePath, fileContent);
// Emit to SSE via progressBus (use "log" stage for log lines)
const host = hostname ?? mac;
for (const line of lines) {
progressBus.emit({
mac,
hostname: host,
stage: "log",
detail: line,
timestamp: now,
});
}
}
/** Get buffered log lines for a machine. */
getLines(mac: string, offset = 0, limit = 500): LogLine[] {
const buffer = this.buffers.get(mac) ?? [];
return buffer.slice(offset, offset + limit);
}
/** Get total line count for a machine. */
lineCount(mac: string): number {
return this.buffers.get(mac)?.length ?? 0;
}
/** Read full log from disk (for machines no longer in memory). */
readFromDisk(mac: string): string | null {
const filePath = this.logFilePath(mac);
if (!existsSync(filePath)) return null;
return readFileSync(filePath, "utf-8");
}
/** Clear log for a machine (after install complete or forget). */
clear(mac: string): void {
this.buffers.delete(mac);
}
private logFilePath(mac: string): string {
// Replace colons with dashes for filesystem safety
return join(this.logDir, `${mac.replace(/:/g, "-")}.log`);
}
}

View File

@@ -0,0 +1,437 @@
// Pure TypeScript UEFI-bootable ISO builder.
// Creates an ISO 9660 image with an embedded FAT EFI system partition
// containing iPXE EFI binaries and an autoexec script.
// No external tools required (no xorriso, mtools).
import { readFileSync } from "node:fs";
const SECTOR_SIZE = 2048; // ISO 9660 logical sector
const FAT_SECTOR_SIZE = 512;
// --- Utility helpers ---
function asciiPad(s: string, len: number, pad = " "): Buffer {
const buf = Buffer.alloc(len, pad.charCodeAt(0));
buf.write(s, 0, Math.min(s.length, len), "ascii");
return buf;
}
function u16le(n: number): Buffer {
const buf = Buffer.alloc(2);
buf.writeUInt16LE(n);
return buf;
}
function u32le(n: number): Buffer {
const buf = Buffer.alloc(4);
buf.writeUInt32LE(n);
return buf;
}
function u16be(n: number): Buffer {
const buf = Buffer.alloc(2);
buf.writeUInt16BE(n);
return buf;
}
function u32be(n: number): Buffer {
const buf = Buffer.alloc(4);
buf.writeUInt32BE(n);
return buf;
}
/** Both-endian 16-bit (ISO 9660 "both-byte" format) */
function u16both(n: number): Buffer {
return Buffer.concat([u16le(n), u16be(n)]);
}
/** Both-endian 32-bit */
function u32both(n: number): Buffer {
return Buffer.concat([u32le(n), u32be(n)]);
}
function isoDate(d: Date): Buffer {
// ISO 9660 date: 17 bytes ASCII "YYYYMMDDHHMMSSCC" + timezone offset
const s =
d.getUTCFullYear().toString().padStart(4, "0") +
(d.getUTCMonth() + 1).toString().padStart(2, "0") +
d.getUTCDate().toString().padStart(2, "0") +
d.getUTCHours().toString().padStart(2, "0") +
d.getUTCMinutes().toString().padStart(2, "0") +
d.getUTCSeconds().toString().padStart(2, "0") +
"00"; // hundredths
const buf = Buffer.alloc(17, 0);
buf.write(s, 0, 16, "ascii");
buf[16] = 0; // UTC offset (0 = UTC)
return buf;
}
function dirRecordDate(d: Date): Buffer {
// 7-byte recording date
const buf = Buffer.alloc(7, 0);
buf[0] = d.getUTCFullYear() - 1900;
buf[1] = d.getUTCMonth() + 1;
buf[2] = d.getUTCDate();
buf[3] = d.getUTCHours();
buf[4] = d.getUTCMinutes();
buf[5] = d.getUTCSeconds();
buf[6] = 0; // UTC
return buf;
}
// --- FAT12 filesystem builder ---
function buildFatImage(files: Array<{ path: string; data: Buffer }>): Buffer {
// Build a minimal FAT12 filesystem in memory
// Layout: BPB | FAT | FAT copy | Root dir | Data clusters
const bytesPerSector = FAT_SECTOR_SIZE;
const sectorsPerCluster = 4; // 2KB clusters
const clusterSize = bytesPerSector * sectorsPerCluster;
const reservedSectors = 1;
const numFats = 2;
const rootEntryCount = 64; // 64 * 32 = 2048 bytes = 4 sectors
const rootDirSectors = Math.ceil((rootEntryCount * 32) / bytesPerSector);
// Calculate data size needed
let totalDataBytes = 0;
for (const f of files) totalDataBytes += Math.ceil(f.data.length / clusterSize) * clusterSize;
// Add directory clusters for EFI and EFI/BOOT
totalDataBytes += clusterSize * 2;
const dataClusters = Math.ceil(totalDataBytes / clusterSize) + 2; // +2 safety
const fatEntries = dataClusters + 2; // clusters start at 2
const fatBytes = Math.ceil((fatEntries * 3) / 2); // FAT12: 1.5 bytes per entry
const sectorsPerFat = Math.ceil(fatBytes / bytesPerSector);
const totalSectors = reservedSectors + (numFats * sectorsPerFat) + rootDirSectors + (dataClusters * sectorsPerCluster);
const image = Buffer.alloc(totalSectors * bytesPerSector, 0);
// --- BPB (BIOS Parameter Block) ---
image[0] = 0xEB; image[1] = 0x3C; image[2] = 0x90; // Jump + NOP
image.write("LABCTL ", 3, 8, "ascii"); // OEM
image.writeUInt16LE(bytesPerSector, 11);
image[13] = sectorsPerCluster;
image.writeUInt16LE(reservedSectors, 14);
image[16] = numFats;
image.writeUInt16LE(rootEntryCount, 17);
image.writeUInt16LE(totalSectors < 0x10000 ? totalSectors : 0, 19);
image[21] = 0xF0; // media descriptor (removable)
image.writeUInt16LE(sectorsPerFat, 22);
image.writeUInt16LE(1, 24); // sectors per track
image.writeUInt16LE(1, 26); // heads
image[38] = 0x29; // Extended boot sig
image.writeUInt32LE(0x12345678, 39); // volume serial
image.write("IPXE BOOT ", 43, 11, "ascii"); // volume label
image.write("FAT12 ", 54, 8, "ascii"); // filesystem type
image[510] = 0x55; image[511] = 0xAA; // Boot signature
// --- FAT table ---
const fatOffset = reservedSectors * bytesPerSector;
const rootDirOffset = fatOffset + (numFats * sectorsPerFat * bytesPerSector);
const dataOffset = rootDirOffset + (rootDirSectors * bytesPerSector);
// FAT12 helper: write a 12-bit entry
function fatSet(fat: number, cluster: number, value: number): void {
const off = fatOffset + (fat * sectorsPerFat * bytesPerSector);
const byteIdx = Math.floor(cluster * 3 / 2);
if (cluster % 2 === 0) {
image[off + byteIdx] = value & 0xFF;
image[off + byteIdx + 1] = (image[off + byteIdx + 1]! & 0xF0) | ((value >> 8) & 0x0F);
} else {
image[off + byteIdx] = (image[off + byteIdx]! & 0x0F) | ((value & 0x0F) << 4);
image[off + byteIdx + 1] = (value >> 4) & 0xFF;
}
}
// Media descriptor in FAT
for (let f = 0; f < numFats; f++) {
fatSet(f, 0, 0xFF0);
fatSet(f, 1, 0xFFF);
}
let nextCluster = 2;
function allocClusters(size: number): number {
const needed = Math.max(1, Math.ceil(size / clusterSize));
const startCluster = nextCluster;
for (let i = 0; i < needed; i++) {
const c = nextCluster++;
const next = (i === needed - 1) ? 0xFFF : c + 1;
for (let f = 0; f < numFats; f++) fatSet(f, c, next);
}
return startCluster;
}
function clusterOffset(cluster: number): number {
return dataOffset + (cluster - 2) * clusterSize;
}
function writeDirEntry(dirBuf: Buffer, entryIdx: number, name: string, ext: string, cluster: number, size: number, isDir: boolean): void {
const off = entryIdx * 32;
dirBuf.write(name.toUpperCase().padEnd(8, " "), off, 8, "ascii");
dirBuf.write(ext.toUpperCase().padEnd(3, " "), off + 8, 3, "ascii");
dirBuf[off + 11] = isDir ? 0x10 : 0x20; // attributes
dirBuf.writeUInt16LE(cluster & 0xFFFF, off + 26); // first cluster low
dirBuf.writeUInt32LE(isDir ? 0 : size, off + 28); // file size
}
// --- Create directory structure ---
// Root: EFI dir + autoexec.ipxe
// EFI: BOOT dir
// BOOT: BOOTX64.EFI, BOOTAA64.EFI
// EFI directory cluster
const efiDirCluster = allocClusters(clusterSize);
const efiDirBuf = Buffer.alloc(clusterSize, 0);
// BOOT directory cluster
const bootDirCluster = allocClusters(clusterSize);
const bootDirBuf = Buffer.alloc(clusterSize, 0);
// Write . and .. entries for EFI
writeDirEntry(efiDirBuf, 0, ".", "", efiDirCluster, 0, true);
writeDirEntry(efiDirBuf, 1, "..", "", 0, 0, true);
// BOOT subdir in EFI
writeDirEntry(efiDirBuf, 2, "BOOT", "", bootDirCluster, 0, true);
// Write . and .. entries for BOOT
writeDirEntry(bootDirBuf, 0, ".", "", bootDirCluster, 0, true);
writeDirEntry(bootDirBuf, 1, "..", "", efiDirCluster, 0, true);
let bootEntryIdx = 2;
// Root directory entries
let rootEntryIdx = 0;
// Volume label
const rootBuf = image.subarray(rootDirOffset, rootDirOffset + rootDirSectors * bytesPerSector);
rootBuf.write("IPXE BOOT ", rootEntryIdx * 32, 11, "ascii");
rootBuf[rootEntryIdx * 32 + 11] = 0x08; // volume label attribute
rootEntryIdx++;
// EFI directory in root
writeDirEntry(rootBuf, rootEntryIdx++, "EFI", "", efiDirCluster, 0, true);
// Write files
for (const file of files) {
const parts = file.path.toUpperCase().split("/").filter(Boolean);
const fileName = parts[parts.length - 1]!;
const nameParts = fileName.split(".");
const name = nameParts[0]!.substring(0, 8);
const ext = (nameParts[1] ?? "").substring(0, 3);
const fileCluster = allocClusters(file.data.length);
file.data.copy(image, clusterOffset(fileCluster));
if (parts.length === 1) {
// Root level file
writeDirEntry(rootBuf, rootEntryIdx++, name, ext, fileCluster, file.data.length, false);
} else if (parts.length === 3 && parts[0] === "EFI" && parts[1] === "BOOT") {
// EFI/BOOT/ file
writeDirEntry(bootDirBuf, bootEntryIdx++, name, ext, fileCluster, file.data.length, false);
}
}
// Write directory clusters to image
efiDirBuf.copy(image, clusterOffset(efiDirCluster));
bootDirBuf.copy(image, clusterOffset(bootDirCluster));
return image;
}
// --- ISO 9660 builder ---
export function buildBootIso(efiFiles: Array<{ path: string; data: Buffer }>, scriptContent?: string): Buffer {
const now = new Date();
// Build FAT image with all files
const allFiles = [...efiFiles];
if (scriptContent) {
allFiles.push({ path: "autoexec.ipxe", data: Buffer.from(scriptContent, "utf-8") });
}
const fatImage = buildFatImage(allFiles);
// ISO layout:
// Sector 0-15: System area (unused)
// Sector 16: Primary Volume Descriptor
// Sector 17: Boot Record Volume Descriptor (El Torito)
// Sector 18: Volume Descriptor Set Terminator
// Sector 19: Root directory record
// Sector 20: El Torito boot catalog
// Sector 21: El Torito boot image (the FAT image, this gets large)
// After FAT: EFI boot image reference for files visible in ISO
const fatSectors = Math.ceil(fatImage.length / SECTOR_SIZE);
const rootDirSector = 19;
const bootCatalogSector = 20;
const efiImageSector = 21;
const totalSectors = efiImageSector + fatSectors + 1;
const iso = Buffer.alloc(totalSectors * SECTOR_SIZE, 0);
// --- Primary Volume Descriptor (sector 16) ---
const pvd = iso.subarray(16 * SECTOR_SIZE, 17 * SECTOR_SIZE);
pvd[0] = 1; // type: Primary
pvd.write("CD001", 1, 5, "ascii"); // standard identifier
pvd[6] = 1; // version
asciiPad("LABCTL", 32).copy(pvd, 8); // system identifier
asciiPad("IPXE_BOOT", 32).copy(pvd, 40); // volume identifier
u32both(totalSectors).copy(pvd, 80); // volume space size
u16both(1).copy(pvd, 120); // volume set size
u16both(1).copy(pvd, 124); // volume sequence number
u16both(SECTOR_SIZE).copy(pvd, 128); // logical block size
// Root directory record (34 bytes)
const rootRec = Buffer.alloc(34, 0);
rootRec[0] = 34; // length
rootRec[1] = 0; // extended attribute length
u32both(rootDirSector).copy(rootRec, 2); // extent location
u32both(SECTOR_SIZE).copy(rootRec, 10); // data length
dirRecordDate(now).copy(rootRec, 18);
rootRec[25] = 0x02; // flags: directory
rootRec[28] = 1; // file unit size
u16both(1).copy(rootRec, 30); // volume sequence
rootRec[32] = 1; // name length
rootRec[33] = 0; // name: root
rootRec.copy(pvd, 156); // copy to PVD
// Volume dates
isoDate(now).copy(pvd, 813); // creation
isoDate(now).copy(pvd, 830); // modification
Buffer.alloc(17, 0x30).copy(pvd, 847); // expiration (none)
isoDate(now).copy(pvd, 864); // effective
pvd[881] = 1; // file structure version
// --- Boot Record Volume Descriptor (El Torito, sector 17) ---
const brvd = iso.subarray(17 * SECTOR_SIZE, 18 * SECTOR_SIZE);
brvd[0] = 0; // type: Boot Record
brvd.write("CD001", 1, 5, "ascii");
brvd[6] = 1; // version
brvd.write("EL TORITO SPECIFICATION", 7, 32, "ascii");
u32le(bootCatalogSector).copy(brvd, 0x47); // boot catalog pointer
// --- Volume Descriptor Set Terminator (sector 18) ---
const vdst = iso.subarray(18 * SECTOR_SIZE, 19 * SECTOR_SIZE);
vdst[0] = 255; // type: terminator
vdst.write("CD001", 1, 5, "ascii");
vdst[6] = 1;
// --- Root Directory (sector 19) ---
const rootDir = iso.subarray(rootDirSector * SECTOR_SIZE, (rootDirSector + 1) * SECTOR_SIZE);
let offset = 0;
// "." entry
const dotRec = Buffer.alloc(34, 0);
dotRec[0] = 34;
u32both(rootDirSector).copy(dotRec, 2);
u32both(SECTOR_SIZE).copy(dotRec, 10);
dirRecordDate(now).copy(dotRec, 18);
dotRec[25] = 0x02;
u16both(1).copy(dotRec, 28);
dotRec[32] = 1;
dotRec[33] = 0;
dotRec.copy(rootDir, offset);
offset += 34;
// ".." entry
const dotdotRec = Buffer.alloc(34, 0);
dotdotRec[0] = 34;
u32both(rootDirSector).copy(dotdotRec, 2);
u32both(SECTOR_SIZE).copy(dotdotRec, 10);
dirRecordDate(now).copy(dotdotRec, 18);
dotdotRec[25] = 0x02;
u16both(1).copy(dotdotRec, 28);
dotdotRec[32] = 1;
dotdotRec[33] = 1;
dotdotRec.copy(rootDir, offset);
offset += 34;
// EFI boot image file entry (the FAT image visible as a file)
const efiFileName = "EFI.IMG;1";
const efiRec = Buffer.alloc(33 + efiFileName.length + ((efiFileName.length % 2 === 0) ? 1 : 0), 0);
efiRec[0] = efiRec.length;
u32both(efiImageSector).copy(efiRec, 2);
u32both(fatImage.length).copy(efiRec, 10);
dirRecordDate(now).copy(efiRec, 18);
efiRec[25] = 0x00; // flags: file
u16both(1).copy(efiRec, 28);
efiRec[32] = efiFileName.length;
efiRec.write(efiFileName, 33, efiFileName.length, "ascii");
efiRec.copy(rootDir, offset);
offset += efiRec.length;
// Boot catalog file entry
const catFileName = "BOOT.CAT;1";
const catRec = Buffer.alloc(33 + catFileName.length + ((catFileName.length % 2 === 0) ? 1 : 0), 0);
catRec[0] = catRec.length;
u32both(bootCatalogSector).copy(catRec, 2);
u32both(SECTOR_SIZE).copy(catRec, 10);
dirRecordDate(now).copy(catRec, 18);
catRec[25] = 0x01; // flags: hidden
u16both(1).copy(catRec, 28);
catRec[32] = catFileName.length;
catRec.write(catFileName, 33, catFileName.length, "ascii");
catRec.copy(rootDir, offset);
// --- El Torito Boot Catalog (sector 20) ---
const catalog = iso.subarray(bootCatalogSector * SECTOR_SIZE, (bootCatalogSector + 1) * SECTOR_SIZE);
// Validation entry (32 bytes)
catalog[0] = 1; // header ID
catalog[1] = 0xEF; // platform: EFI
catalog.write("LABCTL", 4, 24, "ascii"); // ID string
// Calculate checksum for validation entry
let cksum = 0;
for (let i = 0; i < 32; i += 2) {
cksum += catalog[i]! + (catalog[i + 1]! << 8);
}
catalog.writeUInt16LE((0x10000 - (cksum & 0xFFFF)) & 0xFFFF, 28); // checksum
catalog[30] = 0x55;
catalog[31] = 0xAA;
// Default/Initial entry (32 bytes, offset 32)
catalog[32] = 0x88; // bootable
catalog[33] = 0xEF; // type: EFI
catalog.writeUInt16LE(0, 34); // load segment
catalog[36] = 0; // system type
const efiImageSectors512 = Math.ceil(fatImage.length / FAT_SECTOR_SIZE);
catalog.writeUInt16LE(efiImageSectors512 & 0xFFFF, 38); // sector count
catalog.writeUInt32LE(efiImageSector, 40); // load LBA
// --- EFI boot image (FAT filesystem, starting at sector 21) ---
fatImage.copy(iso, efiImageSector * SECTOR_SIZE);
return iso;
}
/** Build a ready-to-serve iPXE boot ISO from system iPXE binaries. */
export function buildBastionBootIso(bastionUrl: string): Buffer {
const efiFiles: Array<{ path: string; data: Buffer }> = [];
const PATHS: Record<string, { src: string; dest: string }> = {
x86_64: { src: "/usr/share/ipxe/ipxe-snponly-x86_64.efi", dest: "EFI/BOOT/BOOTX64.EFI" },
aarch64: { src: "/usr/share/ipxe/arm64-efi/snponly.efi", dest: "EFI/BOOT/BOOTAA64.EFI" },
};
for (const [, paths] of Object.entries(PATHS)) {
try {
efiFiles.push({ path: paths.dest, data: readFileSync(paths.src) });
} catch {
// Architecture not available, skip
}
}
if (efiFiles.length === 0) {
throw new Error("No iPXE EFI binaries found on system");
}
const script = [
"#!ipxe",
"",
"echo Booting from iPXE ISO -- connecting to bastion...",
"dhcp || ( echo DHCP failed, retrying... && sleep 3 && dhcp )",
`chain ${bastionUrl}/boot.ipxe || shell`,
].join("\n");
return buildBootIso(efiFiles, script);
}

View File

@@ -1,7 +1,7 @@
// Generate kickstart content for discovery and install modes.
// Uses template literal functions -- no external template engine.
import type { BastionConfig } from "@lab/shared";
import type { BastionConfig, Role } from "@lab/shared";
import { renderDiscoverKickstart } from "../templates/discover.ks.js";
import { renderInstallKickstart, type InstallKickstartParams } from "../templates/install.ks.js";
@@ -23,7 +23,7 @@ export function generateInstallKickstart(
params: {
hostname: string;
disk: string;
role: "worker" | "infra";
role: Role;
},
): string {
const ksParams: InstallKickstartParams = {

View File

@@ -0,0 +1,252 @@
// WebSocket connection from bastion to labd for registration and state sync.
// If LABD_URL is configured, bastion registers with labd on startup and pushes
// state changes. If not configured, bastion runs standalone (backward compatible).
import WebSocket from "ws";
import { readFileSync, writeFileSync, existsSync } from "node:fs";
import { hostname as osHostname } from "node:os";
import type { BastionState, BastionConfig } from "@lab/shared";
import {
type BastionMessage,
type LabdBastionMessage,
isLabdBastionMessage,
} from "@lab/shared";
import { logger } from "./logger.js";
const HEARTBEAT_INTERVAL_MS = 10_000;
const RECONNECT_BASE_DELAY_MS = 1_000;
const RECONNECT_MAX_DELAY_MS = 30_000;
type CommandHandler = (msg: LabdBastionMessage) => Promise<{ status: "ok" | "error"; data?: unknown; error?: string }>;
export class BastionConnection {
private ws: WebSocket | null = null;
private bastionId: string | null = null;
private heartbeatTimer: NodeJS.Timeout | null = null;
private reconnectTimer: NodeJS.Timeout | null = null;
private retryCount = 0;
private closed = false;
private startTime = Date.now();
private commandHandlers = new Map<string, CommandHandler>();
constructor(
private readonly config: BastionConfig,
private readonly getState: () => BastionState,
) {
// Load persisted bastionId if we've enrolled before
const idFile = `${config.bastionDir}/bastion-id`;
if (existsSync(idFile)) {
this.bastionId = readFileSync(idFile, "utf-8").trim();
}
}
/** Register a handler for incoming commands from labd. */
onCommand(type: string, handler: CommandHandler): void {
this.commandHandlers.set(type, handler);
}
connect(): void {
if (this.closed) return;
if (!this.config.labdUrl) return;
const wsUrl = this.config.labdUrl
.replace(/^https:/, "wss:")
.replace(/^http:/, "ws:");
const token = this.config.bastionJoinToken ?? "";
const url = `${wsUrl}/ws/bastion?token=${encodeURIComponent(token)}`;
logger.info(`Connecting to labd at ${this.config.labdUrl}...`);
this.ws = new WebSocket(url);
this.ws.on("open", () => {
logger.info("Connected to labd");
this.retryCount = 0;
// Send enrollment or re-registration
if (this.bastionId) {
// Already enrolled — send state sync immediately
this.sendStateSync();
} else {
// First time — enroll
this.send({
type: "bastion-enroll",
token,
hostname: osHostname(),
network: this.config.network,
serverIp: this.config.serverIp,
});
}
this.startHeartbeat();
});
this.ws.on("message", (data: WebSocket.Data) => {
try {
const raw = data.toString();
const msg: unknown = JSON.parse(raw);
if (!isLabdBastionMessage(msg)) {
logger.warn(`Unknown message from labd: ${(msg as { type?: string }).type}`);
return;
}
this.handleMessage(msg);
} catch (err) {
logger.error(`Failed to parse labd message: ${err instanceof Error ? err.message : String(err)}`);
}
});
this.ws.on("close", () => {
logger.warn("Disconnected from labd");
this.stopHeartbeat();
this.scheduleReconnect();
});
this.ws.on("error", (err) => {
logger.error(`WebSocket error: ${err.message}`);
// close event will fire after this, triggering reconnect
});
}
/** Push current state to labd. Call this after any state change. */
syncState(): void {
if (!this.bastionId || !this.ws || this.ws.readyState !== WebSocket.OPEN) return;
this.sendStateSync();
}
/** Forward a progress event to labd. */
sendProgress(mac: string, stage: string, detail: string): void {
if (!this.bastionId || !this.ws || this.ws.readyState !== WebSocket.OPEN) return;
this.send({
type: "bastion-progress",
bastionId: this.bastionId,
mac,
stage,
detail,
timestamp: new Date().toISOString(),
});
}
close(): void {
this.closed = true;
this.stopHeartbeat();
if (this.reconnectTimer) {
clearTimeout(this.reconnectTimer);
this.reconnectTimer = null;
}
if (this.ws) {
this.ws.close();
this.ws = null;
}
}
private handleMessage(msg: LabdBastionMessage): void {
switch (msg.type) {
case "bastion-enrolled":
this.bastionId = msg.bastionId;
// Persist for reconnects
writeFileSync(`${this.config.bastionDir}/bastion-id`, msg.bastionId);
logger.info(`Enrolled with labd as bastion ${msg.bastionId}`);
// Send initial state
this.sendStateSync();
break;
case "bastion-heartbeat-ack":
// No-op, confirms labd is alive
break;
case "server-shutdown":
logger.info(`labd shutting down, will reconnect in ${msg.reconnectAfter}ms`);
break;
case "command-install":
case "command-forget":
case "command-role-update":
void this.handleCommand(msg);
break;
}
}
private async handleCommand(msg: LabdBastionMessage & { requestId: string }): Promise<void> {
const handler = this.commandHandlers.get(msg.type);
if (!handler) {
this.send({
type: "command-response",
requestId: msg.requestId,
status: "error",
error: `No handler for command: ${msg.type}`,
});
return;
}
try {
const result = await handler(msg);
this.send({
type: "command-response",
requestId: msg.requestId,
...result,
});
} catch (err) {
this.send({
type: "command-response",
requestId: msg.requestId,
status: "error",
error: err instanceof Error ? err.message : String(err),
});
}
}
private sendStateSync(): void {
if (!this.bastionId) return;
this.send({
type: "bastion-state-sync",
bastionId: this.bastionId,
state: this.getState(),
});
}
private startHeartbeat(): void {
this.stopHeartbeat();
this.heartbeatTimer = setInterval(() => {
if (!this.bastionId) return;
const state = this.getState();
const machineCount =
Object.keys(state.discovered).length +
Object.keys(state.install_queue).length +
Object.keys(state.installed).length;
this.send({
type: "bastion-heartbeat",
bastionId: this.bastionId,
uptime: Math.floor((Date.now() - this.startTime) / 1000),
machineCount,
});
}, HEARTBEAT_INTERVAL_MS);
}
private stopHeartbeat(): void {
if (this.heartbeatTimer) {
clearInterval(this.heartbeatTimer);
this.heartbeatTimer = null;
}
}
private scheduleReconnect(): void {
if (this.closed) return;
const delay = Math.min(
RECONNECT_BASE_DELAY_MS * Math.pow(2, this.retryCount),
RECONNECT_MAX_DELAY_MS,
);
this.retryCount++;
logger.info(`Reconnecting to labd in ${delay}ms (attempt ${this.retryCount})...`);
this.reconnectTimer = setTimeout(() => this.connect(), delay);
}
private send(msg: BastionMessage): void {
if (this.ws && this.ws.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify(msg));
}
}
}

View File

@@ -0,0 +1,233 @@
// Post-provision automation: installs k3s after OS provisioning completes.
// Runs asynchronously — does not block the progress callback.
import { spawn } from "node:child_process";
import { existsSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
import { logger } from "./logger.js";
import { progressBus } from "./progress-events.js";
function findSshKey(): string | undefined {
const sudoUser = process.env["SUDO_USER"];
const realHome = sudoUser ? join("/home", sudoUser) : homedir();
for (const name of ["id_ed25519", "id_ecdsa", "id_rsa"]) {
const p = join(realHome, ".ssh", name);
if (existsSync(p)) return p;
}
return undefined;
}
/** Wait for SSH to become available, with retries. */
async function waitForSsh(ip: string, user: string, keyPath: string | undefined, timeoutMs: number): Promise<boolean> {
const start = Date.now();
while (Date.now() - start < timeoutMs) {
try {
const result = await sshExec(ip, user, "echo ok", keyPath);
if (result.includes("ok")) return true;
} catch { /* retry */ }
await new Promise((r) => setTimeout(r, 5000));
}
return false;
}
function sshExec(ip: string, user: string, command: string, keyPath: string | undefined): Promise<string> {
return new Promise((resolve, reject) => {
const args = [
"-o", "StrictHostKeyChecking=no",
"-o", "ConnectTimeout=10",
"-o", "BatchMode=yes",
...(keyPath ? ["-i", keyPath] : []),
`${user}@${ip}`,
command,
];
const proc = spawn("ssh", args, { stdio: ["ignore", "pipe", "pipe"] });
let stdout = "";
proc.stdout.on("data", (d: Buffer) => { stdout += d.toString(); });
proc.on("close", (code) => {
if (code === 0) resolve(stdout);
else reject(new Error(`SSH exit ${code}`));
});
proc.on("error", reject);
});
}
function sshRunStreaming(ip: string, user: string, command: string, keyPath: string | undefined, label: string, mac?: string): Promise<number> {
return new Promise((resolve) => {
const args = [
"-o", "StrictHostKeyChecking=no",
"-o", "ConnectTimeout=10",
"-o", "BatchMode=yes",
...(keyPath ? ["-i", keyPath] : []),
`${user}@${ip}`,
command,
];
const proc = spawn("ssh", args, { stdio: ["ignore", "pipe", "pipe"] });
proc.stdout.on("data", (d: Buffer) => {
for (const line of d.toString().split("\n").filter(Boolean)) {
logger.info(`[k3s:${label}] ${line}`);
if (mac) {
progressBus.emit({ mac, hostname: label, stage: "log", detail: `[k3s] ${line}`, timestamp: new Date().toISOString() });
}
}
});
proc.stderr.on("data", (d: Buffer) => {
for (const line of d.toString().split("\n").filter(Boolean)) {
logger.info(`[k3s:${label}] ${line}`);
if (mac) {
progressBus.emit({ mac, hostname: label, stage: "log", detail: `[k3s] ${line}`, timestamp: new Date().toISOString() });
}
}
});
proc.on("close", (code) => resolve(code ?? 1));
proc.on("error", () => resolve(1));
});
}
/**
* Trigger k3s installation on a freshly provisioned machine.
* Runs in the background — logs progress to bastion console and progressBus.
*/
export async function triggerPostProvisionK3s(
hostname: string,
ip: string,
role: string,
sshUser: string,
mac?: string,
): Promise<void> {
const keyPath = findSshKey();
const emitStage = (stage: string, detail: string): void => {
logger.info(`[k3s] ${detail}`);
if (mac) {
progressBus.emit({ mac, hostname, stage, detail, timestamp: new Date().toISOString() });
}
};
emitStage("post-provision", `auto-installing k3s on ${hostname} (${ip}) role=${role}`);
emitStage("post-provision", "waiting for SSH (machine may still be rebooting)");
// Wait up to 5 minutes for SSH (machine just finished kickstart and is rebooting)
const sshReady = await waitForSsh(ip, sshUser, keyPath, 300_000);
if (!sshReady) {
emitStage("error", `SSH not available on ${hostname} (${ip}) after 5 minutes`);
logger.error(`[k3s] Run manually: labctl app k3s install ${hostname}`);
return;
}
emitStage("post-provision", "SSH ready, installing k3s prerequisites");
// Step 1: Prerequisites
await sshRunStreaming(ip, sshUser, "sudo modprobe br_netfilter overlay 2>/dev/null; sudo swapoff -a", keyPath, hostname, mac);
// Step 2: Sysctl
emitStage("post-provision", "configuring sysctl for k3s");
await sshRunStreaming(ip, sshUser, `sudo bash -c 'cat > /etc/sysctl.d/90-k3s.conf << EOF
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
vm.panic_on_oom=0
vm.overcommit_memory=1
kernel.panic=10
kernel.panic_on_oops=1
EOF
sysctl --system > /dev/null'`, keyPath, hostname, mac);
// Step 3: SELinux + firewalld + stale CNI cleanup
emitStage("post-provision", "disabling firewalld and cleaning stale CNI");
await sshRunStreaming(ip, sshUser, [
"sudo setenforce 0 2>/dev/null || true",
"sudo systemctl disable --now firewalld 2>/dev/null || true",
"sudo systemctl mask firewalld 2>/dev/null || true",
// Clean stale CNI interfaces that conflict with Cilium (flannel.1 uses same vxlan port 8472)
"sudo systemctl stop k3s 2>/dev/null || true",
"sudo ip link delete flannel.1 2>/dev/null || true",
"sudo ip link delete cilium_vxlan 2>/dev/null || true",
"sudo ip link delete cilium_host 2>/dev/null || true",
"sudo ip link delete cilium_net 2>/dev/null || true",
"sudo rm -rf /etc/cni/net.d/* /var/lib/cni/ 2>/dev/null || true",
].join("; "), keyPath, hostname, mac);
// Step 4: Install k3s
// labcontroller extends infra — both are k3s servers
const k3sRole = (role === "infra" || role === "labcontroller") ? "server" : "agent";
emitStage("post-provision", `installing k3s ${k3sRole}`);
const code = await sshRunStreaming(ip, sshUser,
`curl -sfL https://get.k3s.io | sudo INSTALL_K3S_EXEC="${k3sRole}" INSTALL_K3S_SKIP_SELINUX_RPM=true sh -`,
keyPath, hostname, mac,
);
if (code !== 0) {
emitStage("error", `k3s install failed on ${hostname} (exit ${code})`);
logger.error(`[k3s] Run manually: labctl app k3s install ${hostname}`);
return;
}
// Step 5: Wait for ready
emitStage("post-provision", "waiting for k3s node to become Ready");
await sshRunStreaming(ip, sshUser,
"for i in $(seq 1 60); do sudo k3s kubectl get nodes 2>/dev/null | grep -q Ready && break; sleep 2; done",
keyPath, hostname, mac,
);
emitStage("post-provision", `k3s ${k3sRole} installed on ${hostname} (${ip})`);
// Step 6: Deploy role-specific apps from ROLE_REGISTRY chain
const { ROLE_REGISTRY } = await import("@lab/shared");
const roleInfo = ROLE_REGISTRY.find((r: { name: string }) => r.name === role);
if (roleInfo && roleInfo.apps.length > 0) {
emitStage("post-provision", `deploying apps: ${roleInfo.apps.join(", ")}`);
if (roleInfo.apps.includes("cockroachdb") || roleInfo.apps.includes("labd") || roleInfo.apps.includes("bastion")) {
// This is a labcontroller — deploy the full stack
emitStage("post-provision", `deploying labcontroller stack on ${hostname}`);
try {
const { cockroachDbManifests } = await import("@lab/modules/dist/modules/labcontroller/src/cockroachdb.js");
const { labdManifests } = await import("@lab/modules/dist/modules/labcontroller/src/labd.js");
const { bastionManifests } = await import("@lab/modules/dist/modules/labcontroller/src/bastion.js");
const crdb = cockroachDbManifests();
const labd = labdManifests({ databaseUrl: crdb.connectionString });
const bastion = bastionManifests();
const manifests = [
crdb.namespace, crdb.headlessService, crdb.clientService, crdb.statefulSet,
labd.service, labd.deployment,
bastion.daemonSet,
];
for (const manifest of manifests) {
const json = JSON.stringify(manifest);
const kind = (manifest as { kind?: string }).kind ?? "?";
const name = ((manifest as { metadata?: { name?: string } }).metadata)?.name ?? "?";
const result = await sshRunStreaming(ip, sshUser,
`echo '${json.replace(/'/g, "'\\''")}' | sudo k3s kubectl apply -f -`,
keyPath, hostname, mac,
);
if (result === 0) {
emitStage("post-provision", `applied ${kind}/${name}`);
} else {
emitStage("error", `failed to apply ${kind}/${name}`);
}
}
// Init CockroachDB
const initJson = JSON.stringify(crdb.initJob);
await sshRunStreaming(ip, sshUser,
`echo '${initJson.replace(/'/g, "'\\''")}' | sudo k3s kubectl apply -f - 2>/dev/null; sleep 30; sudo k3s kubectl exec cockroachdb-0 -n lab-system -- /cockroach/cockroach sql --insecure -e 'CREATE DATABASE IF NOT EXISTS lab' 2>/dev/null || true`,
keyPath, hostname, mac,
);
emitStage("post-provision", `labcontroller stack deployed on ${hostname}`);
} catch (err) {
const errMsg = err instanceof Error ? err.message : String(err);
emitStage("error", `failed to deploy labcontroller stack: ${errMsg}`);
logger.error(`[post-provision] Run manually: labctl app labcontroller deploy ${hostname}`);
}
}
}
emitStage("post-provision", `${hostname} (${ip}) provisioning complete (role: ${role})`);
}

View File

@@ -0,0 +1,28 @@
// In-memory event bus for provision progress updates.
// Allows SSE clients to subscribe to real-time progress and log lines.
import { EventEmitter } from "node:events";
export interface ProgressEvent {
mac: string;
hostname: string;
/** "log" for raw log lines, anything else is a progress stage name */
stage: string;
detail: string;
timestamp: string;
}
// Simple typed wrapper around EventEmitter for progress events.
const _bus = new EventEmitter();
export const progressBus = {
emit(event: ProgressEvent): void {
_bus.emit("progress", event);
},
on(listener: (event: ProgressEvent) => void): void {
_bus.on("progress", listener);
},
off(listener: (event: ProgressEvent) => void): void {
_bus.off("progress", listener);
},
};

View File

@@ -13,9 +13,18 @@ const EMPTY_STATE: BastionState = {
installed: {},
};
export type StateChangeListener = (state: BastionState) => void;
export class StateManager {
private changeListeners: StateChangeListener[] = [];
constructor(private readonly stateFile: string) {}
/** Register a listener that fires after every state update. */
onChange(listener: StateChangeListener): void {
this.changeListeners.push(listener);
}
load(): BastionState {
try {
const raw = readFileSync(this.stateFile, "utf-8");
@@ -52,6 +61,9 @@ export class StateManager {
const state = this.load();
fn(state);
this.save(state);
for (const listener of this.changeListeners) {
try { listener(state); } catch { /* don't let listener errors break state updates */ }
}
return state;
}
}

View File

@@ -62,7 +62,7 @@ dhcp-match=set:httpboot-arm64,option:client-arch,20
dhcp-userclass=set:ipxe,iPXE
# UEFI HTTP Boot -> serve full iPXE EFI via HTTP (no TFTP size limit)
dhcp-boot=tag:httpboot-x86_64,http://${serverIp}:${httpPort}/ipxe-real.efi
dhcp-boot=tag:httpboot-x86_64,http://${serverIp}:${httpPort}/ipxe.efi
dhcp-boot=tag:httpboot-arm64,http://${serverIp}:${httpPort}/ipxe-arm64.efi
# Echo vendor class back to HTTP Boot clients (required by UEFI HTTP Boot spec)
dhcp-option-force=tag:httpboot-x86_64,60,HTTPClient
@@ -72,15 +72,21 @@ dhcp-option-force=tag:httpboot-arm64,60,HTTPClient
dhcp-boot=tag:bios,tag:!ipxe,undionly.kpxe
dhcp-boot=tag:efi-x86_64,tag:!ipxe,ipxe.efi
dhcp-boot=tag:efi-arm64,tag:!ipxe,ipxe-arm64.efi
# Echo vendor class back to PXE clients (OVMF requires this, real hardware usually doesn't)
dhcp-option-force=tag:efi-x86_64,60,PXEClient
dhcp-option-force=tag:efi-arm64,60,PXEClient
dhcp-option-force=tag:bios,60,PXEClient
# iPXE clients -> chain to boot script via HTTP
dhcp-boot=tag:ipxe,http://${serverIp}:${httpPort}/boot.ipxe
# PXE service directives (needed for proxy DHCP to respond properly)
${dhcpMode === "proxy" ? `# PXE service directives (proxy DHCP needs these to respond on port 4011)
pxe-service=tag:!ipxe,x86PC,"PXE Boot",undionly.kpxe
pxe-service=tag:!ipxe,X86-64_EFI,"PXE Boot",ipxe.efi
pxe-service=tag:!ipxe,BC_EFI,"PXE Boot",ipxe.efi
pxe-service=tag:!ipxe,ARM64_EFI,"PXE Boot",ipxe-arm64.efi
pxe-service=tag:!ipxe,ARM64_EFI,"PXE Boot",ipxe-arm64.efi` : `# Full DHCP mode -- pxe-service directives omitted (they trigger PXE Boot Server
# Discovery protocol which some UEFI implementations don't support). The dhcp-boot
# directives above provide the boot filename directly in the DHCP offer.`}
# Verbose logging
log-dhcp

View File

@@ -2,10 +2,12 @@
// Full Fedora server install with LVM partitioning, %pre for reprovision detection,
// packages, and %post with SSH keys, user creation, k3s prereqs, progress callbacks.
import type { Role } from "@lab/shared";
export interface InstallKickstartParams {
hostname: string;
disk: string;
role: "worker" | "infra";
role: Role;
domain: string;
fedoraVersion: string;
timezone: string;
@@ -36,6 +38,7 @@ export function renderInstallKickstart(params: InstallKickstartParams): string {
const now = new Date().toISOString();
const hasLonghorn = role === "worker";
const hasRancher = role === "infra";
const isVanilla = role === "vanilla";
// -- Auth section --
const auth = sshKeys.length > 0
@@ -97,6 +100,48 @@ done
? `logvol /var/lib/rancher --vgname=${vg} --name=rancher --fstype=xfs --size=20480`
: "";
// Helper: the bastion callback functions used in both %pre and %post.
// Defined as a template so each section gets its own copy (they run in different shells).
const bastionHelpers = `
# Detect MAC address (first real ethernet MAC, skip loopback/veth)
_BASTION_MAC=$(ip link show | awk '/ether/ && !/00:00:00:00/ {print $2; exit}')
_BASTION_URL="http://${serverIp}:${httpPort}"
# Send a structured progress stage to bastion
bastion_progress() {
local stage="$1" detail="\${2:-}"
curl -sf -X POST "\${_BASTION_URL}/api/progress" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$_BASTION_MAC\\",\\"stage\\":\\"$stage\\",\\"detail\\":\\"$detail\\"}" \\
--connect-timeout 5 --max-time 10 2>/dev/null || true
}
# Send log lines to bastion (batched)
bastion_log() {
local line="$1"
curl -sf -X POST "\${_BASTION_URL}/api/log" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$_BASTION_MAC\\",\\"line\\":\\"$(echo "$line" | sed 's/\\\\/\\\\\\\\/g; s/"/\\\\"/g')\\"}\" \\
--connect-timeout 5 --max-time 10 2>/dev/null || true
}
# Send an error stage to bastion with context
bastion_error() {
local detail="$1"
bastion_progress "error" "$detail"
# Also send the last 50 lines of any log file as context
for logfile in /root/bastion-post-install.log /tmp/pre-partition.log; do
if [ -f "$logfile" ]; then
local tail_content
tail_content=$(tail -50 "$logfile" 2>/dev/null | sed 's/\\\\/\\\\\\\\/g; s/"/\\\\"/g; s/$/\\\\n/' | tr -d '\\n')
curl -sf -X POST "\${_BASTION_URL}/api/log" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$_BASTION_MAC\\",\\"lines\\":[\\"--- $logfile (last 50 lines) ---\\"],\\"tail\\":\\"$tail_content\\"}" \\
--connect-timeout 5 --max-time 10 2>/dev/null || true
fi
done
}`;
return `# Lab Bastion -- Fedora ${fedoraVersion} server install
# Generated: ${now}
# Target: ${fqdn} (role=${role})
@@ -123,27 +168,25 @@ url --mirrorlist=https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-$relea
%pre --log=/tmp/pre-partition.log
#!/bin/bash
set -x
${bastionHelpers}
# Progress callback helper
bastion_progress() {
local stage="$1" detail="\${2:-}"
local mac=$(ip link show | awk '/ether/ && !/00:00:00:00/ {print $2; exit}')
curl -sf -X POST "http://${serverIp}:${httpPort}/api/progress" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$mac\\",\\"stage\\":\\"$stage\\",\\"detail\\":\\"$detail\\"}" 2>/dev/null || true
}
# Error trap: report failures back to bastion
trap 'bastion_error "%pre failed at line $LINENO: $(tail -1 /tmp/pre-partition.log 2>/dev/null)"' ERR
bastion_progress "partitioning" "preparing disk layout"
bastion_progress "partitioning" "detecting disk"
VG="${vg}"
${diskLine}
bastion_log "disk detected: $DISK"
REPROVISION=no
# Check if VG exists (reprovision scenario)
if vgs $VG &>/dev/null; then
echo "=== Existing VG found - reprovision mode ==="
REPROVISION=yes
bastion_progress "partitioning" "reprovision mode -- preserving data volumes"
# Detect which data LVs to preserve
PRESERVE_LONGHORN=no; PRESERVE_SRV=no; PRESERVE_HOME=no; PRESERVE_RANCHER=no
@@ -153,11 +196,14 @@ if vgs $VG &>/dev/null; then
lvs $VG/rancher &>/dev/null && PRESERVE_RANCHER=yes
echo "Preserving: longhorn=$PRESERVE_LONGHORN srv=$PRESERVE_SRV home=$PRESERVE_HOME rancher=$PRESERVE_RANCHER"
bastion_log "preserving LVs: longhorn=$PRESERVE_LONGHORN srv=$PRESERVE_SRV home=$PRESERVE_HOME rancher=$PRESERVE_RANCHER"
# Remove only OS logical volumes (keep data LVs)
for lv in root var varlog swap; do
lvremove -f $VG/$lv 2>/dev/null || true
done
else
bastion_progress "partitioning" "fresh install on $DISK"
fi
if [ "$REPROVISION" = "yes" ]; then
@@ -226,7 +272,8 @@ echo "=== Generated partition config ==="
cat /tmp/part.ks
echo "==================================="
bastion_progress "partitioning" "layout ready, starting install"
bastion_progress "partitioning" "disk layout ready"
bastion_log "partition config written to /tmp/part.ks"
%end
@@ -256,7 +303,7 @@ iotop
strace
jq
# k3s prerequisites
${isVanilla ? "# vanilla role -- skipping k3s prerequisites" : `# k3s prerequisites
container-selinux
iptables-nft
nftables
@@ -265,7 +312,7 @@ chrony
tar
socat
conntrack-tools
ethtool
ethtool`}
# Boot management
efibootmgr
@@ -286,31 +333,87 @@ ruby-libs
%post --log=/root/bastion-post-install.log
#!/bin/bash
set -x
${bastionHelpers}
# Progress callback helper
bastion_progress() {
local stage="$1" detail="\${2:-}"
local mac=$(ip link show | awk '/ether/ && !/00:00:00:00/ {print $2; exit}')
curl -sf -X POST "http://${serverIp}:${httpPort}/api/progress" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$mac\\",\\"stage\\":\\"$stage\\",\\"detail\\":\\"$detail\\"}" 2>/dev/null || true
# --- Error trap: catch any failure and report to bastion ---
_post_error_handler() {
local exit_code=$? lineno=$1
bastion_error "%post failed at line $lineno (exit $exit_code)"
}
trap '_post_error_handler $LINENO' ERR
# --- Background log streamer: sends %post output to bastion in real-time ---
_LOG_FILE=/root/bastion-post-install.log
_LOG_STREAMER_PID=""
(
# Wait for the log file to exist
while [ ! -f "$_LOG_FILE" ]; do sleep 1; done
# Tail and batch-send lines every 3 seconds
_batch=""
_count=0
tail -f "$_LOG_FILE" 2>/dev/null | while IFS= read -r _line; do
# Escape for JSON
_escaped=$(echo "$_line" | sed 's/\\\\/\\\\\\\\/g; s/"/\\\\"/g; s/\\t/\\\\t/g')
if [ -z "$_batch" ]; then
_batch="\\"$_escaped\\""
else
_batch="$_batch,\\"$_escaped\\""
fi
_count=$((_count + 1))
# Send batch every 10 lines
if [ "$_count" -ge 10 ]; then
curl -sf -X POST "\${_BASTION_URL}/api/log" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$_BASTION_MAC\\",\\"lines\\":[$_batch]}" \\
--connect-timeout 5 --max-time 10 2>/dev/null || true
_batch=""
_count=0
fi
done
) &
_LOG_STREAMER_PID=$!
# Flush remaining log lines helper
_flush_log_streamer() {
if [ -n "$_LOG_STREAMER_PID" ]; then
kill "$_LOG_STREAMER_PID" 2>/dev/null || true
wait "$_LOG_STREAMER_PID" 2>/dev/null || true
fi
# Send any remaining lines from the log
if [ -f "$_LOG_FILE" ]; then
local remaining
remaining=$(tail -20 "$_LOG_FILE" 2>/dev/null | sed 's/\\\\/\\\\\\\\/g; s/"/\\\\"/g; s/\\t/\\\\t/g; s/^/"/; s/$/"/' | paste -sd, -)
if [ -n "$remaining" ]; then
curl -sf -X POST "\${_BASTION_URL}/api/log" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$_BASTION_MAC\\",\\"lines\\":[$remaining]}" \\
--connect-timeout 5 --max-time 10 2>/dev/null || true
fi
fi
}
bastion_progress "post-install" "configuring system"
bastion_progress "installing" "packages installed, starting post-install"
# -- SSH --
bastion_progress "post-install" "configuring SSH"
systemctl enable --now sshd
sed -i 's/^#\\?PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
sed -i 's/^#\\?PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
${sshPostBlock}
bastion_log "SSH configured: root login by key only, password auth disabled"
# -- Hostname and domain --
bastion_progress "post-install" "setting hostname to ${fqdn}"
hostnamectl set-hostname ${fqdn}
# -- tmpfs for /tmp --
echo "tmpfs /tmp tmpfs defaults,noatime,nosuid,nodev,size=4G 0 0" >> /etc/fstab
# -- Kernel modules for k3s --
${isVanilla ? `# -- vanilla role: skip k3s kernel/sysctl/firewall setup --
bastion_progress "post-install" "vanilla role -- skipping k3s setup"
# -- Enable chronyd for time sync --
systemctl enable chronyd || true` : `# -- Kernel modules for k3s --
bastion_progress "post-install" "loading k3s kernel modules"
cat > /etc/modules-load.d/k3s.conf << 'MODULES'
br_netfilter
overlay
@@ -320,6 +423,7 @@ modprobe br_netfilter || true
modprobe overlay || true
# -- Sysctl for k3s networking --
bastion_progress "post-install" "configuring k3s sysctl"
cat > /etc/sysctl.d/90-k3s.conf << 'SYSCTL'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
@@ -330,29 +434,41 @@ fs.inotify.max_user_watches = 1048576
SYSCTL
sysctl --system || true
# -- Disable firewalld (k3s manages its own iptables rules) --
# -- Disable firewalld permanently (k3s/Cilium manage iptables directly) --
bastion_progress "post-install" "disabling firewalld"
# Must be masked to prevent re-enable on updates
systemctl disable --now firewalld || true
systemctl mask firewalld || true
# -- Enable chronyd for time sync --
systemctl enable --now chronyd
systemctl enable chronyd || true`}
# -- Set boot order: local disk first, PXE after --
bastion_progress "post-install" "configuring EFI boot order"
if command -v efibootmgr >/dev/null 2>&1; then
FEDORA_ENTRY=$(efibootmgr | grep -i fedora | head -1 | grep -oP 'Boot\\K[0-9A-F]+')
if [ -n "$FEDORA_ENTRY" ]; then
CURRENT_ORDER=$(efibootmgr | grep BootOrder | cut -d: -f2 | tr -d ' ')
NEW_ORDER="$FEDORA_ENTRY,$(echo "$CURRENT_ORDER" | sed "s/$FEDORA_ENTRY,\\\\?//;s/,$//")"
efibootmgr -o "$NEW_ORDER" || true
echo "Boot order set: Fedora first ($NEW_ORDER)"
bastion_log "boot order set: Fedora first ($NEW_ORDER)"
else
bastion_log "no Fedora EFI entry found, boot order unchanged"
fi
else
bastion_log "efibootmgr not available, skipping boot order config"
fi
# -- Provisioning metadata --
bastion_progress "post-install" "writing provisioning metadata"
IP_ADDR=$(ip -4 addr show | awk '/inet / && !/127.0.0/ {split($2,a,"/"); print a[1]; exit}')
cat > /etc/lab-provisioned << PROVEOF
hostname: ${fqdn}
role: ${role}
provisioned: $(date -Iseconds)
bastion: ${serverIp}
ip: $IP_ADDR
PROVEOF
cat > /root/README << 'README'
@@ -370,8 +486,13 @@ cat > /root/README << 'README'
README
${hasRancher ? `# Install k3s server (skip start - will be configured manually)
bastion_progress "post-install" "pre-installing k3s server"
curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true sh -
` : ""}IP_ADDR=$(ip -4 addr show | awk '/inet / && !/127.0.0/ {split($2,a,"/"); print a[1]; exit}')
bastion_log "k3s server pre-installed (not started)"
` : ""}
# Stop log streamer and flush remaining lines
_flush_log_streamer
bastion_progress "complete" "ready at $IP_ADDR"
%end

View File

@@ -0,0 +1,299 @@
// Ubuntu autoinstall template (cloud-init).
// Equivalent of the Fedora kickstart: LVM partitioning, packages,
// SSH keys, k3s prereqs, progress callbacks.
export interface UbuntuAutoinstallParams {
hostname: string;
disk: string;
role: string; // "vanilla" | "worker" | "infra"
domain: string;
ubuntuVersion: string;
timezone: string;
locale: string;
serverIp: string;
httpPort: number;
sshKeys: string[];
adminUser: string;
}
export function renderUbuntuAutoinstall(params: UbuntuAutoinstallParams): string {
const {
hostname,
disk,
role,
domain,
timezone,
serverIp,
httpPort,
sshKeys,
adminUser,
} = params;
const fqdn = domain ? `${hostname}.${domain}` : hostname;
const vg = "labvg";
const hasLonghorn = role === "worker";
const hasRancher = role === "infra";
// Determine disk device -- default to biggest NVMe/SCSI/virtio
const diskDevice = disk || "/dev/sda";
// Build the LVM layout to match Fedora kickstart sizes
const extraLvs: string[] = [];
if (hasLonghorn) {
extraLvs.push(` - id: lv-longhorn
name: longhorn
type: lvm_partition
volgroup: vg0
size: -1
- id: fs-longhorn
type: format
volume: lv-longhorn
fstype: xfs
- id: mount-longhorn
type: mount
device: fs-longhorn
path: /var/lib/longhorn`);
}
if (hasRancher) {
extraLvs.push(` - id: lv-rancher
name: rancher
type: lvm_partition
volgroup: vg0
size: 20G
- id: fs-rancher
type: format
volume: lv-rancher
fstype: xfs
- id: mount-rancher
type: mount
device: fs-rancher
path: /var/lib/rancher`);
}
const extraLvsBlock = extraLvs.length > 0 ? "\n" + extraLvs.join("\n") : "";
// SSH keys YAML list
const sshKeysYaml = sshKeys.map((k) => ` - "${k}"`).join("\n");
// late-commands for k3s prereqs, firewall, chrony, admin user, progress callback
const lateCommands: string[] = [
// Kernel modules for k3s
`curtin in-target -- bash -c 'cat > /etc/modules-load.d/k3s.conf << EOF\nbr_netfilter\noverlay\nip_conntrack\nEOF'`,
// Sysctl for k3s networking
`curtin in-target -- bash -c 'cat > /etc/sysctl.d/90-k3s.conf << EOF\nnet.bridge.bridge-nf-call-iptables = 1\nnet.bridge.bridge-nf-call-ip6tables = 1\nnet.ipv4.ip_forward = 1\nnet.ipv6.conf.all.forwarding = 1\nfs.inotify.max_user_instances = 524288\nfs.inotify.max_user_watches = 1048576\nEOF'`,
// Disable ufw firewall
`curtin in-target -- systemctl disable ufw || true`,
// Enable chrony/ntp
`curtin in-target -- systemctl enable chrony || true`,
// tmpfs for /tmp
`curtin in-target -- bash -c 'echo "tmpfs /tmp tmpfs defaults,noatime,nosuid,nodev,size=4G 0 0" >> /etc/fstab'`,
];
// Admin user creation + SSH keys + sudoers
if (adminUser) {
lateCommands.push(
`curtin in-target -- useradd -m -G sudo -s /bin/bash ${adminUser}`,
`curtin in-target -- usermod -L ${adminUser}`,
`curtin in-target -- mkdir -p /home/${adminUser}/.ssh`,
`curtin in-target -- bash -c 'cat > /home/${adminUser}/.ssh/authorized_keys << EOF\n${sshKeys.join("\n")}\nEOF'`,
`curtin in-target -- chmod 700 /home/${adminUser}/.ssh`,
`curtin in-target -- chmod 600 /home/${adminUser}/.ssh/authorized_keys`,
`curtin in-target -- chown -R ${adminUser}:${adminUser} /home/${adminUser}/.ssh`,
`curtin in-target -- bash -c 'echo "${adminUser} ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/${adminUser}'`,
`curtin in-target -- chmod 440 /etc/sudoers.d/${adminUser}`,
);
}
// Provisioning metadata
lateCommands.push(
`curtin in-target -- bash -c 'cat > /etc/lab-provisioned << EOF\nhostname: ${fqdn}\nrole: ${role}\nprovisioned: $(date -Iseconds)\nbastion: ${serverIp}\nEOF'`,
);
// k3s install for infra role
if (hasRancher) {
lateCommands.push(
`curtin in-target -- bash -c 'curl -sfL https://get.k3s.io | INSTALL_K3S_SKIP_START=true sh -'`,
);
}
// Progress callback (complete)
lateCommands.push(
`curtin in-target -- bash -c 'IP_ADDR=$(ip -4 addr show | awk "/inet / && !/127.0.0/ {split(\\$2,a,\\"/\\"); print a[1]; exit}"); curl -sf -X POST "http://${serverIp}:${httpPort}/api/progress" -H "Content-Type: application/json" -d "{\\"mac\\":\\"$(ip link show | awk "/ether/ && !/00:00:00:00/ {print \\$2; exit}")\\",\\"stage\\":\\"complete\\",\\"detail\\":\\"ready at $IP_ADDR\\"}" || true'`,
);
const lateCommandsYaml = lateCommands.map((c) => ` - "${c}"`).join("\n");
return `#cloud-config
autoinstall:
version: 1
locale: ${params.locale}
keyboard:
layout: gb
timezone: ${timezone}
identity:
hostname: ${fqdn}
username: ${adminUser || "root"}
password: "!"
ssh:
install-server: true
allow-pw: false
authorized-keys:
${sshKeysYaml}
storage:
config:
- id: disk0
type: disk
ptable: gpt
path: ${diskDevice}
wipe: superblock-recursive
grub_device: true
- id: part-efi
type: partition
device: disk0
size: 600M
flag: boot
grub_device: true
- id: fs-efi
type: format
volume: part-efi
fstype: fat32
- id: mount-efi
type: mount
device: fs-efi
path: /boot/efi
- id: part-boot
type: partition
device: disk0
size: 3G
- id: fs-boot
type: format
volume: part-boot
fstype: ext4
- id: mount-boot
type: mount
device: fs-boot
path: /boot
- id: part-pv
type: partition
device: disk0
size: -1
- id: vg0
type: lvm_volgroup
name: ${vg}
devices:
- part-pv
- id: lv-swap
name: swap
type: lvm_partition
volgroup: vg0
size: 27G
- id: fs-swap
type: format
volume: lv-swap
fstype: swap
- id: mount-swap
type: mount
device: fs-swap
path: none
- id: lv-root
name: root
type: lvm_partition
volgroup: vg0
size: 33G
- id: fs-root
type: format
volume: lv-root
fstype: xfs
- id: mount-root
type: mount
device: fs-root
path: /
- id: lv-var
name: var
type: lvm_partition
volgroup: vg0
size: 100G
- id: fs-var
type: format
volume: lv-var
fstype: xfs
- id: mount-var
type: mount
device: fs-var
path: /var
- id: lv-varlog
name: varlog
type: lvm_partition
volgroup: vg0
size: 10G
- id: fs-varlog
type: format
volume: lv-varlog
fstype: xfs
- id: mount-varlog
type: mount
device: fs-varlog
path: /var/log
- id: lv-home
name: home
type: lvm_partition
volgroup: vg0
size: 10G
- id: fs-home
type: format
volume: lv-home
fstype: xfs
- id: mount-home
type: mount
device: fs-home
path: /home
- id: lv-srv
name: srv
type: lvm_partition
volgroup: vg0
size: 20G
- id: fs-srv
type: format
volume: lv-srv
fstype: xfs
- id: mount-srv
type: mount
device: fs-srv
path: /srv${extraLvsBlock}
packages:
- openssh-server
- curl
- wget
- git
- jq
- htop
- vim
- tmux
- python3
- lshw
- dmidecode
- net-tools
- iproute2
- iputils-ping
- traceroute
- tcpdump
- iotop
- strace
- tar
- containerd
- socat
- conntrack
- ethtool
- iptables
- chrony
- efibootmgr
late-commands:
${lateCommandsYaml}
`;
}
export function renderUbuntuMetaData(hostname: string): string {
return `instance-id: ${hostname}
local-hostname: ${hostname}
`;
}

View File

@@ -0,0 +1,24 @@
// iPXE boot script template for Ubuntu autoinstall.
export function renderUbuntuInstallIpxe(params: {
mac: string;
hostname: string;
serverIp: string;
httpPort: number;
ubuntuVersion: string;
}): string {
return `#!ipxe
echo
echo =============================================
echo Lab PXE Bastion - INSTALLING Ubuntu ${params.ubuntuVersion}
echo Target: ${params.hostname}
echo MAC: ${params.mac}
echo =============================================
echo
kernel http://${params.serverIp}:${params.httpPort}/ubuntu-vmlinuz autoinstall ds=nocloud-net;seedfrom=http://${params.serverIp}:${params.httpPort}/autoinstall/${params.mac}/ ---
initrd http://${params.serverIp}:${params.httpPort}/ubuntu-initrd
boot
`;
}

View File

@@ -6,6 +6,7 @@ import type { BastionConfig } from "@lab/shared";
import { createApp } from "../src/server.js";
import type { FastifyInstance } from "fastify";
import type { StateManager } from "../src/services/state.js";
import type { InstallLogBuffer } from "../src/services/install-log.js";
function createTestConfig(testDir: string): BastionConfig {
return {
@@ -19,6 +20,8 @@ function createTestConfig(testDir: string): BastionConfig {
dhcpMode: "proxy",
dhcpRangeStart: "",
dhcpRangeEnd: "",
ubuntuVersion: "26.04",
ubuntuMirror: "https://releases.ubuntu.com/26.04",
iface: "eth0",
serverIp: "10.0.0.1",
network: "10.0.0.0",
@@ -38,6 +41,7 @@ describe("dispatch routes", () => {
let testDir: string;
let app: FastifyInstance;
let state: StateManager;
let installLog: InstallLogBuffer;
beforeEach(() => {
testDir = join(tmpdir(), `bastion-dispatch-test-${Date.now()}-${Math.random().toString(36).slice(2)}`);
@@ -49,6 +53,7 @@ describe("dispatch routes", () => {
const result = createApp(config);
app = result.app;
state = result.state;
installLog = result.installLog;
});
afterEach(async () => {
@@ -224,4 +229,100 @@ describe("dispatch routes", () => {
const result = JSON.parse(response.body);
expect(result.error).toBe("machine not found");
});
it("POST /api/log accepts a single line", async () => {
const mac = "aa:bb:cc:dd:ee:ff";
const response = await app.inject({
method: "POST",
url: "/api/log",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ mac, line: "hello from kickstart" }),
});
expect(response.statusCode).toBe(200);
const result = JSON.parse(response.body);
expect(result.status).toBe("ok");
expect(result.lines).toBe(1);
// Verify line is stored
const lines = installLog.getLines(mac);
expect(lines).toHaveLength(1);
expect(lines[0]!.line).toBe("hello from kickstart");
});
it("POST /api/log accepts multiple lines", async () => {
const mac = "aa:bb:cc:dd:ee:ff";
const response = await app.inject({
method: "POST",
url: "/api/log",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ mac, lines: ["line 1", "line 2", "line 3"] }),
});
expect(response.statusCode).toBe(200);
const result = JSON.parse(response.body);
expect(result.lines).toBe(3);
const lines = installLog.getLines(mac);
expect(lines).toHaveLength(3);
});
it("GET /api/logs/:mac includes log lines for installing machine", async () => {
const mac = "aa:bb:cc:dd:ee:ff";
state.update((s) => {
s.install_queue[mac] = {
hostname: "test-node",
disk: "/dev/sda",
role: "worker",
queued_at: new Date().toISOString(),
};
});
// Add some log lines
installLog.append(mac, ["log line 1", "log line 2"], "test-node");
const response = await app.inject({
method: "GET",
url: `/api/logs/${encodeURIComponent(mac)}`,
});
expect(response.statusCode).toBe(200);
const result = JSON.parse(response.body);
expect(result.status).toBe("installing");
expect(result.log_lines).toHaveLength(2);
expect(result.log_total).toBe(2);
expect(result.log_lines[0].line).toBe("log line 1");
});
it("progress endpoint with 'error' stage keeps machine in install_queue", async () => {
const mac = "aa:bb:cc:dd:ee:ff";
state.update((s) => {
s.install_queue[mac] = {
hostname: "failing-node",
disk: "/dev/sda",
role: "worker",
queued_at: new Date().toISOString(),
};
});
const response = await app.inject({
method: "POST",
url: "/api/progress",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
mac,
stage: "error",
detail: "%post failed at line 42",
}),
});
expect(response.statusCode).toBe(200);
// Machine should still be in install_queue (not moved to installed)
const currentState = state.load();
expect(currentState.install_queue[mac]).toBeDefined();
expect(currentState.install_queue[mac]?.progress).toBe("error");
expect(currentState.install_queue[mac]?.progress_detail).toBe("%post failed at line 42");
expect(currentState.installed[mac]).toBeUndefined();
});
});

View File

@@ -90,7 +90,9 @@ describe("renderInstallKickstart", () => {
serverIp: "10.0.0.5",
httpPort: 9090,
}));
expect(ks).toContain("http://10.0.0.5:9090/api/progress");
expect(ks).toContain('_BASTION_URL="http://10.0.0.5:9090"');
expect(ks).toContain("/api/progress");
expect(ks).toContain("/api/log");
});
it("infra role has /var/lib/rancher partition", () => {
@@ -137,4 +139,52 @@ describe("renderInstallKickstart", () => {
// swap = 27648
expect(ks).toContain("--name=swap --fstype=swap --size=27648");
});
it("%pre has error trap", () => {
const ks = renderInstallKickstart(baseParams());
expect(ks).toContain("trap");
expect(ks).toContain("bastion_error");
expect(ks).toContain("%pre failed");
});
it("%post has error trap", () => {
const ks = renderInstallKickstart(baseParams());
expect(ks).toContain("_post_error_handler");
expect(ks).toContain("%post failed");
});
it("has granular progress stages in %post", () => {
const ks = renderInstallKickstart(baseParams());
expect(ks).toContain('"configuring SSH"');
expect(ks).toContain('"setting hostname');
expect(ks).toContain('"configuring EFI boot order"');
expect(ks).toContain('"writing provisioning metadata"');
});
it("has background log streamer in %post", () => {
const ks = renderInstallKickstart(baseParams());
expect(ks).toContain("_LOG_STREAMER_PID");
expect(ks).toContain("_flush_log_streamer");
expect(ks).toContain("tail -f");
});
it("has bastion_log function for sending log lines", () => {
const ks = renderInstallKickstart(baseParams());
expect(ks).toContain("bastion_log()");
expect(ks).toContain("/api/log");
});
it("vanilla role skips k3s progress stages", () => {
const ks = renderInstallKickstart(baseParams({ role: "vanilla" }));
expect(ks).toContain("vanilla role");
expect(ks).not.toContain('"loading k3s kernel modules"');
expect(ks).not.toContain('"disabling firewalld"');
});
it("worker role has k3s-related progress stages", () => {
const ks = renderInstallKickstart(baseParams({ role: "worker" }));
expect(ks).toContain('"loading k3s kernel modules"');
expect(ks).toContain('"configuring k3s sysctl"');
expect(ks).toContain('"disabling firewalld"');
});
});

View File

@@ -7,6 +7,7 @@
},
"include": ["src/**/*.ts"],
"references": [
{ "path": "../shared" }
{ "path": "../shared" },
{ "path": "../modules" }
]
}

View File

@@ -17,10 +17,13 @@
},
"dependencies": {
"@lab/bastion": "workspace:*",
"@lab/modules": "workspace:*",
"@lab/shared": "workspace:*",
"commander": "^13.0.0"
"commander": "^13.0.0",
"ws": "^8.19.0"
},
"devDependencies": {
"@types/node": "^22.10.0"
"@types/node": "^22.10.0",
"@types/ws": "^8.18.1"
}
}

View File

@@ -0,0 +1,161 @@
// Typed API client for communicating with labd.
import https from "node:https";
import { readFileSync } from "node:fs";
import { LabdApiError } from "./errors.js";
import type {
Server,
ServerFilters,
JoinToken,
CreateTokenOpts,
EnrollmentRequest,
EnrollmentResponse,
HealthStatus,
RequestOpts,
} from "./types.js";
export interface LabdClientConfig {
baseUrl: string;
certPath?: string;
keyPath?: string;
caPath?: string;
timeoutMs?: number;
}
export class LabdClient {
private config: LabdClientConfig;
private agent: https.Agent | undefined;
private sessionId: string | undefined;
constructor(config: LabdClientConfig) {
this.config = config;
if (config.certPath && config.keyPath) {
this.agent = new https.Agent({
cert: readFileSync(config.certPath),
key: readFileSync(config.keyPath),
ca: config.caPath ? readFileSync(config.caPath) : undefined,
rejectUnauthorized: true,
});
}
}
setSessionId(id: string): void {
this.sessionId = id;
}
// --- Server endpoints ---
async getServers(filters?: ServerFilters): Promise<Server[]> {
return this.request("GET", "/api/servers", { query: filters as Record<string, string | undefined> });
}
async getServer(id: string): Promise<Server> {
return this.request("GET", `/api/servers/${encodeURIComponent(id)}`);
}
// --- Token endpoints ---
async createJoinToken(opts: CreateTokenOpts): Promise<JoinToken> {
return this.request("POST", "/api/tokens", { body: opts });
}
async listTokens(): Promise<JoinToken[]> {
return this.request("GET", "/api/tokens");
}
async revokeToken(id: string): Promise<{ status: string; id: string }> {
return this.request("DELETE", `/api/tokens/${encodeURIComponent(id)}`);
}
// --- Auth endpoints ---
async enroll(req: EnrollmentRequest): Promise<EnrollmentResponse> {
return this.request("POST", "/api/auth/enroll", { body: req });
}
// --- Bastion endpoints ---
async getBastions(): Promise<Array<{
id: string; hostname: string; network: string; serverIp: string;
status: string; machineCount: number; lastHeartbeat?: string; connectedAt?: string;
}>> {
return this.request("GET", "/api/bastions");
}
// --- Machine endpoints (aggregated through labd from bastions) ---
async getMachines(): Promise<import("@lab/shared").BastionState> {
return this.request("GET", "/api/machines");
}
async installMachine(opts: {
mac: string; hostname: string; disk?: string; role?: string; os?: string;
}): Promise<{ status: string; data?: unknown; error?: string }> {
return this.request("POST", "/api/machines/install", { body: opts });
}
async forgetMachine(mac: string): Promise<{ status: string }> {
return this.request("DELETE", `/api/machines/${encodeURIComponent(mac)}`);
}
async updateRole(mac: string, role: string): Promise<{ status: string }> {
return this.request("POST", "/api/machines/role", { body: { mac, role } });
}
async getMachineLogs(mac: string): Promise<Record<string, unknown>> {
return this.request("GET", `/api/machines/${encodeURIComponent(mac)}/logs`);
}
// --- Health endpoints ---
async getHealth(): Promise<HealthStatus> {
return this.request("GET", "/healthz");
}
// --- Internal ---
private async request<T>(method: string, path: string, opts?: RequestOpts): Promise<T> {
const url = new URL(path, this.config.baseUrl);
if (opts?.query) {
for (const [k, v] of Object.entries(opts.query)) {
if (v !== undefined) url.searchParams.set(k, String(v));
}
}
const headers: Record<string, string> = {
"Content-Type": "application/json",
};
if (this.sessionId) {
headers["X-Session-ID"] = this.sessionId;
}
const timeoutMs = this.config.timeoutMs ?? 30_000;
try {
const resp = await fetch(url.toString(), {
method,
headers,
body: opts?.body ? JSON.stringify(opts.body) : undefined,
signal: AbortSignal.timeout(timeoutMs),
// @ts-expect-error -- Node fetch supports dispatcher/agent
agent: this.agent,
});
if (!resp.ok) {
const body = await resp.json().catch(() => ({ error: resp.statusText }));
throw LabdApiError.fromResponse(resp.status, body);
}
return (await resp.json()) as T;
} catch (err) {
if (err instanceof LabdApiError) throw err;
if (err instanceof TypeError && (err.message.includes("fetch") || err.message.includes("ECONNREFUSED"))) {
throw LabdApiError.notConnected(this.config.baseUrl);
}
if (err instanceof DOMException && err.name === "TimeoutError") {
throw LabdApiError.timeout(timeoutMs);
}
throw err;
}
}
}

View File

@@ -0,0 +1,47 @@
// CLI configuration loading for labd client.
// Bridges the CLI config module into LabdClient configuration.
import { loadConfig, CONFIG_DIR, CONFIG_FILE, CERT_DIR } from "../config/index.js";
import { LabdClient, type LabdClientConfig } from "./client.js";
export { CONFIG_DIR, CONFIG_FILE, CERT_DIR };
export function loadClientConfig(
overrides?: Partial<LabdClientConfig>,
): LabdClientConfig {
const cliConfig = loadConfig();
let config: LabdClientConfig = {
baseUrl: cliConfig.labdUrl,
...(cliConfig.certPath ? { certPath: cliConfig.certPath } : {}),
...(cliConfig.keyPath ? { keyPath: cliConfig.keyPath } : {}),
...(cliConfig.caPath ? { caPath: cliConfig.caPath } : {}),
};
// Environment variable overrides (cert paths)
if (process.env["LABCTL_CERT_PATH"]) config.certPath = process.env["LABCTL_CERT_PATH"];
if (process.env["LABCTL_KEY_PATH"]) config.keyPath = process.env["LABCTL_KEY_PATH"];
if (process.env["LABCTL_CA_PATH"]) config.caPath = process.env["LABCTL_CA_PATH"];
if (overrides) {
config = { ...config, ...overrides };
}
return config;
}
export function createLabdClient(
overrides?: Partial<LabdClientConfig>,
): LabdClient {
const config = loadClientConfig(overrides);
return new LabdClient(config);
}
let _singleton: LabdClient | undefined;
export function getLabdClient(): LabdClient {
if (!_singleton) {
_singleton = createLabdClient();
}
return _singleton;
}

View File

@@ -0,0 +1,59 @@
// Structured API error class for labd communication.
export class LabdApiError extends Error {
readonly statusCode: number;
readonly errorCode: string;
readonly detail: string | undefined;
constructor(statusCode: number, message: string, detail?: string) {
super(message);
this.name = "LabdApiError";
this.statusCode = statusCode;
this.errorCode = statusCodeToErrorCode(statusCode);
this.detail = detail;
}
static fromResponse(statusCode: number, body: unknown): LabdApiError {
if (typeof body === "object" && body !== null) {
const b = body as Record<string, unknown>;
const message = typeof b["error"] === "string" ? b["error"] : `HTTP ${statusCode}`;
const detail = typeof b["detail"] === "string" ? b["detail"] : undefined;
return new LabdApiError(statusCode, message, detail);
}
return new LabdApiError(statusCode, `HTTP ${statusCode}`);
}
static notConnected(url: string): LabdApiError {
return new LabdApiError(
0,
`Cannot connect to labd at ${url}`,
"Check that labd is running and the URL is correct.",
);
}
static timeout(timeoutMs: number): LabdApiError {
return new LabdApiError(
0,
`Request timed out after ${timeoutMs}ms`,
"The server may be overloaded. Try again later.",
);
}
}
export function isLabdApiError(err: unknown): err is LabdApiError {
return err instanceof LabdApiError;
}
function statusCodeToErrorCode(code: number): string {
switch (code) {
case 400: return "BAD_REQUEST";
case 401: return "UNAUTHORIZED";
case 403: return "FORBIDDEN";
case 404: return "NOT_FOUND";
case 409: return "CONFLICT";
case 429: return "RATE_LIMITED";
case 500: return "INTERNAL_ERROR";
case 503: return "UNAVAILABLE";
default: return code === 0 ? "CONNECTION_ERROR" : "UNKNOWN";
}
}

View File

@@ -0,0 +1,18 @@
// Public API for labd client.
export { LabdClient, type LabdClientConfig } from "./client.js";
export { LabdApiError, isLabdApiError } from "./errors.js";
export { loadClientConfig, createLabdClient, getLabdClient, CONFIG_DIR, CONFIG_FILE, CERT_DIR } from "./config.js";
export type {
Server,
ServerFilters,
Agent,
JoinToken,
CreateTokenOpts,
EnrollmentRequest,
EnrollmentResponse,
HealthStatus,
ApiErrorBody,
RequestOpts,
} from "./types.js";
export { createLabdWebSocket, streamExec, streamLogs, type StreamOptions } from "./websocket.js";

View File

@@ -0,0 +1,96 @@
// Typed interfaces for labd API requests and responses.
// Matches Prisma schema models and labd route contracts.
// --- Server ---
export interface Server {
id: string;
hostname: string;
mac: string | null;
cloud: string;
environment: string;
role: string;
labels: Record<string, string>;
ip: string | null;
agentVersion: string | null;
status: string;
lastHeartbeat: string | null;
createdAt: string;
updatedAt: string;
agent?: Agent | null;
}
export interface Agent {
id: string;
serverId: string;
certificatePem: string | null;
enrolledAt: string;
lastSeen: string | null;
}
export interface ServerFilters {
cloud?: string;
environment?: string;
status?: string;
}
// --- Join Tokens ---
export interface JoinToken {
id: string;
token?: string; // Only present on creation
type: string;
label: string | null;
usedBy: string | null;
usedAt: string | null;
revokedAt: string | null;
createdAt: string;
expiresAt: string | null;
}
export interface CreateTokenOpts {
type?: "one-time" | "reusable";
label?: string;
expiresInHours?: number;
}
// --- Auth / Enrollment ---
export interface EnrollmentRequest {
token: string;
hostname: string;
csr?: string;
}
export interface EnrollmentResponse {
status: string;
hostname: string;
message: string;
certificatePem: string | null;
}
// --- Health ---
export interface HealthStatus {
status: "healthy" | "degraded";
uptime: number;
timestamp: string;
checks: {
database: "ok" | "error";
};
}
// --- API Error ---
export interface ApiErrorBody {
error: string;
detail?: string;
code?: string;
}
// --- Request helpers ---
export interface RequestOpts {
query?: Record<string, string | number | boolean | undefined>;
body?: unknown;
}

View File

@@ -0,0 +1,160 @@
// WebSocket client for real-time streaming operations (exec, logs).
import { WebSocket } from "ws";
import { loadConfig } from "../config/index.js";
import { readFileSync } from "node:fs";
import { LabdApiError } from "./errors.js";
export interface StreamOptions {
onData: (data: string) => void;
onError: (error: Error) => void;
onClose: () => void;
}
export async function createLabdWebSocket(path: string): Promise<WebSocket> {
const config = loadConfig();
const baseUrl = config.labdUrl.replace("https:", "wss:").replace("http:", "ws:");
const url = new URL(path, baseUrl);
const wsOptions: WebSocket.ClientOptions = {};
if (config.certPath && config.keyPath) {
wsOptions.cert = readFileSync(config.certPath);
wsOptions.key = readFileSync(config.keyPath);
if (config.caPath) wsOptions.ca = readFileSync(config.caPath);
}
return new Promise((resolve, reject) => {
const timeout = setTimeout(() => {
ws.terminate();
reject(LabdApiError.timeout(10_000));
}, 10_000);
const ws = new WebSocket(url.toString(), wsOptions);
ws.on("open", () => {
clearTimeout(timeout);
resolve(ws);
});
ws.on("error", (err: Error) => {
clearTimeout(timeout);
reject(
LabdApiError.notConnected(config.labdUrl + " — " + err.message),
);
});
});
}
export async function streamExec(
serverName: string,
command: string[],
options: StreamOptions & { tty?: boolean; timeout?: number },
): Promise<number> {
const ws = await createLabdWebSocket("/ws/exec");
const requestId = crypto.randomUUID();
return new Promise<number>((resolve, reject) => {
ws.on("message", (raw: Buffer) => {
try {
const msg = JSON.parse(raw.toString()) as {
type: string;
data?: string;
exitCode?: number;
message?: string;
};
switch (msg.type) {
case "exec-stdout":
case "exec-stderr":
if (msg.data) options.onData(msg.data);
break;
case "exec-exit":
ws.close();
resolve(msg.exitCode ?? 1);
break;
case "error":
ws.close();
reject(new Error(msg.message ?? "Remote execution error"));
break;
}
} catch (err) {
options.onError(err instanceof Error ? err : new Error(String(err)));
}
});
ws.on("close", () => {
options.onClose();
});
ws.on("error", (err: Error) => {
options.onError(err);
});
ws.send(
JSON.stringify({
type: "exec",
requestId,
server: serverName,
command,
tty: options.tty ?? false,
timeout: options.timeout ?? 30_000,
}),
);
});
}
export async function streamLogs(
serverName: string,
logOptions: {
follow?: boolean;
lines?: number;
unit?: string;
since?: string;
priority?: string;
kernel?: boolean;
},
options: StreamOptions,
): Promise<void> {
const ws = await createLabdWebSocket("/ws/logs");
const requestId = crypto.randomUUID();
ws.on("message", (raw: Buffer) => {
try {
const msg = JSON.parse(raw.toString()) as {
type: string;
line?: string;
message?: string;
};
switch (msg.type) {
case "log-line":
if (msg.line) options.onData(msg.line);
break;
case "log-end":
ws.close();
break;
case "error":
ws.close();
options.onError(new Error(msg.message ?? "Log streaming error"));
break;
}
} catch (err) {
options.onError(err instanceof Error ? err : new Error(String(err)));
}
});
ws.on("close", () => {
options.onClose();
});
ws.on("error", (err) => {
options.onError(err);
});
ws.send(
JSON.stringify({
type: "log-subscribe",
requestId,
server: serverName,
options: logOptions,
}),
);
}

View File

@@ -0,0 +1,403 @@
// CLI command: labctl app k3s install/health <target>
// Install or check k3s on a target machine via SSH.
import { existsSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
import type { Command } from "commander";
import type { BastionState } from "@lab/shared";
import { K3sModule, sshExec } from "@lab/modules";
import { getLabdClient } from "../api/config.js";
function resolveTarget(
target: string,
state: BastionState | null,
): { ip: string; hostname: string; role: string } | null {
// Direct IP
if (/^\d+\.\d+\.\d+\.\d+$/.test(target)) {
return { ip: target, hostname: target, role: "infra" };
}
if (!state) return null;
// Check by MAC
const mac = target.toLowerCase().replace(/-/g, ":");
const installed = state.installed[mac];
if (installed?.ip) {
return { ip: installed.ip, hostname: installed.hostname, role: installed.role };
}
// Check by hostname
for (const [, info] of Object.entries(state.installed)) {
if (info.hostname === target || info.hostname.startsWith(target + ".")) {
return { ip: info.ip, hostname: info.hostname, role: info.role };
}
}
return null;
}
function findSshKey(): string | undefined {
const sudoUser = process.env["SUDO_USER"];
const realHome = sudoUser ? join("/home", sudoUser) : homedir();
for (const name of ["id_ed25519", "id_ecdsa", "id_rsa"]) {
const keyPath = join(realHome, ".ssh", name);
if (existsSync(keyPath)) return keyPath;
}
return undefined;
}
async function fetchState(): Promise<BastionState | null> {
try {
return await getLabdClient().getMachines();
} catch {
return null;
}
}
import { registerLabcontrollerCommands } from "./labcontroller.js";
export function registerAppCommand(program: Command): void {
const appCmd = program.command("app").description("Application management");
// labcontroller subcommands
registerLabcontrollerCommands(appCmd);
const k3sCmd = appCmd.command("k3s").description("k3s cluster management");
k3sCmd
.command("install <target>")
.description("Install k3s on a target machine (hostname, IP, or MAC)")
.option("--role <role>", "k3s role: infra (server) or worker (agent)", "infra")
.option("--user <user>", "SSH user", "michal")
.option("--k3s-server <url>", "k3s server URL (required for worker role)")
.option("--k3s-token <token>", "k3s join token (required for worker role)")
.action(async (target: string, opts: {
role: string;
user: string;
k3sServer?: string;
k3sToken?: string;
}) => {
const state = await fetchState();
const resolved = resolveTarget(target, state);
if (!resolved) {
console.error(`Cannot resolve target: ${target}`);
console.error("Provide an IP address, hostname, or MAC of an installed machine.");
process.exit(1);
}
const role = opts.role === "worker" ? "worker" : "infra";
const sshKey = findSshKey();
console.log(`Installing k3s on ${resolved.hostname} (${resolved.ip}) as ${role}...`);
console.log("");
const k3s = new K3sModule();
const moduleCtx = {
hostname: resolved.hostname,
ip: resolved.ip,
role,
os: "fedora-43" as const,
arch: "x86_64" as const,
sshUser: opts.user,
...(sshKey ? { sshKeyPath: sshKey } : {}),
config: {
...(opts.k3sServer ? { k3sServerUrl: opts.k3sServer } : {}),
...(opts.k3sToken ? { k3sToken: opts.k3sToken } : {}),
},
};
const installResult = await k3s.install(moduleCtx);
for (const line of installResult.output) {
console.log(` ${line}`);
}
if (!installResult.success) {
console.error(`\nk3s install failed: ${installResult.errors.join(", ")}`);
process.exit(1);
}
console.log("\nRunning post-install configuration...\n");
const configResult = await k3s.configure(moduleCtx);
for (const line of configResult.output) {
console.log(` ${line}`);
}
if (!configResult.success) {
console.error(`\nk3s configure failed: ${configResult.errors.join(", ")}`);
process.exit(1);
}
console.log("\nk3s installed successfully.");
// Check if the machine's role requires additional app deployments
try {
const { ROLE_REGISTRY } = await import("@lab/shared");
const freshState = await fetchState();
if (freshState) {
for (const [, info] of Object.entries(freshState.installed)) {
if (info.ip === resolved.ip || info.hostname === resolved.hostname) {
const roleInfo = ROLE_REGISTRY.find((r: { name: string }) => r.name === info.role);
if (roleInfo && roleInfo.apps.length > 0) {
console.log(`\nRole ${info.role} requires: ${roleInfo.apps.join(", ")}`);
console.log(`Deploying automatically...`);
const { execFileSync } = await import("node:child_process");
try {
execFileSync("node", [
process.argv[1] ?? "",
"app", "labcontroller", "deploy", resolved.hostname,
"--user", opts.user,
], { stdio: "inherit" });
} catch {
console.error(`\nAuto-deploy failed. Run manually: labctl app labcontroller deploy ${resolved.hostname}`);
}
}
break;
}
}
}
} catch { /* best-effort chain */ }
console.log(`\nTo get kubeconfig: ssh ${opts.user}@${resolved.ip} sudo cat /etc/rancher/k3s/k3s.yaml`);
});
k3sCmd
.command("health [target]")
.description("Check k3s health (all hosts if no target given)")
.option("--user <user>", "SSH user", "michal")
.action(async (target: string | undefined, opts: { user: string }) => {
const sshKey = findSshKey();
if (!target) {
let state: BastionState;
try {
state = await getLabdClient().getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const entries = Object.entries(state.installed);
if (entries.length === 0) {
console.log("No installed machines.");
return;
}
const BOLD = "\x1b[1m";
const GREEN = "\x1b[32m";
const RED = "\x1b[31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
const pad = (s: string, w: number) => s.padEnd(w);
console.log(
`${BOLD}${pad("HOST", 22)}${pad("IP", 16)}${pad("ROLE", 8)}${pad("K3S", 14)}${pad("NODE", 10)}${pad("ENCRYPT", 10)}${pad("CNI", 14)}${pad("PODS", 6)}${RESET}`,
);
interface HealthRow {
host: string; ip: string; role: string;
k3s: string; node: string; encrypt: string; cni: string; pods: string;
k3sC: string; nodeC: string; encC: string; cniC: string;
}
const probes = entries.map(async ([_mac, info]): Promise<HealthRow> => {
const r: HealthRow = {
host: info.hostname, ip: info.ip, role: info.role,
k3s: "—", node: "—", encrypt: "—", cni: "—", pods: "—",
k3sC: DIM, nodeC: DIM, encC: DIM, cniC: DIM,
};
if (!info.ip || info.role === "vanilla") {
r.k3s = info.role === "vanilla" ? "n/a" : "no ip";
return r;
}
try {
const svc = await sshExec(info.ip, opts.user, "systemctl is-active k3s 2>/dev/null || systemctl is-active k3s-agent 2>/dev/null", {
...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000,
});
if (svc.stdout.trim() !== "active") {
r.k3s = svc.stdout.trim() === "inactive" ? "stopped" : "not installed";
r.k3sC = svc.stdout.trim() === "inactive" ? RED : DIM;
return r;
}
r.k3s = "running"; r.k3sC = GREEN;
const [nodeRes, encRes, cniRes, podRes] = await Promise.all([
sshExec(info.ip, opts.user,
"sudo k3s kubectl get nodes -o jsonpath='{.items[0].status.conditions[?(@.type==\"Ready\")].status}' 2>/dev/null",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
sshExec(info.ip, opts.user,
"sudo k3s secrets-encrypt status 2>/dev/null | head -1",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
sshExec(info.ip, opts.user,
"sudo k3s kubectl get pods -n kube-system -l k8s-app=cilium --no-headers 2>/dev/null | head -1",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
sshExec(info.ip, opts.user,
"sudo k3s kubectl get pods -A --no-headers 2>/dev/null | wc -l",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
]);
r.node = nodeRes.stdout.includes("True") ? "Ready" : "NotReady";
r.nodeC = nodeRes.stdout.includes("True") ? GREEN : RED;
r.encrypt = encRes.stdout.includes("Enabled") ? "yes" : "no";
r.encC = encRes.stdout.includes("Enabled") ? GREEN : RED;
r.cni = cniRes.stdout.includes("Running") ? "cilium" : "flannel";
r.cniC = cniRes.stdout.includes("Running") ? GREEN : DIM;
r.pods = podRes.stdout.trim() || "?";
} catch {
r.k3s = "unreachable"; r.k3sC = RED;
}
return r;
});
const results = await Promise.all(probes);
for (const r of results) {
console.log(
`${pad(r.host, 22)}${pad(r.ip, 16)}${pad(r.role, 8)}${r.k3sC}${pad(r.k3s, 14)}${RESET}${r.nodeC}${pad(r.node, 10)}${RESET}${r.encC}${pad(r.encrypt, 10)}${RESET}${r.cniC}${pad(r.cni, 14)}${RESET}${pad(r.pods, 6)}`,
);
}
return;
}
// Single target: detailed health check
const state = await fetchState();
const resolved = resolveTarget(target, state);
if (!resolved) {
console.error(`Cannot resolve target: ${target}`);
process.exit(1);
}
console.log(`Checking k3s health on ${resolved.hostname} (${resolved.ip})...\n`);
const k3s = new K3sModule();
const healthResult = await k3s.health({
hostname: resolved.hostname,
ip: resolved.ip,
role: resolved.role,
os: "fedora-43" as const,
arch: "x86_64" as const,
sshUser: opts.user,
...(sshKey ? { sshKeyPath: sshKey } : {}),
config: {},
});
for (const line of healthResult.output) {
console.log(` ${line}`);
}
if (healthResult.errors.length > 0) {
for (const err of healthResult.errors) {
console.error(` ERROR: ${err}`);
}
}
process.exit(healthResult.success ? 0 : 1);
});
k3sCmd
.command("list")
.description("List installed machines and their k3s status")
.option("--user <user>", "SSH user", "michal")
.action(async (opts: { user: string }) => {
let state: BastionState;
try {
state = await getLabdClient().getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const entries = Object.entries(state.installed);
if (entries.length === 0) {
console.log("No installed machines.");
return;
}
const sshKey = findSshKey();
const BOLD = "\x1b[1m";
const GREEN = "\x1b[32m";
const RED = "\x1b[31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
const hdr = (s: string, w: number) => s.padEnd(w);
console.log(
`${BOLD}${hdr("HOSTNAME", 28)}${hdr("IP", 18)}${hdr("ROLE", 10)}${hdr("K3S", 16)}${hdr("NODE", 12)}${hdr("PODS", 6)}${RESET}`,
);
const probes = entries.map(async ([_mac, info]) => {
const row = {
hostname: info.hostname,
ip: info.ip,
role: info.role,
k3s: "—",
node: "—",
pods: "—",
k3sColor: DIM,
nodeColor: DIM,
};
if (!info.ip || info.role === "vanilla") {
row.k3s = info.role === "vanilla" ? "n/a" : "no ip";
return row;
}
try {
const svcResult = await sshExec(info.ip, opts.user, "systemctl is-active k3s 2>/dev/null || systemctl is-active k3s-agent 2>/dev/null", {
...(sshKey ? { keyPath: sshKey } : {}),
timeoutMs: 8_000,
});
const svcStatus = svcResult.stdout.trim();
if (svcStatus === "active") {
row.k3s = "running";
row.k3sColor = GREEN;
const nodeResult = await sshExec(info.ip, opts.user,
"sudo k3s kubectl get nodes -o jsonpath='{.items[0].status.conditions[?(@.type==\"Ready\")].status}' 2>/dev/null || echo unknown",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 },
);
const nodeReady = nodeResult.stdout.trim();
if (nodeReady.includes("True")) {
row.node = "Ready";
row.nodeColor = GREEN;
} else {
row.node = "NotReady";
row.nodeColor = RED;
}
const podResult = await sshExec(info.ip, opts.user,
"sudo k3s kubectl get pods -A --no-headers 2>/dev/null | wc -l",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 },
);
row.pods = podResult.stdout.trim() || "?";
} else if (svcStatus === "inactive" || svcStatus === "dead") {
row.k3s = "stopped";
row.k3sColor = RED;
} else {
row.k3s = "not installed";
row.k3sColor = DIM;
}
} catch {
row.k3s = "unreachable";
row.k3sColor = RED;
}
return row;
});
const results = await Promise.all(probes);
for (const r of results) {
console.log(
`${hdr(r.hostname, 28)}${hdr(r.ip, 18)}${hdr(r.role, 10)}${r.k3sColor}${hdr(r.k3s, 16)}${RESET}${r.nodeColor}${hdr(r.node, 12)}${RESET}${hdr(r.pods, 6)}`,
);
}
});
}

View File

@@ -0,0 +1,76 @@
// labctl config — view and modify CLI configuration.
import type { Command } from "commander";
import {
loadConfig,
saveConfig,
getConfigValue,
setConfigValue,
isValidConfigKey,
CONFIG_FILE,
} from "../config/index.js";
export function registerConfigCommand(parent: Command): void {
const configCmd = parent
.command("config")
.description("View and modify CLI configuration");
// config list
configCmd
.command("list")
.description("Show all configuration values")
.action(() => {
const config = loadConfig();
console.log(`# Configuration (${CONFIG_FILE})\n`);
for (const [k, v] of Object.entries(config)) {
if (v !== undefined) {
console.log(`${k}: ${v}`);
}
}
});
// config get <key>
configCmd
.command("get <key>")
.description("Get a configuration value")
.action((key: string) => {
if (!isValidConfigKey(key)) {
console.error(`Unknown config key: ${key}`);
console.error(`Valid keys: labdUrl, certPath, keyPath, caPath, defaultEnvironment, defaultCloud, outputFormat`);
process.exit(1);
}
const config = loadConfig();
const value = getConfigValue(config, key);
if (value) {
console.log(value);
}
});
// config set <key> <value>
configCmd
.command("set <key> <value>")
.description("Set a configuration value")
.action((key: string, value: string) => {
if (!isValidConfigKey(key)) {
console.error(`Unknown config key: ${key}`);
console.error(`Valid keys: labdUrl, certPath, keyPath, caPath, defaultEnvironment, defaultCloud, outputFormat`);
process.exit(1);
}
if (key === "outputFormat" && !["table", "json", "yaml"].includes(value)) {
console.error(`Invalid output format: ${value}. Must be table, json, or yaml.`);
process.exit(1);
}
let config = loadConfig();
config = setConfigValue(config, key, value);
saveConfig(config);
console.log(`Set ${key} = ${value}`);
});
// config path
configCmd
.command("path")
.description("Show configuration file path")
.action(() => {
console.log(CONFIG_FILE);
});
}

View File

@@ -0,0 +1,126 @@
// labctl doctor — diagnose configuration and connectivity issues.
import { existsSync, readFileSync } from "node:fs";
import { X509Certificate } from "node:crypto";
import type { Command } from "commander";
import { loadConfig, CONFIG_FILE, CERT_DIR } from "../config/index.js";
interface DiagnosticResult {
name: string;
status: "ok" | "warn" | "error";
message: string;
}
const GREEN = "\x1b[32m";
const YELLOW = "\x1b[33m";
const RED = "\x1b[31m";
const RESET = "\x1b[0m";
export function registerDoctorCommand(program: Command): void {
program
.command("doctor")
.description("Diagnose configuration and connectivity issues")
.option("--json", "Output results as JSON")
.action(async (opts: { json?: boolean }) => {
const results: DiagnosticResult[] = [];
const config = loadConfig();
// Check config file
results.push({
name: "Configuration file",
status: existsSync(CONFIG_FILE) ? "ok" : "warn",
message: existsSync(CONFIG_FILE) ? CONFIG_FILE : "Using defaults — run 'labctl config set labdUrl <url>'",
});
// Check labd URL
results.push({
name: "labd URL",
status: config.labdUrl ? "ok" : "error",
message: config.labdUrl || "Not configured",
});
// Check client certificate
if (config.certPath && existsSync(config.certPath)) {
try {
const certPem = readFileSync(config.certPath, "utf-8");
const cert = new X509Certificate(certPem);
const expiresIn = new Date(cert.validTo).getTime() - Date.now();
const daysLeft = Math.floor(expiresIn / (1000 * 60 * 60 * 24));
results.push({
name: "Client certificate",
status: daysLeft > 7 ? "ok" : daysLeft > 0 ? "warn" : "error",
message: daysLeft > 0 ? `Valid for ${daysLeft} days` : "Expired!",
});
} catch {
results.push({
name: "Client certificate",
status: "error",
message: "Failed to parse certificate",
});
}
} else {
results.push({
name: "Client certificate",
status: "warn",
message: `Not configured — run 'labctl login'`,
});
}
// Check cert directory
results.push({
name: "Certificate directory",
status: existsSync(CERT_DIR) ? "ok" : "warn",
message: existsSync(CERT_DIR) ? CERT_DIR : "Not created yet",
});
// Test labd connectivity
try {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 5000);
const resp = await fetch(`${config.labdUrl}/healthz`, {
signal: controller.signal,
});
clearTimeout(timeout);
const body = (await resp.json()) as { status?: string };
results.push({
name: "labd connectivity",
status: resp.ok ? "ok" : "warn",
message: resp.ok
? `Connected — ${body.status ?? "ok"}`
: `HTTP ${resp.status}: ${body.status ?? "unknown"}`,
});
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
results.push({
name: "labd connectivity",
status: "error",
message: msg.includes("abort")
? "Connection timed out (5s)"
: msg.includes("ECONNREFUSED")
? "Connection refused"
: msg,
});
}
// Output
if (opts.json) {
console.log(JSON.stringify(results, null, 2));
} else {
console.log("Running diagnostics...\n");
for (const r of results) {
const icon = r.status === "ok" ? "\u2713" : r.status === "warn" ? "!" : "\u2717";
const color = r.status === "ok" ? GREEN : r.status === "warn" ? YELLOW : RED;
console.log(`${color}${icon}${RESET} ${r.name}: ${r.message}`);
}
const errors = results.filter((r) => r.status === "error").length;
const warns = results.filter((r) => r.status === "warn").length;
const oks = results.filter((r) => r.status === "ok").length;
console.log(`\n${oks} passed, ${warns} warnings, ${errors} errors`);
if (errors > 0) process.exitCode = 1;
}
});
}

View File

@@ -1,35 +1,21 @@
// CLI command: provision forget
// Remove a machine from all bastion state.
// Remove a machine from all bastion state via labd.
import type { Command } from "commander";
import { getLabdClient } from "../api/config.js";
export function registerForgetCommand(parent: Command): void {
parent
.command("forget <mac>")
.description("Remove a machine from bastion state")
.option("--port <port>", "Bastion HTTP port", "8080")
.action(async (mac: string, opts: { port: string }) => {
const port = parseInt(opts.port, 10);
.action(async (mac: string) => {
const normalizedMac = mac.toLowerCase().replace(/-/g, ":");
try {
const response = await fetch(
`http://localhost:${port}/api/machines/${encodeURIComponent(normalizedMac)}`,
{ method: "DELETE" },
);
const result = await response.json() as Record<string, unknown>;
if (!response.ok) {
console.error(
`Error: ${result["error"] ?? `HTTP ${response.status}`}`,
);
process.exit(1);
}
const result = await getLabdClient().forgetMachine(normalizedMac);
console.log(JSON.stringify(result, null, 2));
} catch {
console.error(`Cannot reach bastion at localhost:${port}. Is it running?`);
} catch (err) {
console.error(`Failed: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
});

View File

@@ -1,43 +1,68 @@
// CLI command: provision install
// Queue a discovered machine for Fedora installation.
// Queue a discovered machine for OS installation via labd.
import type { Command } from "commander";
import { Command, Option } from "commander";
import { isValidOsId, SUPPORTED_OS, SUPPORTED_ROLES, ROLE_REGISTRY } from "@lab/shared";
import { getLabdClient } from "../api/config.js";
function roleTable(): string {
const lines: string[] = ["", "Available roles:"];
for (const r of ROLE_REGISTRY) {
const parent = r.parent ? ` (extends ${r.parent})` : "";
const apps = r.apps.length > 0 ? ` [auto: ${r.apps.join(", ")}]` : "";
lines.push(` ${r.name.padEnd(16)} ${r.description}${parent}${apps}`);
}
return lines.join("\n");
}
export function registerInstallCommand(parent: Command): void {
parent
.command("install <mac> <hostname>")
.description("Queue a discovered machine for Fedora installation")
.option("--role <role>", "Machine role: worker or infra", "worker")
.description("Queue a discovered machine for OS installation")
.showHelpAfterError(true)
.addHelpText("after", roleTable())
.addOption(new Option("--role <role>", "Machine role (see below)").choices([...SUPPORTED_ROLES]).default("worker"))
.addOption(new Option("--os <os>", "Operating system").choices([...SUPPORTED_OS]).default("fedora-43"))
.option("--disk <device>", "Target disk device (auto-detect if omitted)")
.option("--port <port>", "Bastion HTTP port", "8080")
.action(async (mac: string, hostname: string, opts: {
role: string;
os: string;
disk?: string;
port: string;
}) => {
const port = parseInt(opts.port, 10);
const payload: Record<string, string> = {
mac,
hostname,
role: opts.role,
};
if (opts.disk !== undefined) {
payload["disk"] = opts.disk;
if (!isValidOsId(opts.os)) {
console.error(`Unknown OS: ${opts.os}. Supported: ${SUPPORTED_OS.join(", ")}`);
process.exit(1);
}
if (!(SUPPORTED_ROLES as readonly string[]).includes(opts.role)) {
console.error(`Unknown role: ${opts.role}`);
console.error(roleTable());
process.exit(1);
}
try {
const response = await fetch(`http://localhost:${port}/api/install`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(payload),
const result = await getLabdClient().installMachine({
mac,
hostname,
role: opts.role,
os: opts.os,
...(opts.disk ? { disk: opts.disk } : {}),
});
const result = await response.json() as Record<string, unknown>;
console.log(JSON.stringify(result, null, 2));
console.log("");
console.log("Power on the machine to start Fedora installation.");
} catch {
console.error(`Cannot reach bastion at localhost:${port}. Is it running?`);
const osLabel = opts.os.startsWith("ubuntu") ? "Ubuntu" : "Fedora";
console.log(`Power on the machine to start ${osLabel} installation.`);
const roleInfo = ROLE_REGISTRY.find(r => r.name === opts.role);
if (roleInfo?.k3s) {
console.log(`After install completes, k3s will be installed automatically (role=${opts.role}).`);
if (roleInfo.apps.length > 0) {
console.log(`Then: ${roleInfo.apps.join(", ")} will be deployed.`);
}
console.log(`To install k3s manually later: labctl app k3s install ${hostname}`);
}
} catch (err) {
console.error(`Failed: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
});

View File

@@ -0,0 +1,298 @@
// CLI command: labctl app labcontroller deploy/status
// Deploy bastion + labd + CockroachDB to a k3s labcontroller node.
import { existsSync, writeFileSync, mkdirSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
import type { Command } from "commander";
import type { BastionState } from "@lab/shared";
import { sshExec } from "@lab/modules";
import { getLabdClient } from "../api/config.js";
function findSshKey(): string | undefined {
const sudoUser = process.env["SUDO_USER"];
const realHome = sudoUser ? join("/home", sudoUser) : homedir();
for (const name of ["id_ed25519", "id_ecdsa", "id_rsa"]) {
const p = join(realHome, ".ssh", name);
if (existsSync(p)) return p;
}
return undefined;
}
async function resolveIp(target: string): Promise<string> {
if (/^\d+\.\d+\.\d+\.\d+$/.test(target)) return target;
try {
const state = await getLabdClient().getMachines();
for (const [, info] of Object.entries(state.installed)) {
if (info.hostname === target || info.hostname.startsWith(target + ".")) {
return info.ip;
}
}
} catch { /* use target as-is */ }
return target;
}
export function registerLabcontrollerCommands(appCmd: Command): void {
const lcCmd = appCmd.command("labcontroller").description("Labcontroller deployment (bastion + labd + CockroachDB)");
lcCmd
.command("deploy <target>")
.description("Deploy labcontroller stack to a k3s node")
.option("--user <user>", "SSH user", "michal")
.option("--crdb-replicas <n>", "CockroachDB replicas", "1")
.action(async (target: string, opts: {
user: string;
crdbReplicas: string;
}) => {
const ip = await resolveIp(target);
const sshKey = findSshKey();
const sshOpts = sshKey ? { keyPath: sshKey } : {};
console.log(`Deploying labcontroller stack to ${target} (${ip})...\n`);
// 1. Fetch kubeconfig from target
console.log("[1/4] Fetching kubeconfig...");
const kcResult = await sshExec(ip, opts.user, "sudo cat /etc/rancher/k3s/k3s.yaml", { ...sshOpts, timeoutMs: 10_000 });
if (kcResult.exitCode !== 0) {
console.error(" Failed to fetch kubeconfig. Is k3s running?");
process.exit(1);
}
const kubeconfigDir = join(homedir(), ".kube");
mkdirSync(kubeconfigDir, { recursive: true });
const contextName = `lab-${target}`;
const kubeconfig = kcResult.stdout
.replace(/server:\s*https:\/\/127\.0\.0\.1:6443/, `server: https://${ip}:6443`)
.replace(/name:\s*default/g, `name: ${contextName}`)
.replace(/cluster:\s*default/g, `cluster: ${contextName}`)
.replace(/user:\s*default/g, `user: ${contextName}`);
const tmpPath = join(kubeconfigDir, `.lab-${target}-tmp`);
writeFileSync(tmpPath, kubeconfig, { mode: 0o600 });
const mainConfig = join(kubeconfigDir, "config");
const { spawnSync } = await import("node:child_process");
const mergeResult = spawnSync("kubectl", ["config", "view", "--flatten"], {
encoding: "utf-8",
stdio: ["pipe", "pipe", "pipe"],
env: { ...process.env, KUBECONFIG: `${mainConfig}:${tmpPath}` },
});
if (mergeResult.status === 0 && mergeResult.stdout) {
writeFileSync(mainConfig, mergeResult.stdout, { mode: 0o600 });
spawnSync("kubectl", ["config", "use-context", contextName], {
stdio: "pipe",
env: { ...process.env, KUBECONFIG: mainConfig },
});
console.log(` Merged into ~/.kube/config as context "${contextName}"`);
console.log(` Active context set to "${contextName}"`);
} else {
writeFileSync(join(kubeconfigDir, `lab-${target}`), kubeconfig, { mode: 0o600 });
console.log(` Saved to ~/.kube/lab-${target} (merge failed, use KUBECONFIG=~/.kube/lab-${target})`);
}
try { const { unlinkSync } = await import("node:fs"); unlinkSync(tmpPath); } catch { /* ignore */ }
console.log("");
// 2. Apply CockroachDB manifests
console.log("[2/4] Deploying CockroachDB...");
const { cockroachDbManifests } = await import("@lab/modules/dist/modules/labcontroller/src/cockroachdb.js");
const crdb = cockroachDbManifests({ replicas: parseInt(opts.crdbReplicas, 10) });
const manifests = [crdb.namespace, crdb.headlessService, crdb.clientService, crdb.statefulSet];
for (const manifest of manifests) {
const json = JSON.stringify(manifest);
const kind = (manifest as { kind?: string }).kind ?? "?";
const name = ((manifest as { metadata?: { name?: string } }).metadata)?.name ?? "?";
const result = await sshExec(ip, opts.user,
`echo '${json.replace(/'/g, "'\\''")}' | sudo k3s kubectl apply -f -`,
{ ...sshOpts, timeoutMs: 15_000 },
);
if (result.exitCode === 0) {
console.log(` applied ${kind}/${name}`);
} else {
console.error(` FAILED ${kind}/${name}: ${result.stderr.trim()}`);
}
}
console.log(" Waiting for CockroachDB pod...");
const waitResult = await sshExec(ip, opts.user,
"sudo k3s kubectl wait --for=condition=Ready pod -l app=cockroachdb -n lab-system --timeout=120s 2>/dev/null || echo 'still starting'",
{ ...sshOpts, timeoutMs: 130_000 },
);
console.log(` ${waitResult.stdout.trim()}`);
console.log(" Initializing CockroachDB cluster...");
const initJson = JSON.stringify(crdb.initJob);
await sshExec(ip, opts.user,
`echo '${initJson.replace(/'/g, "'\\''")}' | sudo k3s kubectl apply -f - 2>/dev/null; sudo k3s kubectl wait --for=condition=Complete job/cockroachdb-init -n lab-system --timeout=60s 2>/dev/null || echo 'init may already be done'`,
{ ...sshOpts, timeoutMs: 70_000 },
);
await sshExec(ip, opts.user,
"sudo k3s kubectl exec cockroachdb-0 -n lab-system -- /cockroach/cockroach sql --insecure -e 'CREATE DATABASE IF NOT EXISTS lab' 2>/dev/null || echo 'db may already exist'",
{ ...sshOpts, timeoutMs: 15_000 },
);
console.log(" CockroachDB ready\n");
// 3. Deploy labd
console.log("[3/4] Deploying labd...");
const { labdManifests } = await import("@lab/modules/dist/modules/labcontroller/src/labd.js");
const labd = labdManifests({ databaseUrl: crdb.connectionString });
for (const manifest of [labd.service, labd.deployment]) {
const json = JSON.stringify(manifest);
const kind = (manifest as { kind?: string }).kind ?? "?";
const name = ((manifest as { metadata?: { name?: string } }).metadata)?.name ?? "?";
const result = await sshExec(ip, opts.user,
`echo '${json.replace(/'/g, "'\\''")}' | sudo k3s kubectl apply -f -`,
{ ...sshOpts, timeoutMs: 15_000 },
);
console.log(` ${result.exitCode === 0 ? "applied" : "FAILED"} ${kind}/${name}`);
}
console.log("");
// 4. Deploy bastion
console.log("[4/4] Deploying bastion (hostNetwork)...");
const { bastionManifests } = await import("@lab/modules/dist/modules/labcontroller/src/bastion.js");
const bastion = bastionManifests();
const bJson = JSON.stringify(bastion.daemonSet);
const bResult = await sshExec(ip, opts.user,
`echo '${bJson.replace(/'/g, "'\\''")}' | sudo k3s kubectl apply -f -`,
{ ...sshOpts, timeoutMs: 15_000 },
);
console.log(` ${bResult.exitCode === 0 ? "applied" : "FAILED"} DaemonSet/bastion`);
// 5. Promote host role to labcontroller via labd
console.log("Promoting host role to labcontroller...");
try {
const state = await getLabdClient().getMachines();
for (const [mac, info] of Object.entries(state.installed)) {
if (info.ip === ip || info.hostname === target) {
await getLabdClient().updateRole(mac, "labcontroller");
console.log(` ${info.hostname}: infra -> labcontroller`);
break;
}
}
} catch {
console.log(" Could not update role (labd may not be running yet)");
}
console.log("\n=== Labcontroller deployed ===");
console.log(` CockroachDB: cockroachdb-client.lab-system:26257`);
console.log(` labd: ${ip}:30100`);
console.log(` bastion: ${ip}:8080 (hostNetwork)`);
console.log(` context: lab-${target}`);
console.log(`\n Switch context: kubectl ctx lab-${target}`);
console.log(` View pods: kubectl get pods -n lab-system`);
});
lcCmd
.command("status [target]")
.description("Check labcontroller deployment status (all hosts if no target)")
.option("--user <user>", "SSH user", "michal")
.action(async (target: string | undefined, opts: { user: string }) => {
const sshKey = findSshKey();
const sshOpts = sshKey ? { keyPath: sshKey } : {};
if (!target) {
let state: BastionState;
try {
state = await getLabdClient().getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const entries = Object.entries(state.installed);
if (entries.length === 0) {
console.log("No installed machines.");
return;
}
const BOLD = "\x1b[1m";
const GREEN = "\x1b[32m";
const RED = "\x1b[31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
const pad = (s: string, w: number) => s.padEnd(w);
console.log(
`${BOLD}${pad("HOST", 22)}${pad("IP", 16)}${pad("ROLE", 14)}${pad("CRDB", 12)}${pad("LABD", 12)}${pad("BASTION", 12)}${pad("NS", 8)}${RESET}`,
);
interface StatusRow {
host: string; ip: string; role: string;
crdb: string; labd: string; bastion: string; ns: string;
crdbC: string; labdC: string; bastionC: string;
}
const probes = entries.map(async ([_mac, info]): Promise<StatusRow> => {
const r: StatusRow = {
host: info.hostname, ip: info.ip, role: info.role ?? "?",
crdb: "—", labd: "—", bastion: "—", ns: "—",
crdbC: DIM, labdC: DIM, bastionC: DIM,
};
if (!info.ip) return r;
try {
const result = await sshExec(info.ip, opts.user,
"sudo k3s kubectl get pods -n lab-system --no-headers -o custom-columns='NAME:.metadata.name,STATUS:.status.phase' 2>/dev/null || echo 'NO_NS'",
{ ...sshOpts, timeoutMs: 10_000 },
);
if (result.stdout.includes("NO_NS") || result.exitCode !== 0) {
r.ns = "none";
return r;
}
r.ns = "ok";
const lines = result.stdout.trim().split("\n").filter(Boolean);
for (const line of lines) {
const [name, status] = line.trim().split(/\s+/);
if (!name) continue;
const running = status === "Running" || status === "Succeeded";
const color = running ? GREEN : RED;
const label = running ? "running" : (status ?? "?").toLowerCase();
if (name.startsWith("cockroachdb-") && !name.includes("init")) {
r.crdb = label; r.crdbC = color;
} else if (name.startsWith("labd-")) {
r.labd = label; r.labdC = color;
} else if (name.startsWith("bastion-")) {
r.bastion = label; r.bastionC = color;
}
}
} catch {
r.crdb = "ssh err"; r.crdbC = RED;
}
return r;
});
const results = await Promise.all(probes);
for (const r of results) {
console.log(
`${pad(r.host, 22)}${pad(r.ip, 16)}${pad(r.role, 14)}${r.crdbC}${pad(r.crdb, 12)}${RESET}${r.labdC}${pad(r.labd, 12)}${RESET}${r.bastionC}${pad(r.bastion, 12)}${RESET}${pad(r.ns, 8)}`,
);
}
return;
}
// Specific target: show detailed pod list
const ip = await resolveIp(target);
console.log(`Labcontroller status on ${target} (${ip}):\n`);
const result = await sshExec(ip, opts.user,
"sudo k3s kubectl get pods -n lab-system -o wide 2>/dev/null || echo 'lab-system namespace not found'",
{ ...sshOpts, timeoutMs: 10_000 },
);
console.log(result.stdout);
});
}

View File

@@ -3,6 +3,7 @@
import type { Command } from "commander";
import type { BastionState } from "@lab/shared";
import { getLabdClient } from "../api/config.js";
const BOLD = "\x1b[1m";
const GREEN = "\x1b[0;32m";
@@ -24,16 +25,12 @@ export function registerListCommand(parent: Command): void {
parent
.command("list")
.description("List all known machines")
.option("--port <port>", "Bastion HTTP port", "8080")
.action(async (opts: { port: string }) => {
const port = parseInt(opts.port, 10);
.action(async () => {
let state: BastionState;
try {
const response = await fetch(`http://localhost:${port}/api/machines`);
state = (await response.json()) as BastionState;
} catch {
console.error(`Cannot reach bastion at localhost:${port}. Is it running?`);
state = await getLabdClient().getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}

View File

@@ -0,0 +1,120 @@
// labctl login — authenticate with labd and obtain client certificate.
import { generateKeyPairSync } from "node:crypto";
import { writeFileSync, existsSync, mkdirSync, readFileSync } from "node:fs";
import { createInterface } from "node:readline";
import type { Command } from "commander";
import { loadConfig, saveConfig, CERT_DIR } from "../config/index.js";
import { join } from "node:path";
export function registerLoginCommand(program: Command): void {
program
.command("login")
.description("Authenticate with labd and obtain client certificate")
.option("--server <url>", "labd server URL")
.action(async (options: { server?: string }) => {
if (!existsSync(CERT_DIR)) {
mkdirSync(CERT_DIR, { recursive: true, mode: 0o700 });
}
const config = loadConfig();
const serverUrl = options.server ?? config.labdUrl;
const keyPath = join(CERT_DIR, "client.key");
const certPath = join(CERT_DIR, "client.crt");
const caPath = join(CERT_DIR, "ca.crt");
// 1. Generate keypair if not exists
if (!existsSync(keyPath)) {
console.log("Generating client keypair...");
const { privateKey } = generateKeyPairSync("ec", {
namedCurve: "P-256",
privateKeyEncoding: { type: "pkcs8", format: "pem" },
publicKeyEncoding: { type: "spki", format: "pem" },
});
writeFileSync(keyPath, privateKey, { mode: 0o600 });
console.log(`Private key saved to ${keyPath}`);
} else {
console.log(`Using existing keypair at ${keyPath}`);
}
// 2. Read public key for CSR (simplified — send public key, labd signs)
const publicKey = readFileSync(keyPath, "utf-8");
// 3. Prompt for token
const token = await promptPassword("Enter join token: ");
if (!token) {
console.error("Token is required.");
process.exit(1);
}
// 4. Submit enrollment request
console.log(`Authenticating with ${serverUrl}...`);
try {
const resp = await fetch(`${serverUrl}/api/auth/user-enroll`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
token,
hostname: `cli-${process.env["USER"] ?? "unknown"}`,
csr: publicKey,
}),
});
if (!resp.ok) {
const body = (await resp.json().catch(() => ({}))) as Record<string, string>;
console.error(`Login failed: ${body["error"] ?? resp.statusText}`);
process.exit(1);
}
const result = (await resp.json()) as {
certificatePem?: string | null;
caPem?: string | null;
status: string;
};
if (result.certificatePem) {
writeFileSync(certPath, result.certificatePem, { mode: 0o600 });
console.log(`Client certificate saved to ${certPath}`);
}
if (result.caPem) {
writeFileSync(caPath, result.caPem, { mode: 0o644 });
console.log(`CA certificate saved to ${caPath}`);
}
// 5. Update config
saveConfig({
...config,
labdUrl: serverUrl,
certPath,
keyPath,
...(existsSync(caPath) ? { caPath } : {}),
});
console.log(`\nLogin successful! Configuration updated.`);
console.log(`Server: ${serverUrl}`);
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
if (message.includes("ECONNREFUSED") || message.includes("fetch")) {
console.error(`Cannot connect to labd at ${serverUrl}`);
console.error("Check that labd is running and the URL is correct.");
} else {
console.error(`Login failed: ${message}`);
}
process.exit(1);
}
});
}
function promptPassword(message: string): Promise<string> {
return new Promise((resolve) => {
const rl = createInterface({
input: process.stdin,
output: process.stdout,
});
rl.question(message, (answer) => {
rl.close();
resolve(answer.trim());
});
});
}

View File

@@ -0,0 +1,85 @@
// CLI command: provision logs
// Show provisioning logs for a machine via labd.
import type { Command } from "commander";
import { getLabdClient } from "../api/config.js";
/** Resolve a target (hostname, MAC, IP) to a MAC address. */
async function resolveToMac(target: string): Promise<string> {
const normalized = target.toLowerCase().replace(/-/g, ":");
// Looks like a MAC already
if (/^([0-9a-f]{2}:){5}[0-9a-f]{2}$/.test(normalized)) {
return normalized;
}
// Resolve from labd aggregated state
try {
const state = await getLabdClient().getMachines();
for (const [mac, info] of Object.entries(state.installed)) {
if (info.hostname === target || info.hostname.startsWith(target + ".") || info.ip === target) {
return mac;
}
}
for (const [mac, info] of Object.entries(state.install_queue)) {
if (info.hostname === target || info.hostname.startsWith(target + ".")) {
return mac;
}
}
for (const mac of Object.keys(state.discovered)) {
if (mac === normalized) return mac;
}
} catch { /* can't reach labd */ }
return normalized;
}
export function registerLogsCommand(parent: Command): void {
parent
.command("logs <target>")
.description("Show provisioning logs for a machine (hostname, MAC, or IP)")
.action(async (target: string) => {
const mac = await resolveToMac(target);
try {
const data = await getLabdClient().getMachineLogs(mac);
const BOLD = "\x1b[1m";
const GREEN = "\x1b[32m";
const YELLOW = "\x1b[33m";
const RED = "\x1b[31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
console.log(`${BOLD}${data["hostname"]}${RESET} (${mac})`);
console.log(` Status: ${data["status"] === "installed" ? GREEN : YELLOW}${data["status"]}${RESET}`);
console.log(` Role: ${data["role"]}`);
if (data["os"]) console.log(` OS: ${data["os"]}`);
if (data["ip"]) console.log(` IP: ${data["ip"]}`);
console.log("");
const log = data["log"] as Array<{ stage: string; detail: string; timestamp: string }> | undefined;
if (log && log.length > 0) {
console.log(`${BOLD} Log:${RESET}`);
for (const entry of log) {
const time = entry.timestamp.slice(11, 19);
const color = entry.stage === "complete" ? GREEN : entry.stage === "error" ? RED : YELLOW;
const detail = entry.detail ? ` ${DIM}-- ${entry.detail}${RESET}` : "";
console.log(` ${DIM}${time}${RESET} ${color}${entry.stage}${RESET}${detail}`);
}
} else {
console.log(` ${DIM}No progress events yet (queued, waiting for PXE boot)${RESET}`);
}
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
if (msg.includes("404") || msg.includes("not found")) {
console.error(`Machine not found: ${target}`);
console.error("Run 'labctl provision list' to see available machines.");
} else {
console.error(`Cannot reach labd: ${msg}`);
}
process.exit(1);
}
});
}

View File

@@ -0,0 +1,114 @@
// CLI command: provision makeiso
// Generate/serve a UEFI-bootable iPXE ISO for machines that don't support PXE boot.
// Queries labd for connected bastions and provides the download URL.
import { readFileSync, writeFileSync, existsSync } from "node:fs";
import { createInterface } from "node:readline";
import { Command, Option } from "commander";
import { getLabdClient } from "../api/config.js";
import { buildBootIso } from "@lab/bastion/iso-builder";
function prompt(question: string): Promise<string> {
const rl = createInterface({ input: process.stdin, output: process.stdout });
return new Promise((resolve) => {
rl.question(question, (answer) => {
rl.close();
resolve(answer.trim());
});
});
}
const IPXE_PATHS: Record<string, { src: string; dest: string }> = {
x86_64: { src: "/usr/share/ipxe/ipxe-snponly-x86_64.efi", dest: "EFI/BOOT/BOOTX64.EFI" },
aarch64: { src: "/usr/share/ipxe/arm64-efi/snponly.efi", dest: "EFI/BOOT/BOOTAA64.EFI" },
};
async function selectBastion(): Promise<{ hostname: string; serverIp: string; httpPort: number }> {
const bastions = await getLabdClient().getBastions();
const online = bastions.filter(b => b.status === "online");
if (online.length === 0) {
console.error("No bastions online. Start a bastion first.");
process.exit(1);
}
if (online.length === 1) {
const b = online[0]!;
console.log(`Using bastion: ${b.hostname} (${b.serverIp})`);
return { hostname: b.hostname, serverIp: b.serverIp, httpPort: 8080 };
}
console.log("Available bastions:\n");
for (let i = 0; i < online.length; i++) {
const b = online[i]!;
console.log(` ${i + 1}) ${b.hostname} ${b.serverIp} (${b.network})`);
}
console.log("");
const answer = await prompt(`Select bastion [1-${online.length}]: `);
const idx = parseInt(answer, 10) - 1;
if (isNaN(idx) || idx < 0 || idx >= online.length) {
console.error("Invalid selection.");
process.exit(1);
}
const selected = online[idx]!;
return { hostname: selected.hostname, serverIp: selected.serverIp, httpPort: 8080 };
}
export function registerMakeIsoCommand(parent: Command): void {
parent
.command("makeiso")
.description("Generate a UEFI-bootable iPXE ISO for network provisioning")
.addOption(
new Option("--arch <arch...>", "Target architecture(s)")
.choices(["x86_64", "aarch64"])
.default(["x86_64", "aarch64"]),
)
.option("--local", "Build ISO locally instead of using bastion-hosted URL")
.option("--out <path>", "Output path for local ISO build", "ipxe-bastion.iso")
.action(async (opts: { arch: string[]; local?: boolean; out: string }) => {
const bastion = await selectBastion();
const bastionUrl = `http://${bastion.serverIp}:${bastion.httpPort}`;
if (opts.local) {
console.log(`\nGenerating iPXE boot ISO...`);
console.log(` Architectures: ${opts.arch.join(", ")}`);
console.log(` Bastion: ${bastionUrl}`);
const efiFiles: Array<{ path: string; data: Buffer }> = [];
for (const arch of opts.arch) {
const paths = IPXE_PATHS[arch];
if (!paths) {
console.error(`Unknown architecture: ${arch}`);
process.exit(1);
}
if (!existsSync(paths.src)) {
console.error(`iPXE binary not found: ${paths.src}`);
console.error(`Install: sudo dnf install ipxe-bootimgs-${arch === "aarch64" ? "aarch64" : "x86"}`);
process.exit(1);
}
efiFiles.push({ path: paths.dest, data: readFileSync(paths.src) });
console.log(` ${arch}: ${paths.dest.split("/").pop()}`);
}
const script = [
"#!ipxe",
"",
"echo Booting from iPXE ISO -- connecting to bastion...",
"dhcp || ( echo DHCP failed, retrying... && sleep 3 && dhcp )",
`chain ${bastionUrl}/boot.ipxe || shell`,
].join("\n");
const iso = buildBootIso(efiFiles, script);
writeFileSync(opts.out, iso);
console.log(`\nISO written to: ${opts.out} (${(iso.length / 1024 / 1024).toFixed(1)}MB)`);
} else {
console.log(`\nThe bastion serves a boot ISO with the correct URL embedded.`);
console.log(`Use this URL in JetKVM or any BMC virtual media:\n`);
console.log(` ${bastionUrl}/boot.iso`);
}
console.log(`\nMount as virtual CD, boot from it. iPXE will chainload from bastion.`);
});
}

View File

@@ -1,100 +1,161 @@
// CLI command: provision reprovision
// Queue a machine for reinstall and attempt SSH reboot into PXE.
// Queue a machine for reinstall and attempt SSH reboot into PXE via labd.
import { execFileSync } from "node:child_process";
import { existsSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
import type { Command } from "commander";
import { Command, Option } from "commander";
import type { BastionState } from "@lab/shared";
import { isValidOsId, SUPPORTED_OS, SUPPORTED_ROLES, ROLE_REGISTRY } from "@lab/shared";
import { getLabdClient } from "../api/config.js";
function roleTable(): string {
const lines: string[] = ["", "Available roles:"];
for (const r of ROLE_REGISTRY) {
const parent = r.parent ? ` (extends ${r.parent})` : "";
const apps = r.apps.length > 0 ? ` [auto: ${r.apps.join(", ")}]` : "";
lines.push(` ${r.name.padEnd(16)} ${r.description}${parent}${apps}`);
}
return lines.join("\n");
}
/** Resolve a target (hostname, MAC, or IP) to {mac, hostname, ip} from state. */
function resolveTarget(
target: string,
state: BastionState,
): { mac: string; hostname: string; ip: string } | null {
const normalized = target.toLowerCase().replace(/-/g, ":");
if (state.installed[normalized]) {
const info = state.installed[normalized];
return { mac: normalized, hostname: info.hostname, ip: info.ip };
}
if (state.discovered[normalized]) {
return { mac: normalized, hostname: normalized, ip: "" };
}
for (const [mac, info] of Object.entries(state.installed)) {
if (info.hostname === target || info.hostname.startsWith(target + ".")) {
return { mac, hostname: info.hostname, ip: info.ip };
}
}
for (const [mac, info] of Object.entries(state.installed)) {
if (info.ip === target) {
return { mac, hostname: info.hostname, ip: info.ip };
}
}
return null;
}
export function registerReprovisionCommand(parent: Command): void {
parent
.command("reprovision <mac> <hostname>")
.description("Queue install + SSH reboot into PXE for reprovision")
.option("--role <role>", "Machine role: worker or infra", "worker")
.command("reprovision <target> [hostname]")
.description("Queue install + SSH reboot into PXE (target: hostname, MAC, or IP)")
.showHelpAfterError(true)
.addHelpText("after", roleTable())
.addOption(new Option("--role <role>", "Machine role (see below)").choices([...SUPPORTED_ROLES]).default("worker"))
.addOption(new Option("--os <os>", "Operating system").choices([...SUPPORTED_OS]).default("fedora-43"))
.option("--disk <device>", "Target disk device (auto-detect if omitted)")
.option("--port <port>", "Bastion HTTP port", "8080")
.action(async (mac: string, hostname: string, opts: {
.action(async (target: string, hostnameOverride: string | undefined, opts: {
role: string;
os: string;
disk?: string;
port: string;
}) => {
const port = parseInt(opts.port, 10);
// Queue the install
const payload: Record<string, string> = {
mac,
hostname,
role: opts.role,
};
if (opts.disk !== undefined) {
payload["disk"] = opts.disk;
if (!isValidOsId(opts.os)) {
console.error(`Unknown OS: ${opts.os}. Supported: ${SUPPORTED_OS.join(", ")}`);
process.exit(1);
}
let state: BastionState;
try {
const installResponse = await fetch(`http://localhost:${port}/api/install`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(payload),
});
const result = await installResponse.json() as Record<string, unknown>;
console.log(JSON.stringify(result, null, 2));
} catch {
console.error(`Cannot reach bastion at localhost:${port}. Is it running?`);
if (!(SUPPORTED_ROLES as readonly string[]).includes(opts.role)) {
console.error(`Unknown role: ${opts.role}`);
console.error(roleTable());
process.exit(1);
}
// Try to find IP from installed state and SSH in to trigger PXE reboot
const client = getLabdClient();
// Resolve target from labd aggregated state
let state: BastionState;
try {
const machinesResponse = await fetch(`http://localhost:${port}/api/machines`);
state = (await machinesResponse.json()) as BastionState;
} catch {
console.log("");
console.log("Could not fetch machine state. Reboot the machine manually into PXE.");
state = await client.getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const resolved = resolveTarget(target, state);
if (!resolved) {
console.error(`Cannot find machine: ${target}`);
console.error("Provide a hostname, MAC, or IP of a known machine.");
console.error("Run 'labctl provision list' to see available machines.");
process.exit(1);
}
const mac = resolved.mac;
const hostname = hostnameOverride ?? resolved.hostname;
const ip = resolved.ip;
console.log(`Reprovisioning ${hostname} (${mac})${ip ? ` at ${ip}` : ""}...`);
console.log(` Role: ${opts.role} OS: ${opts.os}`);
console.log("");
// Queue the install via labd
try {
const result = await client.installMachine({
mac,
hostname,
role: opts.role,
os: opts.os,
...(opts.disk ? { disk: opts.disk } : {}),
});
console.log(JSON.stringify(result, null, 2));
} catch (err) {
console.error(`Failed to queue install: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
// Try SSH reboot into PXE
if (ip === "") {
console.log("\nNo IP known. Reboot the machine manually into PXE.");
return;
}
const installedEntry = state.installed[mac.toLowerCase().replace(/-/g, ":")];
const ip = installedEntry?.ip ?? "";
const adminUser = process.env["SUDO_USER"] ?? process.env["USER"] ?? "";
const effectiveUser = adminUser === "root" ? "" : adminUser;
if (ip !== "" && effectiveUser !== "") {
console.log("");
console.log(`Attempting SSH reboot into PXE (${effectiveUser}@${ip})...`);
// Find SSH key
const sudoUser = process.env["SUDO_USER"];
const realHome = sudoUser !== undefined
? join("/home", sudoUser)
: homedir();
const keyPaths = [
join(realHome, ".ssh", "id_ed25519"),
join(realHome, ".ssh", "id_rsa"),
join(realHome, ".ssh", "id_ecdsa"),
];
const sshKey = keyPaths.find(k => existsSync(k));
const sshArgs = [
"-o", "StrictHostKeyChecking=no",
"-o", "ConnectTimeout=10",
...(sshKey !== undefined ? ["-i", sshKey] : []),
`${effectiveUser}@${ip}`,
'PXE_ENTRY=$(sudo efibootmgr | grep -iE "pxe|network|ipv4" | head -1 | grep -oP "Boot\\K[0-9A-F]+"); if [ -n "$PXE_ENTRY" ]; then sudo efibootmgr --bootnext "$PXE_ENTRY" && echo "PXE set as next boot" && sudo reboot; else echo "No PXE boot entry found, rebooting anyway..." && sudo reboot; fi',
];
try {
execFileSync("ssh", sshArgs, { stdio: "inherit" });
} catch {
// SSH connection closing during reboot is expected
}
console.log("");
console.log("Machine is rebooting into PXE. Install will start automatically.");
} else {
console.log("");
console.log("No IP known for this machine. Reboot it manually into PXE.");
if (effectiveUser === "") {
console.log("\nReboot the machine manually into PXE.");
return;
}
console.log(`\nAttempting SSH reboot into PXE (${effectiveUser}@${ip})...`);
const sudoUser = process.env["SUDO_USER"];
const realHome = sudoUser !== undefined ? join("/home", sudoUser) : homedir();
const keyPaths = [
join(realHome, ".ssh", "id_ed25519"),
join(realHome, ".ssh", "id_rsa"),
join(realHome, ".ssh", "id_ecdsa"),
];
const sshKey = keyPaths.find(k => existsSync(k));
const sshArgs = [
"-o", "StrictHostKeyChecking=no",
"-o", "ConnectTimeout=10",
...(sshKey !== undefined ? ["-i", sshKey] : []),
`${effectiveUser}@${ip}`,
'PXE_ENTRY=$(sudo efibootmgr | grep -iE "pxe|network|ipv4" | head -1 | grep -oP "Boot\\K[0-9A-F]+"); if [ -n "$PXE_ENTRY" ]; then sudo efibootmgr --bootnext "$PXE_ENTRY" && echo "PXE set as next boot" && sudo reboot; else echo "No PXE boot entry found, rebooting anyway..." && sudo reboot; fi',
];
try {
execFileSync("ssh", sshArgs, { stdio: "inherit" });
} catch {
// SSH connection closing during reboot is expected
}
console.log("");
console.log("Machine is rebooting into PXE. Install will start automatically.");
});
}

View File

@@ -2,7 +2,7 @@
// Start the bastion server (HTTP + dnsmasq), daemonized by default.
import { spawn, type ChildProcess } from "node:child_process";
import { existsSync, readFileSync } from "node:fs";
import { existsSync, readFileSync, openSync, mkdirSync } from "node:fs";
import type { Command } from "commander";
import { startBastion } from "@lab/bastion";
@@ -34,6 +34,13 @@ export function registerStartCommand(parent: Command): void {
skipArtifacts?: boolean;
foreground?: boolean;
}) => {
// Check root early (before daemonize) so the error is visible
if (!opts.skipDnsmasq && process.getuid?.() !== 0) {
console.error("Must run as root (dnsmasq needs DHCP/TFTP ports).");
console.error("Usage: sudo labctl init bastion standalone start");
process.exit(1);
}
if (opts.foreground === true) {
// Run in foreground
await startBastion({
@@ -51,55 +58,88 @@ export function registerStartCommand(parent: Command): void {
return;
}
// Daemonize: spawn ourselves with --foreground and detach
// Daemonize: re-run with --foreground, redirect output to log file
mkdirSync(opts.dir, { recursive: true });
const logFile = `${opts.dir}/bastion.log`;
const args = process.argv.slice(1);
// Add --foreground flag
args.push("--foreground");
const child: ChildProcess = spawn(process.argv[0] ?? "labctl", args, {
// Build explicit argument list instead of re-using process.argv
// (which breaks with bun-compiled binaries)
const fgArgs = [
"init", "bastion", "standalone", "start", "--foreground",
"--port", opts.port,
"--dir", opts.dir,
"--domain", opts.domain,
"--dhcp-mode", opts.dhcpMode,
"--fedora", opts.fedora,
"--arch", opts.arch,
"--timezone", opts.timezone,
"--locale", opts.locale,
];
if (opts.skipDnsmasq) fgArgs.push("--skip-dnsmasq");
if (opts.skipArtifacts) fgArgs.push("--skip-artifacts");
// Determine how to re-invoke ourselves
const execPath = process.argv[0] ?? "labctl";
let spawnCmd: string;
let spawnArgs: string[];
if (execPath.includes("node") || execPath.includes("tsx")) {
const scriptPath = process.argv[1];
spawnCmd = execPath;
spawnArgs = scriptPath ? [scriptPath, ...fgArgs] : fgArgs;
} else {
spawnCmd = execPath;
spawnArgs = fgArgs;
}
// Open log file for the child's stdout/stderr so it survives parent exit
const logFd = openSync(logFile, "a");
const child: ChildProcess = spawn(spawnCmd, spawnArgs, {
detached: true,
stdio: ["ignore", "pipe", "pipe"],
stdio: ["ignore", logFd, logFd],
});
// Collect initial output to confirm startup
let output = "";
const timeout = setTimeout(() => {
child.stdout?.removeAllListeners();
child.stderr?.removeAllListeners();
child.unref();
console.log(`Bastion starting in background (PID ${child.pid})`);
console.log(`Log: ${logFile}`);
process.exit(0);
}, 3000);
// Wait briefly for the child to start, then check it's alive
await new Promise((resolve) => setTimeout(resolve, 3000));
child.stdout?.on("data", (data: Buffer) => {
output += data.toString();
process.stdout.write(data);
if (output.includes("Waiting for PXE boot requests")) {
clearTimeout(timeout);
child.stdout?.removeAllListeners();
child.stderr?.removeAllListeners();
child.unref();
// Check PID file
const pidFile = `${opts.dir}/bastion.pid`;
const pid = existsSync(pidFile) ? readFileSync(pidFile, "utf-8").trim() : String(child.pid);
console.log("");
console.log(`Bastion running in background (PID ${pid})`);
console.log(`Log: ${logFile}`);
process.exit(0);
// Check if child is still running
try {
process.kill(child.pid!, 0); // signal 0 = check existence
} catch {
// Child already died — show the log
console.error("Bastion failed to start. Log output:");
console.error("");
try {
const log = readFileSync(logFile, "utf-8");
const lines = log.trim().split("\n").slice(-20);
for (const line of lines) {
console.error(" " + line);
}
} catch {
console.error(" (no log output)");
}
});
process.exit(1);
}
child.stderr?.on("data", (data: Buffer) => {
process.stderr.write(data);
});
child.unref();
child.on("exit", (code) => {
clearTimeout(timeout);
console.error(`Bastion exited with code ${code}`);
process.exit(code ?? 1);
});
// Print startup info from the log
try {
const log = readFileSync(logFile, "utf-8");
process.stdout.write(log);
} catch {
// No log yet
}
const pidFile = `${opts.dir}/bastion.pid`;
const pid = existsSync(pidFile)
? readFileSync(pidFile, "utf-8").trim()
: String(child.pid);
console.log("");
console.log(`Bastion running in background (PID ${pid})`);
console.log(`Log: ${logFile}`);
process.exit(0);
});
}

View File

@@ -1,67 +1,42 @@
// CLI command: init bastion standalone status
// Check if bastion is running, show port/uptime/machine count.
// Show connected bastions and their machine counts via labd.
import { readFileSync, existsSync, statSync } from "node:fs";
import type { Command } from "commander";
import type { BastionState } from "@lab/shared";
import { getLabdClient } from "../api/config.js";
import { execSync } from "node:child_process";
function isProcessAlive(pid: number): boolean {
try {
// process.kill(pid, 0) fails for root-owned processes when run as non-root
// Use kill -0 which works across users, or check /proc
execSync(`kill -0 ${pid} 2>/dev/null || test -d /proc/${pid}`, { stdio: "pipe" });
return true;
} catch {
return false;
}
}
const BOLD = "\x1b[1m";
const GREEN = "\x1b[32m";
const RED = "\x1b[31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
export function registerStatusCommand(parent: Command): void {
parent
.command("status")
.description("Show bastion server status")
.option("--dir <dir>", "Bastion data directory", "/tmp/lab-bastion")
.option("--port <port>", "Bastion HTTP port", "8080")
.action(async (opts: { dir: string; port: string }) => {
const pidFile = `${opts.dir}/bastion.pid`;
const port = parseInt(opts.port, 10);
if (!existsSync(pidFile)) {
console.log("Bastion is not running (no PID file).");
return;
}
const pid = parseInt(readFileSync(pidFile, "utf-8").trim(), 10);
if (isNaN(pid) || !isProcessAlive(pid)) {
console.log("Bastion is not running (stale PID file).");
return;
}
// Calculate uptime from PID file mtime
const pidStat = statSync(pidFile);
const uptimeMs = Date.now() - pidStat.mtimeMs;
const uptimeMin = Math.floor(uptimeMs / 60_000);
const uptimeHr = Math.floor(uptimeMin / 60);
const uptimeStr = uptimeHr > 0
? `${uptimeHr}h ${uptimeMin % 60}m`
: `${uptimeMin}m`;
console.log(`Bastion is running (PID ${pid})`);
console.log(` Port: ${port}`);
console.log(` Uptime: ${uptimeStr}`);
// Try to fetch machine count
.action(async () => {
try {
const response = await fetch(`http://localhost:${port}/api/machines`);
const state = (await response.json()) as BastionState;
const discovered = Object.keys(state.discovered).length;
const queued = Object.keys(state.install_queue).length;
const installed = Object.keys(state.installed).length;
console.log(` Machines: ${discovered} discovered, ${queued} queued, ${installed} installed`);
} catch {
console.log(" Machines: (could not reach API)");
const bastions = await getLabdClient().getBastions();
if (bastions.length === 0) {
console.log("No bastions registered.");
return;
}
const pad = (s: string, w: number) => s.padEnd(w);
console.log(
`${BOLD}${pad("HOSTNAME", 24)}${pad("NETWORK", 18)}${pad("IP", 18)}${pad("STATUS", 10)}${pad("MACHINES", 10)}${RESET}`,
);
for (const b of bastions) {
const statusColor = b.status === "online" ? GREEN : RED;
console.log(
`${pad(b.hostname, 24)}${DIM}${pad(b.network, 18)}${RESET}${pad(b.serverIp, 18)}${statusColor}${pad(b.status, 10)}${RESET}${pad(String(b.machineCount), 10)}`,
);
}
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
});
}

View File

@@ -0,0 +1,111 @@
// CLI configuration management.
// Loads from: defaults -> ~/.labctl/config.yaml -> env vars -> CLI flags.
import { existsSync, readFileSync, writeFileSync, mkdirSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
export interface CliConfig {
labdUrl: string;
certPath?: string;
keyPath?: string;
caPath?: string;
defaultEnvironment?: string;
defaultCloud?: string;
outputFormat?: "table" | "json" | "yaml";
}
export const CONFIG_DIR = join(homedir(), ".labctl");
export const CONFIG_FILE = join(CONFIG_DIR, "config.yaml");
export const CERT_DIR = join(CONFIG_DIR, "certs");
const VALID_KEYS = new Set<keyof CliConfig>([
"labdUrl",
"certPath",
"keyPath",
"caPath",
"defaultEnvironment",
"defaultCloud",
"outputFormat",
]);
export function isValidConfigKey(key: string): key is keyof CliConfig {
return VALID_KEYS.has(key as keyof CliConfig);
}
export function loadConfig(): CliConfig {
// 1. Defaults
const config: CliConfig = {
labdUrl: "http://localhost:3100",
};
// 2. Config file overrides
if (existsSync(CONFIG_FILE)) {
try {
const raw = readFileSync(CONFIG_FILE, "utf-8");
const parsed = parseSimpleYaml(raw);
for (const [k, v] of Object.entries(parsed)) {
if (isValidConfigKey(k) && v !== "") {
(config as unknown as Record<string, string>)[k] = v;
}
}
} catch {
// Ignore malformed config
}
}
// 3. Environment variable overrides
if (process.env["LABD_URL"]) config.labdUrl = process.env["LABD_URL"];
if (process.env["LABCTL_ENV"]) config.defaultEnvironment = process.env["LABCTL_ENV"];
if (process.env["LABCTL_CLOUD"]) config.defaultCloud = process.env["LABCTL_CLOUD"];
if (process.env["LABCTL_OUTPUT"]) {
const fmt = process.env["LABCTL_OUTPUT"];
if (fmt === "table" || fmt === "json" || fmt === "yaml") {
config.outputFormat = fmt;
}
}
return config;
}
export function saveConfig(config: CliConfig): void {
if (!existsSync(CONFIG_DIR)) {
mkdirSync(CONFIG_DIR, { recursive: true, mode: 0o700 });
}
const lines: string[] = [];
for (const [k, v] of Object.entries(config)) {
if (v !== undefined) {
lines.push(`${k}: ${v}`);
}
}
writeFileSync(CONFIG_FILE, lines.join("\n") + "\n", { mode: 0o600 });
}
export function getConfigValue(config: CliConfig, key: keyof CliConfig): string {
return String(config[key] ?? "");
}
export function setConfigValue(config: CliConfig, key: keyof CliConfig, value: string): CliConfig {
return { ...config, [key]: value };
}
/** Minimal YAML parser for flat key: value files (no nested structures). */
function parseSimpleYaml(raw: string): Record<string, string> {
const result: Record<string, string> = {};
for (const line of raw.split("\n")) {
const trimmed = line.trim();
if (!trimmed || trimmed.startsWith("#")) continue;
const idx = trimmed.indexOf(":");
if (idx === -1) continue;
const key = trimmed.slice(0, idx).trim();
let value = trimmed.slice(idx + 1).trim();
if (
(value.startsWith('"') && value.endsWith('"')) ||
(value.startsWith("'") && value.endsWith("'"))
) {
value = value.slice(1, -1);
}
result[key] = value;
}
return result;
}

View File

@@ -5,8 +5,9 @@
// provision list/install/reprovision/forget
import { fileURLToPath } from "node:url";
import { Command } from "commander";
import { Command, Option } from "commander";
import { APP_VERSION } from "@lab/shared";
import { loadConfig } from "./config/index.js";
import { registerStartCommand } from "./commands/serve.js";
import { registerStopCommand } from "./commands/stop.js";
import { registerStatusCommand } from "./commands/status.js";
@@ -14,15 +15,65 @@ import { registerInstallCommand } from "./commands/install.js";
import { registerListCommand } from "./commands/list.js";
import { registerReprovisionCommand } from "./commands/reprovision.js";
import { registerForgetCommand } from "./commands/forget.js";
import { registerLogsCommand } from "./commands/logs.js";
import { registerMakeIsoCommand } from "./commands/makeiso.js";
import { registerConfigCommand } from "./commands/config.js";
import { registerLoginCommand } from "./commands/login.js";
import { registerDoctorCommand } from "./commands/doctor.js";
import { registerAppCommand } from "./commands/app.js";
import { ROLE_REGISTRY } from "@lab/shared";
export function createProgram(): Command {
const program = new Command();
program
.name("labctl")
.description("Lab PXE Bastion -- discover-first bare-metal provisioning")
.description("Lab infrastructure management CLI")
.version(APP_VERSION);
// Global options
program
.addOption(
new Option("-o, --output <format>", "output format")
.choices(["table", "json", "yaml"])
.default("table"),
)
.option("--server <url>", "override labd server URL")
.option("--env <name>", "override default environment")
.option("--cloud <name>", "override default cloud")
.option("--debug", "enable debug output")
.option("--no-color", "disable colored output");
// preAction hook: load config, apply CLI overrides, store merged config
program.hook("preAction", (thisCommand) => {
const config = loadConfig();
const opts = thisCommand.opts();
if (opts.output) config.outputFormat = opts.output;
if (opts.server) config.labdUrl = opts.server;
if (opts.env) config.defaultEnvironment = opts.env;
if (opts.cloud) config.defaultCloud = opts.cloud;
if (opts.debug) {
process.env["DEBUG"] = "1";
}
if (opts.color === false) {
process.env["NO_COLOR"] = "1";
}
thisCommand.setOptionValue("_config", config);
});
// version subcommand
program
.command("version")
.description("Show version information")
.action(() => {
console.log(`labctl ${APP_VERSION}`);
console.log(`node ${process.version}`);
console.log(`platform ${process.platform} ${process.arch}`);
});
// init bastion standalone start/stop/status
const initCmd = program.command("init");
initCmd.description("Initialise infrastructure components");
@@ -45,6 +96,39 @@ export function createProgram(): Command {
registerInstallCommand(provisionCmd);
registerReprovisionCommand(provisionCmd);
registerForgetCommand(provisionCmd);
registerLogsCommand(provisionCmd);
registerMakeIsoCommand(provisionCmd);
// config list/get/set/path
registerConfigCommand(program);
// login
registerLoginCommand(program);
// doctor
registerDoctorCommand(program);
// app k3s install/health + labcontroller
registerAppCommand(program);
// roles — quick reference
program
.command("roles")
.description("List available machine roles")
.action(() => {
const BOLD = "\x1b[1m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
const pad = (s: string, w: number) => s.padEnd(w);
console.log(`${BOLD}${pad("ROLE", 18)}${pad("EXTENDS", 12)}${pad("K3S", 6)}${pad("AUTO-DEPLOY", 30)}DESCRIPTION${RESET}`);
for (const r of ROLE_REGISTRY) {
const k3s = r.k3s ? "yes" : "no";
const apps = r.apps.length > 0 ? r.apps.join(", ") : "—";
const parent = r.parent ?? "—";
console.log(`${pad(r.name, 18)}${DIM}${pad(parent, 12)}${RESET}${pad(k3s, 6)}${pad(apps, 30)}${r.description}`);
}
});
return program;
}

View File

@@ -0,0 +1,27 @@
// Public API for CLI utility functions.
export {
parseResource,
formatResource,
validateServerName,
type ResourceType,
type ResourceIdentifier,
} from "./resource.js";
export {
printTable,
formatStatus,
formatRelativeTime,
formatOutput,
serverColumns,
roleColumns,
type TableColumn,
type Role,
} from "./table.js";
export {
isInteractive,
confirmAction,
promptInput,
promptPassword,
} from "./prompts.js";

View File

@@ -0,0 +1,48 @@
// Interactive CLI prompts with non-interactive mode support.
import { createInterface } from "node:readline";
export function isInteractive(): boolean {
if (process.env["LABCTL_YES"] === "true") return false;
if (process.env["CI"] === "true") return false;
return Boolean(process.stdin.isTTY);
}
export async function confirmAction(
message: string,
defaultValue = false,
): Promise<boolean> {
if (!isInteractive()) return true;
const hint = defaultValue ? "[Y/n]" : "[y/N]";
const answer = await prompt(`${message} ${hint} `);
if (answer === "") return defaultValue;
return answer.toLowerCase().startsWith("y");
}
export async function promptInput(
message: string,
defaultValue?: string,
): Promise<string> {
if (!isInteractive() && defaultValue !== undefined) return defaultValue;
const suffix = defaultValue ? ` (${defaultValue})` : "";
const answer = await prompt(`${message}${suffix}: `);
return answer || defaultValue || "";
}
export async function promptPassword(message: string): Promise<string> {
return prompt(message);
}
function prompt(message: string): Promise<string> {
return new Promise((resolve) => {
const rl = createInterface({
input: process.stdin,
output: process.stdout,
});
rl.question(message, (answer) => {
rl.close();
resolve(answer.trim());
});
});
}

View File

@@ -0,0 +1,129 @@
// Resource name parsing and validation utilities.
// Handles "type/name" and "type/namespace/name" resource identifiers
// used throughout the CLI for addressing lab platform objects.
/** All valid resource types in the lab platform. */
export type ResourceType =
| "server"
| "app"
| "cluster"
| "role"
| "user"
| "pulumi"
| "bastion"
| "agent"
| "audit";
const VALID_RESOURCE_TYPES: ReadonlySet<string> = new Set<ResourceType>([
"server",
"app",
"cluster",
"role",
"user",
"pulumi",
"bastion",
"agent",
"audit",
]);
/** A parsed resource identifier: type with name and optional namespace. */
export interface ResourceIdentifier {
type: ResourceType;
name: string;
namespace?: string;
}
/**
* Parse a resource string into a structured identifier.
*
* Accepted formats:
* "server/myhost" -> { type: "server", name: "myhost" }
* "app/production/frontend" -> { type: "app", name: "frontend", namespace: "production" }
*
* @throws Error if the input does not match the expected format or the type is unknown.
*/
export function parseResource(input: string): ResourceIdentifier {
const match = /^([a-z-]+)\/(.+)$/.exec(input);
if (!match) {
throw new Error(
`Invalid resource format: "${input}". Expected "type/name" or "type/namespace/name".`,
);
}
const rawType = match[1]!;
const rest = match[2]!;
if (!VALID_RESOURCE_TYPES.has(rawType)) {
const valid = [...VALID_RESOURCE_TYPES].join(", ");
throw new Error(
`Unknown resource type "${rawType}". Valid types: ${valid}.`,
);
}
const type = rawType as ResourceType;
// If rest contains a slash, split into namespace/name
const slashIndex = rest.indexOf("/");
if (slashIndex !== -1) {
const namespace = rest.slice(0, slashIndex);
const name = rest.slice(slashIndex + 1);
if (!namespace || !name) {
throw new Error(
`Invalid resource format: "${input}". Namespace and name must not be empty.`,
);
}
return { type, name, namespace };
}
return { type, name: rest };
}
/**
* Format a resource identifier back into its string representation.
*
* Returns "type/name" or "type/namespace/name" depending on whether
* a namespace is present.
*/
export function formatResource(resource: ResourceIdentifier): string {
if (resource.namespace !== undefined) {
return `${resource.type}/${resource.namespace}/${resource.name}`;
}
return `${resource.type}/${resource.name}`;
}
/** Hostname validation pattern: lowercase alphanumeric with dots and hyphens. */
const HOSTNAME_PATTERN = /^[a-z0-9][a-z0-9.-]*[a-z0-9]$/;
/**
* Validate that a server name is a legal hostname.
*
* Rules:
* - Must start and end with a lowercase letter or digit
* - May contain lowercase letters, digits, dots, and hyphens
* - Single-character names (one lowercase letter or digit) are allowed
*
* @throws Error if the name is not a valid hostname.
*/
export function validateServerName(name: string): void {
if (name.length === 0) {
throw new Error("Server name must not be empty.");
}
// Single character: just check it's alphanumeric
if (name.length === 1) {
if (!/^[a-z0-9]$/.test(name)) {
throw new Error(
`Invalid server name "${name}". Must contain only lowercase letters, digits, dots, and hyphens, ` +
"and must start and end with a letter or digit.",
);
}
return;
}
if (!HOSTNAME_PATTERN.test(name)) {
throw new Error(
`Invalid server name "${name}". Must contain only lowercase letters, digits, dots, and hyphens, ` +
"and must start and end with a letter or digit.",
);
}
}

View File

@@ -0,0 +1,267 @@
// Table formatting utilities for CLI output.
// Uses plain ANSI escape codes and string padding — no external dependencies.
import type { Server } from "../api/types.js";
// ---------------------------------------------------------------------------
// ANSI escape codes
// ---------------------------------------------------------------------------
const BOLD = "\x1b[1m";
const RESET = "\x1b[0m";
const GREEN = "\x1b[32m";
const RED = "\x1b[31m";
const YELLOW = "\x1b[33m";
const CYAN = "\x1b[36m";
// ---------------------------------------------------------------------------
// TableColumn interface
// ---------------------------------------------------------------------------
/** Describes a single column in a formatted table. */
export interface TableColumn<T> {
/** Column header label. */
header: string;
/** Property key on T, or a function that extracts the cell value from a row. */
accessor: keyof T | ((row: T) => string);
/** Fixed column width (defaults to max of header width and widest cell). */
width?: number;
/** Text alignment within the cell. Defaults to 'left'. */
align?: "left" | "center" | "right";
}
// ---------------------------------------------------------------------------
// printTable
// ---------------------------------------------------------------------------
/** Resolve a cell value from a row using a column's accessor. */
function cellValue<T>(row: T, accessor: TableColumn<T>["accessor"]): string {
if (typeof accessor === "function") {
return accessor(row);
}
const raw = row[accessor];
if (raw === null || raw === undefined) return "-";
return String(raw);
}
/** Pad a string to a given width respecting the requested alignment. */
function padCell(text: string, width: number, align: "left" | "center" | "right"): string {
if (text.length >= width) return text.slice(0, width);
switch (align) {
case "right":
return text.padStart(width);
case "center": {
const total = width - text.length;
const left = Math.floor(total / 2);
return " ".repeat(left) + text + " ".repeat(total - left);
}
case "left":
default:
return text.padEnd(width);
}
}
/**
* Format and print a table to stdout.
*
* Columns are separated by two spaces. The header row is bold. Column widths
* are auto-calculated from the data unless explicitly set via `width`.
*/
export function printTable<T>(data: T[], columns: TableColumn<T>[]): void {
// Compute effective widths: max(header, longest cell, explicit width).
const widths = columns.map((col) => {
const headerLen = col.header.length;
const maxCell = data.reduce((max, row) => {
const len = cellValue(row, col.accessor).length;
return len > max ? len : max;
}, 0);
const auto = Math.max(headerLen, maxCell);
return col.width !== undefined ? Math.max(col.width, headerLen) : auto;
});
const gap = " ";
// Header
const headerLine = columns
.map((col, i) => padCell(col.header, widths[i]!, col.align ?? "left"))
.join(gap);
console.log(`${BOLD}${headerLine}${RESET}`);
// Rows
for (const row of data) {
const line = columns
.map((col, i) => padCell(cellValue(row, col.accessor), widths[i]!, col.align ?? "left"))
.join(gap);
console.log(line);
}
}
// ---------------------------------------------------------------------------
// formatStatus
// ---------------------------------------------------------------------------
/**
* Return a status string wrapped in the appropriate ANSI colour code.
*
* - green: online, installed
* - red: offline, error
* - yellow: queued, installing, provisioning
* - cyan: discovered
*/
export function formatStatus(status: string): string {
const lower = status.toLowerCase();
switch (lower) {
case "online":
case "installed":
return `${GREEN}${status}${RESET}`;
case "offline":
case "error":
return `${RED}${status}${RESET}`;
case "queued":
case "installing":
case "provisioning":
return `${YELLOW}${status}${RESET}`;
case "discovered":
return `${CYAN}${status}${RESET}`;
default:
return status;
}
}
// ---------------------------------------------------------------------------
// formatRelativeTime
// ---------------------------------------------------------------------------
/**
* Convert a timestamp into a human-friendly relative string such as
* "2m ago", "3h ago", or "5d ago". Returns "-" for null / undefined.
*/
export function formatRelativeTime(timestamp: Date | string | null): string {
if (timestamp === null || timestamp === undefined) return "-";
const date = typeof timestamp === "string" ? new Date(timestamp) : timestamp;
const now = Date.now();
const diffMs = now - date.getTime();
if (diffMs < 0) return "just now";
const seconds = Math.floor(diffMs / 1000);
if (seconds < 60) return `${seconds}s ago`;
const minutes = Math.floor(seconds / 60);
if (minutes < 60) return `${minutes}m ago`;
const hours = Math.floor(minutes / 60);
if (hours < 24) return `${hours}h ago`;
const days = Math.floor(hours / 24);
return `${days}d ago`;
}
// ---------------------------------------------------------------------------
// Predefined column sets
// ---------------------------------------------------------------------------
/** Role-like object used for predefined roleColumns. */
export interface Role {
name: string;
description: string;
permissions: string[];
}
/** Predefined columns for listing Server objects. */
export const serverColumns: TableColumn<Server>[] = [
{ header: "NAME", accessor: "hostname" },
{ header: "CLOUD", accessor: "cloud" },
{ header: "ENV", accessor: "environment" },
{ header: "ROLE", accessor: "role" },
{ header: "STATUS", accessor: (s) => formatStatus(s.status) },
{ header: "LAST SEEN", accessor: (s) => formatRelativeTime(s.lastHeartbeat) },
];
/** Predefined columns for listing Role objects. */
export const roleColumns: TableColumn<Role>[] = [
{ header: "NAME", accessor: "name" },
{ header: "DESCRIPTION", accessor: "description" },
{ header: "PERMISSIONS", accessor: (r) => String(r.permissions.length) },
];
// ---------------------------------------------------------------------------
// formatOutput — multi-format output dispatcher
// ---------------------------------------------------------------------------
/**
* Render an array of objects in the requested output format.
*
* - `table`: delegates to {@link printTable}. Requires `columns`.
* - `json`: pretty-prints with 2-space indent.
* - `yaml`: simple key/value serialisation (no external dependency).
*/
export function formatOutput<T>(
data: T[],
format: "table" | "json" | "yaml",
columns?: TableColumn<T>[],
): void {
switch (format) {
case "json":
console.log(JSON.stringify(data, null, 2));
break;
case "yaml":
for (const [idx, item] of data.entries()) {
if (idx > 0) console.log("---");
serializeYaml(item as Record<string, unknown>, 0);
}
break;
case "table":
if (!columns || columns.length === 0) {
// Fallback: dump as JSON when no columns provided.
console.log(JSON.stringify(data, null, 2));
return;
}
printTable(data, columns);
break;
}
}
// ---------------------------------------------------------------------------
// Minimal YAML serialiser (no dependency)
// ---------------------------------------------------------------------------
function serializeYaml(obj: unknown, indent: number): void {
const prefix = " ".repeat(indent);
if (obj === null || obj === undefined) {
console.log(`${prefix}null`);
return;
}
if (typeof obj !== "object") {
console.log(`${prefix}${String(obj)}`);
return;
}
if (Array.isArray(obj)) {
for (const item of obj) {
if (typeof item === "object" && item !== null) {
console.log(`${prefix}-`);
serializeYaml(item, indent + 1);
} else {
console.log(`${prefix}- ${String(item)}`);
}
}
return;
}
for (const [key, value] of Object.entries(obj as Record<string, unknown>)) {
if (value === null || value === undefined) {
console.log(`${prefix}${key}: null`);
} else if (typeof value === "object") {
console.log(`${prefix}${key}:`);
serializeYaml(value, indent + 1);
} else {
console.log(`${prefix}${key}: ${String(value)}`);
}
}
}

View File

@@ -0,0 +1,56 @@
// Tests for LabdApiError.
import { describe, it, expect } from "vitest";
import { LabdApiError, isLabdApiError } from "../src/api/errors.js";
describe("LabdApiError", () => {
it("constructs with status code and message", () => {
const err = new LabdApiError(404, "Not found");
expect(err.statusCode).toBe(404);
expect(err.message).toBe("Not found");
expect(err.errorCode).toBe("NOT_FOUND");
});
it("fromResponse parses error body", () => {
const err = LabdApiError.fromResponse(400, {
error: "Invalid input",
detail: "hostname required",
});
expect(err.statusCode).toBe(400);
expect(err.message).toBe("Invalid input");
expect(err.detail).toBe("hostname required");
});
it("fromResponse handles non-object body", () => {
const err = LabdApiError.fromResponse(500, "plain text");
expect(err.statusCode).toBe(500);
expect(err.message).toBe("HTTP 500");
});
it("notConnected creates connection error", () => {
const err = LabdApiError.notConnected("https://localhost:8443");
expect(err.statusCode).toBe(0);
expect(err.errorCode).toBe("CONNECTION_ERROR");
expect(err.message).toContain("localhost:8443");
});
it("timeout creates timeout error", () => {
const err = LabdApiError.timeout(30000);
expect(err.message).toContain("30000ms");
});
});
describe("isLabdApiError", () => {
it("returns true for LabdApiError", () => {
expect(isLabdApiError(new LabdApiError(500, "err"))).toBe(true);
});
it("returns false for regular Error", () => {
expect(isLabdApiError(new Error("nope"))).toBe(false);
});
it("returns false for non-errors", () => {
expect(isLabdApiError(null)).toBe(false);
expect(isLabdApiError("string")).toBe(false);
});
});

View File

@@ -0,0 +1,53 @@
// Tests for CLI configuration management.
import { describe, it, expect, beforeEach, afterEach } from "vitest";
import { mkdirSync, writeFileSync, rmSync, existsSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
// We can't easily test loadConfig() because it uses homedir() for paths.
// Instead, test the parsing and validation logic directly.
import { isValidConfigKey } from "../src/config/index.js";
describe("isValidConfigKey", () => {
it("accepts valid keys", () => {
expect(isValidConfigKey("labdUrl")).toBe(true);
expect(isValidConfigKey("certPath")).toBe(true);
expect(isValidConfigKey("keyPath")).toBe(true);
expect(isValidConfigKey("caPath")).toBe(true);
expect(isValidConfigKey("defaultEnvironment")).toBe(true);
expect(isValidConfigKey("defaultCloud")).toBe(true);
expect(isValidConfigKey("outputFormat")).toBe(true);
});
it("rejects invalid keys", () => {
expect(isValidConfigKey("password")).toBe(false);
expect(isValidConfigKey("")).toBe(false);
expect(isValidConfigKey("LABD_URL")).toBe(false);
});
});
describe("CLI program creation", () => {
it("creates program with all commands", async () => {
const { createProgram } = await import("../src/index.js");
const program = createProgram();
// Check top-level commands exist
const commandNames = program.commands.map((c) => c.name());
expect(commandNames).toContain("version");
expect(commandNames).toContain("init");
expect(commandNames).toContain("provision");
expect(commandNames).toContain("config");
expect(commandNames).toContain("login");
expect(commandNames).toContain("doctor");
});
it("has global options", async () => {
const { createProgram } = await import("../src/index.js");
const program = createProgram();
const optionNames = program.options.map((o) => o.long);
expect(optionNames).toContain("--output");
expect(optionNames).toContain("--server");
expect(optionNames).toContain("--debug");
});
});

View File

@@ -0,0 +1,71 @@
// Tests for resource name parsing utilities.
import { describe, it, expect } from "vitest";
import {
parseResource,
formatResource,
validateServerName,
} from "../src/utils/resource.js";
describe("parseResource", () => {
it("parses server/name", () => {
const r = parseResource("server/labmaster");
expect(r.type).toBe("server");
expect(r.name).toBe("labmaster");
expect(r.namespace).toBeUndefined();
});
it("parses app/namespace/name", () => {
const r = parseResource("app/kube-system/nginx");
expect(r.type).toBe("app");
expect(r.namespace).toBe("kube-system");
expect(r.name).toBe("nginx");
});
it("parses all valid types", () => {
const types = ["server", "app", "cluster", "role", "user", "pulumi", "bastion", "agent", "audit"];
for (const t of types) {
expect(parseResource(`${t}/name`).type).toBe(t);
}
});
it("throws on invalid format", () => {
expect(() => parseResource("noslash")).toThrow("Invalid resource format");
});
it("throws on unknown type", () => {
expect(() => parseResource("unknown/name")).toThrow("Unknown resource type");
});
});
describe("formatResource", () => {
it("formats simple resource", () => {
expect(formatResource({ type: "server", name: "w1" })).toBe("server/w1");
});
it("formats namespaced resource", () => {
expect(
formatResource({ type: "app", namespace: "default", name: "nginx" }),
).toBe("app/default/nginx");
});
it("roundtrips with parseResource", () => {
const input = "server/labmaster";
expect(formatResource(parseResource(input))).toBe(input);
});
});
describe("validateServerName", () => {
it("accepts valid hostnames", () => {
expect(() => validateServerName("worker-1")).not.toThrow();
expect(() => validateServerName("web.cluster.local")).not.toThrow();
expect(() => validateServerName("a")).not.toThrow();
});
it("rejects invalid hostnames", () => {
expect(() => validateServerName("-start")).toThrow();
expect(() => validateServerName("end-")).toThrow();
expect(() => validateServerName("")).toThrow();
expect(() => validateServerName("has space")).toThrow();
});
});

View File

@@ -0,0 +1,197 @@
// Smoke tests for bastion CLI commands.
// These tests spawn real processes and verify they work end-to-end.
import { describe, it, expect, afterEach } from "vitest";
import { spawn, execSync, type ChildProcess } from "node:child_process";
import { existsSync, readFileSync, mkdirSync, rmSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
const CLI_PATH = join(import.meta.dirname, "..", "src", "index.ts");
const TEST_DIR = join(tmpdir(), `lab-bastion-smoke-${process.pid}`);
const PID_FILE = join(TEST_DIR, "bastion.pid");
const LOG_FILE = join(TEST_DIR, "bastion.log");
const TEST_PORT = 18932; // Unlikely to conflict
function runCli(args: string[], timeoutMs = 10_000): Promise<{ code: number; stdout: string; stderr: string }> {
return new Promise((resolve, reject) => {
const child = spawn("node", ["--import", "tsx", CLI_PATH, ...args], {
timeout: timeoutMs,
env: { ...process.env, NODE_NO_WARNINGS: "1" },
});
let stdout = "";
let stderr = "";
child.stdout.on("data", (d: Buffer) => { stdout += d.toString(); });
child.stderr.on("data", (d: Buffer) => { stderr += d.toString(); });
child.on("close", (code) => {
resolve({ code: code ?? 1, stdout, stderr });
});
child.on("error", reject);
});
}
function sleep(ms: number): Promise<void> {
return new Promise((r) => setTimeout(r, ms));
}
function killPid(pid: number): void {
try { process.kill(pid, "SIGTERM"); } catch { /* already dead */ }
}
describe("bastion smoke tests", () => {
let daemonPid: number | undefined;
afterEach(() => {
// Kill any daemon we started
if (daemonPid) {
killPid(daemonPid);
daemonPid = undefined;
}
// Also try PID file
try {
const pid = parseInt(readFileSync(PID_FILE, "utf-8").trim(), 10);
if (!isNaN(pid)) killPid(pid);
} catch { /* no pid file */ }
// Clean up test directory
try { rmSync(TEST_DIR, { recursive: true, force: true }); } catch { /* ignore */ }
});
it("--help prints usage without error", async () => {
const result = await runCli(["--help"]);
expect(result.code).toBe(0);
expect(result.stdout).toContain("labctl");
expect(result.stdout).toContain("Commands:");
});
it("--version prints version", async () => {
const result = await runCli(["--version"]);
expect(result.code).toBe(0);
expect(result.stdout.trim()).toMatch(/^\d+\.\d+\.\d+$/);
});
it("version subcommand prints detailed info", async () => {
const result = await runCli(["version"]);
expect(result.code).toBe(0);
expect(result.stdout).toContain("labctl");
expect(result.stdout).toContain("node");
expect(result.stdout).toContain("platform");
});
it("config list works without config file", async () => {
const result = await runCli(["config", "list"]);
expect(result.code).toBe(0);
expect(result.stdout).toContain("labdUrl");
});
it("config path prints a path", async () => {
const result = await runCli(["config", "path"]);
expect(result.code).toBe(0);
expect(result.stdout.trim()).toContain(".labctl");
});
it("start without root prints helpful error", async () => {
// Only run if we're NOT root (CI may run as root)
if (process.getuid?.() === 0) return;
const result = await runCli([
"init", "bastion", "standalone", "start",
"--dir", TEST_DIR,
"--port", String(TEST_PORT),
]);
expect(result.code).toBe(1);
expect(result.stderr).toContain("root");
expect(result.stderr).toContain("sudo");
});
it("foreground start with --skip-dnsmasq --skip-artifacts works and stays alive", async () => {
mkdirSync(TEST_DIR, { recursive: true });
// Start in foreground as a child process
const child = spawn(
"node",
[
"--import", "tsx",
CLI_PATH,
"init", "bastion", "standalone", "start", "--foreground",
"--skip-dnsmasq", "--skip-artifacts",
"--dir", TEST_DIR,
"--port", String(TEST_PORT),
],
{
env: { ...process.env, NODE_NO_WARNINGS: "1" },
stdio: ["ignore", "pipe", "pipe"],
},
);
daemonPid = child.pid;
// Collect output
let stdout = "";
child.stdout.on("data", (d: Buffer) => { stdout += d.toString(); });
let stderr = "";
child.stderr.on("data", (d: Buffer) => { stderr += d.toString(); });
// Wait for the server to start (look for the banner)
const startedAt = Date.now();
const maxWait = 10_000;
while (Date.now() - startedAt < maxWait) {
if (stdout.includes("Waiting for PXE boot requests")) break;
await sleep(200);
}
expect(stdout).toContain("Waiting for PXE boot requests");
expect(stdout).toContain("HTTP server listening");
// Verify the process is still alive after startup
await sleep(1000);
let alive = false;
try {
process.kill(child.pid!, 0);
alive = true;
} catch { /* dead */ }
expect(alive).toBe(true);
// Verify PID file was created
expect(existsSync(PID_FILE)).toBe(true);
const pidFromFile = parseInt(readFileSync(PID_FILE, "utf-8").trim(), 10);
expect(pidFromFile).toBe(child.pid);
// Verify HTTP server responds
try {
const resp = await fetch(`http://127.0.0.1:${TEST_PORT}/api/machines`);
expect(resp.ok).toBe(true);
} catch (err) {
// If fetch fails, that's a real problem
throw new Error(`HTTP server not responding: ${err}`);
}
// Clean shutdown
child.kill("SIGTERM");
await new Promise<void>((resolve) => {
child.on("close", () => resolve());
setTimeout(resolve, 3000);
});
daemonPid = undefined;
}, 20_000);
it("status shows bastion info or reports labd unreachable", async () => {
const result = await runCli([
"init", "bastion", "standalone", "status",
]);
// Status queries labd — may show bastions (if labd running) or error (if not)
const output = result.stdout + result.stderr;
expect(output).toMatch(/HOSTNAME|Cannot reach labd|No bastions/i);
});
it("doctor runs without crashing", async () => {
const result = await runCli(["doctor"]);
// Doctor may report errors (no labd running) but should not crash
expect(result.code).toBeLessThanOrEqual(1); // 0 = all ok, 1 = errors found
expect(result.stdout).toContain("diagnostics");
});
});

View File

@@ -8,6 +8,7 @@
"include": ["src/**/*.ts"],
"references": [
{ "path": "../shared" },
{ "path": "../bastion" }
{ "path": "../bastion" },
{ "path": "../modules" }
]
}

View File

@@ -0,0 +1,24 @@
{
"name": "@lab/agent",
"version": "0.1.0",
"private": true,
"type": "module",
"main": "./dist/main.js",
"types": "./dist/main.d.ts",
"scripts": {
"build": "tsc --build",
"clean": "rimraf dist"
},
"dependencies": {
"@lab/shared": "workspace:*",
"winston": "^3.17.0",
"winston-daily-rotate-file": "^5.0.0",
"ws": "^8.19.0"
},
"devDependencies": {
"@types/node": "^22.14.1",
"@types/ws": "^8.18.1",
"rimraf": "^6.1.3",
"typescript": "^5.9.3"
}
}

View File

@@ -0,0 +1,10 @@
/**
* @lab/agent — Lab agent daemon entry point.
*
* For now this module re-exports the command executor so it can be consumed
* by other packages in the monorepo.
*/
export { CommandExecutor } from "./services/executor.js";
export type { ExecOptions, ExecResult } from "./services/executor.js";
export { AgentConnection, type ConnectionConfig, type ConnectionState, DEFAULT_CONNECTION_CONFIG } from "./services/connection.js";

View File

@@ -0,0 +1,157 @@
// Agent WebSocket connection to labd with heartbeat and reconnection.
import { EventEmitter } from "node:events";
import { hostname } from "node:os";
import { readFileSync } from "node:fs";
import WebSocket from "ws";
import type { AgentMessage, ServerMessage } from "@lab/shared";
import { parseServerMessage } from "@lab/shared";
export type ConnectionState = "disconnected" | "connecting" | "connected" | "reconnecting";
export interface ConnectionConfig {
labdUrl: string;
certPath: string;
keyPath: string;
caPath?: string;
heartbeatIntervalMs: number;
reconnectBaseDelayMs: number;
reconnectMaxDelayMs: number;
}
export const DEFAULT_CONNECTION_CONFIG: Partial<ConnectionConfig> = {
heartbeatIntervalMs: 10_000,
reconnectBaseDelayMs: 1_000,
reconnectMaxDelayMs: 30_000,
};
export class AgentConnection extends EventEmitter {
private ws: WebSocket | null = null;
private heartbeatTimer: NodeJS.Timeout | null = null;
private reconnectAttempts = 0;
private isClosing = false;
private _state: ConnectionState = "disconnected";
constructor(private config: ConnectionConfig) {
super();
}
get state(): ConnectionState {
return this._state;
}
isConnected(): boolean {
return this._state === "connected";
}
async connect(): Promise<void> {
if (this.isClosing) return;
this.setState(this.reconnectAttempts > 0 ? "reconnecting" : "connecting");
const wsUrl = this.config.labdUrl.replace("https:", "wss:").replace("http:", "ws:") + "/ws/agent";
try {
this.ws = new WebSocket(wsUrl, {
cert: readFileSync(this.config.certPath),
key: readFileSync(this.config.keyPath),
ca: this.config.caPath ? readFileSync(this.config.caPath) : undefined,
rejectUnauthorized: true,
});
this.ws.on("open", () => {
this.reconnectAttempts = 0;
this.setState("connected");
this.startHeartbeat();
this.emit("connected");
});
this.ws.on("message", (data: Buffer) => {
try {
const message = parseServerMessage(data.toString());
this.handleMessage(message);
this.emit("message", message);
} catch {
// Ignore unparseable messages
}
});
this.ws.on("close", (_code: number, _reason: Buffer) => {
this.stopHeartbeat();
this.setState("disconnected");
this.emit("disconnected");
this.scheduleReconnect();
});
this.ws.on("error", (_error: Error) => {
// Error is followed by close event, so reconnect happens there
});
} catch {
this.scheduleReconnect();
}
}
send(message: AgentMessage): void {
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify(message));
}
}
close(): void {
this.isClosing = true;
this.stopHeartbeat();
this.ws?.close();
this.setState("disconnected");
}
private handleMessage(message: ServerMessage): void {
if (message.type === "server-shutdown") {
this.isClosing = true; // Don't reconnect
this.emit("shutdown", message.reconnectAfter);
}
}
private startHeartbeat(): void {
this.stopHeartbeat();
this.heartbeatTimer = setInterval(() => {
this.send({
type: "heartbeat",
hostname: hostname(),
uptime: process.uptime(),
version: process.env["npm_package_version"] ?? "0.0.0",
memUsage: process.memoryUsage().heapUsed,
cpuUsage: 0, // Simplified — os.loadavg() not available everywhere
});
}, this.config.heartbeatIntervalMs);
}
private stopHeartbeat(): void {
if (this.heartbeatTimer) {
clearInterval(this.heartbeatTimer);
this.heartbeatTimer = null;
}
}
private scheduleReconnect(): void {
if (this.isClosing) return;
const delay = Math.min(
this.config.reconnectBaseDelayMs * Math.pow(2, this.reconnectAttempts),
this.config.reconnectMaxDelayMs,
);
this.reconnectAttempts++;
this.setState("reconnecting");
setTimeout(() => {
void this.connect();
}, delay);
}
private setState(state: ConnectionState): void {
if (this._state !== state) {
this._state = state;
this.emit("stateChange", state);
}
}
}

View File

@@ -0,0 +1,161 @@
import { EventEmitter } from "node:events";
import { spawn, type ChildProcess } from "node:child_process";
/** Options for executing a command. */
export interface ExecOptions {
/** The command and its arguments, e.g. ["ls", "-la"]. */
command: string[];
/** Maximum execution time in milliseconds. */
timeout: number;
/** Whether to allocate a pseudo-TTY. */
tty: boolean;
/** Optional environment variables (merged with process.env). */
env?: Record<string, string>;
/** Optional working directory. */
cwd?: string;
}
/** Result returned after a command finishes. */
export interface ExecResult {
exitCode: number;
stdout: string;
stderr: string;
timedOut: boolean;
signal?: string | undefined;
}
export interface CommandExecutorEvents {
stdout: [requestId: string, chunk: Buffer];
stderr: [requestId: string, chunk: Buffer];
}
/**
* Executes commands in a sandboxed child process with timeout handling
* and streaming output via events.
*/
export class CommandExecutor extends EventEmitter<CommandExecutorEvents> {
private readonly processes = new Map<string, ChildProcess>();
/** Grace period between SIGTERM and SIGKILL when a timeout fires (ms). */
private static readonly KILL_GRACE_MS = 5_000;
/**
* Execute a command and return its result once it exits.
*
* While the process is running, `stdout` and `stderr` events are emitted
* with `(requestId, chunk)` so callers can stream output in real time.
*/
execute(requestId: string, options: ExecOptions): Promise<ExecResult> {
const { command, timeout, tty, env, cwd } = options;
const [cmd, ...args] = command;
if (cmd === undefined) {
return Promise.resolve({
exitCode: 1,
stdout: "",
stderr: "Empty command",
timedOut: false,
});
}
return new Promise<ExecResult>((resolve) => {
const child = spawn(cmd, args, {
cwd,
env: env ? { ...process.env, ...env } : undefined,
stdio: tty ? ["pipe", "pipe", "pipe"] : ["pipe", "pipe", "pipe"],
// When TTY support is needed the caller should use node-pty or
// similar; for now we always use pipe-based stdio.
});
this.processes.set(requestId, child);
let stdoutBuf = "";
let stderrBuf = "";
let timedOut = false;
let killTimer: ReturnType<typeof setTimeout> | undefined;
// -- Streaming output ------------------------------------------------
child.stdout?.on("data", (chunk: Buffer) => {
stdoutBuf += chunk.toString();
this.emit("stdout", requestId, chunk);
});
child.stderr?.on("data", (chunk: Buffer) => {
stderrBuf += chunk.toString();
this.emit("stderr", requestId, chunk);
});
// -- Timeout handling -------------------------------------------------
const timeoutTimer = setTimeout(() => {
timedOut = true;
// Graceful shutdown first.
child.kill("SIGTERM");
// If the process does not exit within the grace period, force-kill.
killTimer = setTimeout(() => {
child.kill("SIGKILL");
}, CommandExecutor.KILL_GRACE_MS);
}, timeout);
// -- Completion -------------------------------------------------------
child.on("close", (code, signal) => {
clearTimeout(timeoutTimer);
if (killTimer !== undefined) {
clearTimeout(killTimer);
}
this.processes.delete(requestId);
resolve({
exitCode: code ?? 1,
stdout: stdoutBuf,
stderr: stderrBuf,
timedOut,
signal: signal ?? undefined,
});
});
child.on("error", (err) => {
clearTimeout(timeoutTimer);
if (killTimer !== undefined) {
clearTimeout(killTimer);
}
this.processes.delete(requestId);
resolve({
exitCode: 1,
stdout: stdoutBuf,
stderr: err.message,
timedOut: false,
});
});
});
}
/**
* Send a signal to a running process.
*
* @returns `true` if the process was found and the signal was sent.
*/
sendSignal(requestId: string, signal: NodeJS.Signals): boolean {
const child = this.processes.get(requestId);
if (!child) {
return false;
}
return child.kill(signal);
}
/**
* Write data to the stdin of a running process.
*
* @returns `true` if the process was found and stdin was writable.
*/
writeStdin(requestId: string, data: string): boolean {
const child = this.processes.get(requestId);
if (!child?.stdin || child.stdin.destroyed) {
return false;
}
return child.stdin.write(data);
}
}

View File

@@ -0,0 +1,38 @@
import winston from "winston";
import DailyRotateFile from "winston-daily-rotate-file";
const LOG_DIR = process.env["LOG_DIR"] ?? "/var/log/lab-agent";
const logger = winston.createLogger({
level: process.env["LOG_LEVEL"] ?? "info",
format: winston.format.combine(
winston.format.timestamp(),
winston.format.json(),
),
transports: [
new winston.transports.Console({
format: winston.format.combine(
winston.format.colorize(),
winston.format.simple(),
),
}),
new DailyRotateFile({
dirname: LOG_DIR,
filename: "agent-%DATE%.log",
maxSize: "20m",
maxFiles: "14d",
}),
],
});
/**
* Create a child logger scoped to a specific component.
*
* The returned logger inherits all transports and configuration from the root
* logger but attaches a `component` metadata field to every log entry.
*/
export function createChildLogger(component: string): winston.Logger {
return logger.child({ component });
}
export { logger };

View File

@@ -0,0 +1,111 @@
// Tests for CommandExecutor.
import { describe, it, expect } from "vitest";
import { CommandExecutor } from "../src/services/executor.js";
describe("CommandExecutor", () => {
it("executes a simple command", async () => {
const exec = new CommandExecutor();
const result = await exec.execute("req-1", {
command: ["echo", "hello"],
timeout: 5000,
tty: false,
});
expect(result.exitCode).toBe(0);
expect(result.stdout.trim()).toBe("hello");
expect(result.timedOut).toBe(false);
});
it("captures stderr", async () => {
const exec = new CommandExecutor();
const result = await exec.execute("req-2", {
command: ["sh", "-c", "echo err >&2"],
timeout: 5000,
tty: false,
});
expect(result.exitCode).toBe(0);
expect(result.stderr.trim()).toBe("err");
});
it("returns non-zero exit code", async () => {
const exec = new CommandExecutor();
const result = await exec.execute("req-3", {
command: ["sh", "-c", "exit 42"],
timeout: 5000,
tty: false,
});
expect(result.exitCode).toBe(42);
});
it("times out long-running commands", async () => {
const exec = new CommandExecutor();
const result = await exec.execute("req-4", {
command: ["sleep", "60"],
timeout: 200,
tty: false,
});
expect(result.timedOut).toBe(true);
}, 10_000);
it("emits stdout events for streaming", async () => {
const exec = new CommandExecutor();
const chunks: string[] = [];
exec.on("stdout", (_reqId: string, chunk: string) => {
chunks.push(chunk);
});
await exec.execute("req-5", {
command: ["echo", "streamed"],
timeout: 5000,
tty: false,
});
expect(chunks.join("").trim()).toBe("streamed");
});
it("sends signal to running process", async () => {
const exec = new CommandExecutor();
// Start a long process
const promise = exec.execute("req-6", {
command: ["sleep", "60"],
timeout: 30000,
tty: false,
});
// Give it time to start
await new Promise((r) => setTimeout(r, 100));
const sent = exec.sendSignal("req-6", "SIGTERM");
expect(sent).toBe(true);
const result = await promise;
expect(result.exitCode).not.toBe(0);
}, 10_000);
it("sendSignal returns false for unknown request", () => {
const exec = new CommandExecutor();
expect(exec.sendSignal("nonexistent", "SIGTERM")).toBe(false);
});
it("uses custom cwd", async () => {
const exec = new CommandExecutor();
const result = await exec.execute("req-7", {
command: ["pwd"],
timeout: 5000,
tty: false,
cwd: "/tmp",
});
expect(result.stdout.trim()).toBe("/tmp");
});
it("uses custom env", async () => {
const exec = new CommandExecutor();
const result = await exec.execute("req-8", {
command: ["sh", "-c", "echo $MY_VAR"],
timeout: 5000,
tty: false,
env: { MY_VAR: "test_value" },
});
expect(result.stdout.trim()).toBe("test_value");
});
});

View File

@@ -0,0 +1,12 @@
{
"extends": "../../tsconfig.base.json",
"compilerOptions": {
"rootDir": "src",
"outDir": "dist",
"composite": true
},
"include": ["src/**/*.ts"],
"references": [
{ "path": "../shared" }
]
}

View File

@@ -16,18 +16,29 @@
"clean": "rimraf dist",
"dev": "tsx src/main.ts",
"db:push": "prisma db push",
"db:migrate": "prisma migrate dev",
"db:generate": "prisma generate"
"db:generate": "prisma generate",
"db:migrate:dev": "prisma migrate dev",
"db:migrate:deploy": "prisma migrate deploy",
"db:migrate:reset": "prisma migrate reset",
"db:seed": "tsx prisma/seed.ts",
"db:studio": "prisma studio"
},
"dependencies": {
"@fastify/rate-limit": "^10.3.0",
"@fastify/websocket": "^11.0.2",
"@lab/shared": "workspace:*",
"@prisma/client": "^6.9.0",
"fastify": "^5.3.3",
"@fastify/websocket": "^11.0.2",
"winston": "^3.17.0"
"winston": "^3.17.0",
"ws": "^8.19.0",
"zod": "^4.3.6"
},
"prisma": {
"seed": "tsx prisma/seed.ts"
},
"devDependencies": {
"@types/node": "^22.14.1",
"@types/ws": "^8.18.1",
"prisma": "^6.9.0",
"rimraf": "^6.1.3",
"tsx": "^4.21.0",

View File

@@ -133,6 +133,17 @@ model PulumiRun {
@@index([stackName])
}
model Bastion {
id String @id @default(uuid())
hostname String @unique
network String
serverIp String
status String @default("offline") // online, offline
lastHeartbeat DateTime?
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
model Cluster {
id String @id @default(uuid())
name String @unique

View File

@@ -0,0 +1,113 @@
import { PrismaClient } from "@prisma/client";
const db = new PrismaClient();
async function main() {
console.log("Seeding database with default RBAC roles and permissions...");
// --- Admin role: full wildcard access ---
const admin = await db.role.upsert({
where: { name: "admin" },
update: {
description: "Full administrative access to all resources",
},
create: {
name: "admin",
description: "Full administrative access to all resources",
permissions: {
create: [
{
type: "allow",
action: "*",
cloud: "*",
environment: "*",
server: "*",
},
],
},
},
});
console.log(` Upserted role: ${admin.name} (${admin.id})`);
// --- Viewer role: read-only access ---
const viewer = await db.role.upsert({
where: { name: "viewer" },
update: {
description: "Read-only access to all resources",
},
create: {
name: "viewer",
description: "Read-only access to all resources",
permissions: {
create: [
{
type: "allow",
action: "read",
cloud: "*",
environment: "*",
server: "*",
},
],
},
},
});
console.log(` Upserted role: ${viewer.name} (${viewer.id})`);
// --- Operator role: read/exec/kubectl allowed, destroy denied ---
const operator = await db.role.upsert({
where: { name: "operator" },
update: {
description:
"Operational access: read, exec, and kubectl — destroy denied",
},
create: {
name: "operator",
description:
"Operational access: read, exec, and kubectl — destroy denied",
permissions: {
create: [
{
type: "allow",
action: "read",
cloud: "*",
environment: "*",
server: "*",
},
{
type: "allow",
action: "exec",
cloud: "*",
environment: "*",
server: "*",
},
{
type: "allow",
action: "kubectl",
cloud: "*",
environment: "*",
server: "*",
},
{
type: "deny",
action: "destroy",
cloud: "*",
environment: "*",
server: "*",
},
],
},
},
});
console.log(` Upserted role: ${operator.name} (${operator.id})`);
console.log("Seed complete.");
}
main()
.catch((error) => {
console.error("Seed failed:", error);
process.exit(1);
})
.finally(async () => {
await db.$disconnect();
});

View File

@@ -4,59 +4,54 @@
import { loadConfig } from "./config.js";
import { createApp } from "./server.js";
import { logger } from "./services/logger.js";
import { setupGracefulShutdown } from "./services/shutdown.js";
async function main(): Promise<void> {
const config = loadConfig();
// Initialize Prisma client (wrapped in try/catch for when DB isn't available)
let db;
let db: import("./server.js").DbClient;
try {
const { PrismaClient } = await import("@prisma/client");
const prisma = new PrismaClient({
datasources: config.databaseUrl
? { db: { url: config.databaseUrl } }
: undefined,
});
const prisma = config.databaseUrl
? new PrismaClient({ datasources: { db: { url: config.databaseUrl } } })
: new PrismaClient();
await prisma.$connect();
logger.info("Database connected");
db = prisma;
db = prisma as unknown as import("./server.js").DbClient;
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
logger.warn(`Database not available: ${message}`);
logger.warn("Running without database -- some features will be unavailable");
// Create a stub db client that returns errors for all operations
const dbError = (): never => {
throw new Error("Database not connected");
};
db = {
$queryRaw: async () => {
throw new Error("Database not connected");
},
$queryRaw: () => dbError(),
$disconnect: async () => {},
server: {
findMany: async () => {
throw new Error("Database not connected");
},
findUnique: async () => {
throw new Error("Database not connected");
},
findMany: () => dbError(),
findUnique: () => dbError(),
},
joinToken: {
findUnique: async () => {
throw new Error("Database not connected");
},
findMany: async () => {
throw new Error("Database not connected");
},
create: async () => {
throw new Error("Database not connected");
},
update: async () => {
throw new Error("Database not connected");
},
findUnique: () => dbError(),
findMany: () => dbError(),
create: () => dbError(),
update: () => dbError(),
},
bastion: {
upsert: () => dbError(),
findMany: () => dbError(),
findUnique: () => dbError(),
update: () => dbError(),
},
};
}
// Create Fastify app
const { app } = createApp(config, db);
const { app } = await createApp(config, db);
// Start server
try {
@@ -68,18 +63,7 @@ async function main(): Promise<void> {
}
// Graceful shutdown
const shutdown = async (): Promise<void> => {
logger.info("Shutting down...");
await app.close();
if (db !== null && "$disconnect" in db) {
await (db as { $disconnect: () => Promise<void> }).$disconnect();
}
logger.info("Goodbye");
process.exit(0);
};
process.on("SIGINT", () => void shutdown());
process.on("SIGTERM", () => void shutdown());
setupGracefulShutdown({ app, db });
// Keep process alive
await new Promise(() => {});

View File

@@ -0,0 +1,50 @@
// Rate limiting middleware for labd API.
// Applies global rate limits and stricter limits for sensitive routes.
import type { FastifyInstance } from "fastify";
import rateLimit from "@fastify/rate-limit";
import { logger } from "../services/logger.js";
/** Routes that require stricter rate limiting. */
const SENSITIVE_ROUTE_LIMITS: Record<string, number> = {
"/api/auth/enroll": 10,
"/api/tokens": 20,
};
/**
* Register the @fastify/rate-limit plugin with global defaults
* and apply stricter limits to sensitive routes.
*/
export async function setupRateLimiting(
app: FastifyInstance,
): Promise<void> {
await app.register(rateLimit, {
max: 100,
timeWindow: "1 minute",
keyGenerator: (request) => request.ip,
errorResponseBuilder: (_request, context) => ({
error: "Too many requests",
code: "RATE_LIMITED",
retryAfter: context.after,
}),
});
// Apply stricter per-route limits for sensitive endpoints.
app.addHook("onRoute", (routeOptions) => {
const url = routeOptions.url;
for (const [prefix, max] of Object.entries(SENSITIVE_ROUTE_LIMITS)) {
if (url.startsWith(prefix)) {
routeOptions.config = {
...routeOptions.config,
rateLimit: {
max,
timeWindow: "1 minute",
},
};
logger.info(`Rate limit: ${url} -> ${max} req/min`);
break;
}
}
});
}

View File

@@ -0,0 +1,20 @@
// Agent connection routes.
// GET /api/agents — list currently connected agents (excludes raw socket)
import type { FastifyInstance } from "fastify";
import { agentRegistry } from "../services/agent-registry.js";
export function registerAgentRoutes(app: FastifyInstance): void {
app.get("/api/agents", async (_request, reply) => {
const agents = agentRegistry.getAllConnected().map((agent) => ({
serverId: agent.serverId,
hostname: agent.hostname,
connectedAt: agent.connectedAt,
lastHeartbeat: agent.lastHeartbeat,
version: agent.version,
certFingerprint: agent.certFingerprint,
}));
return reply.send(agents);
});
}

View File

@@ -0,0 +1,207 @@
// Bastion management routes.
// GET /api/bastions — list connected bastions
// GET /api/machines — aggregated machines from all bastions
// POST /api/machines/install — queue install on correct bastion
// DELETE /api/machines/:mac — forget machine on correct bastion
// POST /api/machines/role — update role on correct bastion
// GET /api/machines/:mac/logs — get provision logs from correct bastion
import type { FastifyInstance } from "fastify";
import type { DbClient } from "../server.js";
import { bastionRegistry } from "../services/bastion-registry.js";
import { generateRequestId } from "@lab/shared";
const COMMAND_TIMEOUT_MS = 15_000;
/** Send a command to a bastion and wait for the response. */
function sendCommand(
bastionId: string,
msg: Record<string, unknown>,
): Promise<{ status: string; data?: unknown; error?: string | undefined }> {
const bastion = bastionRegistry.getById(bastionId);
if (!bastion) {
return Promise.reject(new Error(`Bastion ${bastionId} not connected`));
}
const requestId = generateRequestId();
const fullMsg = { ...msg, requestId };
return new Promise((resolve, reject) => {
const timeout = setTimeout(() => {
cleanup();
reject(new Error("Command timed out"));
}, COMMAND_TIMEOUT_MS);
const handler = (data: Buffer) => {
try {
const parsed = JSON.parse(data.toString()) as { type: string; requestId?: string; status?: string; data?: unknown; error?: string };
if (parsed.type === "command-response" && parsed.requestId === requestId) {
cleanup();
resolve({ status: parsed.status ?? "ok", data: parsed.data, error: parsed.error });
}
} catch { /* not our message */ }
};
const cleanup = () => {
clearTimeout(timeout);
bastion.socket.off("message", handler);
};
bastion.socket.on("message", handler);
bastion.socket.send(JSON.stringify(fullMsg));
});
}
export function registerBastionRoutes(app: FastifyInstance, db: DbClient): void {
// List all bastions (DB records enriched with online status from registry)
app.get("/api/bastions", async () => {
const dbBastions = await db.bastion.findMany() as Array<{
id: string; hostname: string; network: string; serverIp: string;
status: string; lastHeartbeat: Date | null; createdAt: Date;
}>;
return dbBastions.map((b) => {
const connected = bastionRegistry.getById(b.id);
return {
id: b.id,
hostname: b.hostname,
network: b.network,
serverIp: b.serverIp,
status: connected ? "online" : "offline",
lastHeartbeat: connected?.lastHeartbeat ?? b.lastHeartbeat,
connectedAt: connected?.connectedAt,
machineCount: connected
? Object.keys(connected.state.discovered).length +
Object.keys(connected.state.install_queue).length +
Object.keys(connected.state.installed).length
: 0,
createdAt: b.createdAt,
};
});
});
// Aggregated machines from all connected bastions
app.get("/api/machines", async () => {
return bastionRegistry.getAggregatedState();
});
// Queue install — route to correct bastion by MAC
app.post<{
Body: { mac?: string; hostname?: string; disk?: string; role?: string; os?: string };
}>("/api/machines/install", async (request, reply) => {
const { mac, hostname, disk, role, os } = request.body ?? {};
if (!mac || !hostname) {
return reply.code(400).send({ error: "mac and hostname are required" });
}
// Find bastion that knows this MAC, or let caller specify
const bastion = bastionRegistry.findBastionByMac(mac);
if (!bastion) {
// If only one bastion is connected, use it
const all = bastionRegistry.getAll();
if (all.length === 0) {
return reply.code(503).send({ error: "No bastions connected" });
}
if (all.length === 1) {
try {
const result = await sendCommand(all[0]!.bastionId, {
type: "command-install",
mac, hostname, disk: disk ?? "/dev/sda", role: role ?? "infra", os: os ?? "fedora-43",
});
return reply.code(result.status === "ok" ? 200 : 500).send(result);
} catch (err) {
return reply.code(500).send({ error: err instanceof Error ? err.message : String(err) });
}
}
return reply.code(404).send({ error: `MAC ${mac} not found on any bastion` });
}
try {
const result = await sendCommand(bastion.bastionId, {
type: "command-install",
mac, hostname, disk: disk ?? "/dev/sda", role: role ?? "infra", os: os ?? "fedora-43",
});
return reply.code(result.status === "ok" ? 200 : 500).send(result);
} catch (err) {
return reply.code(500).send({ error: err instanceof Error ? err.message : String(err) });
}
});
// Forget machine
app.delete<{ Params: { mac: string } }>("/api/machines/:mac", async (request, reply) => {
const mac = request.params.mac.toLowerCase().replace(/-/g, ":");
const bastion = bastionRegistry.findBastionByMac(mac);
if (!bastion) {
return reply.code(404).send({ error: `MAC ${mac} not found on any bastion` });
}
try {
const result = await sendCommand(bastion.bastionId, { type: "command-forget", mac });
return reply.send(result);
} catch (err) {
return reply.code(500).send({ error: err instanceof Error ? err.message : String(err) });
}
});
// Update role
app.post<{
Body: { mac?: string; role?: string };
}>("/api/machines/role", async (request, reply) => {
const { mac, role } = request.body ?? {};
if (!mac || !role) {
return reply.code(400).send({ error: "mac and role are required" });
}
const normalized = mac.toLowerCase().replace(/-/g, ":");
const bastion = bastionRegistry.findBastionByMac(normalized);
if (!bastion) {
return reply.code(404).send({ error: `MAC ${normalized} not found on any bastion` });
}
try {
const result = await sendCommand(bastion.bastionId, { type: "command-role-update", mac: normalized, role });
return reply.send(result);
} catch (err) {
return reply.code(500).send({ error: err instanceof Error ? err.message : String(err) });
}
});
// Machine logs (snapshot from bastion's state)
app.get<{ Params: { mac: string } }>("/api/machines/:mac/logs", async (request, reply) => {
const mac = request.params.mac.toLowerCase().replace(/-/g, ":");
const bastion = bastionRegistry.findBastionByMac(mac);
if (!bastion) {
return reply.code(404).send({ error: `MAC ${mac} not found` });
}
const queued = bastion.state.install_queue[mac];
const installed = bastion.state.installed[mac];
if (installed) {
return {
mac,
hostname: installed.hostname,
status: "installed",
role: installed.role,
ip: installed.ip,
installed_at: installed.installed_at,
};
}
if (queued) {
return {
mac,
hostname: queued.hostname,
status: queued.progress ? "installing" : "queued",
progress: queued.progress,
progress_detail: queued.progress_detail,
progress_at: queued.progress_at,
role: queued.role,
os: queued.os,
log: queued.log,
};
}
return reply.code(404).send({ error: `MAC ${mac} not found in install queue or installed` });
});
}

View File

@@ -1,10 +1,85 @@
// Health check routes.
import type { FastifyInstance } from "fastify";
import { performance } from "node:perf_hooks";
import type { DbClient } from "../server.js";
import { agentRegistry } from "../services/agent-registry.js";
import { isShuttingDownNow } from "../services/shutdown.js";
// ---------------------------------------------------------------------------
// Types
// ---------------------------------------------------------------------------
export interface ComponentStatus {
status: "up" | "down" | "degraded";
latency?: number;
message?: string;
}
export interface HealthStatus {
status: "healthy" | "degraded" | "unhealthy";
version: string;
uptime: number;
timestamp: string;
components: {
database: ComponentStatus;
agents: ComponentStatus;
};
}
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
async function checkDatabase(db: DbClient): Promise<ComponentStatus> {
const start = performance.now();
try {
await db.$queryRaw`SELECT 1`;
const latency = Math.round((performance.now() - start) * 100) / 100;
return { status: "up", latency };
} catch (err) {
const latency = Math.round((performance.now() - start) * 100) / 100;
const message = err instanceof Error ? err.message : "Unknown error";
return { status: "down", latency, message };
}
}
function checkAgents(): ComponentStatus {
const count = agentRegistry.getConnectedCount();
return {
status: count > 0 ? "up" : "degraded",
message: `${count} agent(s) connected`,
};
}
function aggregateStatus(
components: HealthStatus["components"],
): { status: HealthStatus["status"]; statusCode: number } {
const statuses = Object.values(components);
if (statuses.some((c) => c.status === "down")) {
return { status: "unhealthy", statusCode: 503 };
}
if (statuses.some((c) => c.status === "degraded")) {
return { status: "degraded", statusCode: 200 };
}
return { status: "healthy", statusCode: 200 };
}
// ---------------------------------------------------------------------------
// Route registration
// ---------------------------------------------------------------------------
export function registerHealthRoutes(app: FastifyInstance, db: DbClient): void {
// ---- existing /healthz (preserved for backward compat) ------------------
app.get("/healthz", async (_request, reply) => {
if (isShuttingDownNow()) {
return reply.code(503).send({
status: "shutting_down",
uptime: process.uptime(),
timestamp: new Date().toISOString(),
});
}
let dbOk = false;
try {
await db.$queryRaw`SELECT 1`;
@@ -25,4 +100,45 @@ export function registerHealthRoutes(app: FastifyInstance, db: DbClient): void {
},
});
});
// ---- GET /health — simple probe for k8s --------------------------------
app.get("/health", async (_request, reply) => {
return reply.code(200).send({ status: "ok" });
});
// ---- GET /health/detailed — full component check -----------------------
app.get("/health/detailed", async (_request, reply) => {
const database = await checkDatabase(db);
const agents = checkAgents();
const components = { database, agents };
const { status, statusCode } = aggregateStatus(components);
const body: HealthStatus = {
status,
version: process.env["LABD_VERSION"] ?? "0.1.0",
uptime: process.uptime(),
timestamp: new Date().toISOString(),
components,
};
return reply.code(statusCode).send(body);
});
// ---- GET /health/live — liveness probe ---------------------------------
app.get("/health/live", async (_request, reply) => {
return reply.code(200).send({ status: "alive" });
});
// ---- GET /health/ready — readiness probe (needs DB) --------------------
app.get("/health/ready", async (_request, reply) => {
const database = await checkDatabase(db);
if (database.status === "down") {
return reply.code(503).send({
status: "not_ready",
reason: database.message,
});
}
return reply.code(200).send({ status: "ready" });
});
}

View File

@@ -7,28 +7,43 @@ import { logger } from "./services/logger.js";
import { registerHealthRoutes } from "./routes/health.js";
import { registerServerRoutes } from "./routes/servers.js";
import { registerAuthRoutes } from "./routes/auth.js";
import { registerAgentRoutes } from "./routes/agents.js";
import { registerBastionRoutes } from "./routes/bastions.js";
import { setupRateLimiting } from "./middleware/rate-limit.js";
import { bastionRegistry } from "./services/bastion-registry.js";
import { isBastionMessage } from "@lab/shared";
export interface DbClient {
$queryRaw: (query: TemplateStringsArray) => Promise<unknown>;
$queryRaw: (...args: unknown[]) => Promise<unknown>;
$disconnect?: () => Promise<void>;
server: {
findMany: (args?: unknown) => Promise<unknown[]>;
findUnique: (args: unknown) => Promise<unknown>;
findMany: (...args: unknown[]) => Promise<unknown[]>;
findUnique: (...args: unknown[]) => Promise<unknown>;
};
joinToken: {
findUnique: (args: unknown) => Promise<unknown>;
findMany: (args?: unknown) => Promise<unknown[]>;
create: (args: unknown) => Promise<unknown>;
update: (args: unknown) => Promise<unknown>;
findUnique: (...args: unknown[]) => Promise<unknown>;
findMany: (...args: unknown[]) => Promise<unknown[]>;
create: (...args: unknown[]) => Promise<unknown>;
update: (...args: unknown[]) => Promise<unknown>;
};
bastion: {
upsert: (...args: unknown[]) => Promise<unknown>;
findMany: (...args: unknown[]) => Promise<unknown[]>;
findUnique: (...args: unknown[]) => Promise<unknown>;
update: (...args: unknown[]) => Promise<unknown>;
};
}
export function createApp(_config: LabdConfig, db: DbClient): {
export async function createApp(_config: LabdConfig, db: DbClient): Promise<{
app: ReturnType<typeof Fastify>;
} {
}> {
const app = Fastify({
logger: false, // We use winston instead
});
// Register rate limiting before routes
await setupRateLimiting(app);
// Register WebSocket support
void app.register(websocket);
@@ -36,6 +51,8 @@ export function createApp(_config: LabdConfig, db: DbClient): {
registerHealthRoutes(app, db);
registerServerRoutes(app, db);
registerAuthRoutes(app, db);
registerAgentRoutes(app);
registerBastionRoutes(app, db);
// WebSocket handler for agent connections
app.register(async (fastify) => {
@@ -54,6 +71,148 @@ export function createApp(_config: LabdConfig, db: DbClient): {
});
});
// WebSocket handler for bastion connections
app.register(async (fastify) => {
fastify.get("/ws/bastion", { websocket: true }, (socket, _request) => {
let bastionId: string | null = null;
logger.info("Bastion WebSocket connection established");
socket.on("message", (data: Buffer) => {
try {
const raw = data.toString();
const msg: unknown = JSON.parse(raw);
if (!isBastionMessage(msg)) {
logger.warn(`Unknown bastion message: ${(msg as { type?: string }).type}`);
return;
}
switch (msg.type) {
case "bastion-enroll": {
// Validate the join token
void (async () => {
try {
const joinToken = await db.joinToken.findUnique({ where: { token: msg.token } }) as {
id: string; type: string; usedBy: string | null; revokedAt: Date | null; expiresAt: Date | null;
} | null;
if (!joinToken || joinToken.revokedAt !== null) {
logger.warn(`Bastion enrollment rejected: invalid/revoked token from ${msg.hostname}`);
socket.send(JSON.stringify({ type: "error", error: "Invalid or revoked token" }));
socket.close();
return;
}
if (joinToken.expiresAt !== null && joinToken.expiresAt < new Date()) {
logger.warn(`Bastion enrollment rejected: expired token from ${msg.hostname}`);
socket.send(JSON.stringify({ type: "error", error: "Token expired" }));
socket.close();
return;
}
if (joinToken.type === "one-time" && joinToken.usedBy !== null) {
logger.warn(`Bastion enrollment rejected: already-used token from ${msg.hostname}`);
socket.send(JSON.stringify({ type: "error", error: "Token already used" }));
socket.close();
return;
}
// Mark token as used
await db.joinToken.update({
where: { id: joinToken.id },
data: { usedBy: `bastion:${msg.hostname}`, usedAt: new Date() },
});
// Upsert bastion record
const record = await db.bastion.upsert({
where: { hostname: msg.hostname },
create: { hostname: msg.hostname, network: msg.network, serverIp: msg.serverIp, status: "online" },
update: { network: msg.network, serverIp: msg.serverIp, status: "online", lastHeartbeat: new Date() },
}) as { id: string };
bastionId = record.id;
bastionRegistry.register({
bastionId: record.id,
hostname: msg.hostname,
network: msg.network,
serverIp: msg.serverIp,
socket,
connectedAt: new Date(),
lastHeartbeat: new Date(),
state: { discovered: {}, install_queue: {}, installed: {} },
});
socket.send(JSON.stringify({ type: "bastion-enrolled", bastionId: record.id }));
logger.info(`BASTION ENROLLED: ${msg.hostname} (${msg.network}) as ${record.id.slice(0, 8)}...`);
} catch (err) {
logger.error(`Bastion enrollment error: ${err instanceof Error ? err.message : String(err)}`);
socket.close();
}
})();
break;
}
case "bastion-state-sync": {
if (!bastionId && msg.bastionId) {
// Reconnection with known bastionId — re-register
bastionId = msg.bastionId;
const existing = bastionRegistry.getById(bastionId);
if (!existing) {
bastionRegistry.register({
bastionId,
hostname: "reconnecting",
network: "",
serverIp: "",
socket,
connectedAt: new Date(),
lastHeartbeat: new Date(),
state: msg.state,
});
// Update DB status
void db.bastion.update({ where: { id: bastionId }, data: { status: "online", lastHeartbeat: new Date() } });
}
}
if (bastionId) {
bastionRegistry.updateState(bastionId, msg.state);
logger.info(`Bastion ${bastionId.slice(0, 8)} state sync: ${Object.keys(msg.state.discovered).length} discovered, ${Object.keys(msg.state.installed).length} installed`);
}
break;
}
case "bastion-heartbeat": {
if (bastionId) {
bastionRegistry.updateHeartbeat(bastionId);
socket.send(JSON.stringify({ type: "bastion-heartbeat-ack", serverTime: new Date().toISOString() }));
}
break;
}
case "bastion-progress": {
// Forward to any SSE subscribers (future)
logger.info(`Bastion progress: ${msg.mac} -> ${msg.stage}: ${msg.detail}`);
break;
}
case "command-response": {
// Handled by the pending command listener in bastions.ts routes
break;
}
}
} catch (err) {
logger.error(`Failed to parse bastion message: ${err instanceof Error ? err.message : String(err)}`);
}
});
socket.on("close", () => {
if (bastionId) {
logger.info(`Bastion ${bastionId.slice(0, 8)} disconnected`);
bastionRegistry.unregister(bastionId);
void db.bastion.update({ where: { id: bastionId }, data: { status: "offline" } }).catch(() => {});
}
});
});
});
// Log all requests
app.addHook("onRequest", async (request) => {
logger.info(`HTTP: ${request.ip} ${request.method} ${request.url}`);

Some files were not shown because too many files have changed in this diff Show More