# PRD: Resource Tracking & kubectl-style CLI ## Problem The lab platform currently has fragmented state management: - Bastion keeps machine state in an ephemeral JSON file (`/tmp/lab-bastion/state.json`) that is lost on pod restart - labd receives state syncs from bastions but only stores them in memory — the `Server` table in CockroachDB is never written to - There is no system to track relationships between resources (servers belong to clusters, clusters run on servers, networks connect servers) - The CLI (`labctl`) uses an inconsistent verb-noun structure (`labctl provision list`, `labctl app k3s install`) instead of a uniform resource-oriented pattern - RBAC permissions reference resources (server, cloud, environment) but there is no resource registry to validate against ## Vision A unified resource tracking system where all infrastructure objects (servers, clusters, networks, bastions, VMs) are persisted in CockroachDB via labd, with relationships between them, and managed through a kubectl-style CLI. This replaces the ephemeral JSON state and becomes the single source of truth for the platform. ## Current State ### Database (CockroachDB via Prisma) Existing models that are scaffolded but mostly unused: - `Server` — hostname, mac, cloud, environment, role, labels, ip, status (0 rows) - `Agent` — mTLS certificate enrollment per server (0 rows) - `Bastion` — PXE server registration (1 row, labmaster) - `Cluster` — k8s cluster metadata (0 rows) - `User`, `Role`, `Permission`, `UserRole` — RBAC framework (seeded with 3 roles, 6 permissions) - `JoinToken` — agent/bastion enrollment tokens - `AuditLog` — action audit trail ### Bastion State (ephemeral JSON) Three categories tracked per-bastion: - `discovered` — machines found via PXE with hardware info (CPU, RAM, disks, NICs, arch) - `install_queue` — machines queued for OS install with progress tracking - `installed` — machines with OS installed (hostname, role, IP, OS) ### CLI Structure (current) ``` labctl init bastion standalone [start|stop|status] labctl provision [list|install|reprovision|forget|logs] labctl app [k3s|labcontroller] labctl config [list|get|set] labctl roles labctl doctor labctl login labctl logs ``` ## Requirements ### 1. Persist Bastion State to Database When labd receives `bastion-state-sync` messages, it must upsert machines into the `Server` table: - Discovered machines → create/update Server with status "discovered", store HardwareInfo as JSON labels - Queued machines → update Server status to "provisioning" - Installed machines → update Server with hostname, IP, role, OS, status "installed" - Track which bastion owns which server (add `bastionId` to Server model) - Track hardware info: arch, cpu_model, cpu_cores, memory_gb, disks, nics The bastion's local JSON state becomes a cache; labd's database is the source of truth. On bastion startup, it should load its state from labd if available. ### 2. Resource Model Expansion Add new models to the Prisma schema for tracking infrastructure: **Network** — L2/L3 network segments - name, cidr, vlan, gateway, domain, dhcpEnabled - Servers have NICs on networks **ServerNic** — NIC-to-network mapping - serverId, networkId, mac, ip, name, state (UP/DOWN) - Derived from HardwareInfo during discovery **ServerDisk** — Disk inventory per server - serverId, name, sizeGb, model - Derived from HardwareInfo during discovery **ClusterMember** — Server-to-cluster membership - clusterId, serverId, role (control-plane, worker) ### 3. kubectl-style CLI Redesign Restructure labctl to follow the `mcpctl` / `kubectl` pattern: ``` # Core CRUD verbs that work on any resource labctl get [name] # List or get specific resource labctl describe # Detailed view with relationships labctl create [flags] # Create a resource labctl delete # Delete a resource labctl edit # Edit in $EDITOR labctl apply -f # Declarative apply from YAML # Resource types (with aliases) servers (server, srv) clusters (cluster) networks (network, net) bastions (bastion) roles (role) users (user) tokens (token) audit (audit) # Output formats -o table (default), -o json, -o yaml, -o wide # Examples labctl get servers # List all servers labctl get servers -o wide # With extra columns (disks, NICs) labctl get server labmaster # Get specific server labctl describe server labmaster # Full details + relationships labctl get servers --role worker # Filter by role labctl get servers --status discovered # Filter by status labctl get clusters # List clusters labctl describe cluster lab-k3s # Cluster members, health labctl get networks # List networks labctl create network --name lab --cidr 192.168.8.0/24 --gateway 192.168.8.1 # Provisioning becomes actions on server resources labctl provision --os fedora-43 --role worker # Queue install labctl reprovision # Reinstall labctl forget # Remove from tracking # App management stays as-is but simplified labctl app install k3s labctl app health k3s [server] # Admin labctl bastion start [--foreground] # Start local bastion labctl bastion status # Bastion health labctl login # Auth labctl doctor # Diagnostics ``` ### 4. Resource Aliases & Resolution Follow mcpctl's pattern from `shared.ts`: - Accept singular, plural, and short aliases: `server`, `servers`, `srv` all resolve to the same resource - Accept name or ID: `labctl get server labmaster` or `labctl get server ` - Accept MAC address for servers: `labctl get server 38:05:25:33:e2:e4` ### 5. RBAC Integration The existing Permission model uses `action:cloud:environment:server` patterns. Wire this into the resource system: - CLI commands check permissions before executing - `labctl get` respects read permissions (only show resources the user can see) - `labctl provision` requires `apply` permission on the target server - `labctl delete` requires `destroy` permission - Audit all resource operations to the AuditLog table ### 6. Bastion State Directory Fix Fix the bug where the CLI's `--dir` default (`/tmp/lab-bastion`) overrides the `BASTION_DIR=/data` environment variable. The CLI option should use the env var as its default: ```typescript .option("--dir ", "Bastion data directory", process.env["BASTION_DIR"] ?? "/tmp/lab-bastion") ``` ## Technical Constraints - Database: CockroachDB with Prisma ORM (already deployed) - API: Fastify + WebSocket (labd) - CLI: Commander.js (labctl) - Auth: mTLS certificates (planned), join tokens (implemented) - Monorepo: pnpm workspace with @lab/shared, @lab/bastion, @lab/cli, @lab/labd - The bastion-to-labd WebSocket protocol is defined in @lab/shared/protocol ## Success Criteria 1. `labctl get servers` shows all machines (discovered, provisioning, installed) from the database 2. Server state survives bastion and labd pod restarts 3. `labctl describe server ` shows hardware info, network, cluster membership 4. Resources have tracked relationships (server→cluster, server→network, bastion→server) 5. RBAC permissions are enforced on CLI operations 6. All resource mutations are audit-logged 7. CLI follows consistent kubectl-style `verb resource [name] [flags]` pattern