# labctl Platform — Implementation Status ## What This Document Is An honest assessment of what code exists, what works, what is stubbed, and what hasn't been started — measured against the PRD phases. --- ## Architecture Overview (as built) ``` labctl CLI ──HTTP──▶ bastion (PXE server) ← WORKING labctl CLI ──HTTP──▶ labd (master daemon) ← PARTIALLY WORKING │ ├── CockroachDB/Prisma ← SCHEMA DEFINED, NOT DEPLOYED ├── /ws/agent WebSocket ← ACCEPTS CONNECTIONS, DOES NOT ROUTE └── mTLS CA ← NOT IMPLEMENTED lab-agent ──WS──▶ labd ← LIBRARY CODE, NO DAEMON BINARY ``` --- ## Package Inventory | Package | Lines of Source | Tests | Status | |---------|---------------|-------|--------| | @lab/shared | ~200 | 0 | Complete — types, protocol, errors | | @lab/bastion | ~800 | 32 | **Production-ready** — PXE discovery, install, reprovision | | @lab/cli | ~600 | 0 (uses bastion tests) | Complete — all commands implemented | | @lab/labd | ~500 | 2 | Partial — routes exist, core features stubbed | | @lab/agent | ~300 | 0 | Library only — no daemon binary | All 5 packages compile. 32 tests pass. --- ## Phase 1: Foundation ### DONE — Working in production | Feature | Code | How It Works | |---------|------|-------------| | PXE bastion server | `src/bastion/` | Fastify HTTP + dnsmasq DHCP/TFTP. Machines PXE boot, get iPXE script from `/dispatch?mac=XX`, chain to discovery or install kickstart. State persisted to JSON file. | | Machine discovery | `routes/dispatch.ts`, `templates/discover.ks.ts` | Unknown MACs get a mini-kickstart that boots a RAM-only Fedora, scrapes hardware via `/proc`, `/sys`, `dmidecode`, POSTs to `/api/discover`, then reboots. No disk touch. | | Machine installation | `routes/api.ts`, `templates/install.ks.ts` | Queue a MAC via `POST /api/install`. Next PXE boot gets a full Kickstart with LVM partitioning (worker: longhorn LV, infra: rancher LV), SSH keys, k3s kernel prereqs, progress callbacks. | | Reprovision with data preservation | `commands/reprovision.ts`, `install.ks.ts` | `%pre` script detects existing LVM. Reformats `/`, `/var`, `/boot` but preserves `/home`, `/srv`, `/var/lib/longhorn`, `/var/lib/rancher`. | | CLI: init/provision commands | `src/cli/src/commands/` | `labctl init bastion standalone start/stop/status`, `labctl provision list/install/reprovision/forget`. All talk to bastion HTTP API. | | CLI: config management | `config/index.ts`, `commands/config.ts` | `labctl config list/get/set/path`. YAML config at `~/.labctl/config.yaml` with env var overrides. | | labd scaffold | `src/labd/` | Fastify server with health, server listing, token management routes. Prisma schema for all models. Starts with or without database. | | Prisma schema | `prisma/schema.prisma` | 10 models: Server, Agent, User, Role, Permission, UserRole, JoinToken, AuditLog, PulumiRun, Cluster. CockroachDB provider. | | Database seeding | `prisma/seed.ts` | Creates admin/viewer/operator roles with proper allow/deny permissions. Idempotent via upsert. | | Multi-arch builds + packaging | `nfpm.yaml`, `scripts/` | nfpm config for RPM/DEB. Bun compile for standalone binary (102MB labctl in `dist/`). | | Gitea CI/CD | `.gitea/` (on remote) | Lint → typecheck → test → build → publish pipeline on mysources.co.uk. | ### DONE — Code exists, not yet connected end-to-end | Feature | Code | What's Real | What's Missing | |---------|------|------------|----------------| | lab-agent connection library | `lab-agent/src/services/connection.ts` | `AgentConnection` class: WebSocket to labd, heartbeat (10s), exponential backoff reconnect (1-30s), state machine (disconnected/connecting/connected/reconnecting), handles server-shutdown messages. | **No daemon binary.** This is a library — nothing starts it. No systemd unit. No enrollment flow. | | lab-agent command executor | `lab-agent/src/services/executor.ts` | `CommandExecutor` class: `spawn()` with timeout handling (SIGTERM then SIGKILL after 5s), stdout/stderr streaming via EventEmitter, stdin writing, signal forwarding. | **Not wired to WebSocket.** The executor and connection don't talk to each other. No message dispatch. | | Agent registry (labd) | `labd/src/services/agent-registry.ts` | `AgentRegistry`: in-memory Map tracking by serverId and hostname, lifecycle events, heartbeat updates. Singleton exported. | **Not used by /ws/agent handler.** The WebSocket handler in `server.ts` just logs messages — it doesn't call `agentRegistry.register()`. | | Message router (labd) | `labd/src/services/message-router.ts` | `MessageRouter`: handler registration, pending request tracking with timeouts, streaming support, log subscription, agent cleanup on disconnect. | **Not used.** `server.ts` doesn't call `messageRouter.handleMessage()`. The router exists but is dead code. | | Token management | `labd/src/routes/auth.ts` | Create, list, revoke join tokens. Validates one-time vs reusable, expiry, revocation. Marks tokens as used. | Token validation works. **But enrollment returns `certificatePem: null`** — no actual certificate is issued. | | CLI API client | `cli/src/api/client.ts` | `LabdClient` with mTLS support, typed methods for servers/tokens/health/enrollment. | Works for REST endpoints. **No CLI commands use it yet** — existing commands still talk directly to bastion HTTP. | | CLI WebSocket streaming | `cli/src/api/websocket.ts` | `streamExec()` and `streamLogs()` functions. | **No `labctl exec` or `labctl logs` commands exist.** The streaming code has no consumer. | | Zod validation | `labd/src/validation/` | Schemas for createToken, enrollment, serverFilters, createRole, permission patterns. Middleware for body/query validation. | **Not applied to routes.** The schemas and middleware exist but no route uses `preHandler: [validateBody(schema)]`. | | Encryption service | `labd/src/services/encryption.ts` | AES-256-GCM with scrypt key derivation. Encrypt/decrypt roundtrip. Singleton from `CA_ENCRYPTION_KEY` env var. | **Not used anywhere.** No CA key is encrypted, no kubeconfig is stored. | | Graceful shutdown | `labd/src/services/shutdown.ts` | SIGTERM/SIGINT handlers, agent notification, message router cleanup, DB disconnect, force exit timer. | Works but agent notification is a no-op since no agents are registered (see above). | | Rate limiting | `labd/src/middleware/rate-limit.ts` | `@fastify/rate-limit`: 100/min global, 10/min for enrollment, 20/min for tokens. | **Wired up in `server.ts`.** This actually works. | | Health checks | `labd/src/routes/health.ts` | `/healthz`, `/health`, `/health/detailed`, `/health/live`, `/health/ready`. Checks DB latency and agent count. | Works. Returns `agents: { connected: 0 }` since no agents ever register. | | Error hierarchy | `shared/src/errors/` | `LabError`, `NotFoundError`, `PermissionDeniedError`, `ValidationError`, `AgentNotConnectedError`. | **Not used in routes.** Routes still use inline `reply.code(404).send({error: ...})`. | | Table formatting | `cli/src/utils/table.ts` | `printTable`, `formatStatus`, `formatRelativeTime`, predefined column sets. | **Not used by existing commands.** `provision list` has its own inline formatting. | | Resource parsing | `cli/src/utils/resource.ts` | Parse `server/labmaster`, `app/kube-system/nginx` format. | **Not used.** No commands accept `type/name` arguments yet. | | Doctor command | `cli/src/commands/doctor.ts` | Config, cert, connectivity diagnostics. | Works standalone. | | Login command | `cli/src/commands/login.ts` | Generates EC keypair, prompts for token, POSTs to `/api/auth/user-enroll`. | **labd has no `/api/auth/user-enroll` endpoint.** Only `/api/auth/enroll` exists (for agents). Login will 404. | ### NOT DONE — Phase 1 items from PRD with no code | Feature | PRD Description | Status | |---------|----------------|--------| | Certificate Authority | Built-in CA in labd. Generate root CA, sign CSRs, revoke certs, rotate. | **Nothing.** No CA code. No X.509 operations. No `@peculiar/x509` dependency. `EncryptionService` exists but it's for data-at-rest, not PKI. | | RBAC engine | Middleware that checks permissions on every request. Deny overrides allow. | **Nothing.** `auth.ts` middleware is a placeholder. No route checks permissions. Anyone can call any endpoint. | | Audit logging | Log every action with user, session, action, resource, result, duration. | **Nothing.** `AuditLog` Prisma model exists but nothing writes to it. No audit middleware. | | `labctl exec` | Remote command execution via labd → agent WebSocket relay. | **Nothing.** No `exec` CLI command. The executor library exists in lab-agent but isn't connected. | | `labctl logs` | Resource-scoped log streaming (server, app, bastion, audit). | **Nothing.** No `logs` CLI command. | | `labctl get servers` | List servers from labd with filters. | **Nothing.** No `get` CLI command. The API client has `getServers()` but no command calls it. | | Smoke test stack | `podman-compose` with CockroachDB + labd + 2 agents, testing enrollment/heartbeat/exec/RBAC. | **Nothing.** `stack/docker-compose.yml` exists but only runs bastion + CockroachDB, not labd or agents. | | Agent enrollment during PXE | Embed join token in kickstart, agent auto-enrolls on first boot. | **Nothing.** Kickstart installs k3s prereqs but doesn't install or start lab-agent. | --- ## Phase 2: Deployment **Nothing from Phase 2 has been built.** | Feature | Status | |---------|--------| | Reprovision labmaster as labmaster.ad.itaz.eu | Not done — manual operation | | Deploy k3s with Cilium CNI | Not done — kickstart only sets up kernel prereqs, leaves a comment "run `curl -sfL https://get.k3s.io`" | | Deploy CockroachDB on k3s | Not done — `docker-compose.yml` runs it in-memory for dev, no k8s manifests for CRDB | | Deploy labd on k3s | **K8s manifests exist** (`deploy/k8s/labd/base/`) — Deployment, Service, ConfigMap, HPA, PDB. But no CockroachDB to connect to and no TLS configured. | | Deploy bastion as managed app | Not done — bastion runs standalone, no Pulumi chart | | Auto-enroll agents during PXE | Not done — no agent install in kickstart, no token embedding | --- ## Phase 3: Infrastructure as Code **Nothing from Phase 3 has been built.** | Feature | Status | |---------|--------| | Module system | Not done — no `module.yaml`, no module loader | | Pulumi charts | Not done — no Pulumi dependency, no chart structure | | `labctl apps install/upgrade/rollback` | Not done — no `apps` command | | `labctl apply -f` | Not done — no `apply` command | | `kubectl proxy` (audited) | Not done — no kubectl proxy | | Kubeconfig store (encrypted) | `EncryptionService` exists but nothing uses it. `Cluster.kubeconfigEnc` field exists in Prisma but nothing reads/writes it. | --- ## Phase 4: Multi-Cloud **Nothing from Phase 4 has been built.** | Feature | Status | |---------|--------| | AWS provider | Not done | | Reusable join tokens for ASGs | Token model supports `reusable` type, but no AWS integration | | Cilium Cluster Mesh | Not done | | Ephemeral test environments | Not done | | Grafana Loki | Not done | --- ## Infrastructure Files | File | Status | |------|--------| | `Dockerfile.labd` | Exists. Multi-stage Alpine build. Would work if you `docker build` it. | | `Dockerfile.bastion` | Exists. Multi-stage Fedora build. Would work. | | `.dockerignore` | Exists. | | `deploy/k8s/labd/base/` | Kustomize manifests for labd (Deployment, Service, ConfigMap, HPA, PDB). Points at a non-existent CockroachDB and has no TLS. | | `stack/docker-compose.yml` | Runs bastion + CockroachDB for local dev. Works. | | `nfpm.yaml` | RPM/DEB packaging config. Works with `nfpm pkg`. | --- ## The Disconnection Problem The core issue is that many services were built in isolation but never wired together: ``` ┌─────────────────────────────────────────────────────────┐ │ BUILT BUT NOT CONNECTED │ │ │ │ AgentConnection ──✗──▶ /ws/agent handler │ │ CommandExecutor ──✗──▶ MessageRouter │ │ MessageRouter ──✗──▶ /ws/agent handler │ │ AgentRegistry ──✗──▶ /ws/agent handler │ │ Zod schemas ──✗──▶ Route preHandlers │ │ Error classes ──✗──▶ Route error handling │ │ LabdClient ──✗──▶ CLI commands (get/exec/logs) │ │ Table formatting──✗──▶ CLI commands │ │ Resource parsing──✗──▶ CLI commands │ │ EncryptionService──✗──▶ CA / kubeconfig storage │ │ Login command ──✗──▶ /api/auth/user-enroll (missing) │ │ Audit logging ──✗──▶ Any middleware │ │ RBAC engine ──✗──▶ Any middleware │ └─────────────────────────────────────────────────────────┘ ``` --- ## What Actually Works End-to-End Today 1. **PXE boot a bare-metal machine:** ``` labctl init bastion standalone start # Machine PXE boots → discovered automatically labctl provision list labctl provision install AA:BB:CC:DD:EE:FF worker-1 --role worker # Machine reboots → installs Fedora → reports complete ``` 2. **Manage bastion lifecycle:** ``` labctl init bastion standalone status labctl init bastion standalone stop ``` 3. **Start labd (without database):** ``` LABD_PORT=3100 tsx src/labd/src/main.ts # Starts with stub DB, health endpoint works, token/server routes return errors ``` 4. **Start labd (with CockroachDB):** ``` docker-compose -f stack/docker-compose.yml up cockroachdb DATABASE_URL=postgresql://root@localhost:26257/lab tsx src/labd/src/main.ts # Token creation/listing/revocation works # Server listing works (empty until agents register) ``` 5. **CLI diagnostics:** ``` labctl doctor labctl config list labctl version ``` That's it. No agent communication, no remote exec, no log streaming, no RBAC, no certificates. --- ## Recommended Next Steps (to make Phase 1 actually work) ### Priority 1: Wire up the agent connection 1. Update `/ws/agent` handler to use `agentRegistry.register()` and `messageRouter.handleMessage()` 2. Create lab-agent daemon binary that uses `AgentConnection` + `CommandExecutor` 3. Create systemd unit for lab-agent ### Priority 2: Certificate Authority 1. Add `@peculiar/x509` dependency 2. Implement CA service: generate root CA, sign CSRs 3. Wire enrollment route to actually sign and return certificates 4. Store CA key encrypted using `EncryptionService` ### Priority 3: RBAC + Audit 1. Create RBAC middleware that checks `Permission` table 2. Create audit middleware that writes to `AuditLog` 3. Apply both to all routes ### Priority 4: CLI commands for labd 1. `labctl get servers` using `LabdClient.getServers()` 2. `labctl exec server/` using `streamExec()` 3. `labctl logs server/` using `streamLogs()` ### Priority 5: Smoke test stack 1. Update `docker-compose.yml` to include labd + 2 agents 2. Write integration tests for enrollment → heartbeat → exec → logs