michal/lab - lab - Gitea: Git with a cup of tea

michal/lab

Author	SHA1	Message	Date
Michal	cdf3b5c045	fix(labd): wire v2.0 Phase 1 routes into createApp + smoke tests Some checks failed CI/CD / typecheck (pull_request) Failing after 11s Details CI/CD / test (pull_request) Failing after 9s Details CI/CD / lint (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details The v2.0 Phase 1 commit (`04faa07`) added AuthService, RbacService, ResourceStore, AuditService, the bearer auth middleware, and the v2-auth/environments/resources route files, but createApp() never registered any of them. They sat in the codebase as dead code: a running labd would 404 on /api/auth/login, /api/resources, /api/events, etc. Wiring (server.ts) - Instantiate AuthService, RbacService, ResourceStore, AuditService at app creation. Cast DbClient to PrismaClient (the runtime db is a real PrismaClient; DbClient is a structural shim). - Start AuditService timer, register an onClose hook to stop it on shutdown so we never lose the last batch. - Register v2 routes inside a Fastify scope with the bearer-auth middleware as preHandler. v1 routes (registered on the root scope) are unaffected so existing labd clients keep working. AuditService (audit.ts) - Expose flushPending() so tests can deterministically observe events without leaning on the 5-second flush interval. Implementation delegates to the existing private flush(). Smoke tests (v2-smoke.test.ts, 11 cases) - Bootstrap: first POST /api/auth/login with empty users creates the admin (role=ADMIN, hashed password), returns a 64-hex token, marks isBootstrap=true, emits an auth_bootstrap audit event. Second login uses the normal flow. Wrong password returns 401 and audits failure. Missing credentials returns 400. - RBAC: missing/empty/invalid bearer tokens return 401. ADMIN role bypasses RBAC. A non-admin with no role bindings gets 403 with "no matching role binding". A user with an env-A binding is denied for env-B resources. - Audit: bootstrap event is queryable via /api/events?correlation=... Explicit parent/child chain (shared correlationId, parentEventId) is preserved across emits. All 246 workspace tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 22:18:18 +01:00
michal	f3c50f71ef	Merge pull request 'feat: v2.0 Phase 1 foundation + bastion-restart identity fix + Dockerfile + BASTION_DIR' (#14 ) from feat/v2-phase1-foundation into main Some checks failed CI/CD / lint (push) Failing after 22s Details CI/CD / typecheck (push) Failing after 21s Details CI/CD / test (push) Failing after 22s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-05-05 21:10:25 +00:00
Michal	98b0ccc6c9	feat(cli): honor BASTION_DIR env var as default for --dir Some checks failed CI/CD / typecheck (pull_request) Failing after 21s Details CI/CD / test (pull_request) Failing after 22s Details CI/CD / lint (pull_request) Failing after 7m2s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details bastion serve/stop default for --dir was hardcoded to /tmp/lab-bastion. Now reads BASTION_DIR from env if set, so a deployed bastion daemon can run from a persistent directory without callers having to pass --dir on every invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 22:09:24 +01:00
Michal	37a3b51e57	build(labd): include @lab/core in the Dockerfile build chain The v2.0 Phase 1 commit (`04faa07`) introduced the @lab/core package but the labd Dockerfile still only copied @lab/shared and @lab/labd, so the container build would fail to resolve @lab/core imports. Both stages updated: - Builder: copy @lab/core package.json/tsconfig + src, add it to the build order between @lab/shared and @lab/labd. - Runtime: copy @lab/core dist and package.json into the final image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 22:09:24 +01:00
Michal	d6e1f3c74d	fix(labd): preserve machine identity across bastion restarts The worker0-k8s0 bug: when labd restarts, the in-memory installed map is lost. The next DHCP/PXE re-discovery for that MAC ran an upsert that wrote status="discovered", silently downgrading the DB record from "online" or "offline" and erasing the machine's known hostname/role identity from the CLI view. - server.ts: drop status="discovered" from the upsert update branch so re-discovery cannot downgrade an installed record. - routes/bastions.ts (/api/machines): when the DB knows a real hostname+role for a MAC currently only in live.discovered, promote it back to live.installed so the CLI sees the right state. Also reordered the live-vs-DB fallback so DB online/offline maps to live.installed and the discovered branch is the else. - tests: 3 new vitest cases covering promotion, fresh-discovery, and unknown-MAC fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 22:09:24 +01:00
Michal	52e831b8c1	Merge branch 'main' into feat/v2-phase1-foundation	2026-05-05 22:06:34 +01:00
michal	f5af24699a	Merge pull request 'fix(k3s): audit logs via journald + etcd recovery' (#13 ) from fix/k3s-audit-via-journald into main Some checks failed CI/CD / typecheck (push) Failing after 11s Details CI/CD / test (push) Failing after 9s Details CI/CD / lint (push) Failing after 21s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-05-05 20:29:51 +00:00
Michal	dd92147341	fix(k3s): route audit logs through journald, codify etcd member recovery Some checks failed CI/CD / typecheck (pull_request) Failing after 13s Details CI/CD / lint (pull_request) Failing after 23s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Two changes prompted by today's etcd raft panic on worker1-k8s0 (tocommit out of range, lost-write on follower) and the cascading disk pressure that surfaced underneath it. Audit logs to journald - kube-apiserver now uses audit-log-path=- so audit events flow to k3s.service stdout and into journald instead of growing files in /var/log/kubernetes. The previous setup combined apiserver's internal rotation with a logrotate *.log glob that double-rotated the rotated files into permanent orphans (observed: 7+ GB). - New journald-limits operation writes a SystemMaxUse=2G drop-in so audit volume cannot fill /var/log even under bursty load. - log-rotation operation repurposed to decommission the obsolete logrotate rule and reap leftover audit files. Idempotent: no-op on fresh installs. Etcd member recovery - New recoverEtcdMember(broken, peer, hostname) codifies the documented k3s recovery: stop k3s, etcdctl member remove, wipe /var/lib/rancher/k3s/server/{db,tls,cred}, restart, poll for rejoin. Refuses to operate when cluster size < 3 to preserve quorum. Tests - 7 new unit tests covering both decommission paths and the recovery procedure (54 total, all green). - install.test.ts asserts the file-based audit args are gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 21:29:16 +01:00
Michal	04faa079e2	feat: v2.0 Phase 1 foundation — @lab/core, auth, RBAC, audit, resource store New packages: - @lab/core: Resource types, Output<T> (Pulumi), audit event types, auth types, environment/account types, resource kind registry New Prisma schema (mcpctl pattern): - User (email/password/bcrypt), Session (bearer tokens), Group, GroupMember - ServiceAccount, RbacDefinition (JSON subjects + roleBindings) - AuditEvent (correlation IDs, causal chains, fire-and-forget batching) - Environment, Account (driver config, Infisical secret path), Binding - Resource (generic, kind/name/env unique, origin/managedBy tracking) - Secret, Fleet, FleetMember, GitSource - Keeps v1.0 models: Server, Agent, Bastion, Cluster, JoinToken New services: - AuthService: bearer token login, bootstrap (first login creates admin), session management with 30-day expiry - RbacService: environment-scoped permission checks, group membership, role hierarchy (admin > edit > view) - AuditService: fire-and-forget event collection, batch 50 / flush 5s, correlation IDs for causal chains - ResourceStore: CRUD with origin/managedBy, RBAC-enforced routes New routes: - POST /api/auth/login, POST /api/auth/logout (bearer token auth) - GET/POST/PUT/DELETE /api/resources (RBAC-enforced CRUD) - GET/POST /api/environments, GET/POST /api/accounts - POST /api/accounts/bind, GET /api/bindings - GET /api/events (audit query with --last, --kind, --env, --correlation) New middleware: - Bearer token auth (validates Authorization header, resolves user identity) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-02 01:42:28 +01:00
michal	95c99cb4d5	Merge pull request 'docs: CLAUDE.md routing rules + TODOS.md from v2.0 review' (#12 ) from feat/recheck-and-fixes into main Some checks failed CI/CD / lint (push) Failing after 12s Details CI/CD / typecheck (push) Failing after 22s Details CI/CD / test (push) Failing after 12s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details Reviewed-on: #12	2026-04-02 00:31:44 +00:00
Michal	2eda926d4c	docs: add TODOS.md from v2.0 CEO review Some checks failed CI/CD / typecheck (pull_request) Failing after 12s Details CI/CD / lint (pull_request) Failing after 21s Details CI/CD / test (pull_request) Failing after 11s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Project tracking for labctl v2.0 platform design. Includes P1 (arch doc update), P2 (SSH emergency mode, Prometheus metrics), and P3 (graph viz, import, secrets rotation) items from the CEO and eng review sessions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-02 01:29:30 +01:00
Michal	70258a0cc3	Merge remote-tracking branch 'origin/main' into feat/recheck-and-fixes	2026-04-02 01:27:45 +01:00
Michal	e9944c5413	chore: add gstack skill routing rules to CLAUDE.md	2026-04-01 23:56:47 +01:00
michal	22e2946e95	Merge pull request 'feat: provision recheck, hardware info preservation, ISO boot fixes' (#11 ) from feat/recheck-and-fixes into main Some checks failed CI/CD / typecheck (push) Failing after 11s Details CI/CD / lint (push) Failing after 22s Details CI/CD / test (push) Failing after 11s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details Reviewed-on: #11	2026-04-01 17:11:33 +00:00
Michal	9ddab24931	feat: provision recheck, hardware info preservation, ISO boot fixes Some checks failed CI/CD / lint (pull_request) Failing after 1m26s Details CI/CD / typecheck (pull_request) Failing after 11s Details CI/CD / test (pull_request) Failing after 11s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details - Add `labctl provision recheck` to refresh hardware info via SSH - Preserve hardware info in InstalledInfo when install completes - Fix /ks-auto: run nested %pre scripts from included kickstarts - Add command-discover WebSocket routing for hw info updates - Fix k3s join: clean stale TLS/cred when joining existing cluster - Add --tls-verify=false for internal HTTP registry pushes - Add fix-ssh-root.sh script for root SSH access on all nodes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 17:59:39 +01:00
Michal	ae91f2895e	feat: dynamic /ks-auto kickstart for ISO boot (R1 ARM support) Some checks failed CI/CD / lint (push) Failing after 11s Details CI/CD / typecheck (push) Failing after 22s Details CI/CD / test (push) Failing after 7m5s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details Add state-aware kickstart dispatch for machines that boot from ISO (no PXE/network at UEFI level). Replaces hardcoded discover.ks. - /ks-auto: %pre detects MAC, queries /api/machine-state/<mac>, writes discover or install kickstart to /tmp/dynamic.ks, main body %include's it - /api/machine-state/<mac>: simple state endpoint returning unknown\|discovered\|queued\|installing\|installed\|debug - ISO kernel cmdline updated: discover.ks → ks-auto - Handles: discovery (first boot), install (queued), debug modes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 16:17:08 +01:00
Michal	06fc40a857	fix: k3s install automation — skip Cilium on join, Longhorn via server, default root user Some checks failed CI/CD / typecheck (push) Failing after 10s Details CI/CD / test (push) Failing after 9s Details CI/CD / lint (push) Failing after 22s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details - Skip Cilium install for joining servers (already in cluster via daemonset) - Longhorn annotation for workers: SSH to server node from CLI to apply kubectl annotation (workers don't have kubectl access) - Default SSH user for k3s/app commands changed to 'root' (operations need root privileges, using 'lab' user broke installs) - k3s server config: cluster-init for initial server, server+token for joins Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 16:02:19 +01:00
Michal	a68d6d617e	feat: k3s cluster-init for etcd HA, fix Cilium duplicate install Some checks failed CI/CD / lint (push) Failing after 11s Details CI/CD / test (push) Failing after 10s Details CI/CD / typecheck (push) Failing after 22s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details - Server config now uses cluster-init: true for initial server (enables embedded etcd). Joining servers get server: + token: in config. - Cilium install already checks for existing installation, so joining servers skip it gracefully (the "release name in use" error is non-fatal) Cluster rebuilt as etcd HA: worker0-k8s0 control-plane,etcd (initial server, cluster-init) worker1-k8s0 control-plane,etcd (joined server, Mac Studio aarch64) spark-2935 worker (DGX Spark, aarch64) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 15:53:18 +01:00
Michal	c49a650888	fix: firstboot fstab handling — no duplicates, compatible with Asahi sed Some checks failed CI/CD / typecheck (push) Failing after 10s Details CI/CD / test (push) Failing after 11s Details CI/CD / lint (push) Failing after 23s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details - Replace sed with grep -v / awk for fstab manipulation (Asahi Fedora's sed doesn't support \\| delimiter or \? quantifier) - Use idempotent write_lab_fstab function: removes all old entries first, comments out conflicting btrfs subvol entries, adds fresh LVM entries - Fix sed for SSH hardening: use #* instead of \? (POSIX compatible) - Tested on Mac Studio: no duplicate fstab entries after multiple runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 15:40:29 +01:00
Michal	87e09af941	fix: default admin user to 'lab', case-insensitive OS detection for iSCSI Some checks failed CI/CD / typecheck (push) Failing after 10s Details CI/CD / test (push) Failing after 10s Details CI/CD / lint (push) Failing after 22s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details - Firstboot script defaults admin user to 'lab' instead of bastion's config.adminUser (which was 'michal' from host system) - iSCSI OS detection uses case-insensitive match for 'fedora' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 15:13:53 +01:00
Michal	6f13e284fd	fix: firstboot script auto-detects hostname and MAC, no query params needed Some checks failed CI/CD / typecheck (push) Failing after 10s Details CI/CD / test (push) Failing after 10s Details CI/CD / lint (push) Failing after 23s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details The firstboot script now auto-detects hostname (from hostnamectl) and MAC address (from first UP interface) at runtime. No URL query parameters required — just `curl bastion/asahi/firstboot.sh \| sudo bash`. Fixes the shell escaping issue where `&` in query params broke curl piping. Updated labctl provision asahi instructions accordingly. Tested on Mac Studio (worker1-k8s0): hostname, MAC, and bastion registration all auto-detected correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 15:05:25 +01:00
Michal	6c963a15bd	fix: firstboot reprovision path now runs hostname, user, and registration Some checks failed CI/CD / lint (push) Failing after 12s Details CI/CD / test (push) Failing after 10s Details CI/CD / typecheck (push) Failing after 29s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details Previously the reprovision path exited early after re-mounting LVs, skipping hostname setup, admin user creation, metadata, and bastion registration. Now both paths fall through to the common post-setup code. Tested on Mac Studio (worker1-k8s0) — reprovision + self-registration confirmed working via curl \| bash pipe. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 09:59:02 +01:00
michal	8c737d163d	Merge pull request 'feat: Asahi Linux provisioning for Apple Silicon' (#10 ) from feat/asahi-provisioning into main Some checks failed CI/CD / lint (push) Failing after 11s Details CI/CD / typecheck (push) Failing after 22s Details CI/CD / test (push) Failing after 7m7s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-03-31 23:30:41 +00:00
Michal	17bae7ddbf	fix: pre-download rootfs ZIP to avoid macOS Python HTTP streaming issues Some checks failed CI/CD / lint (pull_request) Failing after 11s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details The Asahi installer's urlcache.py fails with AssertionError on macOS when streaming ZIP via HTTP Range requests from Fastify. Fix: download the ZIP with curl first (reliable on macOS), then set REPO_BASE to the local directory so the installer opens it as a local file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 00:30:29 +01:00
Michal	bb8f37ef7d	feat: iSCSI, Longhorn disk labels, labctl asahi command, ZIP32 fix Some checks failed CI/CD / typecheck (pull_request) Failing after 12s Details CI/CD / lint (pull_request) Failing after 22s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details k3s host prep: - Add iSCSI initiator install+enable (Fedora: iscsi-initiator-utils, Ubuntu: open-iscsi) — required by Longhorn - Add Longhorn disk label to k3s server+agent configs - Add Longhorn disk annotation operation in post-install hardening CLI: - Add `labctl provision asahi` command with interactive install guide - Change default SSH user from "michal" to "lab" in all commands - Change admin user in bastion progress callback to "lab" Asahi provisioning fixes: - Download installer_data.json locally (installer reads it as file) - Use REPO_BASE to serve upstream ZIP from bastion (LAN speed) - Fix ZIP32 vs ZIP64: serve original upstream ZIP unmodified (our repackaged ZIP used ZIP64 which breaks Asahi urlcache) - Add /data/asahi-repo fallback path for k3s container PVC mount - Deploy script syncs asahi-repo to bastion pod after deployment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 23:32:38 +01:00
Michal	a8dc79bc5a	feat: Asahi validation tests, rootfs build fixes, shellcheck-clean scripts Some checks failed CI/CD / lint (pull_request) Failing after 12s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details - Add 16 validation tests: shellcheck (3 roles), installer_data.json schema (8), Python parser validation, ZIP structure (3), rootfs mount - Fix empty SSH keys generating invalid bash (SC1073) - Fix __dirname crash in ESM modules (use import.meta.url) - Fix rootfs build: mkdir -p before writing, correct binary paths - Add .gitignore for large build artifacts (.asahi-cache, *.zip) - Bump smoke test timeout for additional static plugin registration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 13:22:24 +01:00
Michal	ad76c74020	fix: rootfs build script — mkdir before write, fix package path checks Some checks failed CI/CD / typecheck (pull_request) Failing after 10s Details CI/CD / lint (pull_request) Failing after 21s Details CI/CD / test (pull_request) Failing after 11s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 03:26:26 +01:00
Michal	6807632d46	feat: Asahi rootfs build pipeline + serve from bastion Some checks failed CI/CD / lint (pull_request) Failing after 10s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details - Add scripts/build-asahi-rootfs.sh: downloads upstream Fedora Asahi Remix Server, injects lab firstboot script + systemd service + SSH keys, repackages with installer_data.json that adds LVM Data partition - Bastion serves built artifacts at /asahi/repo/* via fastify-static - installer_data.json prefers built config, falls back to minimal - Fix __dirname crash in ESM module (use import.meta.url) - Fix smoke test timeout (was crashing due to __dirname) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 03:20:12 +01:00
Michal	53265bb18c	test: integration test for Asahi firstboot LVM setup Some checks failed CI/CD / lint (pull_request) Failing after 21s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / test (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details VM-based end-to-end test using Fedora cloud image with two disks: root (20GB) + data (200GB). Verifies the firstboot script creates labvg with correct LV sizes, mounts volumes, migrates /home content, sets hostname, creates admin user, and handles reprovision. Fixes to firstboot script: - Detect whole disks (not just partitions) for LVM PV - Handle btrfs subvolume paths in root device detection - Copy /home content before mounting LV (preserves SSH keys) - Don't restart sshd (config takes effect on reboot) - Make swapon and mount operations resilient to failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 03:07:38 +01:00
Michal	863c7f2b83	feat: Asahi Linux provisioning for Apple Silicon (Mac Studio) Some checks failed CI/CD / typecheck (pull_request) Failing after 11s Details CI/CD / lint (pull_request) Failing after 22s Details CI/CD / test (pull_request) Failing after 11s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Add bastion endpoints for provisioning Apple Silicon machines via the Asahi Linux installer with custom LVM partitioning: - GET /asahi — wrapper script (curl bastion:8080/asahi \| sh) - GET /asahi/installer_data.json — custom partition layout (60GB root + LVM data) - GET /asahi/firstboot.sh — first-boot LVM setup matching kickstart layout - GET /asahi/firstboot.service — systemd oneshot unit The firstboot script creates labvg with role-specific LVs (var, varlog, home, srv, rancher, longhorn) and handles reprovision by detecting existing VGs. Includes 19 new tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 02:46:27 +01:00
michal	906f93f6f2	Merge pull request 'fix: Cilium multi-node support' (#9 ) from fix/cilium-multi-node into main Some checks failed CI/CD / lint (push) Failing after 22s Details CI/CD / typecheck (push) Failing after 21s Details CI/CD / test (push) Failing after 22s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-03-31 00:36:17 +00:00
Michal	aea28b5a0f	fix: Cilium multi-node support — auto-detect NIC, k3s agent API port, worker label Some checks failed CI/CD / typecheck (pull_request) Failing after 10s Details CI/CD / lint (pull_request) Failing after 22s Details CI/CD / test (pull_request) Failing after 7m8s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details - Remove hardcoded devices/directRoutingDevice from Cilium install (let Cilium auto-detect per node — needed for heterogeneous NICs like eno1 vs enP7s7) - Set k8sServiceHost=127.0.0.1 k8sServicePort=6444 so Cilium init containers can reach the API via k3s agent's local LB proxy - Add node-role.kubernetes.io/worker label to agent config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 01:35:51 +01:00
michal	f3f0ea48e7	Merge pull request 'feat: provision register + k3s kubeconfig' (#8 ) from feat/register-and-kubeconfig into main Some checks failed CI/CD / lint (push) Failing after 10s Details CI/CD / test (push) Failing after 10s Details CI/CD / typecheck (push) Failing after 21s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-03-31 00:16:06 +00:00
Michal	49d747db98	feat: provision register command and k3s kubeconfig merge Some checks failed CI/CD / lint (pull_request) Failing after 11s Details CI/CD / test (pull_request) Failing after 11s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Add `labctl provision register` to re-add machines to installed state without reprovisioning (e.g. after bastion state loss). Full stack: protocol type, bastion API + WS handler, labd route, CLI command. Add `labctl app k3s kubeconfig <target>` to fetch kubeconfig from a k3s node via SSH, rewrite server URL, and merge into ~/.kube/config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 01:15:31 +01:00
michal	8635da08a6	Merge pull request 'fix: reprovision workflow bugs' (#7 ) from fix/reprovision-bugs into main Some checks failed CI/CD / typecheck (push) Failing after 10s Details CI/CD / test (push) Failing after 10s Details CI/CD / lint (push) Failing after 23s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details Reviewed-on: #7	2026-03-30 22:44:44 +00:00
Michal	6a5f23c0f5	fix: reprovision workflow bugs — SSH host key warnings, log following, status priority Some checks failed CI/CD / lint (pull_request) Failing after 10s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / typecheck (pull_request) Failing after 23s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details - Add UserKnownHostsFile=/dev/null to SSH in debug and reprovision commands - Track install state in log follower so it doesn't exit prematurely on "installed" - Reorder bastion status check to prioritize active queue over stale installed state - Update .gitignore with task file entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 22:59:45 +01:00
michal	63cc033e3e	Merge pull request 'docs: comprehensive architecture document' (#6 ) from docs/architecture into main Some checks failed CI/CD / typecheck (push) Failing after 10s Details CI/CD / test (push) Failing after 11s Details CI/CD / lint (push) Failing after 24s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-03-30 16:31:41 +00:00
Michal	d7a25066bd	docs: comprehensive architecture document Some checks failed CI/CD / lint (pull_request) Failing after 13s Details CI/CD / typecheck (pull_request) Failing after 23s Details CI/CD / test (pull_request) Failing after 14s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Covers all components (bastion, labd, labctl, agent, modules), data flow, machine lifecycle, disk layout, kickstart features, deployment, testing, security, known issues, and planned work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 17:31:29 +01:00
michal	a0f6161533	Merge pull request 'docs: PXE boot debugging post-mortem' (#5 ) from docs/pxe-boot-debugging into main Some checks failed CI/CD / lint (push) Failing after 21s Details CI/CD / typecheck (push) Failing after 22s Details CI/CD / test (push) Failing after 22s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-03-30 03:01:12 +00:00
Michal	87c1a34232	docs: PXE boot debugging post-mortem — serial console root cause Some checks failed CI/CD / lint (pull_request) Failing after 10s Details CI/CD / typecheck (pull_request) Failing after 23s Details CI/CD / test (pull_request) Failing after 7m4s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Documents the 2026-03-30 debugging session: root cause (console=ttyS0 on UART-less hardware), what was tried, what was fixed, and remaining work items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 04:00:51 +01:00
michal	84afe7d5e4	Merge pull request 'feat: PXE debug boot mode for rescue/diagnostics' (#4 ) from wip/ks-debugging into main Some checks failed CI/CD / lint (push) Failing after 9s Details CI/CD / test (push) Failing after 10s Details CI/CD / typecheck (push) Failing after 22s Details CI/CD / build (push) Has been skipped Details CI/CD / publish-rpm (push) Has been skipped Details CI/CD / publish-deb (push) Has been skipped Details	2026-03-30 02:59:34 +00:00
Michal	0a4916d3c9	fix: remove serial console (root cause of 30s boot delay), enable syslog logging, disk auto-detect Some checks failed CI/CD / typecheck (pull_request) Failing after 9s Details CI/CD / test (pull_request) Failing after 9s Details CI/CD / lint (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Root cause found: console=ttyS0,115200n8 causes 30-second timeout at every systemd boot phase on hardware without a physical serial UART. Each phase transition blocks waiting for the non-existent UART. Changes: - Remove console=ttyS0 from kickstart bootloader args and %post setup - Enable Anaconda syslog forwarding (logging --host --port) for install visibility - Improve syslog IP→MAC resolution (register from kickstart fetch + progress) - Fix disk auto-detect: default to empty string (not /dev/sda) for NVMe support - Enable SysRq magic keys (kernel.sysrq=1) for emergency reboot via JetKVM - Simplify debug command: remove --sshd flag (inst.sshd always available), add /debug-setup.sh HTTP endpoint for nc listener setup - Add labctl provision logs -f (follow mode with polling) - Add syslog listener unit tests - Enable syslog log capture test in integration suite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 03:58:51 +01:00
Michal	a4a4840930	feat: debug --pxe-boot flag, boot installed system via PXE Some checks failed CI/CD / lint (pull_request) Failing after 10s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Loads kernel+initrd from bastion HTTP server, mounts root from local NVMe. Workaround for UEFI firmware bugs that make local disk boot 100x slower. One-time use, auto-clears after boot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 00:49:44 +01:00
Michal	8da947a1c3	fix: use %pre instead of %post for debug --sshd (rescue mode skips %post) Some checks failed CI/CD / typecheck (pull_request) Failing after 9s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / lint (pull_request) Failing after 23s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 00:25:19 +01:00
Michal	92c65b4672	fix: generic rescue instructions in debug command output Some checks failed CI/CD / typecheck (pull_request) Failing after 9s Details CI/CD / test (pull_request) Failing after 9s Details CI/CD / lint (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 23:59:38 +01:00
Michal	3835fefba1	feat: debug --sshd flag, auto SSH + nc listener + IP callback Some checks failed CI/CD / lint (pull_request) Failing after 9s Details CI/CD / test (pull_request) Failing after 9s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details When using `labctl provision debug <target> --sshd`, the rescue kickstart generates host keys, starts sshd (pw: debug) and nc listener (port 2323), and reports the IP back to bastion via /api/progress callback. Fully self-contained, no mounted FS needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 23:54:22 +01:00
Michal	d7a59665ad	fix: route command-debug through bastion WebSocket handler Some checks failed CI/CD / typecheck (pull_request) Failing after 9s Details CI/CD / lint (pull_request) Failing after 23s Details CI/CD / test (pull_request) Failing after 6m53s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 23:01:16 +01:00
Michal	82ca93f4d7	fix: add debug field to inline BastionState in labd server Some checks failed CI/CD / typecheck (pull_request) Failing after 9s Details CI/CD / test (pull_request) Failing after 8s Details CI/CD / lint (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 22:54:02 +01:00
Michal	52150fd955	fix: add command-debug to LabdBastionMessage protocol types Some checks failed CI/CD / lint (pull_request) Failing after 9s Details CI/CD / test (pull_request) Failing after 9s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 22:42:52 +01:00
Michal	e87edfcfbd	feat: PXE debug boot mode for rescue/diagnostics Some checks failed CI/CD / lint (pull_request) Failing after 11s Details CI/CD / test (pull_request) Failing after 9s Details CI/CD / typecheck (pull_request) Failing after 22s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details New `labctl provision debug <target>` command that PXE boots a machine into Fedora rescue mode (inst.rescue) for live debugging. Auto-clears after one boot so next reboot returns to normal. Adds debug state to BastionState, dispatch routing, API endpoints, labd command routing, and CLI with rescue workflow guide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 22:25:44 +01:00

1 2

94 Commits