- Add scripts/build-asahi-rootfs.sh: downloads upstream Fedora Asahi
Remix Server, injects lab firstboot script + systemd service + SSH
keys, repackages with installer_data.json that adds LVM Data partition
- Bastion serves built artifacts at /asahi/repo/* via fastify-static
- installer_data.json prefers built config, falls back to minimal
- Fix __dirname crash in ESM module (use import.meta.url)
- Fix smoke test timeout (was crashing due to __dirname)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
VM-based end-to-end test using Fedora cloud image with two disks:
root (20GB) + data (200GB). Verifies the firstboot script creates
labvg with correct LV sizes, mounts volumes, migrates /home content,
sets hostname, creates admin user, and handles reprovision.
Fixes to firstboot script:
- Detect whole disks (not just partitions) for LVM PV
- Handle btrfs subvolume paths in root device detection
- Copy /home content before mounting LV (preserves SSH keys)
- Don't restart sshd (config takes effect on reboot)
- Make swapon and mount operations resilient to failures
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add bastion endpoints for provisioning Apple Silicon machines via the
Asahi Linux installer with custom LVM partitioning:
- GET /asahi — wrapper script (curl bastion:8080/asahi | sh)
- GET /asahi/installer_data.json — custom partition layout (60GB root + LVM data)
- GET /asahi/firstboot.sh — first-boot LVM setup matching kickstart layout
- GET /asahi/firstboot.service — systemd oneshot unit
The firstboot script creates labvg with role-specific LVs (var, varlog,
home, srv, rancher, longhorn) and handles reprovision by detecting
existing VGs. Includes 19 new tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove hardcoded devices/directRoutingDevice from Cilium install (let
Cilium auto-detect per node — needed for heterogeneous NICs like eno1 vs enP7s7)
- Set k8sServiceHost=127.0.0.1 k8sServicePort=6444 so Cilium init
containers can reach the API via k3s agent's local LB proxy
- Add node-role.kubernetes.io/worker label to agent config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `labctl provision register` to re-add machines to installed state
without reprovisioning (e.g. after bastion state loss). Full stack:
protocol type, bastion API + WS handler, labd route, CLI command.
Add `labctl app k3s kubeconfig <target>` to fetch kubeconfig from a
k3s node via SSH, rewrite server URL, and merge into ~/.kube/config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add UserKnownHostsFile=/dev/null to SSH in debug and reprovision commands
- Track install state in log follower so it doesn't exit prematurely on "installed"
- Reorder bastion status check to prioritize active queue over stale installed state
- Update .gitignore with task file entries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers all components (bastion, labd, labctl, agent, modules),
data flow, machine lifecycle, disk layout, kickstart features,
deployment, testing, security, known issues, and planned work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Documents the 2026-03-30 debugging session: root cause (console=ttyS0
on UART-less hardware), what was tried, what was fixed, and remaining
work items.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause found: console=ttyS0,115200n8 causes 30-second timeout at every
systemd boot phase on hardware without a physical serial UART. Each phase
transition blocks waiting for the non-existent UART.
Changes:
- Remove console=ttyS0 from kickstart bootloader args and %post setup
- Enable Anaconda syslog forwarding (logging --host --port) for install visibility
- Improve syslog IP→MAC resolution (register from kickstart fetch + progress)
- Fix disk auto-detect: default to empty string (not /dev/sda) for NVMe support
- Enable SysRq magic keys (kernel.sysrq=1) for emergency reboot via JetKVM
- Simplify debug command: remove --sshd flag (inst.sshd always available),
add /debug-setup.sh HTTP endpoint for nc listener setup
- Add labctl provision logs -f (follow mode with polling)
- Add syslog listener unit tests
- Enable syslog log capture test in integration suite
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Loads kernel+initrd from bastion HTTP server, mounts root from local
NVMe. Workaround for UEFI firmware bugs that make local disk boot
100x slower. One-time use, auto-clears after boot.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When using `labctl provision debug <target> --sshd`, the rescue
kickstart generates host keys, starts sshd (pw: debug) and nc
listener (port 2323), and reports the IP back to bastion via
/api/progress callback. Fully self-contained, no mounted FS needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New `labctl provision debug <target>` command that PXE boots a machine
into Fedora rescue mode (inst.rescue) for live debugging. Auto-clears
after one boot so next reboot returns to normal.
Adds debug state to BastionState, dispatch routing, API endpoints,
labd command routing, and CLI with rescue workflow guide.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Check sysfs device path for 'usb' to skip JetKVM virtual media which
appears as /dev/sda but is not a real install target.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JetKVM virtual media appears as /dev/sda before NVMe initializes.
Now: wait up to 10s for disks, skip removable disks and anything
under 20GB. Fixes "ignoredisk: sda does not exist" on SER9MAX.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ttyS0 console output on iPXE kernel line may cause kernel hang on
hardware without physical serial port. Removed from both discover
and install iPXE scripts. Serial console stays in bootloader config
for the installed system only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- logging --host blocks Anaconda when syslog UDP port not reachable
- nomodeset prevents amdgpu hang on SER9MAX (Radeon 780M)
- JetKVM helper script for device control (status, reboot, power)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Radeon 780M GPU driver initialization hangs during Anaconda boot
on SER9MAX. nomodeset disables kernel modesetting so the installer
doesn't try to initialize the GPU.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When bastion syncs state, labd now upserts discovered and installed
machines into the Server table. /api/machines merges live bastion
state with DB records, so machines survive pod restarts.
Discovered machines get status=discovered with hardware labels.
Installed machines get status=online with hostname, role, IP.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add console=ttyS0,115200n8 to both discover and install iPXE kernel
lines so Anaconda output is visible on serial during install phase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ksvalidator caught the issue: --level=info is not valid for F43.
Correct syntax is just: logging --host=<ip> --port=<port>
Also added ksvalidator syntax check to unit tests — validates
rendered kickstart for all roles (vanilla, worker, infra) against
F43 pykickstart. This catches kickstart syntax errors at test time
instead of during a 12-minute VM install.
Integration test passes: 21/22 (1 skipped: log lines capture).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The kickstart `logging --host` directive stalls Anaconda install —
likely firewall blocks UDP syslog or Fedora 43 Anaconda has issues
with it. Commented out for now. Syslog listener infrastructure is
in place and ready once we resolve the Anaconda/firewall issue.
Added vitest.integration.config.ts for running integration tests:
pnpm exec vitest run --config vitest.integration.config.ts
All 21 integration tests pass, serial console rsyslog forwarding works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Changed admin user from 'michal' to generic 'lab' user.
SSH key auth works for both root and lab user.
21/22 tests pass (1 skipped: log lines, needs log streamer redesign).
Bisection complete — all features work except background log streamer
which prevents Anaconda from syncing filesystem writes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bisection results:
- Step 2: bastion_log/bastion_error helpers — PASS
- Step 3: ERR trap in %post — PASS
- Step 4: background log streamer — FAIL (breaks boot, NOT included)
- Step 5: serial console on ttyS0 — PASS
The background log streamer (tail -f subprocess in %post) prevents
Anaconda from properly syncing the installed filesystem. This was
the root cause of all boot failures. Will need a different approach
for real-time log streaming.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All accumulated changes to kickstart template, test infrastructure,
and dnsmasq config. None of these produce a clean boot yet — saving
state before reverting to baseline for bisection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- VMs get serial console on TCP (PXE: port 4555, ISO: port 4556)
- serialExec() helper: runs commands via telnet when SSH/network is down
- PXE test: on SSH failure, dumps hostname, IP, NetworkManager, sshd,
failed units, and fstab via serial console before failing
- Kickstart enables serial-getty@ttyS0 for auto-login on serial
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Kickstart %post now restores network-first EFI boot order (undoes
Anaconda's disk-first default). Grep pattern includes HTTP boot entries.
- Test force-restarts VM after install so OVMF rereads NVRAM.
- VM successfully network-boots after install, hits /dispatch, bastion
returns exit (local boot). Confirmed in test logs.
- nofail on /boot/efi fstab entry prevents emergency mode.
- Remaining: Fedora disk boot after iPXE exit may still fail.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ARM integration test:
- arm-iso-provision.test.ts: aarch64 VM boots from bastion-generated ISO
- Uses QEMU aarch64 emulation (slow but validates the R1 scenario)
- Generous timeouts for emulated boot (15min discovery, 60min install)
- test-provision.sh updated: `sudo ./scripts/test-provision.sh arm`
VM boot fixes:
- setBootDisk() preserves UEFI loader/nvram when switching to disk boot
- /boot/efi mount gets nofail in fstab (prevents emergency mode in VMs)
- chronyd enable uses || true (fails in kickstart chroot)
- createIsoVm supports arch parameter for ARM VMs
Note: SSH-after-reboot in OVMF VMs still fails — OVMF doesn't respect
efibootmgr changes and loops PXE/HTTP Boot. Real hardware works fine.
The install flow itself (discovery → kickstart → complete) is validated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.
Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts
Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)
dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility
Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)
201 unit tests passing (11 new).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- `lab init bastion standalone start` now runs in background by default
- `--foreground` flag for running in foreground (debugging/containers)
- Shows startup output then detaches with PID + log path
- Status command uses /proc check instead of kill -0 (works cross-user)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Require root when dnsmasq is needed (clear error message)
- Handle stale PID files owned by different user (remove + recreate)
- Create bastion dir with 755 permissions
- 3 new PID file tests (30 total)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- build-rpm.sh: --arch flag for targeting x86_64 or arm64, --all for both
Uses bun cross-compile with --target=bun-linux-x64/arm64
- build-bastion.sh: --arch flag for Docker platform targeting
- release.sh: builds both architectures by default
- CI: builds + publishes RPM/DEB for both architectures
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scripts/release.sh: full release orchestration (build, publish, install)
- deploy/k3s/: Deployment, ConfigMap, PVC, Namespace with kustomize
hostNetwork for dnsmasq, NET_ADMIN caps, local-path PVC
- Infra role gets /var/lib/rancher partition (20GB, preserved on reprovision)
for k3s etcd data persistence across reinstalls
- Infra %post installs k3s server (INSTALL_K3S_SKIP_START=true)
- 5 new kickstart tests (27 total)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Progress callbacks from kickstart now show:
◆ 78:55:36:08:35:14 partitioning -- preparing disk layout
◆◆◆ 78:55:36:08:35:14 post-install -- configuring system
✔ 78:55:36:08:35:14 complete -- ready at 10.0.1.88
ssh michal@10.0.1.88
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Open firewall ports (dhcp, tftp, http, 4011) on bastion start
- Close firewall ports on bastion shutdown
- Auto-detect firewall zone for interface
- Fix reprovision SSH to use execFileSync with explicit key path
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>