Radeon 780M GPU driver initialization hangs during Anaconda boot
on SER9MAX. nomodeset disables kernel modesetting so the installer
doesn't try to initialize the GPU.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When bastion syncs state, labd now upserts discovered and installed
machines into the Server table. /api/machines merges live bastion
state with DB records, so machines survive pod restarts.
Discovered machines get status=discovered with hardware labels.
Installed machines get status=online with hostname, role, IP.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add console=ttyS0,115200n8 to both discover and install iPXE kernel
lines so Anaconda output is visible on serial during install phase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ksvalidator caught the issue: --level=info is not valid for F43.
Correct syntax is just: logging --host=<ip> --port=<port>
Also added ksvalidator syntax check to unit tests — validates
rendered kickstart for all roles (vanilla, worker, infra) against
F43 pykickstart. This catches kickstart syntax errors at test time
instead of during a 12-minute VM install.
Integration test passes: 21/22 (1 skipped: log lines capture).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The kickstart `logging --host` directive stalls Anaconda install —
likely firewall blocks UDP syslog or Fedora 43 Anaconda has issues
with it. Commented out for now. Syslog listener infrastructure is
in place and ready once we resolve the Anaconda/firewall issue.
Added vitest.integration.config.ts for running integration tests:
pnpm exec vitest run --config vitest.integration.config.ts
All 21 integration tests pass, serial console rsyslog forwarding works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bisection results:
- Step 2: bastion_log/bastion_error helpers — PASS
- Step 3: ERR trap in %post — PASS
- Step 4: background log streamer — FAIL (breaks boot, NOT included)
- Step 5: serial console on ttyS0 — PASS
The background log streamer (tail -f subprocess in %post) prevents
Anaconda from properly syncing the installed filesystem. This was
the root cause of all boot failures. Will need a different approach
for real-time log streaming.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All accumulated changes to kickstart template, test infrastructure,
and dnsmasq config. None of these produce a clean boot yet — saving
state before reverting to baseline for bisection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- VMs get serial console on TCP (PXE: port 4555, ISO: port 4556)
- serialExec() helper: runs commands via telnet when SSH/network is down
- PXE test: on SSH failure, dumps hostname, IP, NetworkManager, sshd,
failed units, and fstab via serial console before failing
- Kickstart enables serial-getty@ttyS0 for auto-login on serial
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Kickstart %post now restores network-first EFI boot order (undoes
Anaconda's disk-first default). Grep pattern includes HTTP boot entries.
- Test force-restarts VM after install so OVMF rereads NVRAM.
- VM successfully network-boots after install, hits /dispatch, bastion
returns exit (local boot). Confirmed in test logs.
- nofail on /boot/efi fstab entry prevents emergency mode.
- Remaining: Fedora disk boot after iPXE exit may still fail.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ARM integration test:
- arm-iso-provision.test.ts: aarch64 VM boots from bastion-generated ISO
- Uses QEMU aarch64 emulation (slow but validates the R1 scenario)
- Generous timeouts for emulated boot (15min discovery, 60min install)
- test-provision.sh updated: `sudo ./scripts/test-provision.sh arm`
VM boot fixes:
- setBootDisk() preserves UEFI loader/nvram when switching to disk boot
- /boot/efi mount gets nofail in fstab (prevents emergency mode in VMs)
- chronyd enable uses || true (fails in kickstart chroot)
- createIsoVm supports arch parameter for ARM VMs
Note: SSH-after-reboot in OVMF VMs still fails — OVMF doesn't respect
efibootmgr changes and loops PXE/HTTP Boot. Real hardware works fine.
The install flow itself (discovery → kickstart → complete) is validated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.
Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts
Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)
dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility
Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)
201 unit tests passing (11 new).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- `lab init bastion standalone start` now runs in background by default
- `--foreground` flag for running in foreground (debugging/containers)
- Shows startup output then detaches with PID + log path
- Status command uses /proc check instead of kill -0 (works cross-user)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Require root when dnsmasq is needed (clear error message)
- Handle stale PID files owned by different user (remove + recreate)
- Create bastion dir with 755 permissions
- 3 new PID file tests (30 total)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scripts/release.sh: full release orchestration (build, publish, install)
- deploy/k3s/: Deployment, ConfigMap, PVC, Namespace with kustomize
hostNetwork for dnsmasq, NET_ADMIN caps, local-path PVC
- Infra role gets /var/lib/rancher partition (20GB, preserved on reprovision)
for k3s etcd data persistence across reinstalls
- Infra %post installs k3s server (INSTALL_K3S_SKIP_START=true)
- 5 new kickstart tests (27 total)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Progress callbacks from kickstart now show:
◆ 78:55:36:08:35:14 partitioning -- preparing disk layout
◆◆◆ 78:55:36:08:35:14 post-install -- configuring system
✔ 78:55:36:08:35:14 complete -- ready at 10.0.1.88
ssh michal@10.0.1.88
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Open firewall ports (dhcp, tftp, http, 4011) on bastion start
- Close firewall ports on bastion shutdown
- Auto-detect firewall zone for interface
- Fix reprovision SSH to use execFileSync with explicit key path
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables running the TS bastion without dnsmasq for testing.
VM-tested: SSH works, partitions correct, k3s prereqs configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full rewrite of the bash bastion.sh into a TypeScript application:
- Fastify HTTP server with typed routes (dispatch, kickstart, API)
- Commander CLI (serve, install, list, reprovision)
- Kickstart templates as TypeScript template literals (no more heredoc hell)
- dnsmasq management via execa subprocess
- Merged machine list view (hardware + install info in one table)
- Containerized via podman-compose (Dockerfile + docker-compose.yml)
- All partition logic preserved (LVM, reprovision detection, role-based)
Not yet tested end-to-end — needs VM validation before replacing bash version.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>