- VMs get serial console on TCP (PXE: port 4555, ISO: port 4556)
- serialExec() helper: runs commands via telnet when SSH/network is down
- PXE test: on SSH failure, dumps hostname, IP, NetworkManager, sshd,
failed units, and fstab via serial console before failing
- Kickstart enables serial-getty@ttyS0 for auto-login on serial
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Kickstart %post now restores network-first EFI boot order (undoes
Anaconda's disk-first default). Grep pattern includes HTTP boot entries.
- Test force-restarts VM after install so OVMF rereads NVRAM.
- VM successfully network-boots after install, hits /dispatch, bastion
returns exit (local boot). Confirmed in test logs.
- nofail on /boot/efi fstab entry prevents emergency mode.
- Remaining: Fedora disk boot after iPXE exit may still fail.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ARM integration test:
- arm-iso-provision.test.ts: aarch64 VM boots from bastion-generated ISO
- Uses QEMU aarch64 emulation (slow but validates the R1 scenario)
- Generous timeouts for emulated boot (15min discovery, 60min install)
- test-provision.sh updated: `sudo ./scripts/test-provision.sh arm`
VM boot fixes:
- setBootDisk() preserves UEFI loader/nvram when switching to disk boot
- /boot/efi mount gets nofail in fstab (prevents emergency mode in VMs)
- chronyd enable uses || true (fails in kickstart chroot)
- createIsoVm supports arch parameter for ARM VMs
Note: SSH-after-reboot in OVMF VMs still fails — OVMF doesn't respect
efibootmgr changes and loops PXE/HTTP Boot. Real hardware works fine.
The install flow itself (discovery → kickstart → complete) is validated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.
Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts
Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)
dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility
Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)
201 unit tests passing (11 new).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- `lab init bastion standalone start` now runs in background by default
- `--foreground` flag for running in foreground (debugging/containers)
- Shows startup output then detaches with PID + log path
- Status command uses /proc check instead of kill -0 (works cross-user)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Require root when dnsmasq is needed (clear error message)
- Handle stale PID files owned by different user (remove + recreate)
- Create bastion dir with 755 permissions
- 3 new PID file tests (30 total)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- build-rpm.sh: --arch flag for targeting x86_64 or arm64, --all for both
Uses bun cross-compile with --target=bun-linux-x64/arm64
- build-bastion.sh: --arch flag for Docker platform targeting
- release.sh: builds both architectures by default
- CI: builds + publishes RPM/DEB for both architectures
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scripts/release.sh: full release orchestration (build, publish, install)
- deploy/k3s/: Deployment, ConfigMap, PVC, Namespace with kustomize
hostNetwork for dnsmasq, NET_ADMIN caps, local-path PVC
- Infra role gets /var/lib/rancher partition (20GB, preserved on reprovision)
for k3s etcd data persistence across reinstalls
- Infra %post installs k3s server (INSTALL_K3S_SKIP_START=true)
- 5 new kickstart tests (27 total)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Progress callbacks from kickstart now show:
◆ 78:55:36:08:35:14 partitioning -- preparing disk layout
◆◆◆ 78:55:36:08:35:14 post-install -- configuring system
✔ 78:55:36:08:35:14 complete -- ready at 10.0.1.88
ssh michal@10.0.1.88
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Open firewall ports (dhcp, tftp, http, 4011) on bastion start
- Close firewall ports on bastion shutdown
- Auto-detect firewall zone for interface
- Fix reprovision SSH to use execFileSync with explicit key path
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables running the TS bastion without dnsmasq for testing.
VM-tested: SSH works, partitions correct, k3s prereqs configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full rewrite of the bash bastion.sh into a TypeScript application:
- Fastify HTTP server with typed routes (dispatch, kickstart, API)
- Commander CLI (serve, install, list, reprovision)
- Kickstart templates as TypeScript template literals (no more heredoc hell)
- dnsmasq management via execa subprocess
- Merged machine list view (hardware + install info in one table)
- Containerized via podman-compose (Dockerfile + docker-compose.yml)
- All partition logic preserved (LVM, reprovision detection, role-based)
Not yet tested end-to-end — needs VM validation before replacing bash version.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- LVM partition layout: /, /var, /var/log, /home, /srv, swap, tmpfs /tmp
plus /var/lib/longhorn for worker role (grows to fill disk)
- Reprovision preserves /home, /srv, /var/lib/longhorn via %pre detection
- Admin user created matching the user running the bastion script
with SSH keys from authorized_keys + local pubkeys, passwordless sudo
- Progress callbacks from %pre and %post to /api/progress endpoint
with IP reported on completion (ssh command printed)
- Installed machines boot from local disk (iPXE exit) instead of
re-entering discovery mode
- --role worker|infra flag (infra skips longhorn partition)
- reprovision subcommand: queues install + SSH reboot into PXE
- Self-cleanup: kills old bastion instances on start
- Domain config (DOMAIN env, default ad.itaz.eu)
- efibootmgr in %post to set local disk first in boot order
- k3s prereqs: kernel modules, sysctl, firewalld disabled, chrony
- VM reprovision test script (test-reprovision.sh)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix Content-Length using byte count instead of character count
(em dash in iPXE scripts caused mismatch, breaking iPXE chain)
- Use firewall zone-aware commands matching interface zone
- Add UEFI HTTP Boot support (arch 16/20) alongside PXE TFTP
- Add pxe-service directives for proper proxy DHCP responses
- Use bind-dynamic instead of bind-interfaces for bridge compat
- Add tftp-no-blocksize for UEFI firmware compatibility
- Use local ipxe packages instead of downloading from internet
- Add custom UEFI PXE loader stub (pxeloader.c) for chainloading
- Enable HTTP request logging for debugging boot issues
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrote bastion from install-only to discover-first flow:
- Default mode discovers hardware (PXE boot → inventory → poweroff)
- Discovered machines promoted to install via subcommand
- Per-MAC iPXE dispatch (/dispatch?mac=) routes discover vs install
- Python HTTP server with discovery API, state management, kickstart gen
- Added full DHCP mode (DHCP_MODE=full) for isolated/test networks
- Added arm64 UEFI support (client-arch 11, iPXE arm64 binary)
- Added QEMU test script (aarch64+KVM on Asahi Linux)
- All API endpoints unit tested and working
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>