- Server config now uses cluster-init: true for initial server (enables
embedded etcd). Joining servers get server: + token: in config.
- Cilium install already checks for existing installation, so joining
servers skip it gracefully (the "release name in use" error is non-fatal)
Cluster rebuilt as etcd HA:
worker0-k8s0 control-plane,etcd (initial server, cluster-init)
worker1-k8s0 control-plane,etcd (joined server, Mac Studio aarch64)
spark-2935 worker (DGX Spark, aarch64)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace sed with grep -v / awk for fstab manipulation (Asahi Fedora's
sed doesn't support \| delimiter or \? quantifier)
- Use idempotent write_lab_fstab function: removes all old entries first,
comments out conflicting btrfs subvol entries, adds fresh LVM entries
- Fix sed for SSH hardening: use #* instead of \? (POSIX compatible)
- Tested on Mac Studio: no duplicate fstab entries after multiple runs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Firstboot script defaults admin user to 'lab' instead of bastion's
config.adminUser (which was 'michal' from host system)
- iSCSI OS detection uses case-insensitive match for 'fedora'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
k3s host prep:
- Add iSCSI initiator install+enable (Fedora: iscsi-initiator-utils,
Ubuntu: open-iscsi) — required by Longhorn
- Add Longhorn disk label to k3s server+agent configs
- Add Longhorn disk annotation operation in post-install hardening
CLI:
- Add `labctl provision asahi` command with interactive install guide
- Change default SSH user from "michal" to "lab" in all commands
- Change admin user in bastion progress callback to "lab"
Asahi provisioning fixes:
- Download installer_data.json locally (installer reads it as file)
- Use REPO_BASE to serve upstream ZIP from bastion (LAN speed)
- Fix ZIP32 vs ZIP64: serve original upstream ZIP unmodified
(our repackaged ZIP used ZIP64 which breaks Asahi urlcache)
- Add /data/asahi-repo fallback path for k3s container PVC mount
- Deploy script syncs asahi-repo to bastion pod after deployment
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove hardcoded devices/directRoutingDevice from Cilium install (let
Cilium auto-detect per node — needed for heterogeneous NICs like eno1 vs enP7s7)
- Set k8sServiceHost=127.0.0.1 k8sServicePort=6444 so Cilium init
containers can reach the API via k3s agent's local LB proxy
- Add node-role.kubernetes.io/worker label to agent config
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.
Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts
Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)
dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility
Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)
201 unit tests passing (11 new).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>