# Kickstart Reference — Lessons Learned This documents pitfalls discovered during PXE boot testing. Read before modifying the kickstart template (`src/bastion/src/templates/install.ks.ts`). ## Package requirements ### `kernel-modules` is mandatory `@core` only installs `kernel-modules-core`, which lacks common modules like `vfat`, `zram`, and many network/filesystem drivers. Without `kernel-modules`: - `/boot/efi` (FAT32) cannot mount → `systemd-remount-fs` fails → **root stays read-only** → sshd-keygen can't write host keys → SSH unreachable - `zram-generator` fails → can trigger emergency mode **Always include `kernel-modules` in %packages.** This matches what the real labmaster (192.168.8.11) has installed. Regression introduced in commit `fac14b6` which removed `@server-product` (that group pulled in `kernel-modules` via `fedora-release-server`). ### `dosfstools` is needed Provides `mkfs.vfat` and ensures FAT filesystem support is available. The real labmaster has it installed. ### Verify against the real machine Before changing the package list, SSH to the labmaster and compare: ```bash ssh 192.168.8.11 "rpm -q " ``` ## Anaconda %post execution order This is critical and not well documented: 1. `%pre` scripts run 2. Disk partitioning and formatting 3. Package installation 4. **Anaconda writes system config (fstab, hostname, etc.)** 5. `%post` scripts run (in chroot of installed system) 6. `%post --nochroot` scripts run 7. **Anaconda MAY overwrite fstab again after %post scripts** **Consequence:** You cannot reliably modify `/etc/fstab` from `%post` or `%post --nochroot`. Anaconda overwrites it. Tested and confirmed — both `sed` in %post and %post --nochroot had no effect on the final fstab. What DOES work from %post: - Writing files to `/etc/` (systemd units, config files, SSH keys) - Enabling/disabling systemd services - Installing additional packages - Running `systemctl enable/mask` What does NOT work from %post: - Modifying `/etc/fstab` (Anaconda overwrites it) - `--fsoptions` on `part /boot/efi` (Anaconda ignores it for EFI partitions) ## UEFI / EFI partition - Anaconda always creates an EFI System Partition for UEFI installs - The EFI partition is FAT32 — requires `vfat` kernel module to mount - If `/boot/efi` fails to mount, `systemd-remount-fs` fails, which leaves root as read-only. This cascades to break ALL services that need to write - The EFI partition is used by firmware directly for bootloader — the OS doesn't strictly need it mounted, but Anaconda adds it to fstab ## VM-specific issues (libvirt/QEMU/OVMF) ### iPXE exit behavior - `exit` (no args) returns EFI_SUCCESS → OVMF retries PXE, never reaches disk - `exit 1` returns EFI_ABORTED → OVMF moves to next boot device (disk) - VM boot order needs both `network` and `hd`: `--boot=uefi,network,hd` ### nftables - libvirt creates reject rules for NAT networks in table `ip libvirt_network` (NOT `inet libvirt` — this wrong table name cost hours of debugging) - These rules block new host→VM connections (SSH) - Rules are recreated on every `virsh start` — must delete after each VM restart - Chains: `guest_input` and `guest_output` ### Serial console - VM serial port: `--serial=tcp,host=127.0.0.1:4555,mode=bind,protocol=telnet` - Use `virsh console ` for interactive access (handles telnet protocol) - Raw `socat` works for reading but pagers/readline break interactive use - Add `console=ttyS0,115200n8` to kernel args for boot output on serial ### SELinux on labmaster - Set to **permissive** — this is for k3s/kubernetes, NOT because SSH needs it - SSH works fine with SELinux enforcing on a properly installed Fedora system - The `ld.so.cache` AVC denials seen during debugging were caused by the read-only root filesystem, not by SELinux policy ## Testing checklist Before merging kickstart changes: 1. Check the real labmaster has the same packages: `ssh 192.168.8.11 "rpm -q "` 2. Run the PXE integration test: `sudo pnpm run test:integration:pxe` 3. Verify via serial console (root / `lab-root-pw`) if SSH fails 4. Check `mount | grep " / "` — must show `rw`, not `ro` 5. Check `systemctl --failed` — no critical failures