104 lines
4.1 KiB
Markdown
104 lines
4.1 KiB
Markdown
|
|
# Kickstart Reference — Lessons Learned
|
||
|
|
|
||
|
|
This documents pitfalls discovered during PXE boot testing. Read before modifying
|
||
|
|
the kickstart template (`src/bastion/src/templates/install.ks.ts`).
|
||
|
|
|
||
|
|
## Package requirements
|
||
|
|
|
||
|
|
### `kernel-modules` is mandatory
|
||
|
|
|
||
|
|
`@core` only installs `kernel-modules-core`, which lacks common modules like `vfat`,
|
||
|
|
`zram`, and many network/filesystem drivers. Without `kernel-modules`:
|
||
|
|
|
||
|
|
- `/boot/efi` (FAT32) cannot mount → `systemd-remount-fs` fails → **root stays
|
||
|
|
read-only** → sshd-keygen can't write host keys → SSH unreachable
|
||
|
|
- `zram-generator` fails → can trigger emergency mode
|
||
|
|
|
||
|
|
**Always include `kernel-modules` in %packages.** This matches what the real
|
||
|
|
labmaster (192.168.8.11) has installed.
|
||
|
|
|
||
|
|
Regression introduced in commit `fac14b6` which removed `@server-product`
|
||
|
|
(that group pulled in `kernel-modules` via `fedora-release-server`).
|
||
|
|
|
||
|
|
### `dosfstools` is needed
|
||
|
|
|
||
|
|
Provides `mkfs.vfat` and ensures FAT filesystem support is available. The real
|
||
|
|
labmaster has it installed.
|
||
|
|
|
||
|
|
### Verify against the real machine
|
||
|
|
|
||
|
|
Before changing the package list, SSH to the labmaster and compare:
|
||
|
|
```bash
|
||
|
|
ssh 192.168.8.11 "rpm -q <package>"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Anaconda %post execution order
|
||
|
|
|
||
|
|
This is critical and not well documented:
|
||
|
|
|
||
|
|
1. `%pre` scripts run
|
||
|
|
2. Disk partitioning and formatting
|
||
|
|
3. Package installation
|
||
|
|
4. **Anaconda writes system config (fstab, hostname, etc.)**
|
||
|
|
5. `%post` scripts run (in chroot of installed system)
|
||
|
|
6. `%post --nochroot` scripts run
|
||
|
|
7. **Anaconda MAY overwrite fstab again after %post scripts**
|
||
|
|
|
||
|
|
**Consequence:** You cannot reliably modify `/etc/fstab` from `%post` or
|
||
|
|
`%post --nochroot`. Anaconda overwrites it. Tested and confirmed — both
|
||
|
|
`sed` in %post and %post --nochroot had no effect on the final fstab.
|
||
|
|
|
||
|
|
What DOES work from %post:
|
||
|
|
- Writing files to `/etc/` (systemd units, config files, SSH keys)
|
||
|
|
- Enabling/disabling systemd services
|
||
|
|
- Installing additional packages
|
||
|
|
- Running `systemctl enable/mask`
|
||
|
|
|
||
|
|
What does NOT work from %post:
|
||
|
|
- Modifying `/etc/fstab` (Anaconda overwrites it)
|
||
|
|
- `--fsoptions` on `part /boot/efi` (Anaconda ignores it for EFI partitions)
|
||
|
|
|
||
|
|
## UEFI / EFI partition
|
||
|
|
|
||
|
|
- Anaconda always creates an EFI System Partition for UEFI installs
|
||
|
|
- The EFI partition is FAT32 — requires `vfat` kernel module to mount
|
||
|
|
- If `/boot/efi` fails to mount, `systemd-remount-fs` fails, which leaves
|
||
|
|
root as read-only. This cascades to break ALL services that need to write
|
||
|
|
- The EFI partition is used by firmware directly for bootloader — the OS
|
||
|
|
doesn't strictly need it mounted, but Anaconda adds it to fstab
|
||
|
|
|
||
|
|
## VM-specific issues (libvirt/QEMU/OVMF)
|
||
|
|
|
||
|
|
### iPXE exit behavior
|
||
|
|
- `exit` (no args) returns EFI_SUCCESS → OVMF retries PXE, never reaches disk
|
||
|
|
- `exit 1` returns EFI_ABORTED → OVMF moves to next boot device (disk)
|
||
|
|
- VM boot order needs both `network` and `hd`: `--boot=uefi,network,hd`
|
||
|
|
|
||
|
|
### nftables
|
||
|
|
- libvirt creates reject rules for NAT networks in table `ip libvirt_network`
|
||
|
|
(NOT `inet libvirt` — this wrong table name cost hours of debugging)
|
||
|
|
- These rules block new host→VM connections (SSH)
|
||
|
|
- Rules are recreated on every `virsh start` — must delete after each VM restart
|
||
|
|
- Chains: `guest_input` and `guest_output`
|
||
|
|
|
||
|
|
### Serial console
|
||
|
|
- VM serial port: `--serial=tcp,host=127.0.0.1:4555,mode=bind,protocol=telnet`
|
||
|
|
- Use `virsh console <vm-name>` for interactive access (handles telnet protocol)
|
||
|
|
- Raw `socat` works for reading but pagers/readline break interactive use
|
||
|
|
- Add `console=ttyS0,115200n8` to kernel args for boot output on serial
|
||
|
|
|
||
|
|
### SELinux on labmaster
|
||
|
|
- Set to **permissive** — this is for k3s/kubernetes, NOT because SSH needs it
|
||
|
|
- SSH works fine with SELinux enforcing on a properly installed Fedora system
|
||
|
|
- The `ld.so.cache` AVC denials seen during debugging were caused by the
|
||
|
|
read-only root filesystem, not by SELinux policy
|
||
|
|
|
||
|
|
## Testing checklist
|
||
|
|
|
||
|
|
Before merging kickstart changes:
|
||
|
|
1. Check the real labmaster has the same packages: `ssh 192.168.8.11 "rpm -q <pkg>"`
|
||
|
|
2. Run the PXE integration test: `sudo pnpm run test:integration:pxe`
|
||
|
|
3. Verify via serial console (root / `lab-root-pw`) if SSH fails
|
||
|
|
4. Check `mount | grep " / "` — must show `rw`, not `ro`
|
||
|
|
5. Check `systemctl --failed` — no critical failures
|