Files
lab/bastion/docs/kickstart-reference.md
Michal 816736793d
Some checks failed
CI/CD / lint (pull_request) Failing after 22s
CI/CD / typecheck (pull_request) Failing after 22s
CI/CD / test (pull_request) Failing after 23s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
feat: debug --sshd flag, auto SSH + nc listener + IP callback
When using `labctl provision debug <target> --sshd`, the rescue
kickstart generates host keys, starts sshd (pw: debug) and nc
listener (port 2323), and reports the IP back to bastion via
/api/progress callback. Fully self-contained, no mounted FS needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:53:19 +01:00

4.1 KiB

Kickstart Reference — Lessons Learned

This documents pitfalls discovered during PXE boot testing. Read before modifying the kickstart template (src/bastion/src/templates/install.ks.ts).

Package requirements

kernel-modules is mandatory

@core only installs kernel-modules-core, which lacks common modules like vfat, zram, and many network/filesystem drivers. Without kernel-modules:

  • /boot/efi (FAT32) cannot mount → systemd-remount-fs fails → root stays read-only → sshd-keygen can't write host keys → SSH unreachable
  • zram-generator fails → can trigger emergency mode

Always include kernel-modules in %packages. This matches what the real labmaster (192.168.8.11) has installed.

Regression introduced in commit fac14b6 which removed @server-product (that group pulled in kernel-modules via fedora-release-server).

dosfstools is needed

Provides mkfs.vfat and ensures FAT filesystem support is available. The real labmaster has it installed.

Verify against the real machine

Before changing the package list, SSH to the labmaster and compare:

ssh 192.168.8.11 "rpm -q <package>"

Anaconda %post execution order

This is critical and not well documented:

  1. %pre scripts run
  2. Disk partitioning and formatting
  3. Package installation
  4. Anaconda writes system config (fstab, hostname, etc.)
  5. %post scripts run (in chroot of installed system)
  6. %post --nochroot scripts run
  7. Anaconda MAY overwrite fstab again after %post scripts

Consequence: You cannot reliably modify /etc/fstab from %post or %post --nochroot. Anaconda overwrites it. Tested and confirmed — both sed in %post and %post --nochroot had no effect on the final fstab.

What DOES work from %post:

  • Writing files to /etc/ (systemd units, config files, SSH keys)
  • Enabling/disabling systemd services
  • Installing additional packages
  • Running systemctl enable/mask

What does NOT work from %post:

  • Modifying /etc/fstab (Anaconda overwrites it)
  • --fsoptions on part /boot/efi (Anaconda ignores it for EFI partitions)

UEFI / EFI partition

  • Anaconda always creates an EFI System Partition for UEFI installs
  • The EFI partition is FAT32 — requires vfat kernel module to mount
  • If /boot/efi fails to mount, systemd-remount-fs fails, which leaves root as read-only. This cascades to break ALL services that need to write
  • The EFI partition is used by firmware directly for bootloader — the OS doesn't strictly need it mounted, but Anaconda adds it to fstab

VM-specific issues (libvirt/QEMU/OVMF)

iPXE exit behavior

  • exit (no args) returns EFI_SUCCESS → OVMF retries PXE, never reaches disk
  • exit 1 returns EFI_ABORTED → OVMF moves to next boot device (disk)
  • VM boot order needs both network and hd: --boot=uefi,network,hd

nftables

  • libvirt creates reject rules for NAT networks in table ip libvirt_network (NOT inet libvirt — this wrong table name cost hours of debugging)
  • These rules block new host→VM connections (SSH)
  • Rules are recreated on every virsh start — must delete after each VM restart
  • Chains: guest_input and guest_output

Serial console

  • VM serial port: --serial=tcp,host=127.0.0.1:4555,mode=bind,protocol=telnet
  • Use virsh console <vm-name> for interactive access (handles telnet protocol)
  • Raw socat works for reading but pagers/readline break interactive use
  • Add console=ttyS0,115200n8 to kernel args for boot output on serial

SELinux on labmaster

  • Set to permissive — this is for k3s/kubernetes, NOT because SSH needs it
  • SSH works fine with SELinux enforcing on a properly installed Fedora system
  • The ld.so.cache AVC denials seen during debugging were caused by the read-only root filesystem, not by SELinux policy

Testing checklist

Before merging kickstart changes:

  1. Check the real labmaster has the same packages: ssh 192.168.8.11 "rpm -q <pkg>"
  2. Run the PXE integration test: sudo pnpm run test:integration:pxe
  3. Verify via serial console (root / lab-root-pw) if SSH fails
  4. Check mount | grep " / " — must show rw, not ro
  5. Check systemctl --failed — no critical failures