# PXE Boot Debugging Session — 2026-03-30 ## Problem Beelink SER Mini Pro (AMD Ryzen 7 255, Radeon 780M, 64GB DDR5, 1TB NVMe) boots Fedora 43 100x slower than normal after PXE kickstart install. Every systemd boot phase takes ~30 seconds. The Anaconda installer/rescue mode boots fast on the same hardware. ## Root Cause **`console=ttyS0,115200n8` in kernel cmdline** — added via kickstart `bootloader --append` during install. This mini PC has **no physical serial UART**. When systemd writes to ttyS0, each log write blocks for ~30 seconds waiting for the non-existent UART hardware. Since systemd logs at every phase transition, the total boot time was 10+ minutes. The Anaconda installer was unaffected because it uses a different init flow that doesn't go through the same systemd phase transitions. ## How We Found It Hours of systematic elimination: | What we tried | Result | Ruled out | |---|---|---| | `modprobe.blacklist=amdgpu` | No change | GPU driver | | `amd_iommu=off` | No change | IOMMU | | Rebuild initramfs without plymouth/drm/fips | No change | Initramfs bloat | | systemd-boot instead of GRUB | Still slow | Bootloader | | PXE-boot kernel+initrd (skip local GRUB entirely) | Still slow | Local bootloader/firmware | | Disable TPM in BIOS | No change | TPM | | Remove `resume=` + resume dracut module | No change | Hibernate resume | | Manual LVM activation in rescue shell | **Fast** | NVMe/LVM themselves | | Remove `console=ttyS0,115200n8` from GRUB | **FAST BOOT** | **This was it** | The key breakthrough was noticing the timestamps showed **exactly 30-second gaps** between boot phases — a timeout pattern, not general slowness. Then realising the serial console was added during install and had never been tested without. ## What Was Fixed (PR #4, merged) ### 1. Removed serial console from kickstart - Removed `console=ttyS0,115200n8` from `bootloader --append` - Removed `serial-getty@ttyS0.service` enablement - Removed rsyslog serial forwarding ### 2. Enabled Anaconda syslog forwarding - Uncommented `logging --host --port` directive in kickstart - Bastion's SyslogListener was already built — just needed IP→MAC resolution improvement - Added `registerIp()` calls from kickstart fetch and progress callbacks - Added syslog listener unit tests ### 3. Fixed disk auto-detection - Default disk changed from `/dev/sda` to `""` (auto-detect) in labd route and bastion command handler - The kickstart `%pre` auto-detect logic probes nvme0n1, sda, sdb, vda in order - Without this fix, NVMe-only machines (like the SER Mini Pro) fail immediately ### 4. SysRq magic keys - Added `kernel.sysrq=1` sysctl to kickstart `%post` - Enables Alt+SysRq+REISUB via JetKVM for emergency reboot of stuck machines ### 5. Simplified debug command - Removed `--sshd` flag (SSH always available via `inst.sshd` + `sshpw` in rescue mode) - Added `/debug-setup.sh` HTTP endpoint for nc listener setup from rescue shell - Cleaned up `sshd` field from DebugConfig, protocol types, all routes ### 6. Added `labctl provision logs -f` - Follow mode with 5-second polling for real-time install monitoring ## What Works - **PXE discovery → install → boot** — full flow works end-to-end - **Anaconda syslog forwarding** — install logs stream to bastion - **Progress callbacks** — stage-by-stage install tracking via curl - **Auto disk detection** — works for NVMe and SATA - **Debug rescue mode** — `labctl provision debug ` boots Anaconda rescue with SSH - **Network-first boot order** — bastion controls every reboot via efibootmgr - **SysRq keys** — emergency reboot via JetKVM keyboard ## What Doesn't Work / Known Issues - **`--sshd` in rescue mode** — Anaconda rescue mode skips both `%pre` and `%post` kickstart sections. `inst.sshd` + `sshpw` should provide SSH access, but hasn't been verified end-to-end yet. The `/debug-setup.sh` curl workaround exists for nc. - **arm64 container build** — iPXE cross-compilation fails on arm64 (GCC flag incompatibility). Workaround: build with `--platforms linux/amd64` only. - **Integration test SSH timeout** — VM boots fine but SSH times out due to libvirt nftables reject rules after VM restart. Test infrastructure issue, not a code bug. ## What Was Skipped / Left To Do 1. **Syslog UDP port in k3s** — works because bastion uses `hostNetwork: true`, but should be documented properly 2. **Background log streamer** — the old `tail -f` approach broke Anaconda filesystem sync. Replaced with syslog forwarding. If more granular %post logging is needed, a synchronous log push at end of %post would be safe. 3. **Per-machine hardware overrides** — turned out not to be needed (serial console was the only "special" setting, and removing it is universal) 4. **Ubuntu autoinstall disk default** — `ubuntu-autoinstall.ts` still has `disk || "/dev/sda"` fallback (line 38), should be changed to auto-detect 5. **Verify `inst.sshd` works in rescue mode** — test SSH with password "debug" next time debug mode is used 6. **Re-enable TPM in BIOS** — was disabled during debugging, should be factory-reset (user plans to reset BIOS to factory) ## Key Learnings 1. **`console=ttyS0` on hardware without UART = 30s timeout per boot phase.** Never add serial console to kernel cmdline unless the hardware has a verified physical UART. 2. **Exactly-N-second gaps in boot logs = timeout, not slowness.** Look for the timeout source, not performance issues. 3. **The bisection approach works.** Systematically removing features one at a time found the root cause. But it took hours because the serial console was added early and seemed harmless. 4. **Anaconda rescue mode is limited.** It skips `%pre` and `%post`, so you can't automate setup via kickstart. Use `inst.sshd` + `sshpw` for SSH, and serve helper scripts via HTTP for everything else. 5. **Default disk paths break NVMe machines.** Always default to auto-detect (empty string) rather than `/dev/sda`.