Documents the 2026-03-30 debugging session: root cause (console=ttyS0 on UART-less hardware), what was tried, what was fixed, and remaining work items. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.8 KiB
PXE Boot Debugging Session — 2026-03-30
Problem
Beelink SER Mini Pro (AMD Ryzen 7 255, Radeon 780M, 64GB DDR5, 1TB NVMe) boots Fedora 43 100x slower than normal after PXE kickstart install. Every systemd boot phase takes ~30 seconds. The Anaconda installer/rescue mode boots fast on the same hardware.
Root Cause
console=ttyS0,115200n8 in kernel cmdline — added via kickstart bootloader --append during install.
This mini PC has no physical serial UART. When systemd writes to ttyS0, each log write blocks for ~30 seconds waiting for the non-existent UART hardware. Since systemd logs at every phase transition, the total boot time was 10+ minutes.
The Anaconda installer was unaffected because it uses a different init flow that doesn't go through the same systemd phase transitions.
How We Found It
Hours of systematic elimination:
| What we tried | Result | Ruled out |
|---|---|---|
modprobe.blacklist=amdgpu |
No change | GPU driver |
amd_iommu=off |
No change | IOMMU |
| Rebuild initramfs without plymouth/drm/fips | No change | Initramfs bloat |
| systemd-boot instead of GRUB | Still slow | Bootloader |
| PXE-boot kernel+initrd (skip local GRUB entirely) | Still slow | Local bootloader/firmware |
| Disable TPM in BIOS | No change | TPM |
Remove resume= + resume dracut module |
No change | Hibernate resume |
| Manual LVM activation in rescue shell | Fast | NVMe/LVM themselves |
Remove console=ttyS0,115200n8 from GRUB |
FAST BOOT | This was it |
The key breakthrough was noticing the timestamps showed exactly 30-second gaps between boot phases — a timeout pattern, not general slowness. Then realising the serial console was added during install and had never been tested without.
What Was Fixed (PR #4, merged)
1. Removed serial console from kickstart
- Removed
console=ttyS0,115200n8frombootloader --append - Removed
serial-getty@ttyS0.serviceenablement - Removed rsyslog serial forwarding
2. Enabled Anaconda syslog forwarding
- Uncommented
logging --host --portdirective in kickstart - Bastion's SyslogListener was already built — just needed IP→MAC resolution improvement
- Added
registerIp()calls from kickstart fetch and progress callbacks - Added syslog listener unit tests
3. Fixed disk auto-detection
- Default disk changed from
/dev/sdato""(auto-detect) in labd route and bastion command handler - The kickstart
%preauto-detect logic probes nvme0n1, sda, sdb, vda in order - Without this fix, NVMe-only machines (like the SER Mini Pro) fail immediately
4. SysRq magic keys
- Added
kernel.sysrq=1sysctl to kickstart%post - Enables Alt+SysRq+REISUB via JetKVM for emergency reboot of stuck machines
5. Simplified debug command
- Removed
--sshdflag (SSH always available viainst.sshd+sshpwin rescue mode) - Added
/debug-setup.shHTTP endpoint for nc listener setup from rescue shell - Cleaned up
sshdfield from DebugConfig, protocol types, all routes
6. Added labctl provision logs -f
- Follow mode with 5-second polling for real-time install monitoring
What Works
- PXE discovery → install → boot — full flow works end-to-end
- Anaconda syslog forwarding — install logs stream to bastion
- Progress callbacks — stage-by-stage install tracking via curl
- Auto disk detection — works for NVMe and SATA
- Debug rescue mode —
labctl provision debug <target>boots Anaconda rescue with SSH - Network-first boot order — bastion controls every reboot via efibootmgr
- SysRq keys — emergency reboot via JetKVM keyboard
What Doesn't Work / Known Issues
--sshdin rescue mode — Anaconda rescue mode skips both%preand%postkickstart sections.inst.sshd+sshpwshould provide SSH access, but hasn't been verified end-to-end yet. The/debug-setup.shcurl workaround exists for nc.- arm64 container build — iPXE cross-compilation fails on arm64 (GCC flag incompatibility). Workaround: build with
--platforms linux/amd64only. - Integration test SSH timeout — VM boots fine but SSH times out due to libvirt nftables reject rules after VM restart. Test infrastructure issue, not a code bug.
What Was Skipped / Left To Do
- Syslog UDP port in k3s — works because bastion uses
hostNetwork: true, but should be documented properly - Background log streamer — the old
tail -fapproach broke Anaconda filesystem sync. Replaced with syslog forwarding. If more granular %post logging is needed, a synchronous log push at end of %post would be safe. - Per-machine hardware overrides — turned out not to be needed (serial console was the only "special" setting, and removing it is universal)
- Ubuntu autoinstall disk default —
ubuntu-autoinstall.tsstill hasdisk || "/dev/sda"fallback (line 38), should be changed to auto-detect - Verify
inst.sshdworks in rescue mode — test SSH with password "debug" next time debug mode is used - Re-enable TPM in BIOS — was disabled during debugging, should be factory-reset (user plans to reset BIOS to factory)
Key Learnings
console=ttyS0on hardware without UART = 30s timeout per boot phase. Never add serial console to kernel cmdline unless the hardware has a verified physical UART.- Exactly-N-second gaps in boot logs = timeout, not slowness. Look for the timeout source, not performance issues.
- The bisection approach works. Systematically removing features one at a time found the root cause. But it took hours because the serial console was added early and seemed harmless.
- Anaconda rescue mode is limited. It skips
%preand%post, so you can't automate setup via kickstart. Useinst.sshd+sshpwfor SSH, and serve helper scripts via HTTP for everything else. - Default disk paths break NVMe machines. Always default to auto-detect (empty string) rather than
/dev/sda.