feat: install logging, error trapping, PXE/ISO integration tests
Some checks failed
CI/CD / lint (pull_request) Failing after 13s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 36s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped

Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.

Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts

Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)

dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility

Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)

201 unit tests passing (11 new).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Michal
2026-03-26 22:26:33 +00:00
parent ffc4a782d2
commit 46b017d77e
189 changed files with 16241 additions and 432 deletions

View File

@@ -10,3 +10,4 @@ data:
DHCP_MODE: "proxy"
TIMEZONE: "Europe/London"
LOCALE: "en_GB.UTF-8"
LABD_URL: "http://labd.lab-system.svc.cluster.local:3100"

View File

@@ -7,6 +7,8 @@ metadata:
app: bastion
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: bastion
@@ -15,10 +17,18 @@ spec:
labels:
app: bastion
spec:
imagePullSecrets:
- name: gitea-registry
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
dnsConfig:
options:
- name: ndots
value: "1"
containers:
- name: bastion
image: mysources.co.uk/michal/lab-bastion:latest
image: mysources.co.uk/michal/lab/bastion:latest
imagePullPolicy: Always
command:
- node
- src/cli/dist/index.js
@@ -26,9 +36,16 @@ spec:
- bastion
- standalone
- start
- --foreground
envFrom:
- configMapRef:
name: bastion-config
env:
- name: BASTION_JOIN_TOKEN
valueFrom:
secretKeyRef:
name: bastion-join-token
key: token
ports:
- containerPort: 8080
name: http
@@ -43,17 +60,21 @@ spec:
add:
- NET_ADMIN
- NET_RAW
startupProbe:
httpGet:
path: /api/machines
port: 8080
failureThreshold: 60
periodSeconds: 10
livenessProbe:
httpGet:
path: /api/machines
port: 8080
initialDelaySeconds: 15
periodSeconds: 30
readinessProbe:
httpGet:
path: /api/machines
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
volumes:
- name: state