feat: install logging, error trapping, PXE/ISO integration tests
Some checks failed
CI/CD / lint (pull_request) Failing after 13s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 36s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped

Kickstart installs on real hardware failed silently — no error reporting,
only 3 progress callbacks, zero log streaming. This overhaul makes every
install fully observable.

Kickstart improvements:
- Error trapping in %pre and %post (trap ERR sends failure details to bastion)
- 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata
- Background log streamer: tails %post output and batch-sends to /api/log
- bastion_log() function for explicit log lines from kickstart scripts

Bastion API:
- POST /api/log — receives raw log lines from kickstart (single or batch)
- InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence
- GET /api/logs/:mac — now returns log_lines + log_total alongside stages
- SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log)
- Progress events forwarded to labd via bastion-progress WebSocket message
- Post-provision k3s logs routed through progressBus (was console-only)

dnsmasq fixes found during VM testing:
- HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach)
- pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode)
- PXEClient vendor class echo for UEFI firmware compatibility

Integration tests:
- PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install
- ISO boot test: blank VM boots from bastion-generated ISO → same flow
- Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot)
- test-provision.sh: runs both PXE + ISO tests with prerequisite checks
- 250GB sparse QCOW2 disk (LVM layout needs ~204GB)

201 unit tests passing (11 new).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Michal
2026-03-26 22:26:33 +00:00
parent ffc4a782d2
commit 46b017d77e
189 changed files with 16241 additions and 432 deletions

View File

@@ -10,3 +10,4 @@ data:
DHCP_MODE: "proxy"
TIMEZONE: "Europe/London"
LOCALE: "en_GB.UTF-8"
LABD_URL: "http://labd.lab-system.svc.cluster.local:3100"

View File

@@ -7,6 +7,8 @@ metadata:
app: bastion
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: bastion
@@ -15,10 +17,18 @@ spec:
labels:
app: bastion
spec:
imagePullSecrets:
- name: gitea-registry
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
dnsConfig:
options:
- name: ndots
value: "1"
containers:
- name: bastion
image: mysources.co.uk/michal/lab-bastion:latest
image: mysources.co.uk/michal/lab/bastion:latest
imagePullPolicy: Always
command:
- node
- src/cli/dist/index.js
@@ -26,9 +36,16 @@ spec:
- bastion
- standalone
- start
- --foreground
envFrom:
- configMapRef:
name: bastion-config
env:
- name: BASTION_JOIN_TOKEN
valueFrom:
secretKeyRef:
name: bastion-join-token
key: token
ports:
- containerPort: 8080
name: http
@@ -43,17 +60,21 @@ spec:
add:
- NET_ADMIN
- NET_RAW
startupProbe:
httpGet:
path: /api/machines
port: 8080
failureThreshold: 60
periodSeconds: 10
livenessProbe:
httpGet:
path: /api/machines
port: 8080
initialDelaySeconds: 15
periodSeconds: 30
readinessProbe:
httpGet:
path: /api/machines
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
volumes:
- name: state

View File

@@ -0,0 +1,8 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: labd-config
data:
LABD_PORT: "3100"
LABD_HOST: "0.0.0.0"
LABD_LOG_LEVEL: "info"

View File

@@ -0,0 +1,44 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: labd
spec:
replicas: 1
selector:
matchLabels:
app: labd
template:
metadata:
labels:
app: labd
spec:
containers:
- name: labd
image: mysources.co.uk/michal/lab/labd:latest
imagePullPolicy: Always
ports:
- containerPort: 3100
envFrom:
- configMapRef:
name: labd-config
- secretRef:
name: labd-secrets
livenessProbe:
httpGet:
path: /health/live
port: 3100
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /health/ready
port: 3100
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi

View File

@@ -0,0 +1,18 @@
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: labd
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: labd
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

View File

@@ -0,0 +1,14 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: lab-infra
commonLabels:
app: labd
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
- hpa.yaml
- pdb.yaml

View File

@@ -0,0 +1,9 @@
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: labd
spec:
maxUnavailable: 1
selector:
matchLabels:
app: labd

View File

@@ -0,0 +1,12 @@
apiVersion: v1
kind: Service
metadata:
name: labd
spec:
type: ClusterIP
selector:
app: labd
ports:
- port: 3100
targetPort: 3100
protocol: TCP