23 Commits

Author SHA1 Message Date
Michal
dd92147341 fix(k3s): route audit logs through journald, codify etcd member recovery
Some checks failed
CI/CD / typecheck (pull_request) Failing after 13s
CI/CD / lint (pull_request) Failing after 23s
CI/CD / test (pull_request) Failing after 10s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
Two changes prompted by today's etcd raft panic on worker1-k8s0
(tocommit out of range, lost-write on follower) and the cascading
disk pressure that surfaced underneath it.

Audit logs to journald
- kube-apiserver now uses audit-log-path=- so audit events flow to
  k3s.service stdout and into journald instead of growing files in
  /var/log/kubernetes. The previous setup combined apiserver's
  internal rotation with a logrotate *.log glob that double-rotated
  the rotated files into permanent orphans (observed: 7+ GB).
- New journald-limits operation writes a SystemMaxUse=2G drop-in so
  audit volume cannot fill /var/log even under bursty load.
- log-rotation operation repurposed to decommission the obsolete
  logrotate rule and reap leftover audit files. Idempotent: no-op
  on fresh installs.

Etcd member recovery
- New recoverEtcdMember(broken, peer, hostname) codifies the
  documented k3s recovery: stop k3s, etcdctl member remove, wipe
  /var/lib/rancher/k3s/server/{db,tls,cred}, restart, poll for
  rejoin. Refuses to operate when cluster size < 3 to preserve
  quorum.

Tests
- 7 new unit tests covering both decommission paths and the
  recovery procedure (54 total, all green).
- install.test.ts asserts the file-based audit args are gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 21:29:16 +01:00
95c99cb4d5 Merge pull request 'docs: CLAUDE.md routing rules + TODOS.md from v2.0 review' (#12) from feat/recheck-and-fixes into main
Some checks failed
CI/CD / lint (push) Failing after 12s
CI/CD / typecheck (push) Failing after 22s
CI/CD / test (push) Failing after 12s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
Reviewed-on: #12
2026-04-02 00:31:44 +00:00
Michal
2eda926d4c docs: add TODOS.md from v2.0 CEO review
Some checks failed
CI/CD / typecheck (pull_request) Failing after 12s
CI/CD / lint (pull_request) Failing after 21s
CI/CD / test (pull_request) Failing after 11s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
Project tracking for labctl v2.0 platform design. Includes P1 (arch doc update),
P2 (SSH emergency mode, Prometheus metrics), and P3 (graph viz, import, secrets rotation)
items from the CEO and eng review sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 01:29:30 +01:00
Michal
70258a0cc3 Merge remote-tracking branch 'origin/main' into feat/recheck-and-fixes 2026-04-02 01:27:45 +01:00
Michal
e9944c5413 chore: add gstack skill routing rules to CLAUDE.md 2026-04-01 23:56:47 +01:00
22e2946e95 Merge pull request 'feat: provision recheck, hardware info preservation, ISO boot fixes' (#11) from feat/recheck-and-fixes into main
Some checks failed
CI/CD / typecheck (push) Failing after 11s
CI/CD / lint (push) Failing after 22s
CI/CD / test (push) Failing after 11s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
Reviewed-on: #11
2026-04-01 17:11:33 +00:00
Michal
9ddab24931 feat: provision recheck, hardware info preservation, ISO boot fixes
Some checks failed
CI/CD / lint (pull_request) Failing after 1m26s
CI/CD / typecheck (pull_request) Failing after 11s
CI/CD / test (pull_request) Failing after 11s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
- Add `labctl provision recheck` to refresh hardware info via SSH
- Preserve hardware info in InstalledInfo when install completes
- Fix /ks-auto: run nested %pre scripts from included kickstarts
- Add command-discover WebSocket routing for hw info updates
- Fix k3s join: clean stale TLS/cred when joining existing cluster
- Add --tls-verify=false for internal HTTP registry pushes
- Add fix-ssh-root.sh script for root SSH access on all nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 17:59:39 +01:00
Michal
ae91f2895e feat: dynamic /ks-auto kickstart for ISO boot (R1 ARM support)
Some checks failed
CI/CD / lint (push) Failing after 11s
CI/CD / typecheck (push) Failing after 22s
CI/CD / test (push) Failing after 7m5s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
Add state-aware kickstart dispatch for machines that boot from ISO
(no PXE/network at UEFI level). Replaces hardcoded discover.ks.

- /ks-auto: %pre detects MAC, queries /api/machine-state/<mac>,
  writes discover or install kickstart to /tmp/dynamic.ks,
  main body %include's it
- /api/machine-state/<mac>: simple state endpoint returning
  unknown|discovered|queued|installing|installed|debug
- ISO kernel cmdline updated: discover.ks → ks-auto
- Handles: discovery (first boot), install (queued), debug modes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:17:08 +01:00
Michal
06fc40a857 fix: k3s install automation — skip Cilium on join, Longhorn via server, default root user
Some checks failed
CI/CD / typecheck (push) Failing after 10s
CI/CD / test (push) Failing after 9s
CI/CD / lint (push) Failing after 22s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
- Skip Cilium install for joining servers (already in cluster via daemonset)
- Longhorn annotation for workers: SSH to server node from CLI to apply
  kubectl annotation (workers don't have kubectl access)
- Default SSH user for k3s/app commands changed to 'root' (operations
  need root privileges, using 'lab' user broke installs)
- k3s server config: cluster-init for initial server, server+token for joins

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:02:19 +01:00
Michal
a68d6d617e feat: k3s cluster-init for etcd HA, fix Cilium duplicate install
Some checks failed
CI/CD / lint (push) Failing after 11s
CI/CD / test (push) Failing after 10s
CI/CD / typecheck (push) Failing after 22s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
- Server config now uses cluster-init: true for initial server (enables
  embedded etcd). Joining servers get server: + token: in config.
- Cilium install already checks for existing installation, so joining
  servers skip it gracefully (the "release name in use" error is non-fatal)

Cluster rebuilt as etcd HA:
  worker0-k8s0  control-plane,etcd  (initial server, cluster-init)
  worker1-k8s0  control-plane,etcd  (joined server, Mac Studio aarch64)
  spark-2935    worker              (DGX Spark, aarch64)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 15:53:18 +01:00
Michal
c49a650888 fix: firstboot fstab handling — no duplicates, compatible with Asahi sed
Some checks failed
CI/CD / typecheck (push) Failing after 10s
CI/CD / test (push) Failing after 11s
CI/CD / lint (push) Failing after 23s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
- Replace sed with grep -v / awk for fstab manipulation (Asahi Fedora's
  sed doesn't support \| delimiter or \? quantifier)
- Use idempotent write_lab_fstab function: removes all old entries first,
  comments out conflicting btrfs subvol entries, adds fresh LVM entries
- Fix sed for SSH hardening: use #* instead of \? (POSIX compatible)
- Tested on Mac Studio: no duplicate fstab entries after multiple runs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 15:40:29 +01:00
Michal
87e09af941 fix: default admin user to 'lab', case-insensitive OS detection for iSCSI
Some checks failed
CI/CD / typecheck (push) Failing after 10s
CI/CD / test (push) Failing after 10s
CI/CD / lint (push) Failing after 22s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
- Firstboot script defaults admin user to 'lab' instead of bastion's
  config.adminUser (which was 'michal' from host system)
- iSCSI OS detection uses case-insensitive match for 'fedora'

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 15:13:53 +01:00
Michal
6f13e284fd fix: firstboot script auto-detects hostname and MAC, no query params needed
Some checks failed
CI/CD / typecheck (push) Failing after 10s
CI/CD / test (push) Failing after 10s
CI/CD / lint (push) Failing after 23s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
The firstboot script now auto-detects hostname (from hostnamectl) and
MAC address (from first UP interface) at runtime. No URL query parameters
required — just `curl bastion/asahi/firstboot.sh | sudo bash`.

Fixes the shell escaping issue where `&` in query params broke curl piping.
Updated labctl provision asahi instructions accordingly.

Tested on Mac Studio (worker1-k8s0): hostname, MAC, and bastion
registration all auto-detected correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 15:05:25 +01:00
Michal
6c963a15bd fix: firstboot reprovision path now runs hostname, user, and registration
Some checks failed
CI/CD / lint (push) Failing after 12s
CI/CD / test (push) Failing after 10s
CI/CD / typecheck (push) Failing after 29s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
Previously the reprovision path exited early after re-mounting LVs,
skipping hostname setup, admin user creation, metadata, and bastion
registration. Now both paths fall through to the common post-setup code.

Tested on Mac Studio (worker1-k8s0) — reprovision + self-registration
confirmed working via curl | bash pipe.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 09:59:02 +01:00
8c737d163d Merge pull request 'feat: Asahi Linux provisioning for Apple Silicon' (#10) from feat/asahi-provisioning into main
Some checks failed
CI/CD / lint (push) Failing after 11s
CI/CD / typecheck (push) Failing after 22s
CI/CD / test (push) Failing after 7m7s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
2026-03-31 23:30:41 +00:00
Michal
17bae7ddbf fix: pre-download rootfs ZIP to avoid macOS Python HTTP streaming issues
Some checks failed
CI/CD / lint (pull_request) Failing after 11s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 22s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
The Asahi installer's urlcache.py fails with AssertionError on macOS
when streaming ZIP via HTTP Range requests from Fastify. Fix: download
the ZIP with curl first (reliable on macOS), then set REPO_BASE to the
local directory so the installer opens it as a local file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:30:29 +01:00
Michal
bb8f37ef7d feat: iSCSI, Longhorn disk labels, labctl asahi command, ZIP32 fix
Some checks failed
CI/CD / typecheck (pull_request) Failing after 12s
CI/CD / lint (pull_request) Failing after 22s
CI/CD / test (pull_request) Failing after 10s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
k3s host prep:
- Add iSCSI initiator install+enable (Fedora: iscsi-initiator-utils,
  Ubuntu: open-iscsi) — required by Longhorn
- Add Longhorn disk label to k3s server+agent configs
- Add Longhorn disk annotation operation in post-install hardening

CLI:
- Add `labctl provision asahi` command with interactive install guide
- Change default SSH user from "michal" to "lab" in all commands
- Change admin user in bastion progress callback to "lab"

Asahi provisioning fixes:
- Download installer_data.json locally (installer reads it as file)
- Use REPO_BASE to serve upstream ZIP from bastion (LAN speed)
- Fix ZIP32 vs ZIP64: serve original upstream ZIP unmodified
  (our repackaged ZIP used ZIP64 which breaks Asahi urlcache)
- Add /data/asahi-repo fallback path for k3s container PVC mount
- Deploy script syncs asahi-repo to bastion pod after deployment

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:32:38 +01:00
Michal
a8dc79bc5a feat: Asahi validation tests, rootfs build fixes, shellcheck-clean scripts
Some checks failed
CI/CD / lint (pull_request) Failing after 12s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 22s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
- Add 16 validation tests: shellcheck (3 roles), installer_data.json
  schema (8), Python parser validation, ZIP structure (3), rootfs mount
- Fix empty SSH keys generating invalid bash (SC1073)
- Fix __dirname crash in ESM modules (use import.meta.url)
- Fix rootfs build: mkdir -p before writing, correct binary paths
- Add .gitignore for large build artifacts (.asahi-cache, *.zip)
- Bump smoke test timeout for additional static plugin registration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:22:24 +01:00
Michal
ad76c74020 fix: rootfs build script — mkdir before write, fix package path checks
Some checks failed
CI/CD / typecheck (pull_request) Failing after 10s
CI/CD / lint (pull_request) Failing after 21s
CI/CD / test (pull_request) Failing after 11s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 03:26:26 +01:00
Michal
6807632d46 feat: Asahi rootfs build pipeline + serve from bastion
Some checks failed
CI/CD / lint (pull_request) Failing after 10s
CI/CD / test (pull_request) Failing after 10s
CI/CD / typecheck (pull_request) Failing after 22s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
- Add scripts/build-asahi-rootfs.sh: downloads upstream Fedora Asahi
  Remix Server, injects lab firstboot script + systemd service + SSH
  keys, repackages with installer_data.json that adds LVM Data partition
- Bastion serves built artifacts at /asahi/repo/* via fastify-static
- installer_data.json prefers built config, falls back to minimal
- Fix __dirname crash in ESM module (use import.meta.url)
- Fix smoke test timeout (was crashing due to __dirname)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 03:20:12 +01:00
Michal
53265bb18c test: integration test for Asahi firstboot LVM setup
Some checks failed
CI/CD / lint (pull_request) Failing after 21s
CI/CD / typecheck (pull_request) Failing after 22s
CI/CD / test (pull_request) Failing after 22s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
VM-based end-to-end test using Fedora cloud image with two disks:
root (20GB) + data (200GB). Verifies the firstboot script creates
labvg with correct LV sizes, mounts volumes, migrates /home content,
sets hostname, creates admin user, and handles reprovision.

Fixes to firstboot script:
- Detect whole disks (not just partitions) for LVM PV
- Handle btrfs subvolume paths in root device detection
- Copy /home content before mounting LV (preserves SSH keys)
- Don't restart sshd (config takes effect on reboot)
- Make swapon and mount operations resilient to failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 03:07:38 +01:00
Michal
863c7f2b83 feat: Asahi Linux provisioning for Apple Silicon (Mac Studio)
Some checks failed
CI/CD / typecheck (pull_request) Failing after 11s
CI/CD / lint (pull_request) Failing after 22s
CI/CD / test (pull_request) Failing after 11s
CI/CD / build (pull_request) Has been skipped
CI/CD / publish-rpm (pull_request) Has been skipped
CI/CD / publish-deb (pull_request) Has been skipped
Add bastion endpoints for provisioning Apple Silicon machines via the
Asahi Linux installer with custom LVM partitioning:

- GET /asahi — wrapper script (curl bastion:8080/asahi | sh)
- GET /asahi/installer_data.json — custom partition layout (60GB root + LVM data)
- GET /asahi/firstboot.sh — first-boot LVM setup matching kickstart layout
- GET /asahi/firstboot.service — systemd oneshot unit

The firstboot script creates labvg with role-specific LVs (var, varlog,
home, srv, rancher, longhorn) and handles reprovision by detecting
existing VGs. Includes 19 new tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 02:46:27 +01:00
906f93f6f2 Merge pull request 'fix: Cilium multi-node support' (#9) from fix/cilium-multi-node into main
Some checks failed
CI/CD / lint (push) Failing after 22s
CI/CD / typecheck (push) Failing after 21s
CI/CD / test (push) Failing after 22s
CI/CD / build (push) Has been skipped
CI/CD / publish-rpm (push) Has been skipped
CI/CD / publish-deb (push) Has been skipped
2026-03-31 00:36:17 +00:00
49 changed files with 2999 additions and 58 deletions

4
.gitignore vendored
View File

@@ -27,3 +27,7 @@ node_modules/
# Task files # Task files
# tasks.json # tasks.json
# tasks/ # tasks/
# Asahi build artifacts (large)
bastion/.asahi-cache/
bastion/asahi-repo/*.zip

19
CLAUDE.md Normal file
View File

@@ -0,0 +1,19 @@
## Skill routing
When the user's request matches an available skill, ALWAYS invoke it using the Skill
tool as your FIRST action. Do NOT answer directly, do NOT use other tools first.
The skill has specialized workflows that produce better results than ad-hoc answers.
Key routing rules:
- Product ideas, "is this worth building", brainstorming → invoke gstack-office-hours
- Bugs, errors, "why is this broken", 500 errors → invoke gstack-investigate
- Ship, deploy, push, create PR → invoke gstack-ship
- QA, test the site, find bugs → invoke gstack-qa
- Code review, check my diff → invoke gstack-review
- Update docs after shipping → invoke gstack-document-release
- Weekly retro → invoke gstack-retro
- Design system, brand → invoke gstack-design-consultation
- Visual audit, design polish → invoke gstack-design-review
- Architecture review → invoke gstack-plan-eng-review
- Save progress, checkpoint, resume → invoke gstack-checkpoint
- Code quality, health check → invoke gstack-health

47
TODOS.md Normal file
View File

@@ -0,0 +1,47 @@
# TODOS
## P1 — Ship with Phase 1
### v2.0 Architecture Document Update
Update `bastion/docs/ARCHITECTURE.md` to cover v2.0: driver model, fleet system,
Pulumi integration, Vault secrets, Deno evaluator, new CLI grammar. The existing
doc covers v1.0 comprehensively (432 lines). v2.0 adds 5+ major subsystems.
**Effort:** M (human: 1 week / CC: 1-2 days)
**Depends on:** Phase 1 complete
**Source:** CEO review 2026-04-01
## P2 — Post-v2.0 Core
### SSH Emergency Mode (scoped)
SSH-based operations limited to: (1) earliest necessary box provisioning before agent
is installed, and (2) emergency debugging/fixing operations that can't be done via agent.
NOT a general-purpose DeploymentTarget alternative. The v1.0 `recheck` and `fix-ssh-root.sh`
patterns are the model. Agent stays the primary management path.
**Effort:** S (human: 1 week / CC: 1 day)
**Depends on:** Phase 2 complete (DeploymentTarget interface exists)
**Source:** CEO review 2026-04-01
### Prometheus Metrics Endpoint
Add `/metrics` endpoint to labd: resource counts by status, apply duration histograms,
driver operation latency, fleet pipeline completion rates. Standard Prometheus scraping
for Grafana dashboards and alerting.
**Effort:** S (human: 2-3 days / CC: 2-3 hours)
**Depends on:** Phase 1 (labd exists with resource store)
**Source:** CEO review 2026-04-01 (observability gap)
## P3 — Future Enhancements
### Infrastructure Graph Visualization
Visual representation of resource dependencies, environment topology, fleet status.
Could be a web UI or terminal-based (like `kubectl tree`).
**Source:** CEO review 2026-04-01
### `labctl import` for Existing Cloud Resources
Discover and import existing AWS/GCP resources into the state store.
Pulumi's import functionality could be leveraged.
**Source:** CEO review 2026-04-01
### Built-in Secrets Rotation
Automatic rotation of managed secrets (database passwords, API keys).
Vault handles rotation but a labctl-native workflow could simplify.
**Source:** CEO review 2026-04-01

View File

@@ -0,0 +1,47 @@
{
"os_list": [
{
"name": "Fedora Asahi Lab (infra)",
"default_os_name": "Fedora Linux Lab",
"boot_object": "m1n1.bin",
"next_object": "m1n1/boot.bin",
"package": "fedora-asahi-lab.zip",
"supported_fw": [
"12.3",
"12.3.1",
"13.5"
],
"partitions": [
{
"name": "EFI",
"type": "EFI",
"size": "524288000B",
"format": "fat",
"volume_id": "0x804be8a6",
"copy_firmware": true,
"copy_installer_data": true,
"source": "esp"
},
{
"name": "Boot",
"type": "Linux",
"size": "1073741824B",
"image": "boot.img"
},
{
"name": "Root",
"type": "Linux",
"size": "4626296832B",
"expand": false,
"image": "root.img"
},
{
"name": "Data",
"type": "Linux",
"size": "1073741824B",
"expand": true
}
]
}
]
}

4
bastion/bastion/.gitignore vendored Normal file
View File

@@ -0,0 +1,4 @@
# Asahi build artifacts (large)
.asahi-cache/
asahi-repo/*.zip

View File

@@ -73,12 +73,18 @@ _labctl() {
"provision register") "provision register")
COMPREPLY=($(compgen -W "--role --ip -h --help" -- "$cur")) COMPREPLY=($(compgen -W "--role --ip -h --help" -- "$cur"))
return ;; return ;;
"provision asahi")
COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;;
"provision logs") "provision logs")
COMPREPLY=($(compgen -W "-f --follow -h --help" -- "$cur")) COMPREPLY=($(compgen -W "-f --follow -h --help" -- "$cur"))
return ;; return ;;
"provision makeiso") "provision makeiso")
COMPREPLY=($(compgen -W "--arch --local --out -h --help" -- "$cur")) COMPREPLY=($(compgen -W "--arch --local --out -h --help" -- "$cur"))
return ;; return ;;
"provision recheck")
COMPREPLY=($(compgen -W "--user --target -h --help" -- "$cur"))
return ;;
"config list") "config list")
COMPREPLY=($(compgen -W "-h --help" -- "$cur")) COMPREPLY=($(compgen -W "-h --help" -- "$cur"))
return ;; return ;;
@@ -104,7 +110,7 @@ _labctl() {
COMPREPLY=($(compgen -W "bastion -h --help" -- "$cur")) COMPREPLY=($(compgen -W "bastion -h --help" -- "$cur"))
return ;; return ;;
"provision") "provision")
COMPREPLY=($(compgen -W "list install reprovision debug forget register logs makeiso -h --help" -- "$cur")) COMPREPLY=($(compgen -W "list install reprovision debug forget register asahi logs makeiso recheck -h --help" -- "$cur"))
return ;; return ;;
"config") "config")
COMPREPLY=($(compgen -W "list get set path -h --help" -- "$cur")) COMPREPLY=($(compgen -W "list get set path -h --help" -- "$cur"))

View File

@@ -125,8 +125,10 @@ complete -c labctl -n "__labctl_using_cmd provision" -a reprovision -d 'Queue in
complete -c labctl -n "__labctl_using_cmd provision" -a debug -d 'PXE boot into Fedora rescue mode for debugging (target: hostname, MAC, or IP)' complete -c labctl -n "__labctl_using_cmd provision" -a debug -d 'PXE boot into Fedora rescue mode for debugging (target: hostname, MAC, or IP)'
complete -c labctl -n "__labctl_using_cmd provision" -a forget -d 'Remove a machine from bastion state' complete -c labctl -n "__labctl_using_cmd provision" -a forget -d 'Remove a machine from bastion state'
complete -c labctl -n "__labctl_using_cmd provision" -a register -d 'Register an already-installed machine (e.g. after state loss)' complete -c labctl -n "__labctl_using_cmd provision" -a register -d 'Register an already-installed machine (e.g. after state loss)'
complete -c labctl -n "__labctl_using_cmd provision" -a asahi -d 'Show instructions to provision an Apple Silicon Mac with Asahi Linux'
complete -c labctl -n "__labctl_using_cmd provision" -a logs -d 'Show provisioning logs for a machine (hostname, MAC, or IP)' complete -c labctl -n "__labctl_using_cmd provision" -a logs -d 'Show provisioning logs for a machine (hostname, MAC, or IP)'
complete -c labctl -n "__labctl_using_cmd provision" -a makeiso -d 'Generate a UEFI-bootable iPXE ISO for network provisioning' complete -c labctl -n "__labctl_using_cmd provision" -a makeiso -d 'Generate a UEFI-bootable iPXE ISO for network provisioning'
complete -c labctl -n "__labctl_using_cmd provision" -a recheck -d 'Refresh hardware info for all installed machines via SSH'
# provision install options # provision install options
complete -c labctl -n "__labctl_in_cmd provision install" -l role -d 'Machine role (see below)' -xa 'vanilla worker infra labcontroller' complete -c labctl -n "__labctl_in_cmd provision install" -l role -d 'Machine role (see below)' -xa 'vanilla worker infra labcontroller'
@@ -153,6 +155,10 @@ complete -c labctl -n "__labctl_in_cmd provision makeiso" -l arch -d 'Target arc
complete -c labctl -n "__labctl_in_cmd provision makeiso" -l local -d 'Build ISO locally instead of using bastion-hosted URL' complete -c labctl -n "__labctl_in_cmd provision makeiso" -l local -d 'Build ISO locally instead of using bastion-hosted URL'
complete -c labctl -n "__labctl_in_cmd provision makeiso" -l out -d 'Output path for local ISO build' -x complete -c labctl -n "__labctl_in_cmd provision makeiso" -l out -d 'Output path for local ISO build' -x
# provision recheck options
complete -c labctl -n "__labctl_in_cmd provision recheck" -l user -d 'SSH user' -x
complete -c labctl -n "__labctl_in_cmd provision recheck" -l target -d 'Only recheck a specific machine (by hostname or MAC)' -x
# config subcommands # config subcommands
complete -c labctl -n "__labctl_using_cmd config" -a list -d 'Show all configuration values' complete -c labctl -n "__labctl_using_cmd config" -a list -d 'Show all configuration values'
complete -c labctl -n "__labctl_using_cmd config" -a get -d 'Get a configuration value' complete -c labctl -n "__labctl_using_cmd config" -a get -d 'Get a configuration value'

View File

@@ -22,7 +22,11 @@
"test:integration:iso": "vitest run -c tests/integration/vitest.config.ts -t 'ISO boot'", "test:integration:iso": "vitest run -c tests/integration/vitest.config.ts -t 'ISO boot'",
"test:integration:iso:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'ISO boot'", "test:integration:iso:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'ISO boot'",
"test:integration:arm-iso": "vitest run -c tests/integration/vitest.config.ts -t 'ARM ISO'", "test:integration:arm-iso": "vitest run -c tests/integration/vitest.config.ts -t 'ARM ISO'",
"test:integration:arm-iso:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'ARM ISO'" "test:integration:arm-iso:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'ARM ISO'",
"test:integration:asahi": "vitest run -c tests/integration/vitest.config.ts -t 'asahi firstboot'",
"test:integration:asahi:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'asahi firstboot'",
"test:integration:asahi-validate": "vitest run -c tests/integration/vitest.config.ts -t 'asahi.*validation'",
"test:integration:asahi-validate:host": "sudo -E $(which npx) vitest run -c tests/integration/vitest.config.ts -t 'asahi.*validation'"
}, },
"engines": { "engines": {
"node": ">=20.0.0", "node": ">=20.0.0",

View File

@@ -0,0 +1,302 @@
#!/bin/bash
# Build a custom Fedora Asahi Remix rootfs with lab firstboot LVM setup.
#
# Downloads the upstream Fedora Asahi Remix Server package, injects our
# firstboot script + systemd service, and repackages it for the bastion.
#
# Requirements: root, curl, unzip, mount (loop), zip
# Output: bastion/asahi-repo/ directory with package + installer_data.json
#
# Usage: sudo ./scripts/build-asahi-rootfs.sh [--bastion-ip IP] [--http-port PORT]
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)"
ASAHI_DIR="$PROJECT_DIR/asahi-repo"
CACHE_DIR="$PROJECT_DIR/.asahi-cache"
WORK_DIR=""
# Defaults
BASTION_IP="${BASTION_IP:-192.168.8.23}"
HTTP_PORT="${HTTP_PORT:-8080}"
ROLE="${ROLE:-infra}"
HOSTNAME="${HOSTNAME:-mac-studio}"
MAC="${MAC:-00:00:00:00:00:00}"
ADMIN_USER="${ADMIN_USER:-michal}"
# Parse args
while [[ $# -gt 0 ]]; do
case "$1" in
--bastion-ip) BASTION_IP="$2"; shift 2 ;;
--http-port) HTTP_PORT="$2"; shift 2 ;;
--role) ROLE="$2"; shift 2 ;;
--hostname) HOSTNAME="$2"; shift 2 ;;
--mac) MAC="$2"; shift 2 ;;
--admin-user) ADMIN_USER="$2"; shift 2 ;;
*) echo "Unknown option: $1"; exit 1 ;;
esac
done
# ── Resolve upstream package URL ─────────────────────────────────
echo "==> Fetching Asahi installer data..."
INSTALLER_DATA=$(curl -sfL "https://cdn.asahilinux.org/installer/installer_data.json")
# Find the Server variant package URL
SERVER_URL=$(echo "$INSTALLER_DATA" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for os in data.get('os_list', []):
name = os.get('name', '').lower()
if 'server' in name and 'uefi' not in name and not os.get('expert'):
print(os['package'])
break
" 2>/dev/null)
if [ -z "$SERVER_URL" ]; then
echo "ERROR: Could not find Fedora Asahi Remix Server in installer data."
echo "Available variants:"
echo "$INSTALLER_DATA" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for os in data.get('os_list', []):
print(f\" - {os.get('name', '?')}\")" 2>/dev/null
exit 1
fi
PACKAGE_NAME=$(basename "$SERVER_URL")
echo " Variant: Fedora Asahi Remix Server"
echo " Package: $PACKAGE_NAME"
# Also extract the partition layout and supported_fw from upstream
UPSTREAM_CONFIG=$(echo "$INSTALLER_DATA" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for os in data.get('os_list', []):
name = os.get('name', '').lower()
if 'server' in name and 'uefi' not in name and not os.get('expert'):
json.dump(os, sys.stdout)
break
")
# ── Download upstream package ────────────────────────────────────
mkdir -p "$CACHE_DIR" "$ASAHI_DIR"
CACHED_PKG="$CACHE_DIR/$PACKAGE_NAME"
if [ -f "$CACHED_PKG" ]; then
echo "==> Using cached package: $CACHED_PKG"
else
echo "==> Downloading $SERVER_URL..."
curl -# -L -o "$CACHED_PKG" "$SERVER_URL"
fi
# ── Extract and modify rootfs ────────────────────────────────────
WORK_DIR=$(mktemp -d)
trap 'echo "==> Cleaning up..."; umount "$WORK_DIR/rootfs" 2>/dev/null || true; rm -rf "$WORK_DIR"' EXIT
echo "==> Extracting package..."
unzip -q -o "$CACHED_PKG" -d "$WORK_DIR/pkg"
# List contents
echo " Package contents:"
ls -lh "$WORK_DIR/pkg/" | grep -v ^total | while read -r line; do echo " $line"; done
# Find root.img
ROOT_IMG=$(find "$WORK_DIR/pkg" -name "root.img" -type f | head -1)
if [ -z "$ROOT_IMG" ]; then
echo "ERROR: root.img not found in package."
echo "Contents: $(ls "$WORK_DIR/pkg/")"
exit 1
fi
echo "==> Mounting root.img..."
mkdir -p "$WORK_DIR/rootfs"
mount -o loop "$ROOT_IMG" "$WORK_DIR/rootfs"
# ── Read SSH keys from the system ────────────────────────────────
SSH_KEYS=""
REAL_USER="${SUDO_USER:-$USER}"
REAL_HOME=$(eval echo "~$REAL_USER")
for keyfile in "$REAL_HOME/.ssh/id_ed25519.pub" "$REAL_HOME/.ssh/id_ecdsa.pub" "$REAL_HOME/.ssh/id_rsa.pub"; do
if [ -f "$keyfile" ]; then
SSH_KEYS=$(cat "$keyfile")
echo " SSH key: $keyfile"
break
fi
done
if [ -z "$SSH_KEYS" ]; then
echo "WARNING: No SSH public key found. You'll need to add keys manually."
fi
# ── Generate firstboot script from bastion ───────────────────────
echo "==> Generating firstboot script..."
# Try to get the script from a running bastion, fall back to local generation
FIRSTBOOT_SCRIPT=""
FIRSTBOOT_URL="http://$BASTION_IP:$HTTP_PORT/asahi/firstboot.sh?hostname=$HOSTNAME&role=$ROLE&mac=$MAC&user=$ADMIN_USER"
FIRSTBOOT_SCRIPT=$(curl -sf "$FIRSTBOOT_URL" 2>/dev/null || echo "")
if [ -z "$FIRSTBOOT_SCRIPT" ]; then
echo " Bastion not reachable, generating script locally..."
# Generate a basic firstboot script inline
FIRSTBOOT_SCRIPT=$(cd "$PROJECT_DIR" && node -e "
const { renderFirstbootScript } = require('./src/bastion/dist/templates/asahi-firstboot.sh.js');
process.stdout.write(renderFirstbootScript({
hostname: '$HOSTNAME',
role: '$ROLE',
serverIp: '$BASTION_IP',
httpPort: $HTTP_PORT,
sshKeys: $([ -n "$SSH_KEYS" ] && echo "[\"$SSH_KEYS\"]" || echo "[]"),
adminUser: '$ADMIN_USER',
mac: '$MAC',
}));
" 2>/dev/null) || {
echo " ERROR: Could not generate firstboot script. Build the project first: npm run build"
exit 1
}
fi
# ── Inject files into rootfs ─────────────────────────────────────
echo "==> Injecting lab configuration into rootfs..."
# Firstboot script
mkdir -p "$WORK_DIR/rootfs/usr/local/bin"
echo "$FIRSTBOOT_SCRIPT" > "$WORK_DIR/rootfs/usr/local/bin/lab-firstboot.sh"
chmod 755 "$WORK_DIR/rootfs/usr/local/bin/lab-firstboot.sh"
echo " Installed: /usr/local/bin/lab-firstboot.sh"
# Systemd service
mkdir -p "$WORK_DIR/rootfs/etc/systemd/system"
cat > "$WORK_DIR/rootfs/etc/systemd/system/lab-firstboot.service" << 'UNIT'
[Unit]
Description=Lab first-boot LVM setup
After=local-fs.target network-online.target
Wants=network-online.target
ConditionPathExists=!/etc/lab-lvm-setup-done
[Service]
Type=oneshot
ExecStart=/usr/local/bin/lab-firstboot.sh
RemainAfterExit=yes
StandardOutput=journal+console
StandardError=journal+console
[Install]
WantedBy=multi-user.target
UNIT
echo " Installed: /etc/systemd/system/lab-firstboot.service"
# Enable the service
mkdir -p "$WORK_DIR/rootfs/etc/systemd/system/multi-user.target.wants"
ln -sf /etc/systemd/system/lab-firstboot.service \
"$WORK_DIR/rootfs/etc/systemd/system/multi-user.target.wants/lab-firstboot.service"
echo " Enabled: lab-firstboot.service"
# SSH authorized keys for root (for initial access before firstboot runs user creation)
if [ -n "$SSH_KEYS" ]; then
mkdir -p "$WORK_DIR/rootfs/root/.ssh"
chmod 700 "$WORK_DIR/rootfs/root/.ssh"
echo "$SSH_KEYS" > "$WORK_DIR/rootfs/root/.ssh/authorized_keys"
chmod 600 "$WORK_DIR/rootfs/root/.ssh/authorized_keys"
echo " Installed: /root/.ssh/authorized_keys"
fi
# Ensure lvm2 and xfsprogs are installed (should be in server image already)
echo " Checking required packages..."
if [ -f "$WORK_DIR/rootfs/usr/sbin/pvcreate" ] || [ -f "$WORK_DIR/rootfs/usr/bin/pvcreate" ]; then
echo " lvm2: present"
else
echo " WARNING: lvm2 not found in rootfs. LVM setup may fail."
fi
if [ -f "$WORK_DIR/rootfs/usr/sbin/mkfs.xfs" ] || [ -f "$WORK_DIR/rootfs/usr/bin/mkfs.xfs" ]; then
echo " xfsprogs: present"
else
echo " WARNING: xfsprogs not found in rootfs. LVM setup may fail."
fi
# ── Unmount and repackage ────────────────────────────────────────
echo "==> Unmounting rootfs..."
umount "$WORK_DIR/rootfs"
echo "==> Repackaging..."
OUTPUT_PKG="$ASAHI_DIR/fedora-asahi-lab.zip"
rm -f "$OUTPUT_PKG"
(cd "$WORK_DIR/pkg" && zip -q "$OUTPUT_PKG" *)
echo " Output: $OUTPUT_PKG ($(du -sh "$OUTPUT_PKG" | cut -f1))"
# ── Generate installer_data.json ─────────────────────────────────
echo "==> Generating installer_data.json..."
# Parse upstream config to get supported_fw, boot_object, next_object, and partition details
python3 << PYEOF > "$ASAHI_DIR/installer_data.json"
import json, sys
upstream = json.loads('''$UPSTREAM_CONFIG''')
# Build our custom installer data based on upstream
# Keep EFI and Boot partitions identical, modify Root to not expand,
# add Data partition that expands for LVM.
partitions = []
for p in upstream.get('partitions', []):
if p.get('type') == 'EFI':
partitions.append(p)
elif p.get('name') == 'Boot':
partitions.append(p)
elif p.get('name') == 'Root':
# Fixed size root, no expand
root_p = dict(p)
root_p['expand'] = False
# Keep the original size (it's the minimum needed for the rootfs)
partitions.append(root_p)
# Add Data partition for LVM
partitions.append({
"name": "Data",
"type": "Linux",
"size": "1073741824B", # 1GB minimum, will expand
"expand": True
})
data = {
"os_list": [{
"name": "Fedora Asahi Lab (${ROLE})",
"default_os_name": "Fedora Linux Lab",
"boot_object": upstream.get("boot_object", "m1n1.bin"),
"next_object": upstream.get("next_object", "m1n1/boot.bin"),
"package": "fedora-asahi-lab.zip",
"supported_fw": upstream.get("supported_fw", ["13.5"]),
"partitions": partitions,
}]
}
json.dump(data, sys.stdout, indent=2)
print()
PYEOF
echo " Generated: $ASAHI_DIR/installer_data.json"
# Pretty-print the partition layout
echo ""
echo " Partition layout:"
python3 -c "
import json
with open('$ASAHI_DIR/installer_data.json') as f:
data = json.load(f)
for p in data['os_list'][0]['partitions']:
size = p.get('size', '?')
expand = ' (expand)' if p.get('expand') else ''
image = f\" [{p['image']}]\" if 'image' in p else ''
print(f\" {p['name']:8s} {p['type']:8s} {size:>16s}{expand}{image}\")
"
echo ""
echo "==> Build complete!"
echo ""
echo " Package: $ASAHI_DIR/fedora-asahi-lab.zip"
echo " Config: $ASAHI_DIR/installer_data.json"
echo ""
echo " To serve from bastion, copy to the bastion's HTTP directory"
echo " or configure REPO_BASE to point here."
echo ""
echo " To install on Mac Studio:"
echo " curl http://$BASTION_IP:$HTTP_PORT/asahi | sh"

View File

@@ -99,16 +99,22 @@ if [ "$PUSH" = true ]; then
fi fi
fi fi
# Use --tls-verify=false for plain HTTP registries (e.g. 10.0.0.194:3012)
TLS_FLAG=""
if [[ "$REGISTRY" =~ ^[0-9] ]] || [[ "$REGISTRY" =~ ^localhost ]]; then
TLS_FLAG="--tls-verify=false"
fi
echo "==> Logging in to $REGISTRY..." echo "==> Logging in to $REGISTRY..."
podman login -u michal -p "$GITEA_TOKEN" "$REGISTRY" podman login $TLS_FLAG -u michal -p "$GITEA_TOKEN" "$REGISTRY"
echo "==> Pushing $FULL_IMAGE:$TAG..." echo "==> Pushing $FULL_IMAGE:$TAG..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:$TAG" podman manifest push --all $TLS_FLAG "$MANIFEST" "docker://$FULL_IMAGE:$TAG"
# Also tag as :latest if not already # Also tag as :latest if not already
if [ "$TAG" != "latest" ]; then if [ "$TAG" != "latest" ]; then
echo "==> Also pushing as :latest..." echo "==> Also pushing as :latest..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:latest" podman manifest push --all $TLS_FLAG "$MANIFEST" "docker://$FULL_IMAGE:latest"
fi fi
# Link package to repository if script exists # Link package to repository if script exists

View File

@@ -92,15 +92,21 @@ if [ "$PUSH" = true ]; then
fi fi
fi fi
# Use --tls-verify=false for plain HTTP registries (e.g. 10.0.0.194:3012)
TLS_FLAG=""
if [[ "$REGISTRY" =~ ^[0-9] ]] || [[ "$REGISTRY" =~ ^localhost ]]; then
TLS_FLAG="--tls-verify=false"
fi
echo "==> Logging in to $REGISTRY..." echo "==> Logging in to $REGISTRY..."
podman login -u michal -p "$GITEA_TOKEN" "$REGISTRY" podman login $TLS_FLAG -u michal -p "$GITEA_TOKEN" "$REGISTRY"
echo "==> Pushing $FULL_IMAGE:$TAG..." echo "==> Pushing $FULL_IMAGE:$TAG..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:$TAG" podman manifest push --all $TLS_FLAG "$MANIFEST" "docker://$FULL_IMAGE:$TAG"
if [ "$TAG" != "latest" ]; then if [ "$TAG" != "latest" ]; then
echo "==> Also pushing as :latest..." echo "==> Also pushing as :latest..."
podman manifest push --all "$MANIFEST" "docker://$FULL_IMAGE:latest" podman manifest push --all $TLS_FLAG "$MANIFEST" "docker://$FULL_IMAGE:latest"
fi fi
if [ -f "$SCRIPT_DIR/link-package.sh" ]; then if [ -f "$SCRIPT_DIR/link-package.sh" ]; then

View File

@@ -24,6 +24,21 @@ deploy_bastion() {
kubectl rollout restart deployment/bastion -n lab-infra kubectl rollout restart deployment/bastion -n lab-infra
kubectl rollout status deployment/bastion -n lab-infra --timeout=180s kubectl rollout status deployment/bastion -n lab-infra --timeout=180s
echo "✓ Bastion deployed" echo "✓ Bastion deployed"
# Sync Asahi rootfs package to bastion pod's persistent volume
if [ -d "$PROJECT_DIR/asahi-repo" ] && [ -f "$PROJECT_DIR/asahi-repo/fedora-asahi-lab.zip" ]; then
echo ""
echo "=== Syncing Asahi rootfs to bastion pod ==="
BASTION_POD=$(kubectl get pods -n lab-infra -l app=bastion -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
if [ -n "$BASTION_POD" ]; then
kubectl exec -n lab-infra "$BASTION_POD" -- mkdir -p /data/asahi-repo
kubectl cp "$PROJECT_DIR/asahi-repo/installer_data.json" "lab-infra/$BASTION_POD:/data/asahi-repo/installer_data.json"
kubectl cp "$PROJECT_DIR/asahi-repo/fedora-asahi-lab.zip" "lab-infra/$BASTION_POD:/data/asahi-repo/fedora-asahi-lab.zip"
echo "✓ Asahi rootfs synced ($(du -sh "$PROJECT_DIR/asahi-repo/fedora-asahi-lab.zip" | cut -f1))"
else
echo "WARNING: Could not find bastion pod — Asahi rootfs not synced"
fi
fi
} }
deploy_labd() { deploy_labd() {

View File

@@ -0,0 +1,131 @@
#!/bin/bash
# Fix root SSH access on all provisioned machines.
# Tries root, lab, michal users to find one that works,
# then ensures root has the SSH key and PermitRootLogin is enabled.
set -euo pipefail
SSH_KEY="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDMJ3FkUGbG174eoO5RjZd2eNV680FM5pgp0AgpW/QwlJExK3qxMk0DJSr4ICmzGUx4yujAXcrqU1otcOMPzzFzwc5heWpSmlNHU3TIW6NHEt0sF9ZTAbGLw2zSw3si5UouqFkCcENA40mePFJqY+Q9R8N1uvLgu4m/do+Zrn/mk5Ewc1V7OCRE5Acrnaec4T7LTB0BuVXcjPUfAmZ0q5fI+bKPR1q2Kc3+IeGhVkBuZ9OJVeXXhnpedm0uEbLeriK/jUYKYw/1QhsNDM8Tyty+UIGr9QVnWwzCMHB+wuQcDYC9mPGTqg0fYwX8Mp8xMi1PPxdsh1G7bj/cpWMAF43KswWORF2ul8ICGbaE1zEgIYXO790SuBjpBHhaC6Iegqi58hmCuP+a9893q/EU9HyrWTJHCZXC5E4kP1MsM57KrhEpszM6I3sW9f9zMTPd5QsCXFi4si4OMwX4kYNVu3fQGQPpseDPlTTSrT6uUdqj4Irm0c1m9cYTmK0vYgsM3ss= michal@fedora"
SSH_OPTS="-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR -o ConnectTimeout=5"
USERS_TO_TRY=(root lab michal)
# Machines: hostname ip
MACHINES=(
"labmaster 192.168.8.11"
"worker0-k8s0 192.168.8.23"
"worker1-k8s0 192.168.8.13"
"worker2-k8s0 192.168.8.25"
"spark-2935 192.168.8.12"
)
BOLD="\033[1m"
GREEN="\033[0;32m"
RED="\033[0;31m"
DIM="\033[2m"
RESET="\033[0m"
# Script to run on each machine (via sudo if needed)
read -r -d '' FIX_SCRIPT << 'FIXEOF' || true
#!/bin/bash
set -e
KEY="$1"
# 1. Ensure root .ssh dir exists
mkdir -p /root/.ssh
chmod 700 /root/.ssh
touch /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
# 2. Add key if not present
if ! grep -qF "$KEY" /root/.ssh/authorized_keys 2>/dev/null; then
echo "$KEY" >> /root/.ssh/authorized_keys
echo "KEY_ADDED"
else
echo "KEY_EXISTS"
fi
# 3. Fix sshd_config for root login with keys
SSHD_CONF="/etc/ssh/sshd_config"
CHANGED=0
# Ensure PermitRootLogin allows key auth
CURRENT=$(grep -E "^PermitRootLogin" "$SSHD_CONF" 2>/dev/null | tail -1 || true)
if [ "$CURRENT" = "PermitRootLogin prohibit-password" ] || [ "$CURRENT" = "PermitRootLogin without-password" ]; then
echo "SSHD_OK"
elif [ "$CURRENT" = "PermitRootLogin yes" ]; then
echo "SSHD_OK"
else
# Remove any existing PermitRootLogin lines
sed -i '/^#*PermitRootLogin/d' "$SSHD_CONF"
echo "PermitRootLogin prohibit-password" >> "$SSHD_CONF"
CHANGED=1
echo "SSHD_FIXED"
fi
# Ensure PubkeyAuthentication is enabled
if grep -qE "^PubkeyAuthentication no" "$SSHD_CONF" 2>/dev/null; then
sed -i 's/^PubkeyAuthentication no/PubkeyAuthentication yes/' "$SSHD_CONF"
CHANGED=1
echo "PUBKEY_FIXED"
else
echo "PUBKEY_OK"
fi
# Restart sshd if changed
if [ "$CHANGED" -eq 1 ]; then
systemctl restart sshd 2>/dev/null || systemctl restart ssh 2>/dev/null || true
echo "SSHD_RESTARTED"
fi
# 4. Verify root can be reached
echo "DONE"
FIXEOF
echo ""
echo -e "${BOLD}Fixing root SSH access on all machines...${RESET}"
echo ""
for entry in "${MACHINES[@]}"; do
read -r hostname ip <<< "$entry"
printf " %-24s ${DIM}(%s)${RESET} " "$hostname" "$ip"
# Try each user until one works
WORKING_USER=""
for user in "${USERS_TO_TRY[@]}"; do
if ssh $SSH_OPTS "$user@$ip" "true" 2>/dev/null; then
WORKING_USER="$user"
break
fi
done
if [ -z "$WORKING_USER" ]; then
echo -e "${RED}UNREACHABLE${RESET} (tried: ${USERS_TO_TRY[*]})"
continue
fi
# Run fix script (with sudo if not root)
if [ "$WORKING_USER" = "root" ]; then
RESULT=$(ssh $SSH_OPTS "root@$ip" "bash -s -- '$SSH_KEY'" <<< "$FIX_SCRIPT" 2>&1)
else
RESULT=$(ssh $SSH_OPTS "$WORKING_USER@$ip" "sudo bash -s -- '$SSH_KEY'" <<< "$FIX_SCRIPT" 2>&1)
fi
# Parse result
DETAILS=""
if echo "$RESULT" | grep -q "KEY_ADDED"; then DETAILS="key added"; fi
if echo "$RESULT" | grep -q "KEY_EXISTS"; then DETAILS="key ok"; fi
if echo "$RESULT" | grep -q "SSHD_FIXED"; then DETAILS="$DETAILS, sshd fixed"; fi
if echo "$RESULT" | grep -q "SSHD_OK"; then DETAILS="$DETAILS, sshd ok"; fi
if echo "$RESULT" | grep -q "SSHD_RESTARTED"; then DETAILS="$DETAILS, restarted"; fi
# Verify root works now
if ssh $SSH_OPTS "root@$ip" "true" 2>/dev/null; then
echo -e "${GREEN}OK${RESET} ${DIM}(via $WORKING_USER: $DETAILS)${RESET}"
else
echo -e "${RED}PARTIAL${RESET} ${DIM}(via $WORKING_USER: $DETAILS -- root still blocked)${RESET}"
fi
done
echo ""
echo -e "${BOLD}Done.${RESET} Verify: labctl provision recheck --user root"
echo ""

View File

@@ -309,6 +309,32 @@ export async function startBastion(overrides: Partial<BastionConfig> = {}): Prom
return { status: "ok", data: { mac, hostname: msg.hostname } }; return { status: "ok", data: { mac, hostname: msg.hostname } };
}); });
labdConn.onCommand("command-discover", async (msg) => {
if (msg.type !== "command-discover") throw new Error("unexpected");
const mac = (msg.mac as string).toLowerCase();
const now = new Date().toISOString();
const existing = state.load().discovered[mac];
state.update((s) => {
s.discovered[mac] = {
mac,
product: (msg.product as string) ?? "unknown",
board: (msg.board as string) ?? "unknown",
serial: (msg.serial as string) ?? "unknown",
manufacturer: (msg.manufacturer as string) ?? "unknown",
cpu_model: (msg.cpu_model as string) ?? "unknown",
cpu_cores: (msg.cpu_cores as number) ?? 0,
memory_gb: (msg.memory_gb as number) ?? 0,
arch: (msg.arch as string) ?? "unknown",
disks: (msg.disks as Array<{ name: string; size_gb: number; model: string }>) ?? [],
nics: (msg.nics as Array<{ name: string; mac: string; state: string }>) ?? [],
first_seen: existing?.first_seen ?? now,
last_seen: now,
};
});
logger.info(`HARDWARE UPDATED: ${mac} -- ${msg.manufacturer ?? "?"} ${msg.product ?? "?"} (${msg.cpu_model ?? "?"}, ${msg.cpu_cores ?? "?"} cores, ${msg.memory_gb ?? "?"}GB RAM)`);
return { status: "ok", data: { mac } };
});
labdConn.onCommand("command-role-update", async (msg) => { labdConn.onCommand("command-role-update", async (msg) => {
if (msg.type !== "command-role-update") throw new Error("unexpected"); if (msg.type !== "command-role-update") throw new Error("unexpected");
const mac = msg.mac.toLowerCase(); const mac = msg.mac.toLowerCase();

View File

@@ -139,16 +139,26 @@ export function registerApiRoutes(
? detailStr.replace("ready at ", "").trim() ? detailStr.replace("ready at ", "").trim()
: ""; : "";
const hw = s.discovered[mac];
const installedInfo: InstalledInfo = { const installedInfo: InstalledInfo = {
hostname: cfg?.hostname ?? "?", hostname: cfg?.hostname ?? "?",
role: cfg?.role ?? "?", role: cfg?.role ?? "?",
...(cfg?.os !== undefined ? { os: cfg.os } : {}), ...(cfg?.os !== undefined ? { os: cfg.os } : {}),
ip, ip,
installed_at: new Date().toISOString(), installed_at: new Date().toISOString(),
// Preserve hardware info from discovery
...(hw ? {
product: hw.product,
manufacturer: hw.manufacturer,
cpu_model: hw.cpu_model,
cpu_cores: hw.cpu_cores,
memory_gb: hw.memory_gb,
arch: hw.arch,
} : {}),
}; };
s.installed[mac] = installedInfo; s.installed[mac] = installedInfo;
const admin = installedInfo.role !== "vanilla" && installedInfo.role !== "" ? "michal" : "root"; const admin = installedInfo.role !== "vanilla" && installedInfo.role !== "" ? "lab" : "root";
console.log(`\n \x1b[0;32m\x1b[1m ssh ${admin}@${ip}\x1b[0m\n`); // eslint-disable-line no-console console.log(`\n \x1b[0;32m\x1b[1m ssh ${admin}@${ip}\x1b[0m\n`); // eslint-disable-line no-console
// Auto-install k3s for non-vanilla roles // Auto-install k3s for non-vanilla roles
@@ -359,6 +369,23 @@ export function registerApiRoutes(
}); });
}); });
// Simple machine state query (used by ks-auto for ISO boot dispatch)
app.get<{
Params: { mac: string };
}>("/api/machine-state/:mac", async (request, reply) => {
const mac = request.params.mac.toLowerCase().replace(/-/g, ":");
const currentState = state.load();
if (currentState.debug[mac]) return reply.send("debug");
if (currentState.install_queue[mac]) {
const progress = currentState.install_queue[mac].progress;
return reply.send(progress ? "installing" : "queued");
}
if (currentState.installed[mac]) return reply.send("installed");
if (currentState.discovered[mac]) return reply.send("discovered");
return reply.send("unknown");
});
// Update a machine's role (e.g. promote infra -> labcontroller) // Update a machine's role (e.g. promote infra -> labcontroller)
app.post<{ app.post<{
Body: { Body: {

View File

@@ -0,0 +1,176 @@
// Routes for Asahi Linux provisioning.
// GET /asahi — wrapper script (curl bastion:8080/asahi | sh)
// GET /asahi/installer_data.json — custom installer config (built or fallback)
// GET /asahi/repo/* — serves built rootfs package (fedora-asahi-lab.zip)
// GET /asahi/firstboot.sh — first-boot LVM setup script (for manual use)
import type { FastifyInstance } from "fastify";
import fastifyStatic from "@fastify/static";
import { existsSync, readFileSync } from "node:fs";
import { join, dirname } from "node:path";
import { fileURLToPath } from "node:url";
import type { BastionConfig } from "@lab/shared";
import { renderFirstbootScript, renderFirstbootUnit } from "../templates/asahi-firstboot.sh.js";
import type { Role } from "@lab/shared";
/** Find the asahi-repo directory (built by scripts/build-asahi-rootfs.sh). */
function findAsahiRepo(config: BastionConfig): string | null {
// Check relative to bastionDir (container deploy)
const inBastionDir = join(config.bastionDir, "asahi-repo");
if (existsSync(inBastionDir)) return inBastionDir;
// Check /data/asahi-repo (PVC mount in k3s container)
if (existsSync("/data/asahi-repo")) return "/data/asahi-repo";
// Check relative to project root (dev mode)
try {
const thisDir = dirname(fileURLToPath(import.meta.url));
const projectRoot = join(thisDir, "..", "..", "..", "..");
const inProjectRoot = join(projectRoot, "asahi-repo");
if (existsSync(inProjectRoot)) return inProjectRoot;
} catch { /* import.meta.url not available in tests */ }
return null;
}
export function registerAsahiRoutes(app: FastifyInstance, config: BastionConfig): void {
const repoDir = findAsahiRepo(config);
// Serve built rootfs package files (fedora-asahi-lab.zip, etc.)
if (repoDir) {
app.register(fastifyStatic, {
root: repoDir,
prefix: "/asahi/repo/",
decorateReply: false,
});
}
// Wrapper script — user runs: curl http://bastion:8080/asahi | sh
app.get("/asahi", async (_request, reply) => {
const script = `#!/bin/bash
# Lab Asahi provisioner — sets up Apple Silicon machines with lab LVM layout.
# This wraps the standard Asahi installer with custom installer_data.json
# that creates a separate LVM data partition.
set -euo pipefail
BASTION="http://${config.serverIp}:${config.httpPort}"
echo ""
echo " ╔══════════════════════════════════════════════╗"
echo " ║ Lab Asahi Provisioner ║"
echo " ║ Bastion: \${BASTION} ║"
echo " ╚══════════════════════════════════════════════╝"
echo ""
# Check we're on macOS
if [ "$(uname)" != "Darwin" ]; then
echo "ERROR: This script must be run from macOS on the target Mac."
echo " It uses the Asahi Linux installer to set up Apple Silicon boot."
exit 1
fi
# Download the standard Asahi installer
echo "Downloading Asahi Linux installer..."
WORKDIR=$(mktemp -d)
cd "$WORKDIR"
INSTALLER_BASE="https://cdn.asahilinux.org/installer"
PKG_VER=$(curl -s "\${INSTALLER_BASE}/latest")
echo " Version: \${PKG_VER}"
curl -# -L -o "installer-\${PKG_VER}.tar.gz" "\${INSTALLER_BASE}/installer-\${PKG_VER}.tar.gz"
echo " Extracting..."
tar xf "installer-\${PKG_VER}.tar.gz"
# Download our custom installer_data.json (installer reads it as a local file)
echo " Downloading custom installer data from bastion..."
curl -sfL -o installer_data.json "\${BASTION}/asahi/installer_data.json"
# Pre-download the rootfs package (avoids Python HTTP streaming issues on macOS)
echo " Downloading rootfs package from bastion..."
mkdir -p os
curl -# -L -o os/fedora-asahi-lab.zip "\${BASTION}/asahi/repo/fedora-asahi-lab.zip"
# Point installer to local directory (REPO_BASE + /os/ + package name)
export REPO_BASE="\${PWD}"
echo ""
echo " Using custom partition layout + rootfs from bastion."
echo " This will create:"
echo " - Standard Asahi boot infrastructure (m1n1 + U-Boot)"
echo " - Fedora Asahi Remix root partition"
echo " - LVM data partition (remaining space)"
echo ""
echo " After first boot, SSH in and set up LVM:"
echo " ssh lab@<ip> 'curl -sf \${BASTION}/asahi/firstboot.sh | sudo bash'"
echo ""
# Run the installer
if [ "$USER" != "root" ]; then
echo "The installer needs root. Enter your sudo password if prompted."
exec caffeinate -dis sudo -E ./install.sh "$@"
else
exec caffeinate -dis ./install.sh "$@"
fi
`;
return reply.type("text/x-shellscript").send(script);
});
// Custom installer_data.json — serves built config or fallback
app.get("/asahi/installer_data.json", async (_request, reply) => {
// Prefer the built installer_data.json (from build-asahi-rootfs.sh)
if (repoDir) {
const builtConfig = join(repoDir, "installer_data.json");
if (existsSync(builtConfig)) {
const data = JSON.parse(readFileSync(builtConfig, "utf-8"));
return reply.type("application/json").send(data);
}
}
// Fallback: minimal config (won't have boot.img, for testing only)
return reply.type("application/json").send({
os_list: [{
name: "Fedora Asahi Lab",
default_os_name: "Fedora Linux with Lab LVM",
boot_object: "m1n1.bin",
next_object: "m1n1/boot.bin",
package: "fedora-asahi-lab.zip",
supported_fw: ["13.5"],
partitions: [
{ name: "EFI", type: "EFI", size: "524288000B", format: "fat",
copy_firmware: true, copy_installer_data: true, source: "esp" },
{ name: "Root", type: "Linux", size: "5368709120B", image: "root.img", expand: false },
{ name: "Data", type: "Linux", size: "1073741824B", expand: true },
],
}],
});
});
// First-boot script — for manual download or embedding in rootfs
app.get<{
Querystring: { hostname?: string; role?: string; mac?: string; user?: string };
}>("/asahi/firstboot.sh", async (request, reply) => {
const hostname = request.query.hostname ?? "unknown";
const role = (request.query.role ?? "infra") as Role;
const mac = request.query.mac ?? "unknown";
const user = request.query.user ?? "lab";
const script = renderFirstbootScript({
hostname,
role,
serverIp: config.serverIp,
httpPort: config.httpPort,
sshKeys: config.sshKeys ?? [],
adminUser: user,
mac,
});
return reply.type("text/x-shellscript").send(script);
});
// Systemd unit file for first-boot service
app.get("/asahi/firstboot.service", async (_request, reply) => {
return reply.type("text/plain").send(renderFirstbootUnit());
});
}

View File

@@ -137,7 +137,7 @@ function generateIso(config: BastionConfig, outputPath: string): void {
"# Map iPXE arch names to Fedora mirror paths (arm64 -> aarch64)", "# Map iPXE arch names to Fedora mirror paths (arm64 -> aarch64)",
"set fedarch ${buildarch}", "set fedarch ${buildarch}",
"iseq ${buildarch} arm64 && set fedarch aarch64 ||", "iseq ${buildarch} arm64 && set fedarch aarch64 ||",
`kernel file:/vmlinuz-\${buildarch} inst.ks=${bastionUrl}/discover.ks inst.repo=${FEDORA_MIRROR_BASE}/${config.fedoraVersion}/Everything/\${fedarch}/os inst.text || goto no_kernel`, `kernel file:/vmlinuz-\${buildarch} inst.ks=${bastionUrl}/ks-auto inst.repo=${FEDORA_MIRROR_BASE}/${config.fedoraVersion}/Everything/\${fedarch}/os inst.text || goto no_kernel`,
`initrd file:/initrd-\${buildarch} || goto no_kernel`, `initrd file:/initrd-\${buildarch} || goto no_kernel`,
"boot || shell", "boot || shell",
"", "",

View File

@@ -41,6 +41,150 @@ export function registerKickstartRoutes(
return reply.type("text/plain").send(ks); return reply.type("text/plain").send(ks);
}); });
// Auto-detecting kickstart for ISO boot (no-network machines like R1 ARM).
// %pre detects MAC, queries bastion state, writes dynamic kickstart to /tmp.
// Main body %include's it — so Anaconda gets either discover or install content.
app.get("/ks-auto", async (_request, reply) => {
const bastionUrl = `http://${config.serverIp}:${config.httpPort}`;
const ks = `# Lab Bastion -- Auto-detect kickstart (ISO boot)
# %pre detects MAC, queries bastion state, writes /tmp/dynamic.ks.
# Main body %include's it to get either discovery reboot or full install.
%pre --erroronfail --log=/tmp/ks-auto.log
#!/bin/bash
set -x
# -- Detect MAC address --
MAC=$(ip link show | awk '/ether/ && !/00:00:00:00/ {print $2; exit}')
echo "Detected MAC: $MAC"
# -- Wait for network (Linux drivers may take a moment) --
for i in $(seq 1 30); do
if curl -sf "${bastionUrl}/healthz" >/dev/null 2>&1; then
echo "Bastion reachable at ${bastionUrl}"
break
fi
echo "Waiting for network... ($i/30)"
sleep 2
done
# -- Query bastion for machine state --
STATE=$(curl -sf "${bastionUrl}/api/machine-state/$MAC" 2>/dev/null || echo "unknown")
echo "Machine state: $STATE"
case "$STATE" in
queued|installing)
echo "=== Machine queued for install. Fetching install kickstart... ==="
curl -sf "${bastionUrl}/ks?mac=$MAC" > /tmp/dynamic.ks
if [ -s /tmp/dynamic.ks ]; then
echo "Install kickstart downloaded ($(wc -l < /tmp/dynamic.ks) lines)"
else
echo "ERROR: Failed to download install kickstart"
exit 1
fi
# Run any %pre scripts from the downloaded kickstart.
# Anaconda only runs %pre from the top-level file, not from %include'd files.
python3 -c "
import re, subprocess
content = open('/tmp/dynamic.ks').read()
blocks = re.findall(r'%pre[^\\n]*\\n(.*?)%end', content, re.DOTALL)
for i, script in enumerate(blocks):
path = f'/tmp/inner-pre-{i}.sh'
with open(path, 'w') as f:
f.write(script)
print(f'Running inner %pre script {i} ({len(script.splitlines())} lines)')
subprocess.run(['bash', path], check=False)
"
;;
debug)
echo "=== Debug mode ==="
curl -sf "${bastionUrl}/debug.ks?mac=$MAC" > /tmp/dynamic.ks 2>/dev/null
if [ ! -s /tmp/dynamic.ks ]; then
echo "rescue" > /tmp/dynamic.ks
fi
;;
*)
echo "=== Running hardware discovery ==="
# Collect hardware info
PRODUCT=$(cat /sys/class/dmi/id/product_name 2>/dev/null || echo "unknown")
BOARD=$(cat /sys/class/dmi/id/board_name 2>/dev/null || echo "unknown")
SERIAL=$(cat /sys/class/dmi/id/product_serial 2>/dev/null || echo "unknown")
MANUFACTURER=$(cat /sys/class/dmi/id/sys_vendor 2>/dev/null || echo "unknown")
CPUMODEL=$(grep -m1 'model name' /proc/cpuinfo | cut -d: -f2 | sed 's/^ //')
CPUCORES=$(grep -c '^processor' /proc/cpuinfo)
MEMGB=$(awk '/MemTotal/ {printf "%d", $2/1024/1024}' /proc/meminfo)
ARCHTYPE=$(uname -m)
DISKS_JSON=$(lsblk -Jb -o NAME,SIZE,TYPE,MODEL 2>/dev/null | python3 -c "
import sys, json
data = json.load(sys.stdin)
disks = [d for d in data.get('blockdevices', []) if d.get('type') == 'disk']
result = []
for d in disks:
size_gb = round(int(d.get('size', 0)) / 1073741824, 1)
result.append({'name': d.get('name', '?'), 'size_gb': size_gb, 'model': (d.get('model') or 'unknown').strip()})
print(json.dumps(result))
" 2>/dev/null || echo '[]')
NICS_JSON=$(ip -j link show 2>/dev/null | python3 -c "
import sys, json
nics = json.load(sys.stdin)
result = []
for n in nics:
if n.get('link_type') == 'loopback': continue
result.append({'name': n.get('ifname', '?'), 'mac': n.get('address', '?'), 'state': n.get('operstate', '?')})
print(json.dumps(result))
" 2>/dev/null || echo '[]')
PAYLOAD=$(python3 -c "
import json
print(json.dumps({
'mac': '$MAC', 'product': '$PRODUCT', 'board': '$BOARD', 'serial': '$SERIAL',
'manufacturer': '$MANUFACTURER', 'cpu_model': '$CPUMODEL',
'cpu_cores': int('$CPUCORES' or 0), 'memory_gb': int('$MEMGB' or 0),
'arch': '$ARCHTYPE', 'disks': $DISKS_JSON, 'nics': $NICS_JSON
}))
")
curl -sf -X POST "${bastionUrl}/api/discover" \\
-H "Content-Type: application/json" \\
-d "$PAYLOAD" || true
echo ""
echo "=== Discovery complete ==="
echo "Machine MAC: $MAC"
echo "Queue for install: labctl provision install $MAC <hostname> --role infra"
echo "Then reboot to start installation."
echo ""
# Write a minimal kickstart that just reboots
cat > /tmp/dynamic.ks << 'DISCOVER_KS'
# Discovery mode -- reboot to allow install queue
reboot
DISCOVER_KS
# Force reboot now (don't wait for Anaconda)
sleep 3
echo 1 > /proc/sys/kernel/sysrq
echo b > /proc/sysrq-trigger
sleep 5
reboot -f
;;
esac
%end
# Include the dynamically chosen kickstart
%include /tmp/dynamic.ks
`;
return reply.type("text/plain").send(ks);
});
// Ubuntu autoinstall user-data (cloud-init) // Ubuntu autoinstall user-data (cloud-init)
app.get<{ Params: { mac: string } }>("/autoinstall/:mac/user-data", async (request, reply) => { app.get<{ Params: { mac: string } }>("/autoinstall/:mac/user-data", async (request, reply) => {
const mac = request.params.mac.toLowerCase().replace(/-/g, ":"); const mac = request.params.mac.toLowerCase().replace(/-/g, ":");

View File

@@ -11,6 +11,7 @@ import { logger } from "./services/logger.js";
import { registerDispatchRoutes } from "./routes/dispatch.js"; import { registerDispatchRoutes } from "./routes/dispatch.js";
import { registerKickstartRoutes } from "./routes/kickstart.js"; import { registerKickstartRoutes } from "./routes/kickstart.js";
import { registerApiRoutes } from "./routes/api.js"; import { registerApiRoutes } from "./routes/api.js";
import { registerAsahiRoutes } from "./routes/asahi.js";
export function createApp(config: BastionConfig): { app: ReturnType<typeof Fastify>; state: StateManager; installLog: InstallLogBuffer; syslog: SyslogListener } { export function createApp(config: BastionConfig): { app: ReturnType<typeof Fastify>; state: StateManager; installLog: InstallLogBuffer; syslog: SyslogListener } {
@@ -45,6 +46,7 @@ export function createApp(config: BastionConfig): { app: ReturnType<typeof Fasti
registerDispatchRoutes(app, config, state); registerDispatchRoutes(app, config, state);
registerKickstartRoutes(app, config, state, syslog); registerKickstartRoutes(app, config, state, syslog);
registerApiRoutes(app, state, installLog, syslog); registerApiRoutes(app, state, installLog, syslog);
registerAsahiRoutes(app, config);
// boot.iso is generated at startup and served as a static file from httpDir // boot.iso is generated at startup and served as a static file from httpDir
// (static serving supports HTTP Range requests, required by JetKVM streaming) // (static serving supports HTTP Range requests, required by JetKVM streaming)

View File

@@ -166,6 +166,7 @@ export class BastionConnection {
case "command-role-update": case "command-role-update":
case "command-debug": case "command-debug":
case "command-register": case "command-register":
case "command-discover":
void this.handleCommand(msg); void this.handleCommand(msg);
break; break;
} }

View File

@@ -0,0 +1,311 @@
// First-boot LVM setup script for Asahi-provisioned machines.
// Embedded in the custom rootfs as a systemd service that runs once on first boot.
// Creates the standard lab LVM layout on the data partition, matching install.ks.ts.
import type { Role } from "@lab/shared";
export interface AsahiFirstbootParams {
hostname: string;
role: Role;
serverIp: string;
httpPort: number;
sshKeys: string[];
adminUser: string;
mac: string;
}
export function renderFirstbootScript(params: AsahiFirstbootParams): string {
const { hostname, role, serverIp, httpPort, sshKeys, adminUser, mac } = params;
const isWorker = role === "worker";
const isInfra = role === "infra" || role === "labcontroller";
// Role-specific LV creation commands
const roleLvLines: string[] = [];
const roleFormatLines: string[] = [];
const roleMountLines: string[] = [];
const roleFstabLines: string[] = [];
if (isInfra) {
roleLvLines.push('lvcreate -L 20480M -n rancher labvg -y');
roleFormatLines.push('mkfs.xfs /dev/labvg/rancher');
roleMountLines.push('mount_lv rancher /var/lib/rancher');
roleFstabLines.push('echo "/dev/labvg/rancher /var/lib/rancher xfs defaults 0 0" >> /etc/fstab');
}
if (isWorker || isInfra) {
roleLvLines.push('lvcreate -l 100%FREE -n longhorn labvg -y');
roleFormatLines.push('mkfs.xfs /dev/labvg/longhorn');
roleMountLines.push('mount_lv longhorn /var/lib/longhorn');
roleFstabLines.push('echo "/dev/labvg/longhorn /var/lib/longhorn xfs defaults 0 0" >> /etc/fstab');
}
// SSH key injection block (empty if no keys)
const sshKeyBlock = sshKeys.length > 0
? sshKeys.map(k => `echo '${k}' >> "$ADMIN_SSH/authorized_keys"`).join('\n')
: 'true # no SSH keys configured';
const rootSshKeyBlock = sshKeys.length > 0
? sshKeys.map(k => `echo '${k}' >> /root/.ssh/authorized_keys`).join('\n')
: 'true # no SSH keys configured';
// NOTE: All bash $ references use $VAR not \${VAR} to avoid TS template conflicts.
// Where ${} is needed in bash, we use \\${...} to escape.
return `#!/bin/bash
# Lab first-boot LVM setup — generated by bastion
# This script runs once on first boot via systemd, then disables itself.
set -euo pipefail
MARKER="/etc/lab-lvm-setup-done"
LOG="/var/log/lab-firstboot.log"
exec > >(tee -a "$LOG") 2>&1
echo "=== Lab first-boot LVM setup ==="
date
# Already done?
if [ -f "$MARKER" ]; then
echo "LVM setup already completed, skipping."
exit 0
fi
# ── Find the data partition ──────────────────────────────────────
# The data partition/disk is a large block device that is NOT the root filesystem.
# Handles: NVMe partitions, SCSI partitions, whole unpartitioned disks.
ROOT_DEV=$(findmnt -n -o SOURCE / | sed 's/\\[.*\\]//') # strip btrfs subvol
ROOT_DISK=$(lsblk -n -o PKNAME "$ROOT_DEV" 2>/dev/null | head -1)
echo "Root device: $ROOT_DEV (disk: $ROOT_DISK)"
DATA_PART=""
# Scan partitions first, then whole disks
for part in /dev/nvme*n*p* /dev/sd*[0-9] /dev/vd*[0-9] /dev/nvme*n* /dev/sd[b-z] /dev/vd[b-z]; do
[ -b "$part" ] || continue
# Skip root device and root disk
[ "$part" = "$ROOT_DEV" ] && continue
PART_DISK=$(basename "$part" | sed 's/p[0-9]*$//' | sed 's/[0-9]*$//')
[ "$PART_DISK" = "$ROOT_DISK" ] && continue
# Skip small devices (<50GB) — EFI, boot, APFS stubs
SIZE_BYTES=$(blockdev --getsize64 "$part" 2>/dev/null || echo 0)
SIZE_GB=$((SIZE_BYTES / 1073741824))
[ "$SIZE_GB" -lt 50 ] && continue
# Use if unformatted or already LVM
FSTYPE=$(blkid -o value -s TYPE "$part" 2>/dev/null || echo "")
if [ -z "$FSTYPE" ] || [ "$FSTYPE" = "LVM2_member" ]; then
DATA_PART="$part"
echo "Found data device: $DATA_PART ($SIZE_GB GB)"
break
fi
done
if [ -z "$DATA_PART" ]; then
echo "ERROR: No suitable data partition found for LVM."
echo "Expected a large (>50GB) unformatted partition."
exit 1
fi
# ── Helper function ──────────────────────────────────────────────
mount_lv() {
local lv="$1" mp="$2"
if lvs "labvg/$lv" &>/dev/null; then
mkdir -p "$mp"
mount "/dev/labvg/$lv" "$mp" 2>/dev/null || true
echo " Mounted $lv -> $mp"
fi
}
# ── Write fstab function (idempotent) ────────────────────────────
write_lab_fstab() {
# Remove any previous lab LVM entries (clean slate)
sed -i '/# lab-lvm:/d' /etc/fstab
sed -i '/# Lab LVM volumes/d' /etc/fstab
grep -v "/dev/labvg/" /etc/fstab > /etc/fstab.tmp && mv /etc/fstab.tmp /etc/fstab
# Comment out non-LVM entries for mount points we manage
for mp in "/var " "/var/log " "/home " "/srv "; do
if grep -q "$mp" /etc/fstab; then
awk -v m="$mp" '{if($0 !~ /^#/ && index($0,m)) print "# lab-lvm: " $0; else print}' /etc/fstab > /etc/fstab.tmp
mv /etc/fstab.tmp /etc/fstab
fi
done
# Add fresh LVM entries
echo "# Lab LVM volumes" >> /etc/fstab
echo "/dev/labvg/swap none swap defaults 0 0" >> /etc/fstab
echo "/dev/labvg/var /var xfs defaults 0 0" >> /etc/fstab
echo "/dev/labvg/varlog /var/log xfs defaults 0 0" >> /etc/fstab
echo "/dev/labvg/home /home xfs defaults 0 0" >> /etc/fstab
echo "/dev/labvg/srv /srv xfs defaults 0 0" >> /etc/fstab
${roleFstabLines.join('\n ')}
}
# ── Check for existing VG ────────────────────────────────────────
if vgs labvg &>/dev/null; then
echo "Volume group 'labvg' already exists — reprovision detected."
echo "Activating existing volumes..."
vgchange -ay labvg
mount_lv var /var
mount_lv varlog /var/log
mount_lv home /home
mount_lv srv /srv
${roleMountLines.map(l => ` ${l}`).join('\n')}
# Enable swap
if lvs labvg/swap &>/dev/null; then
swapon /dev/labvg/swap 2>/dev/null || true
echo " Enabled swap"
fi
# Ensure fstab entries exist — comment out conflicting btrfs subvol entries
write_lab_fstab
echo "Existing LVM volumes re-mounted."
else
# ── Fresh install: create LVM ────────────────────────────────────
echo "Creating LVM on $DATA_PART..."
pvcreate "$DATA_PART"
vgcreate labvg "$DATA_PART"
# Create LVs — sizes match install.ks.ts (in MiB)
echo "Creating logical volumes..."
lvcreate -L 27648M -n swap labvg -y # 27GB swap
lvcreate -L 102400M -n var labvg -y # 100GB /var
lvcreate -L 10240M -n varlog labvg -y # 10GB /var/log
lvcreate -L 10240M -n home labvg -y # 10GB /home
lvcreate -L 20480M -n srv labvg -y # 20GB /srv
${roleLvLines.join('\n')}
# Format
echo "Formatting volumes..."
mkswap /dev/labvg/swap
mkfs.xfs /dev/labvg/var
mkfs.xfs /dev/labvg/varlog
mkfs.xfs /dev/labvg/home
mkfs.xfs /dev/labvg/srv
${roleFormatLines.join('\n')}
# Migrate and mount volumes that can be switched live.
# Copy existing content first so we don't shadow files (e.g. /home/user/.ssh).
for LV_MOUNT in "home /home" "srv /srv"; do
LV_NAME=$(echo "$LV_MOUNT" | awk '{print $1}')
MOUNT_PT=$(echo "$LV_MOUNT" | awk '{print $2}')
STAGING="/mnt/labvg-$LV_NAME-staging"
mkdir -p "$STAGING"
mount "/dev/labvg/$LV_NAME" "$STAGING"
cp -a "$MOUNT_PT"/. "$STAGING/" 2>/dev/null || true
umount "$STAGING"
rmdir "$STAGING"
mount_lv "$LV_NAME" "$MOUNT_PT"
done
# Mount role-specific volumes (empty, no content to preserve)
set +e
${roleMountLines.join('\n')}
set -e
# Copy existing /var content into the LV for next boot
echo "Preparing /var LV for next boot..."
TMPVAR="/mnt/labvg-var-staging"
mkdir -p "$TMPVAR"
mount /dev/labvg/var "$TMPVAR"
cp -a /var/. "$TMPVAR/" 2>/dev/null || true
umount "$TMPVAR"
rmdir "$TMPVAR"
# Same for /var/log
TMPVARLOG="/mnt/labvg-varlog-staging"
mkdir -p "$TMPVARLOG"
mount /dev/labvg/varlog "$TMPVARLOG"
cp -a /var/log/. "$TMPVARLOG/" 2>/dev/null || true
umount "$TMPVARLOG"
rmdir "$TMPVARLOG"
echo "NOTE: /var and /var/log will switch to LVM on next reboot."
# Enable swap
swapon /dev/labvg/swap 2>/dev/null || true
write_lab_fstab
echo "LVM setup complete."
lvs labvg
fi # end if/else for reprovision vs fresh install
# ── Set hostname (use configured value, or keep existing) ────────
CONF_HOSTNAME="${hostname}"
if [ "$CONF_HOSTNAME" != "unknown" ] && [ -n "$CONF_HOSTNAME" ]; then
hostnamectl set-hostname "$CONF_HOSTNAME"
fi
ACTUAL_HOSTNAME=$(hostname)
# ── Detect MAC address ───────────────────────────────────────────
CONF_MAC="${mac}"
if [ "$CONF_MAC" = "unknown" ] || [ -z "$CONF_MAC" ]; then
CONF_MAC=$(ip -o link show | grep -v "lo:" | grep "state UP" | head -1 | grep -oP 'link/ether \\K[^ ]+' || echo "unknown")
fi
# ── Configure admin user ─────────────────────────────────────────
ADMIN="${adminUser}"
if ! id "$ADMIN" &>/dev/null; then
useradd -m -G wheel "$ADMIN"
echo "$ADMIN ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$ADMIN
chmod 440 /etc/sudoers.d/$ADMIN
fi
ADMIN_SSH="/home/$ADMIN/.ssh"
mkdir -p "$ADMIN_SSH"
chmod 700 "$ADMIN_SSH"
${sshKeyBlock}
chmod 600 "$ADMIN_SSH/authorized_keys"
chown -R $ADMIN:$ADMIN "$ADMIN_SSH"
# Also authorize root
mkdir -p /root/.ssh
chmod 700 /root/.ssh
${rootSshKeyBlock}
chmod 600 /root/.ssh/authorized_keys
# ── Harden SSH (takes effect on next sshd restart/reboot) ────────
sed -i 's/^#*PermitRootLogin.*/PermitRootLogin prohibit-password/' /etc/ssh/sshd_config
sed -i 's/^#*PasswordAuthentication.*/PasswordAuthentication no/' /etc/ssh/sshd_config
# ── Write provisioning metadata ──────────────────────────────────
cat > /etc/lab-provisioned << LABMETA
hostname=$ACTUAL_HOSTNAME
role=${role}
mac=$CONF_MAC
provisioned_at=$(date -Iseconds)
method=asahi-firstboot
LABMETA
# ── Register with bastion ─────────────────────────────────────────
IP=$(hostname -I | awk '{print $1}')
echo "Registering with bastion at ${serverIp}:${httpPort}..."
curl -sf -X POST "http://${serverIp}:${httpPort}/api/register" \\
-H "Content-Type: application/json" \\
-d "{\\"mac\\":\\"$CONF_MAC\\",\\"hostname\\":\\"$ACTUAL_HOSTNAME\\",\\"role\\":\\"${role}\\",\\"ip\\":\\"$IP\\"}" \\
2>/dev/null && echo " Registered as $ACTUAL_HOSTNAME ($IP)" \\
|| echo " WARNING: Could not reach bastion — register manually with: labctl provision register $CONF_MAC $ACTUAL_HOSTNAME --role ${role} --ip $IP"
# ── Mark done ────────────────────────────────────────────────────
touch "$MARKER"
echo "=== First-boot setup complete ==="
`;
}
/** Systemd unit file for the first-boot service */
export function renderFirstbootUnit(): string {
return `[Unit]
Description=Lab first-boot LVM setup
After=local-fs.target network-online.target
Wants=network-online.target
ConditionPathExists=!/etc/lab-lvm-setup-done
[Service]
Type=oneshot
ExecStart=/usr/local/bin/lab-firstboot.sh
RemainAfterExit=yes
StandardOutput=journal+console
StandardError=journal+console
[Install]
WantedBy=multi-user.target
`;
}

View File

@@ -0,0 +1,225 @@
import { describe, it, expect, beforeEach, afterEach } from "vitest";
import { mkdirSync, rmSync } from "node:fs";
import { join } from "node:path";
import { tmpdir } from "node:os";
import type { BastionConfig } from "@lab/shared";
import { createApp } from "../src/server.js";
import type { FastifyInstance } from "fastify";
import { renderFirstbootScript, renderFirstbootUnit } from "../src/templates/asahi-firstboot.sh.js";
function createTestConfig(testDir: string): BastionConfig {
return {
fedoraVersion: "43",
arch: "x86_64",
httpPort: 0,
timezone: "Europe/London",
locale: "en_GB.UTF-8",
bastionDir: testDir,
domain: "test.local",
dhcpMode: "proxy",
dhcpRangeStart: "",
dhcpRangeEnd: "",
ubuntuVersion: "26.04",
ubuntuMirror: "https://releases.ubuntu.com/26.04",
iface: "eth0",
serverIp: "192.168.8.1",
network: "192.168.8.0",
gateway: "192.168.8.1",
sshKeys: ["ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAITEST test@lab"],
adminUser: "michal",
syslogPort: 15514,
skipDnsmasq: true,
skipArtifacts: true,
fedoraMirror: "https://download.fedoraproject.org/pub/fedora/linux/releases/43/Everything/x86_64/os",
tftpDir: join(testDir, "tftp"),
httpDir: join(testDir, "http"),
stateFile: join(testDir, "state.json"),
};
}
describe("asahi routes", () => {
let testDir: string;
let app: FastifyInstance;
beforeEach(() => {
testDir = join(tmpdir(), `bastion-asahi-test-${Date.now()}-${Math.random().toString(36).slice(2)}`);
mkdirSync(testDir, { recursive: true });
mkdirSync(join(testDir, "http"), { recursive: true });
mkdirSync(join(testDir, "tftp"), { recursive: true });
const config = createTestConfig(testDir);
const result = createApp(config);
app = result.app;
});
afterEach(async () => {
await app.close();
rmSync(testDir, { recursive: true, force: true });
});
it("GET /asahi returns wrapper shell script", async () => {
const resp = await app.inject({ method: "GET", url: "/asahi" });
expect(resp.statusCode).toBe(200);
expect(resp.headers["content-type"]).toContain("text/x-shellscript");
expect(resp.body).toContain("#!/bin/bash");
expect(resp.body).toContain("installer_data.json");
expect(resp.body).toContain("192.168.8.1");
expect(resp.body).toContain("install.sh");
});
it("GET /asahi/installer_data.json returns valid config", async () => {
const resp = await app.inject({ method: "GET", url: "/asahi/installer_data.json" });
expect(resp.statusCode).toBe(200);
const data = JSON.parse(resp.body);
expect(data.os_list).toHaveLength(1);
const os = data.os_list[0];
expect(os.name).toContain("Fedora Asahi Lab");
// 3 partitions (fallback) or 4 (built: EFI + Boot + Root + Data)
expect(os.partitions.length).toBeGreaterThanOrEqual(3);
expect(os.partitions[0].type).toBe("EFI");
// Last partition should be the expanding Data partition
const lastPart = os.partitions[os.partitions.length - 1];
expect(lastPart.type).toBe("Linux");
expect(lastPart.expand).toBe(true);
// Root partition (second-to-last) should NOT expand
const rootPart = os.partitions[os.partitions.length - 2];
expect(rootPart.expand).toBe(false);
expect(rootPart.image).toBe("root.img");
});
it("GET /asahi/firstboot.sh returns parameterized script", async () => {
const resp = await app.inject({
method: "GET",
url: "/asahi/firstboot.sh?hostname=mac-studio&role=infra&mac=00:11:22:33:44:55",
});
expect(resp.statusCode).toBe(200);
expect(resp.body).toContain("#!/bin/bash");
expect(resp.body).toContain("mac-studio");
expect(resp.body).toContain("labvg");
expect(resp.body).toContain("rancher"); // infra gets rancher LV
expect(resp.body).toContain("longhorn"); // infra also gets longhorn
expect(resp.body).toContain("ssh-ed25519"); // SSH key injected
});
it("GET /asahi/firstboot.service returns systemd unit", async () => {
const resp = await app.inject({ method: "GET", url: "/asahi/firstboot.service" });
expect(resp.statusCode).toBe(200);
expect(resp.body).toContain("[Unit]");
expect(resp.body).toContain("lab-firstboot.sh");
expect(resp.body).toContain("ConditionPathExists=!/etc/lab-lvm-setup-done");
});
});
describe("renderFirstbootScript", () => {
const baseParams = {
hostname: "test-node",
serverIp: "10.0.0.1",
httpPort: 8080,
sshKeys: ["ssh-ed25519 AAAA... user@host"],
adminUser: "testadmin",
mac: "aa:bb:cc:dd:ee:ff",
};
it("generates valid bash with shebang", () => {
const script = renderFirstbootScript({ ...baseParams, role: "worker" });
expect(script.startsWith("#!/bin/bash")).toBe(true);
});
it("includes LVM creation commands", () => {
const script = renderFirstbootScript({ ...baseParams, role: "infra" });
expect(script).toContain("pvcreate");
expect(script).toContain("vgcreate labvg");
expect(script).toContain("lvcreate");
});
it("uses correct LV sizes from kickstart layout", () => {
const script = renderFirstbootScript({ ...baseParams, role: "infra" });
expect(script).toContain("27648M"); // swap
expect(script).toContain("102400M"); // /var
expect(script).toContain("10240M"); // /var/log and /home
expect(script).toContain("20480M"); // /srv and /rancher
});
it("includes rancher LV for infra role", () => {
const script = renderFirstbootScript({ ...baseParams, role: "infra" });
expect(script).toContain("rancher");
expect(script).toContain("/var/lib/rancher");
});
it("includes longhorn for worker role", () => {
const script = renderFirstbootScript({ ...baseParams, role: "worker" });
expect(script).toContain("longhorn");
expect(script).toContain("/var/lib/longhorn");
// Worker should NOT have rancher
expect(script).not.toContain("rancher");
});
it("includes longhorn for infra role", () => {
const script = renderFirstbootScript({ ...baseParams, role: "infra" });
expect(script).toContain("longhorn");
expect(script).toContain("/var/lib/longhorn");
});
it("vanilla role gets no role-specific LVs", () => {
const script = renderFirstbootScript({ ...baseParams, role: "vanilla" });
expect(script).not.toContain("rancher");
expect(script).not.toContain("longhorn");
});
it("handles reprovision (existing labvg)", () => {
const script = renderFirstbootScript({ ...baseParams, role: "infra" });
expect(script).toContain("reprovision detected");
expect(script).toContain("vgchange -ay labvg");
expect(script).toContain("mount_lv var /var");
});
it("injects SSH keys for admin user and root", () => {
const script = renderFirstbootScript({ ...baseParams, role: "worker" });
expect(script).toContain("ssh-ed25519 AAAA...");
expect(script).toContain("testadmin");
expect(script).toContain("/root/.ssh/authorized_keys");
});
it("sets hostname", () => {
const script = renderFirstbootScript({ ...baseParams, role: "worker" });
expect(script).toContain('CONF_HOSTNAME="test-node"');
expect(script).toContain("hostnamectl set-hostname");
});
it("includes bastion self-registration", () => {
const script = renderFirstbootScript({ ...baseParams, role: "worker" });
expect(script).toContain("/api/register");
expect(script).toContain("aa:bb:cc:dd:ee:ff");
expect(script).toContain("test-node");
});
it("writes provisioning metadata", () => {
const script = renderFirstbootScript({ ...baseParams, role: "infra" });
expect(script).toContain("/etc/lab-provisioned");
expect(script).toContain("method=asahi-firstboot");
});
it("creates marker file to prevent re-run", () => {
const script = renderFirstbootScript({ ...baseParams, role: "worker" });
expect(script).toContain("/etc/lab-lvm-setup-done");
expect(script).toContain('touch "$MARKER"');
});
});
describe("renderFirstbootUnit", () => {
it("generates valid systemd unit", () => {
const unit = renderFirstbootUnit();
expect(unit).toContain("[Unit]");
expect(unit).toContain("[Service]");
expect(unit).toContain("[Install]");
expect(unit).toContain("Type=oneshot");
expect(unit).toContain("WantedBy=multi-user.target");
});
it("only runs when marker is missing", () => {
const unit = renderFirstbootUnit();
expect(unit).toContain("ConditionPathExists=!/etc/lab-lvm-setup-done");
});
});

View File

@@ -104,6 +104,16 @@ export class LabdClient {
return this.request("POST", "/api/machines/debug", { body: { mac, pxeBoot: opts?.pxeBoot } }); return this.request("POST", "/api/machines/debug", { body: { mac, pxeBoot: opts?.pxeBoot } });
} }
async discoverMachine(data: {
mac: string; product?: string; board?: string; serial?: string;
manufacturer?: string; cpu_model?: string; cpu_cores?: number;
memory_gb?: number; arch?: string;
disks?: Array<{ name: string; size_gb: number; model: string }>;
nics?: Array<{ name: string; mac: string; state: string }>;
}): Promise<{ status: string; error?: string }> {
return this.request("POST", "/api/machines/discover", { body: data });
}
async forgetMachine(mac: string): Promise<{ status: string }> { async forgetMachine(mac: string): Promise<{ status: string }> {
return this.request("DELETE", `/api/machines/${encodeURIComponent(mac)}`); return this.request("DELETE", `/api/machines/${encodeURIComponent(mac)}`);
} }

View File

@@ -70,7 +70,7 @@ export function registerAppCommand(program: Command): void {
.command("install <target>") .command("install <target>")
.description("Install k3s on a target machine (hostname, IP, or MAC)") .description("Install k3s on a target machine (hostname, IP, or MAC)")
.option("--role <role>", "k3s role: infra (server) or worker (agent)", "infra") .option("--role <role>", "k3s role: infra (server) or worker (agent)", "infra")
.option("--user <user>", "SSH user", "michal") .option("--user <user>", "SSH user", "root")
.option("--k3s-server <url>", "k3s server URL (required for worker role)") .option("--k3s-server <url>", "k3s server URL (required for worker role)")
.option("--k3s-token <token>", "k3s join token (required for worker role)") .option("--k3s-token <token>", "k3s join token (required for worker role)")
.action(async (target: string, opts: { .action(async (target: string, opts: {
@@ -164,7 +164,7 @@ export function registerAppCommand(program: Command): void {
k3sCmd k3sCmd
.command("health [target]") .command("health [target]")
.description("Check k3s health (all hosts if no target given)") .description("Check k3s health (all hosts if no target given)")
.option("--user <user>", "SSH user", "michal") .option("--user <user>", "SSH user", "root")
.action(async (target: string | undefined, opts: { user: string }) => { .action(async (target: string | undefined, opts: { user: string }) => {
const sshKey = findSshKey(); const sshKey = findSshKey();
@@ -304,7 +304,7 @@ export function registerAppCommand(program: Command): void {
k3sCmd k3sCmd
.command("list") .command("list")
.description("List installed machines and their k3s status") .description("List installed machines and their k3s status")
.option("--user <user>", "SSH user", "michal") .option("--user <user>", "SSH user", "root")
.action(async (opts: { user: string }) => { .action(async (opts: { user: string }) => {
let state: BastionState; let state: BastionState;
try { try {

View File

@@ -0,0 +1,69 @@
// CLI command: provision asahi
// Prints the curl command to run on the Mac Studio (macOS) to install
// Fedora Asahi Remix with lab LVM layout.
import type { Command } from "commander";
import { getLabdClient } from "../api/config.js";
export function registerAsahiCommand(parent: Command): void {
parent
.command("asahi")
.description("Show instructions to provision an Apple Silicon Mac with Asahi Linux")
.action(async () => {
// Try to get bastion info to determine the correct URL
let bastionUrl = "";
try {
const bastions = await getLabdClient().getBastions();
const online = bastions.find(b => b.status === "online");
if (online) {
bastionUrl = `http://${online.serverIp}:8080`;
}
} catch { /* labd not reachable */ }
if (!bastionUrl) {
// Fall back to config
const { loadConfig } = await import("../config/index.js");
const config = loadConfig();
bastionUrl = config.labdUrl ?? "http://<bastion-ip>:8080";
// Convert labd URL to bastion URL (labd is on different port/host)
bastionUrl = bastionUrl.replace(/:\d+$/, ":8080");
}
const BOLD = "\x1b[1m";
const CYAN = "\x1b[36m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
console.log("");
console.log(`${BOLD} Asahi Linux Provisioning${RESET}`);
console.log(`${DIM} For Apple Silicon Macs (Mac Studio, MacBook, etc.)${RESET}`);
console.log("");
console.log(` Run this command ${BOLD}on the Mac${RESET} (from macOS Terminal):`);
console.log("");
console.log(` ${CYAN}${BOLD}curl ${bastionUrl}/asahi | sh${RESET}`);
console.log("");
console.log(` The installer will ask a few interactive questions:`);
console.log(` ${BOLD}1.${RESET} Action: press ${BOLD}r${RESET} to resize macOS`);
console.log(` ${BOLD}2.${RESET} How much space for Linux: choose maximum`);
console.log(` ${BOLD}3.${RESET} Confirm the resize operation`);
console.log(` ${BOLD}4.${RESET} macOS password for firmware authentication`);
console.log("");
console.log(` After that, everything is automatic:`);
console.log(` - Asahi boot infrastructure (m1n1 + U-Boot)`);
console.log(` - Fedora Asahi Remix root partition`);
console.log(` - LVM data partition (remaining space)`);
console.log("");
console.log(` On first boot, LVM volumes are created automatically:`);
console.log(` ${DIM}labvg/swap (27GB), labvg/var (100GB), labvg/varlog (10GB),`);
console.log(` labvg/home (10GB), labvg/srv (20GB), labvg/rancher (20GB),`);
console.log(` labvg/longhorn (remaining space)${RESET}`);
console.log("");
console.log(` After first boot, SSH in and run the firstboot script:`);
console.log(` ${BOLD}ssh root@<ip> 'curl -sf ${bastionUrl}/asahi/firstboot.sh | bash'${RESET}`);
console.log("");
console.log(` This sets up LVM, detects hostname/MAC, and self-registers.`);
console.log(` Then install k3s:`);
console.log(` ${BOLD}labctl app k3s install <hostname> --role infra${RESET}`);
console.log("");
});
}

View File

@@ -38,7 +38,7 @@ export function registerLabcontrollerCommands(appCmd: Command): void {
lcCmd lcCmd
.command("deploy <target>") .command("deploy <target>")
.description("Deploy labcontroller stack to a k3s node") .description("Deploy labcontroller stack to a k3s node")
.option("--user <user>", "SSH user", "michal") .option("--user <user>", "SSH user", "root")
.option("--crdb-replicas <n>", "CockroachDB replicas", "1") .option("--crdb-replicas <n>", "CockroachDB replicas", "1")
.action(async (target: string, opts: { .action(async (target: string, opts: {
user: string; user: string;
@@ -193,7 +193,7 @@ export function registerLabcontrollerCommands(appCmd: Command): void {
lcCmd lcCmd
.command("status [target]") .command("status [target]")
.description("Check labcontroller deployment status (all hosts if no target)") .description("Check labcontroller deployment status (all hosts if no target)")
.option("--user <user>", "SSH user", "michal") .option("--user <user>", "SSH user", "root")
.action(async (target: string | undefined, opts: { user: string }) => { .action(async (target: string | undefined, opts: { user: string }) => {
const sshKey = findSshKey(); const sshKey = findSshKey();
const sshOpts = sshKey ? { keyPath: sshKey } : {}; const sshOpts = sshKey ? { keyPath: sshKey } : {};

View File

@@ -69,10 +69,10 @@ export function registerListCommand(parent: Command): void {
const hostname = inst?.hostname ?? queued?.hostname ?? "-"; const hostname = inst?.hostname ?? queued?.hostname ?? "-";
const role = inst?.role ?? queued?.role ?? "-"; const role = inst?.role ?? queued?.role ?? "-";
const ip = inst?.ip ?? "-"; const ip = inst?.ip ?? "-";
const cpu = hw?.cpu_model ?? "-"; const cpu = hw?.cpu_model ?? inst?.cpu_model ?? "-";
const cores = hw?.cpu_cores != null ? String(hw.cpu_cores) : "-"; const cores = (hw?.cpu_cores ?? inst?.cpu_cores) != null ? String(hw?.cpu_cores ?? inst?.cpu_cores) : "-";
const ram = hw?.memory_gb != null ? `${hw.memory_gb}GB` : "-"; const ram = (hw?.memory_gb ?? inst?.memory_gb) != null ? `${hw?.memory_gb ?? inst?.memory_gb}GB` : "-";
const product = hw?.product ?? "-"; const product = hw?.product ?? inst?.product ?? "-";
const color = statusColor(status); const color = statusColor(status);

View File

@@ -0,0 +1,94 @@
// CLI command: provision recheck
// SSH into all installed machines, collect hardware info, update bastion state.
import type { Command } from "commander";
import { sshExec } from "@lab/modules";
import { getLabdClient } from "../api/config.js";
const BOLD = "\x1b[1m";
const GREEN = "\x1b[0;32m";
const RED = "\x1b[0;31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
const SSH_OPTS = { timeoutMs: 30_000 };
// Shell script that collects hardware info as JSON.
// Kept simple — no Python, pure shell + awk.
const HW_COLLECT_SCRIPT = [
'P=$(cat /sys/class/dmi/id/product_name 2>/dev/null || echo unknown)',
'B=$(cat /sys/class/dmi/id/board_name 2>/dev/null || echo unknown)',
'S=$(cat /sys/class/dmi/id/product_serial 2>/dev/null || echo unknown)',
'M=$(cat /sys/class/dmi/id/sys_vendor 2>/dev/null || echo unknown)',
'C=$(grep -m1 "model name" /proc/cpuinfo 2>/dev/null | cut -d: -f2 | sed "s/^ //" || grep -m1 Model /proc/cpuinfo 2>/dev/null | cut -d: -f2 | sed "s/^ //" || echo unknown)',
'N=$(grep -c "^processor" /proc/cpuinfo 2>/dev/null || echo 0)',
'R=$(awk "/MemTotal/ {printf \\"%d\\", \\$2/1024/1024}" /proc/meminfo 2>/dev/null || echo 0)',
'A=$(uname -m)',
'printf \'{"product":"%s","board":"%s","serial":"%s","manufacturer":"%s","cpu_model":"%s","cpu_cores":%s,"memory_gb":%s,"arch":"%s"}\\n\' "$P" "$B" "$S" "$M" "$C" "$N" "$R" "$A"',
].join("; ");
export function registerRecheckCommand(parent: Command): void {
parent
.command("recheck")
.description("Refresh hardware info for all installed machines via SSH")
.option("--user <user>", "SSH user", "root")
.option("--target <hostname>", "Only recheck a specific machine (by hostname or MAC)")
.action(async (opts: { user: string; target?: string }) => {
const client = getLabdClient();
let state;
try {
state = await client.getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
// Build list of machines to check
const targets: Array<{ mac: string; hostname: string; ip: string }> = [];
for (const [mac, info] of Object.entries(state.installed)) {
if (!info.ip) continue;
if (opts.target && info.hostname !== opts.target && mac !== opts.target) continue;
targets.push({ mac, hostname: info.hostname, ip: info.ip });
}
if (targets.length === 0) {
console.log("No installed machines with IPs to check.");
return;
}
console.log(`\n${BOLD}Rechecking ${targets.length} machine(s)...${RESET}\n`);
let updated = 0;
let failed = 0;
for (const { mac, hostname, ip } of targets) {
process.stdout.write(` ${hostname.padEnd(24)} ${DIM}(${ip})${RESET} `);
try {
const t0 = Date.now();
const result = await sshExec(ip, opts.user, HW_COLLECT_SCRIPT, SSH_OPTS);
const elapsed = Date.now() - t0;
if (result.exitCode !== 0) {
console.log(`${RED}SSH failed (exit ${result.exitCode}, ${elapsed}ms)${RESET}`);
if (result.stderr) console.log(` ${DIM}${result.stderr.substring(0, 200)}${RESET}`);
console.log(`${RED}SSH failed (exit ${result.exitCode})${RESET}`);
failed++;
continue;
}
const hwData = JSON.parse(result.stdout.trim());
await client.discoverMachine({ mac, ...hwData });
const cpu = hwData.cpu_model || "?";
const cores = hwData.cpu_cores || "?";
const mem = hwData.memory_gb || "?";
console.log(`${GREEN}OK${RESET} ${DIM}${cpu}, ${cores} cores, ${mem}GB${RESET}`);
updated++;
} catch (err) {
console.log(`${RED}FAIL${RESET} ${DIM}${err instanceof Error ? err.message : String(err)}${RESET}`);
failed++;
}
}
console.log(`\n${BOLD}Done:${RESET} ${updated} updated, ${failed} failed\n`);
});
}

View File

@@ -17,8 +17,10 @@ import { registerReprovisionCommand } from "./commands/reprovision.js";
import { registerDebugCommand } from "./commands/debug.js"; import { registerDebugCommand } from "./commands/debug.js";
import { registerForgetCommand } from "./commands/forget.js"; import { registerForgetCommand } from "./commands/forget.js";
import { registerRegisterCommand } from "./commands/register.js"; import { registerRegisterCommand } from "./commands/register.js";
import { registerAsahiCommand } from "./commands/asahi.js";
import { registerLogsCommand } from "./commands/logs.js"; import { registerLogsCommand } from "./commands/logs.js";
import { registerMakeIsoCommand } from "./commands/makeiso.js"; import { registerMakeIsoCommand } from "./commands/makeiso.js";
import { registerRecheckCommand } from "./commands/recheck.js";
import { registerConfigCommand } from "./commands/config.js"; import { registerConfigCommand } from "./commands/config.js";
import { registerLoginCommand } from "./commands/login.js"; import { registerLoginCommand } from "./commands/login.js";
import { registerDoctorCommand } from "./commands/doctor.js"; import { registerDoctorCommand } from "./commands/doctor.js";
@@ -100,8 +102,10 @@ export function createProgram(): Command {
registerDebugCommand(provisionCmd); registerDebugCommand(provisionCmd);
registerForgetCommand(provisionCmd); registerForgetCommand(provisionCmd);
registerRegisterCommand(provisionCmd); registerRegisterCommand(provisionCmd);
registerAsahiCommand(provisionCmd);
registerLogsCommand(provisionCmd); registerLogsCommand(provisionCmd);
registerMakeIsoCommand(provisionCmd); registerMakeIsoCommand(provisionCmd);
registerRecheckCommand(provisionCmd);
// config list/get/set/path // config list/get/set/path
registerConfigCommand(program); registerConfigCommand(program);

View File

@@ -137,7 +137,7 @@ describe("bastion smoke tests", () => {
// Wait for the server to start (look for the banner) // Wait for the server to start (look for the banner)
const startedAt = Date.now(); const startedAt = Date.now();
const maxWait = 10_000; const maxWait = 15_000;
while (Date.now() - startedAt < maxWait) { while (Date.now() - startedAt < maxWait) {
if (stdout.includes("Waiting for PXE boot requests")) break; if (stdout.includes("Waiting for PXE boot requests")) break;
await sleep(200); await sleep(200);

View File

@@ -260,6 +260,37 @@ export function registerBastionRoutes(app: FastifyInstance, db: DbClient): void
} }
}); });
// Update hardware info (discovery data) for a machine
app.post<{
Body: {
mac?: string; product?: string; board?: string; serial?: string;
manufacturer?: string; cpu_model?: string; cpu_cores?: number;
memory_gb?: number; arch?: string;
disks?: Array<{ name: string; size_gb: number; model: string }>;
nics?: Array<{ name: string; mac: string; state: string }>;
};
}>("/api/machines/discover", async (request, reply) => {
const data = request.body ?? {};
const mac = (data.mac ?? "").toLowerCase().replace(/-/g, ":");
if (!mac) {
return reply.code(400).send({ error: "mac is required" });
}
const bastion = bastionRegistry.findBastionByMac(mac);
const target = bastion ?? (bastionRegistry.getAll().length === 1 ? bastionRegistry.getAll()[0] : null);
if (!target) {
return reply.code(503).send({ error: "No bastion found for this MAC" });
}
try {
const result = await sendCommand(target.bastionId, { type: "command-discover", ...data, mac });
return reply.code(result.status === "ok" ? 200 : 500).send(result);
} catch (err) {
return reply.code(500).send({ error: err instanceof Error ? err.message : String(err) });
}
});
// Update role // Update role
app.post<{ app.post<{
Body: { mac?: string; role?: string }; Body: { mac?: string; role?: string };

View File

@@ -1,18 +1,22 @@
// Hardening: Pod Security Standards, certificate check, log rotation. // Hardening: Pod Security Standards, certificate check, journald cap, storage.
import type { OperationContext, OperationResult, OperationGroup } from "../types.js"; import type { OperationContext, OperationResult, OperationGroup } from "../types.js";
import { runSequential } from "../utils.js"; import { runSequential } from "../utils.js";
import { applyPodSecurityStandards } from "../operations/pod-security.js"; import { applyPodSecurityStandards } from "../operations/pod-security.js";
import { checkCertExpiry } from "../operations/cert-check.js"; import { checkCertExpiry } from "../operations/cert-check.js";
import { configureLogRotation } from "../operations/log-rotation.js"; import { configureLogRotation } from "../operations/log-rotation.js";
import { configureJournaldLimits } from "../operations/journald-limits.js";
import { configureLonghornDisk } from "../operations/longhorn-disk.js";
export const hardeningGroup: OperationGroup = { export const hardeningGroup: OperationGroup = {
name: "hardening", name: "hardening",
description: "Pod security, certificate check, log rotation", description: "Pod security, certificate check, journald cap, storage",
operations: [ operations: [
{ name: "Apply Pod Security Standards", fn: applyPodSecurityStandards }, { name: "Apply Pod Security Standards", fn: applyPodSecurityStandards },
{ name: "Check certificate expiry", fn: checkCertExpiry }, { name: "Check certificate expiry", fn: checkCertExpiry },
{ name: "Configure log rotation", fn: configureLogRotation }, { name: "Decommission file-based audit logs", fn: configureLogRotation },
{ name: "Configure journald disk cap", fn: configureJournaldLimits },
{ name: "Configure Longhorn disk", fn: configureLonghornDisk },
], ],
}; };

View File

@@ -7,16 +7,18 @@ import { applyCisHardening } from "../operations/sysctl.js";
import { disableSwap } from "../operations/swap.js"; import { disableSwap } from "../operations/swap.js";
import { disableFirewall } from "../operations/firewall.js"; import { disableFirewall } from "../operations/firewall.js";
import { setSelinuxPermissive } from "../operations/selinux.js"; import { setSelinuxPermissive } from "../operations/selinux.js";
import { enableIscsi } from "../operations/iscsi.js";
export const hostPrepGroup: OperationGroup = { export const hostPrepGroup: OperationGroup = {
name: "host-prep", name: "host-prep",
description: "Prepare host for k3s: kernel modules, sysctl, swap, firewall, SELinux", description: "Prepare host for k3s: kernel modules, sysctl, swap, firewall, SELinux, iSCSI",
operations: [ operations: [
{ name: "Load kernel modules", fn: loadKernelModules }, { name: "Load kernel modules", fn: loadKernelModules },
{ name: "Apply CIS sysctl", fn: applyCisHardening }, { name: "Apply CIS sysctl", fn: applyCisHardening },
{ name: "Disable swap", fn: disableSwap }, { name: "Disable swap", fn: disableSwap },
{ name: "Disable firewall", fn: disableFirewall }, { name: "Disable firewall", fn: disableFirewall },
{ name: "Set SELinux permissive", fn: setSelinuxPermissive }, { name: "Set SELinux permissive", fn: setSelinuxPermissive },
{ name: "Enable iSCSI", fn: enableIscsi },
], ],
}; };

View File

@@ -76,7 +76,6 @@ sed -i 's/^SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config 2>/dev/nul
# ── 5b. Create k3s config directory ── # ── 5b. Create k3s config directory ──
echo "[5/10] Writing k3s server configuration..." echo "[5/10] Writing k3s server configuration..."
mkdir -p /etc/rancher/k3s mkdir -p /etc/rancher/k3s
mkdir -p /var/log/kubernetes
cat > /etc/rancher/k3s/config.yaml << 'K3S_CONFIG' cat > /etc/rancher/k3s/config.yaml << 'K3S_CONFIG'
# k3s server configuration — CIS hardened # k3s server configuration — CIS hardened
@@ -91,13 +90,10 @@ disable:
- servicelb - servicelb
- traefik - traefik
# API server hardening # API server hardening (audit-log-path=- routes audit to journald via stdout)
kube-apiserver-arg: kube-apiserver-arg:
- "anonymous-auth=false" - "anonymous-auth=false"
- "audit-log-path=/var/log/kubernetes/audit.log" - "audit-log-path=-"
- "audit-log-maxage=30"
- "audit-log-maxbackup=10"
- "audit-log-maxsize=100"
- "audit-policy-file=/etc/rancher/k3s/audit-policy.yaml" - "audit-policy-file=/etc/rancher/k3s/audit-policy.yaml"
- "enable-admission-plugins=NodeRestriction,PodSecurity" - "enable-admission-plugins=NodeRestriction,PodSecurity"
- "request-timeout=300s" - "request-timeout=300s"

View File

@@ -78,9 +78,10 @@ export class K3sModule implements Module {
return toModuleResult("install", [...prepResults, ...k3sResults], start); return toModuleResult("install", [...prepResults, ...k3sResults], start);
} }
// Phase 3: Networking (server only — agents don't install Cilium) // Phase 3: Networking (initial server only — joining servers get Cilium via daemonset)
let netResults: OperationResult[] = []; let netResults: OperationResult[] = [];
if (isServer) { const isJoiningServer = isServer && !!opCtx.config.k3sServerUrl;
if (isServer && !isJoiningServer) {
netResults = await runNetworking(opCtx); netResults = await runNetworking(opCtx);
} }

View File

@@ -0,0 +1,194 @@
// Recover a broken etcd member by removing it from the cluster, wiping its
// local state, and restarting k3s so it rejoins as a fresh member.
//
// Use case: a node panics on startup with
// "tocommit(N+1) is out of range [lastIndex(N)]. Was the raft log corrupted,
// truncated, or lost?"
// This means the local raft WAL is missing the last entry the leader thinks
// the follower acknowledged (lost write, unclean shutdown, etc). The fix is
// always the same and well-documented; this codifies it so we don't fumble
// the procedure under pressure.
//
// Preconditions:
// - At least one healthy peer is reachable so the cluster has quorum after
// we remove the broken member. (For a 3-node cluster: 2 healthy. For a
// 5-node: 3 healthy.) If quorum would be lost, this function refuses.
// - SSH access to both the broken node and a healthy peer.
// - etcdctl available on the healthy peer (k3s does not bundle it; the
// procedure installs it on demand on Fedora).
import type { SshClient } from "../types.js";
const ETCD_TLS = {
ca: "/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt",
cert: "/var/lib/rancher/k3s/server/tls/etcd/server-client.crt",
key: "/var/lib/rancher/k3s/server/tls/etcd/server-client.key",
} as const;
const SSH_TIMEOUT = 60_000;
export interface RecoverEtcdMemberOptions {
/** SSH client for the broken node (the one panicking). */
broken: SshClient;
/** SSH client for any healthy server peer in the same cluster. */
peer: SshClient;
/** Hostname (k8s node name) of the broken node. Used to find its etcd member id. */
brokenHostname: string;
/** Logger for progress output. */
log?: (msg: string) => void;
}
export interface RecoverEtcdMemberResult {
success: boolean;
changed: boolean;
message: string;
/** New etcd member id assigned after rejoin (when known). */
newMemberId?: string;
/** Old etcd member id that was removed. */
removedMemberId?: string;
error?: string;
}
function etcdctl(subcmd: string): string {
return [
"ETCDCTL_API=3 etcdctl",
`--cacert=${ETCD_TLS.ca}`,
`--cert=${ETCD_TLS.cert}`,
`--key=${ETCD_TLS.key}`,
"--endpoints=https://127.0.0.1:2379",
"--command-timeout=10s",
subcmd,
].join(" ");
}
async function ensureEtcdctl(peer: SshClient): Promise<void> {
const probe = await peer.exec("command -v etcdctl 2>/dev/null", { timeoutMs: 5_000 });
if (probe.exitCode === 0 && probe.stdout.trim()) return;
// Best-effort install on Fedora. If the host isn't dnf-based, surface the
// error to the caller via the next etcdctl invocation.
await peer.exec("dnf install -y etcd 2>&1", { timeoutMs: 120_000 });
}
async function getMemberList(peer: SshClient): Promise<Array<{ id: string; name: string }>> {
const result = await peer.exec(etcdctl("member list"), { timeoutMs: SSH_TIMEOUT });
if (result.exitCode !== 0) {
throw new Error(`etcdctl member list failed: ${result.stderr || result.stdout}`);
}
// Format: <hex-id>, started, <name>, <peer-urls>, <client-urls>, <isLearner>
return result.stdout
.split("\n")
.map((line) => line.trim())
.filter(Boolean)
.map((line) => {
const [id, , name] = line.split(",").map((p) => p.trim());
return { id: id ?? "", name: name ?? "" };
})
.filter((m) => m.id);
}
export async function recoverEtcdMember(
opts: RecoverEtcdMemberOptions,
): Promise<RecoverEtcdMemberResult> {
const log = opts.log ?? (() => {});
try {
log(`Looking up etcd member id for ${opts.brokenHostname} via peer...`);
await ensureEtcdctl(opts.peer);
const members = await getMemberList(opts.peer);
if (members.length < 3) {
return {
success: false,
changed: false,
message: "Refusing to remove a member from a cluster with <3 members (quorum would be lost)",
error: `member count = ${members.length}`,
};
}
// Member names are <hostname>-<random-suffix>; match by hostname prefix.
const broken = members.find((m) => m.name.startsWith(opts.brokenHostname));
if (!broken) {
return {
success: false,
changed: false,
message: `No etcd member found matching hostname ${opts.brokenHostname}`,
error: `members: ${members.map((m) => m.name).join(", ")}`,
};
}
log(`Broken member: ${broken.id} (${broken.name})`);
log("Step 1/4: stopping k3s on broken node");
await opts.broken.exec("systemctl stop k3s 2>&1", { timeoutMs: SSH_TIMEOUT });
log("Step 2/4: removing broken etcd member from cluster");
const remove = await opts.peer.exec(
etcdctl(`member remove ${broken.id}`),
{ timeoutMs: SSH_TIMEOUT },
);
if (remove.exitCode !== 0) {
return {
success: false,
changed: false,
message: "etcdctl member remove failed",
error: remove.stderr || remove.stdout,
removedMemberId: broken.id,
};
}
log("Step 3/4: archiving corrupt etcd state and stale TLS/cred dirs on broken node");
const ts = Math.floor(Date.now() / 1000);
await opts.broken.exec(
[
`mv /var/lib/rancher/k3s/server/db /var/lib/rancher/k3s/server/db.corrupt-${ts} 2>/dev/null || true`,
"rm -rf /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/cred",
].join(" && "),
{ timeoutMs: SSH_TIMEOUT },
);
log("Step 4/4: starting k3s on broken node — it will rejoin");
await opts.broken.exec("systemctl start k3s 2>&1", { timeoutMs: SSH_TIMEOUT });
// Poll for rejoin. The new member-id is what the cluster assigns on join.
let newMemberId: string | undefined;
for (let i = 0; i < 60; i++) {
await new Promise((r) => setTimeout(r, 5_000));
try {
const after = await getMemberList(opts.peer);
const rejoined = after.find(
(m) => m.name.startsWith(opts.brokenHostname) && m.id !== broken.id,
);
if (rejoined) {
newMemberId = rejoined.id;
break;
}
} catch {
// peer may briefly be unreachable mid-rejoin — keep polling
}
}
if (!newMemberId) {
return {
success: false,
changed: true,
message: "k3s started but new member did not appear in cluster within 5 minutes",
removedMemberId: broken.id,
};
}
log(`Rejoined as ${newMemberId}`);
return {
success: true,
changed: true,
message: `Recovered: removed ${broken.id}, rejoined as ${newMemberId}`,
removedMemberId: broken.id,
newMemberId,
};
} catch (err) {
return {
success: false,
changed: false,
message: "Recovery failed",
error: err instanceof Error ? err.message : String(err),
};
}
}

View File

@@ -1,6 +1,7 @@
export { loadKernelModules } from "./kernel-modules.js"; export { loadKernelModules } from "./kernel-modules.js";
export { applyCisHardening } from "./sysctl.js"; export { applyCisHardening } from "./sysctl.js";
export { disableSwap } from "./swap.js"; export { disableSwap } from "./swap.js";
export { enableIscsi } from "./iscsi.js";
export { disableFirewall } from "./firewall.js"; export { disableFirewall } from "./firewall.js";
export { setSelinuxPermissive } from "./selinux.js"; export { setSelinuxPermissive } from "./selinux.js";
export { writeK3sConfig } from "./k3s-config.js"; export { writeK3sConfig } from "./k3s-config.js";
@@ -10,6 +11,13 @@ export { installK3sBinary } from "./k3s-install.js";
export { installCilium } from "./cilium.js"; export { installCilium } from "./cilium.js";
export { fixCoreDnsUpstream } from "./dns-fix.js"; export { fixCoreDnsUpstream } from "./dns-fix.js";
export { configureLogRotation } from "./log-rotation.js"; export { configureLogRotation } from "./log-rotation.js";
export { configureJournaldLimits } from "./journald-limits.js";
export { applyDefaultNetworkPolicies } from "./network-policy.js"; export { applyDefaultNetworkPolicies } from "./network-policy.js";
export { applyPodSecurityStandards } from "./pod-security.js"; export { applyPodSecurityStandards } from "./pod-security.js";
export { checkCertExpiry } from "./cert-check.js"; export { checkCertExpiry } from "./cert-check.js";
export { configureLonghornDisk } from "./longhorn-disk.js";
export { recoverEtcdMember } from "./etcd-recover.js";
export type {
RecoverEtcdMemberOptions,
RecoverEtcdMemberResult,
} from "./etcd-recover.js";

View File

@@ -0,0 +1,31 @@
// Install and enable iSCSI initiator (required by Longhorn storage).
// Fedora: iscsi-initiator-utils, Ubuntu: open-iscsi
import type { Operation, OperationResult } from "../types.js";
import { sshOpts } from "../utils.js";
export const enableIscsi: Operation = async (ctx): Promise<OperationResult> => {
// Check if iscsid is already running
const check = await ctx.ssh.exec("systemctl is-active iscsid 2>/dev/null", sshOpts(ctx));
if (check.stdout.trim() === "active") {
return { success: true, changed: false, message: "iSCSI already active" };
}
// Install the package (detect distro)
const osRelease = await ctx.ssh.exec("cat /etc/os-release", sshOpts(ctx));
const osLower = osRelease.stdout.toLowerCase();
const isFedora = osLower.includes("fedora") || osLower.includes("rhel") || osLower.includes("centos");
const pkg = isFedora ? "iscsi-initiator-utils" : "open-iscsi";
const installCmd = isFedora ? `sudo dnf install -y ${pkg}` : `sudo apt-get install -y ${pkg}`;
const install = await ctx.ssh.exec(installCmd, { timeoutMs: 120_000 });
if (install.exitCode !== 0) {
return { success: false, changed: false, message: `Failed to install ${pkg}`, error: install.stderr.trim() };
}
// Enable and start
await ctx.ssh.exec("sudo systemctl enable --now iscsid", sshOpts(ctx));
return { success: true, changed: true, message: `Installed ${pkg} and enabled iscsid` };
};

View File

@@ -0,0 +1,33 @@
// Cap journald disk usage so audit logs (which now flow through journald via
// kube-apiserver's stdout) cannot fill /var/log. Default journald uses up to
// 10% of the filesystem, capped at 4 GB. In a /var/log of ~10 GB shared with
// other services, that's still room for audit volume to evict useful logs.
// 2 GB / 200 MB-per-file is a comfortable middle.
import type { Operation, OperationResult } from "../types.js";
import { sshOpts, writeRemoteFile } from "../utils.js";
const DROPIN_CONTENT = `[Journal]
SystemMaxUse=2G
SystemKeepFree=1G
SystemMaxFileSize=200M
`;
const DROPIN_PATH = "/etc/systemd/journald.conf.d/10-k3s-audit-cap.conf";
export const configureJournaldLimits: Operation = async (ctx): Promise<OperationResult> => {
const changed = await writeRemoteFile(ctx, DROPIN_PATH, DROPIN_CONTENT);
if (changed) {
// Reload journald so the new limit applies without a reboot.
await ctx.ssh.exec(
"systemctl kill --signal=SIGUSR2 systemd-journald 2>/dev/null; " +
"systemctl restart systemd-journald 2>&1 || true",
sshOpts(ctx),
);
}
return {
success: true,
changed,
message: changed ? "journald limits configured (2 GB cap)" : "journald limits already configured",
};
};

View File

@@ -9,7 +9,18 @@ function isServerRole(role: string): boolean {
function generateServerConfig(config: K3sConfig): string { function generateServerConfig(config: K3sConfig): string {
const tlsSans = [config.hostname, config.ip, ...(config.tlsSans ?? [])]; const tlsSans = [config.hostname, config.ip, ...(config.tlsSans ?? [])];
return `# k3s server configuration — CIS hardened const isJoining = !!config.k3sServerUrl;
const clusterLines = isJoining
? `server: "${config.k3sServerUrl}"\ntoken: "${config.k3sToken}"`
: "cluster-init: true";
// audit-log-path=- routes audit events to k3s.service's stdout, which systemd
// forwards to journald. journald enforces its own size caps (see
// configureJournaldLimits) so audit volume cannot fill the disk. File-based
// audit logs led to /var/log/kubernetes growing to 7+ GB because apiserver's
// own rotation produced files that any logrotate glob would double-rotate
// and never expire.
return `# k3s server configuration — CIS hardened, etcd HA
${clusterLines}
protect-kernel-defaults: true protect-kernel-defaults: true
secrets-encryption: true secrets-encryption: true
write-kubeconfig-mode: "0640" write-kubeconfig-mode: "0640"
@@ -20,12 +31,12 @@ disable:
- servicelb - servicelb
- traefik - traefik
node-label:
- "node.longhorn.io/create-default-disk=config"
kube-apiserver-arg: kube-apiserver-arg:
- "anonymous-auth=false" - "anonymous-auth=false"
- "audit-log-path=/var/log/kubernetes/audit.log" - "audit-log-path=-"
- "audit-log-maxage=30"
- "audit-log-maxbackup=10"
- "audit-log-maxsize=100"
- "audit-policy-file=/etc/rancher/k3s/audit-policy.yaml" - "audit-policy-file=/etc/rancher/k3s/audit-policy.yaml"
- "enable-admission-plugins=NodeRestriction,PodSecurity" - "enable-admission-plugins=NodeRestriction,PodSecurity"
- "request-timeout=300s" - "request-timeout=300s"
@@ -44,6 +55,7 @@ function generateAgentConfig(): string {
return `protect-kernel-defaults: true return `protect-kernel-defaults: true
node-label: node-label:
- "node-role.kubernetes.io/worker=true" - "node-role.kubernetes.io/worker=true"
- "node.longhorn.io/create-default-disk=config"
kubelet-arg: kubelet-arg:
- "protect-kernel-defaults=true" - "protect-kernel-defaults=true"
- "streaming-connection-idle-timeout=5m" - "streaming-connection-idle-timeout=5m"
@@ -52,7 +64,7 @@ kubelet-arg:
} }
export const writeK3sConfig: Operation = async (ctx): Promise<OperationResult> => { export const writeK3sConfig: Operation = async (ctx): Promise<OperationResult> => {
await ctx.ssh.exec("mkdir -p /etc/rancher/k3s /var/log/kubernetes", sshOpts(ctx)); await ctx.ssh.exec("mkdir -p /etc/rancher/k3s", sshOpts(ctx));
const content = isServerRole(ctx.config.role) const content = isServerRole(ctx.config.role)
? generateServerConfig(ctx.config) ? generateServerConfig(ctx.config)

View File

@@ -15,8 +15,21 @@ export const installK3sBinary: Operation = async (ctx): Promise<OperationResult>
const alreadyInstalled = version.exitCode === 0; const alreadyInstalled = version.exitCode === 0;
if (isServer) { if (isServer) {
// Clean stale server state when joining an existing cluster
// (TLS certs from a previous run cause "newer than datastore" fatal error)
if (ctx.config.k3sServerUrl && ctx.config.k3sToken) {
await ctx.ssh.exec(
"rm -rf /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/cred /var/lib/rancher/k3s/server/db",
sshOpts(ctx),
);
}
// If joining an existing cluster, pass K3S_URL and K3S_TOKEN
const joinEnv = ctx.config.k3sServerUrl && ctx.config.k3sToken
? `K3S_URL="${ctx.config.k3sServerUrl}" K3S_TOKEN="${ctx.config.k3sToken}"`
: "";
const result = await ctx.ssh.exec( const result = await ctx.ssh.exec(
'curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server" INSTALL_K3S_SKIP_SELINUX_RPM=true sh -', `curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server" INSTALL_K3S_SKIP_SELINUX_RPM=true ${joinEnv} sh -`,
{ timeoutMs: 300_000 }, { timeoutMs: 300_000 },
); );
if (result.exitCode !== 0) { if (result.exitCode !== 0) {

View File

@@ -1,25 +1,44 @@
// Configure log rotation for k3s. // Decommission file-based k8s audit logging in favor of journald.
//
// Earlier versions wrote audit events to /var/log/kubernetes/audit.log and
// rotated them with a logrotate rule. Two failure modes followed: kube-apiserver
// rotated internally (audit-{ts}.log), the *.log glob in logrotate
// double-rotated those (-{date}), and the resulting filename matched no
// retention policy, so the directory grew unbounded (we observed 7+ GB).
//
// k3s now sets audit-log-path=- so audit goes to stdout → journald, which
// enforces SystemMaxUse caps. This operation removes the obsolete logrotate
// rule and reaps any audit files left behind by the old setup. Idempotent: on
// fresh installs everything is already absent and the operation is a no-op.
import type { Operation, OperationResult } from "../types.js"; import type { Operation, OperationResult } from "../types.js";
import { writeRemoteFile } from "../utils.js"; import { sshOpts } from "../utils.js";
const LOGROTATE_CONFIG = `/var/log/kubernetes/*.log { const REMOVE_LOGROTATE = "rm -f /etc/logrotate.d/k3s";
daily
rotate 14 // Bounded by a max-depth and explicit name pattern so we never reach outside
compress // the deprecated audit-log directory.
delaycompress const REAP_OLD_AUDIT_FILES =
missingok "find /var/log/kubernetes -maxdepth 1 -type f " +
notifempty "\\( -name 'audit*.log*' -o -name 'audit-*.log' \\) " +
copytruncate "-delete 2>/dev/null; " +
maxsize 100M "rmdir /var/log/kubernetes 2>/dev/null; true";
}`;
export const configureLogRotation: Operation = async (ctx): Promise<OperationResult> => { export const configureLogRotation: Operation = async (ctx): Promise<OperationResult> => {
const changed = await writeRemoteFile(ctx, "/etc/logrotate.d/k3s", LOGROTATE_CONFIG); const before = await ctx.ssh.exec(
"test -e /etc/logrotate.d/k3s -o -d /var/log/kubernetes && echo present || echo absent",
sshOpts(ctx),
);
const wasPresent = before.stdout.trim() === "present";
await ctx.ssh.exec(REMOVE_LOGROTATE, sshOpts(ctx));
await ctx.ssh.exec(REAP_OLD_AUDIT_FILES, sshOpts(ctx));
return { return {
success: true, success: true,
changed, changed: wasPresent,
message: changed ? "Log rotation configured" : "Log rotation already configured", message: wasPresent
? "Removed legacy file-based audit logging (now via journald)"
: "No legacy audit log artifacts present",
}; };
}; };

View File

@@ -0,0 +1,50 @@
// Annotate nodes with Longhorn default disk config when /var/lib/longhorn exists.
// The label is set in k3s config (node-label), but the annotation must be applied via kubectl.
import type { Operation, OperationResult } from "../types.js";
import { sshOpts } from "../utils.js";
import { sshExec as remoteSshExec } from "../../../../src/ssh.js";
export const configureLonghornDisk: Operation = async (ctx): Promise<OperationResult> => {
// Check if /var/lib/longhorn exists on this node
const check = await ctx.ssh.exec("test -d /var/lib/longhorn && echo yes || echo no", sshOpts(ctx));
if (check.stdout.trim() !== "yes") {
return { success: true, changed: false, message: "No /var/lib/longhorn directory — skipping Longhorn disk config" };
}
// Find the node name (hostname as registered in k3s)
const nodeNameResult = await ctx.ssh.exec("hostname -f 2>/dev/null || hostname", sshOpts(ctx));
const nodeName = nodeNameResult.stdout.trim();
const annotation = JSON.stringify([{ path: "/var/lib/longhorn", allowScheduling: true }]);
// Try kubectl locally first (works on server nodes)
const result = await ctx.ssh.exec(
`k3s kubectl annotate node "${nodeName}" "node.longhorn.io/default-disks-config=${annotation}" --overwrite 2>&1 || true`,
sshOpts(ctx),
);
if (result.stdout.includes("annotated") || result.stdout.includes("unchanged")) {
return { success: true, changed: true, message: `Longhorn disk annotation applied to ${nodeName}` };
}
// For worker/agent nodes without local kubectl: apply via the server
if (ctx.config.k3sServerUrl) {
// The CLI has SSH access to the server — use sshExec from there
const serverHost = new URL(ctx.config.k3sServerUrl).hostname;
try {
const remoteResult = await remoteSshExec(
serverHost, "root",
`k3s kubectl annotate node "${nodeName}" "node.longhorn.io/default-disks-config=${annotation}" --overwrite`,
{ ...(ctx.ssh.keyPath ? { keyPath: ctx.ssh.keyPath } : {}), timeoutMs: 15_000 },
);
if (remoteResult.stdout.includes("annotated") || remoteResult.stdout.includes("unchanged")) {
return { success: true, changed: true, message: `Longhorn disk annotation applied to ${nodeName} (via server)` };
}
} catch {
// Fall through to manual instruction
}
}
return { success: true, changed: false, message: "Longhorn disk label set (annotation requires server kubectl)" };
};

View File

@@ -71,9 +71,14 @@ describe("k3s install script — server role", () => {
expect(script).toContain("enable-admission-plugins=NodeRestriction,PodSecurity"); expect(script).toContain("enable-admission-plugins=NodeRestriction,PodSecurity");
}); });
it("configures audit logging", () => { it("configures audit logging via journald (stdout)", () => {
expect(script).toContain("audit-log-path=/var/log/kubernetes/audit.log"); expect(script).toContain("audit-log-path=-");
expect(script).toContain("audit-log-maxage=30"); // file-based fields and the now-obsolete log directory must be gone
expect(script).not.toContain("/var/log/kubernetes/audit.log");
expect(script).not.toContain("audit-log-maxage");
expect(script).not.toContain("audit-log-maxbackup");
expect(script).not.toContain("audit-log-maxsize");
expect(script).not.toContain("mkdir -p /var/log/kubernetes");
}); });
it("cleans stale flannel vxlan before Cilium install", () => { it("cleans stale flannel vxlan before Cilium install", () => {

View File

@@ -348,3 +348,143 @@ describe("applyPodSecurityStandards", () => {
expectCommand(ctx.ssh, "pod-security.kubernetes.io/audit=restricted"); expectCommand(ctx.ssh, "pod-security.kubernetes.io/audit=restricted");
}); });
}); });
// --- Audit Logging Decommission (file-based → journald) ---
import { configureLogRotation } from "../src/operations/log-rotation.js";
import { configureJournaldLimits } from "../src/operations/journald-limits.js";
describe("configureLogRotation (decommission file-based audit logs)", () => {
it("removes the legacy logrotate rule and reaps obsolete audit files", async () => {
const ctx = mockCtx();
ctx.ssh.exec.mockResolvedValueOnce(stdout("present")); // probe: legacy artifacts exist
ctx.ssh.exec.mockResolvedValue(OK);
const result = await configureLogRotation(ctx);
expect(result.success).toBe(true);
expect(result.changed).toBe(true);
expectCommand(ctx.ssh, "rm -f /etc/logrotate.d/k3s");
expectCommand(ctx.ssh, /find \/var\/log\/kubernetes.*audit.*-delete/);
expectCommand(ctx.ssh, "rmdir /var/log/kubernetes");
});
it("is a no-op when nothing legacy is present", async () => {
const ctx = mockCtx();
ctx.ssh.exec.mockResolvedValueOnce(stdout("absent"));
ctx.ssh.exec.mockResolvedValue(OK);
const result = await configureLogRotation(ctx);
expect(result.success).toBe(true);
expect(result.changed).toBe(false);
});
});
describe("configureJournaldLimits", () => {
it("writes a 2 GB SystemMaxUse drop-in and reloads journald when changed", async () => {
const ctx = mockCtx();
ctx.ssh.exec.mockResolvedValueOnce(stdout("__LABCTL_NOT_FOUND__")); // no existing drop-in
ctx.ssh.exec.mockResolvedValue(OK);
const result = await configureJournaldLimits(ctx);
expect(result.success).toBe(true);
expect(result.changed).toBe(true);
const writeCall = ctx.ssh.exec.mock.calls.find((c) => {
const cmd = c[0] as string;
return cmd.includes("10-k3s-audit-cap.conf") && cmd.includes("LABCTL_EOF");
});
expect(writeCall).toBeTruthy();
const written = writeCall?.[0] as string;
expect(written).toContain("SystemMaxUse=2G");
expect(written).toContain("SystemKeepFree=1G");
expectCommand(ctx.ssh, "systemctl restart systemd-journald");
});
it("does not restart journald when the drop-in is already correct", async () => {
const ctx = mockCtx();
const existing =
"[Journal]\nSystemMaxUse=2G\nSystemKeepFree=1G\nSystemMaxFileSize=200M\n";
ctx.ssh.exec.mockResolvedValueOnce(stdout(existing));
ctx.ssh.exec.mockResolvedValue(OK);
const result = await configureJournaldLimits(ctx);
expect(result.success).toBe(true);
expect(result.changed).toBe(false);
expectNoCommand(ctx.ssh, "systemctl restart systemd-journald");
});
});
// --- Etcd Recovery ---
import { recoverEtcdMember } from "../src/operations/etcd-recover.js";
import { mockSsh } from "./helpers.js";
describe("recoverEtcdMember", () => {
it("refuses to operate when cluster is below 3 members (quorum risk)", async () => {
const broken = mockSsh();
const peer = mockSsh();
peer.exec.mockResolvedValueOnce(stdout("/usr/bin/etcdctl")); // etcdctl present
peer.exec.mockResolvedValueOnce(stdout(
"111, started, host-a-aaa, https://10.0.0.1:2380, https://10.0.0.1:2379, false\n" +
"222, started, host-b-bbb, https://10.0.0.2:2380, https://10.0.0.2:2379, false",
));
const result = await recoverEtcdMember({ broken, peer, brokenHostname: "host-b" });
expect(result.success).toBe(false);
expect(result.message).toMatch(/quorum/i);
// Critically: must NOT have stopped k3s or removed anything
expect(broken.exec).not.toHaveBeenCalledWith(expect.stringContaining("systemctl stop k3s"), expect.anything());
});
it("performs full procedure when quorum is preserved", async () => {
const broken = mockSsh();
const peer = mockSsh();
// ensureEtcdctl: present
peer.exec.mockResolvedValueOnce(stdout("/usr/bin/etcdctl"));
// member list (3 members, target = host-b)
peer.exec.mockResolvedValueOnce(stdout(
"111, started, host-a-aaa, https://10.0.0.1:2380, https://10.0.0.1:2379, false\n" +
"222, started, host-b-bbb, https://10.0.0.2:2380, https://10.0.0.2:2379, false\n" +
"333, started, host-c-ccc, https://10.0.0.3:2380, https://10.0.0.3:2379, false",
));
// member remove
peer.exec.mockResolvedValueOnce(stdout("Member 222 removed"));
// post-rejoin member list — new id 444 for host-b
peer.exec.mockResolvedValueOnce(stdout(
"111, started, host-a-aaa, https://10.0.0.1:2380, https://10.0.0.1:2379, false\n" +
"333, started, host-c-ccc, https://10.0.0.3:2380, https://10.0.0.3:2379, false\n" +
"444, started, host-b-zzz, https://10.0.0.2:2380, https://10.0.0.2:2379, false",
));
const result = await recoverEtcdMember({ broken, peer, brokenHostname: "host-b" });
expect(result.success).toBe(true);
expect(result.removedMemberId).toBe("222");
expect(result.newMemberId).toBe("444");
expectCommand(broken,"systemctl stop k3s");
expectCommand(peer,"member remove 222");
expectCommand(broken,/db\.corrupt-/);
expectCommand(broken,/rm -rf .*\/server\/tls/);
expectCommand(broken,"systemctl start k3s");
});
it("fails clearly when no member matches the broken hostname", async () => {
const broken = mockSsh();
const peer = mockSsh();
peer.exec.mockResolvedValueOnce(stdout("/usr/bin/etcdctl"));
peer.exec.mockResolvedValueOnce(stdout(
"111, started, host-a-aaa, https://10.0.0.1:2380, https://10.0.0.1:2379, false\n" +
"222, started, host-b-bbb, https://10.0.0.2:2380, https://10.0.0.2:2379, false\n" +
"333, started, host-c-ccc, https://10.0.0.3:2380, https://10.0.0.3:2379, false",
));
const result = await recoverEtcdMember({ broken, peer, brokenHostname: "host-d" });
expect(result.success).toBe(false);
expect(result.message).toMatch(/No etcd member found/);
expect(broken.exec).not.toHaveBeenCalledWith(expect.stringContaining("systemctl stop k3s"), expect.anything());
});
});

View File

@@ -113,6 +113,7 @@ export type LabdBastionMessage =
| { type: "command-role-update"; requestId: string; mac: string; role: string } | { type: "command-role-update"; requestId: string; mac: string; role: string }
| { type: "command-debug"; requestId: string; mac: string; pxeBoot?: boolean } | { type: "command-debug"; requestId: string; mac: string; pxeBoot?: boolean }
| { type: "command-register"; requestId: string; mac: string; hostname: string; role: string; ip: string } | { type: "command-register"; requestId: string; mac: string; hostname: string; role: string; ip: string }
| { type: "command-discover"; requestId: string; mac: string; product?: string; board?: string; serial?: string; manufacturer?: string; cpu_model?: string; cpu_cores?: number; memory_gb?: number; arch?: string; disks?: Array<{ name: string; size_gb: number; model: string }>; nics?: Array<{ name: string; mac: string; state: string }> }
| { type: "server-shutdown"; reconnectAfter: number }; | { type: "server-shutdown"; reconnectAfter: number };
export type BastionMessageType = BastionMessage["type"]; export type BastionMessageType = BastionMessage["type"];
@@ -127,7 +128,7 @@ const BASTION_MESSAGE_TYPES = new Set<string>([
const LABD_BASTION_MESSAGE_TYPES = new Set<string>([ const LABD_BASTION_MESSAGE_TYPES = new Set<string>([
"bastion-enrolled", "bastion-heartbeat-ack", "command-install", "bastion-enrolled", "bastion-heartbeat-ack", "command-install",
"command-forget", "command-role-update", "command-debug", "command-register", "server-shutdown", "command-forget", "command-role-update", "command-debug", "command-register", "command-discover", "server-shutdown",
]); ]);
export function isBastionMessage(msg: unknown): msg is BastionMessage { export function isBastionMessage(msg: unknown): msg is BastionMessage {

View File

@@ -96,6 +96,13 @@ export interface InstalledInfo {
ip: string; ip: string;
installed_at: string; installed_at: string;
bastionId?: string; // set when aggregated through labd bastionId?: string; // set when aggregated through labd
// Hardware info (copied from discovered on install completion)
product?: string;
manufacturer?: string;
cpu_model?: string;
cpu_cores?: number;
memory_gb?: number;
arch?: string;
} }
export interface DebugConfig { export interface DebugConfig {

View File

@@ -0,0 +1,355 @@
// Integration test: Asahi first-boot LVM setup.
//
// Tests the first-boot script that creates the standard lab LVM layout
// on a separate data disk — simulating the Asahi provisioning flow where
// the root partition is pre-installed and a data partition is left for LVM.
//
// Uses a Fedora cloud VM with two disks:
// disk0: 20GB root (Fedora cloud image)
// disk1: 200GB empty (simulates the Asahi "Data" partition)
//
// The firstboot script should detect disk1, create labvg + LVs, mount them.
// Then we test reprovision: wipe marker, re-run, verify existing VG reused.
//
// Prerequisites: libvirt, virsh, virt-install, qemu, sudo access, lvm2
// Run: sudo pnpm run test:integration:asahi
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { readFileSync, existsSync } from "node:fs";
import { execSync } from "node:child_process";
import { join } from "node:path";
import { homedir } from "node:os";
import { destroyVm, waitForVmIp, waitForSsh, log, ensureCloudImage, createCloudInitIso } from "./helpers/libvirt.js";
import { ensureTestNetwork, TEST_NETWORK_NAME } from "./helpers/network.js";
import { sshExec, sshRun } from "./helpers/ssh.js";
import { renderFirstbootScript } from "../../src/bastion/src/templates/asahi-firstboot.sh.js";
const VM_NAME = "lab-asahi-firstboot-test";
const VM_MEMORY = 4096;
const VM_VCPUS = 2;
const VM_ROOT_DISK_GB = 20;
const VM_DATA_DISK_GB = 200; // Simulates the Asahi "Data" partition
const SSH_USER = "fedora";
const IMAGE_DIR = "/var/lib/libvirt/images";
const IS_ROOT = process.getuid?.() === 0;
const FEDORA_CLOUD_IMAGE = "https://download.fedoraproject.org/pub/fedora/linux/releases/43/Cloud/x86_64/images/Fedora-Cloud-Base-Generic-43-1.6.x86_64.qcow2";
function run(cmd: string, opts?: { timeout?: number }): string {
const full = IS_ROOT ? cmd : `sudo ${cmd}`;
return execSync(full, { encoding: "utf-8", stdio: "pipe", timeout: opts?.timeout ?? 60_000 });
}
function findSshKey(): { pubKey: string; keyPath: string } {
const homes = [homedir()];
const sudoUser = process.env["SUDO_USER"];
if (sudoUser) homes.push(join("/home", sudoUser));
if (process.env["SSH_KEY_PATH"]) {
const keyPath = process.env["SSH_KEY_PATH"];
const pubPath = `${keyPath}.pub`;
if (existsSync(keyPath) && existsSync(pubPath)) {
return { pubKey: readFileSync(pubPath, "utf-8").trim(), keyPath };
}
}
for (const home of homes) {
for (const name of ["id_ed25519", "id_ecdsa", "id_rsa"]) {
const keyPath = join(home, ".ssh", name);
const pubPath = `${keyPath}.pub`;
if (existsSync(keyPath) && existsSync(pubPath)) {
return { pubKey: readFileSync(pubPath, "utf-8").trim(), keyPath };
}
}
}
throw new Error("No SSH key found");
}
/** Create a VM with two disks: root (cloud image) + empty data disk. */
function createTwoDiskVm(config: {
name: string;
memory: number;
vcpus: number;
rootDiskGb: number;
dataDiskGb: number;
network: string;
cloudImageUrl: string;
sshPubKey: string;
}): void {
destroyVm(config.name);
log(`Creating two-disk VM: ${config.name} (root=${config.rootDiskGb}GB, data=${config.dataDiskGb}GB)`);
const baseImage = ensureCloudImage(config.cloudImageUrl, `${config.name}-base`);
const rootDiskPath = join(IMAGE_DIR, `${config.name}.qcow2`);
const dataDiskPath = join(IMAGE_DIR, `${config.name}-data.qcow2`);
// Root disk from cloud image
run(`cp "${baseImage}" "${rootDiskPath}"`);
run(`qemu-img resize "${rootDiskPath}" ${config.rootDiskGb}G`);
// Empty data disk
run(`qemu-img create -f qcow2 "${dataDiskPath}" ${config.dataDiskGb}G`);
// Cloud-init with LVM tools
const cloudInitIso = createCloudInitIso(config.name, {
name: config.name,
memory: config.memory,
vcpus: config.vcpus,
diskSize: config.rootDiskGb,
network: config.network,
cloudImageUrl: config.cloudImageUrl,
sshPubKey: config.sshPubKey,
userData: `#cloud-config
hostname: ${config.name}
manage_etc_hosts: true
users:
- default
- name: fedora
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- ${config.sshPubKey}
ssh_pwauth: false
package_update: false
packages:
- lvm2
- xfsprogs
`,
});
const virtInstallArgs = [
"virt-install",
`--name=${config.name}`,
`--memory=${config.memory}`,
`--vcpus=${config.vcpus}`,
`--disk=path=${rootDiskPath},format=qcow2`,
`--disk=path=${dataDiskPath},format=qcow2`, // Second disk for LVM
`--disk=path=${cloudInitIso},device=cdrom`,
`--network=network=${config.network},model=virtio`,
"--os-variant=generic",
"--import",
"--noautoconsole",
"--wait=0",
];
run(virtInstallArgs.join(" "));
log(`Two-disk VM ${config.name} created`);
}
describe("asahi firstboot LVM integration", () => {
let vmIp: string;
let sshKeyPath: string;
let sshPubKey: string;
beforeAll(async () => {
const keys = findSshKey();
sshKeyPath = keys.keyPath;
sshPubKey = keys.pubKey;
log("Setting up test network...");
ensureTestNetwork();
log("Creating two-disk VM...");
createTwoDiskVm({
name: VM_NAME,
memory: VM_MEMORY,
vcpus: VM_VCPUS,
rootDiskGb: VM_ROOT_DISK_GB,
dataDiskGb: VM_DATA_DISK_GB,
network: TEST_NETWORK_NAME,
cloudImageUrl: FEDORA_CLOUD_IMAGE,
sshPubKey,
});
log("Waiting for VM IP...");
vmIp = await waitForVmIp(VM_NAME, 120_000);
log("Waiting for SSH...");
await waitForSsh(vmIp, SSH_USER, 180_000, sshKeyPath);
log("Waiting for cloud-init to finish...");
await sshRun(vmIp, SSH_USER, "sudo cloud-init status --wait 2>/dev/null || sleep 30", "cloud-init", { keyPath: sshKeyPath });
// Verify second disk exists
const disks = sshExec(vmIp, SSH_USER, "lsblk -d -n -o NAME,SIZE", { keyPath: sshKeyPath });
log(`Disks:\n${disks.stdout}`);
}, 300_000);
afterAll(async () => {
log("Cleaning up VM...");
destroyVm(VM_NAME);
// Also remove data disk
try { run(`rm -f "${join(IMAGE_DIR, `${VM_NAME}-data.qcow2`)}"`); } catch { /* ignore */ }
});
it("second disk is visible and unformatted", () => {
const result = sshExec(vmIp, SSH_USER, "lsblk -d -n -o NAME,SIZE,TYPE | grep disk", { keyPath: sshKeyPath });
const disks = result.stdout.trim().split("\n");
expect(disks.length).toBeGreaterThanOrEqual(2);
// Second disk (vdb) should exist
const vdb = sshExec(vmIp, SSH_USER, "sudo blkid /dev/vdb 2>/dev/null; echo exit=$?", { keyPath: sshKeyPath });
// Should have no filesystem (blkid returns nothing or non-zero)
expect(vdb.stdout).toContain("exit=2");
});
it("firstboot script creates LVM on data disk", async () => {
// Generate the firstboot script
const script = renderFirstbootScript({
hostname: "asahi-test",
role: "infra",
serverIp: "10.0.0.1",
httpPort: 8080,
sshKeys: [sshPubKey],
adminUser: "testadmin",
mac: "52:54:00:aa:bb:cc",
});
// Upload and run
log("Uploading firstboot script...");
await sshRun(vmIp, SSH_USER,
`cat > /tmp/firstboot.sh << 'SCRIPT_EOF'\n${script}\nSCRIPT_EOF\nchmod +x /tmp/firstboot.sh`,
"upload script", { keyPath: sshKeyPath });
log("Running firstboot script...");
const result = await sshRun(vmIp, SSH_USER,
"sudo /tmp/firstboot.sh 2>&1",
"firstboot", { keyPath: sshKeyPath, timeout: 120_000 });
expect(result).toBe(0);
}, 180_000);
it("SSH still works after firstboot script", () => {
const result = sshExec(vmIp, SSH_USER, "echo hello", { keyPath: sshKeyPath });
if (result.stdout.trim() !== "hello") {
log(`SSH debug: exitCode=${result.exitCode} stdout='${result.stdout}' stderr='${result.stderr}'`);
}
expect(result.stdout.trim()).toBe("hello");
});
it("volume group labvg exists", () => {
const result = sshExec(vmIp, SSH_USER, "sudo vgs labvg --noheadings -o vg_name", { keyPath: sshKeyPath });
expect(result.stdout.trim()).toBe("labvg");
});
it("all expected logical volumes exist", () => {
const result = sshExec(vmIp, SSH_USER,
"sudo lvs labvg --noheadings -o lv_name --sort lv_name",
{ keyPath: sshKeyPath });
const lvs = result.stdout.trim().split("\n").map(l => l.trim()).sort();
expect(lvs).toContain("home");
expect(lvs).toContain("longhorn");
expect(lvs).toContain("rancher"); // infra role
expect(lvs).toContain("srv");
expect(lvs).toContain("swap");
expect(lvs).toContain("var");
expect(lvs).toContain("varlog");
});
it("LV sizes match kickstart layout", () => {
const result = sshExec(vmIp, SSH_USER,
"sudo lvs labvg --noheadings -o lv_name,lv_size --units m --nosuffix",
{ keyPath: sshKeyPath });
const lvMap = new Map<string, number>();
for (const line of result.stdout.trim().split("\n")) {
const [name, size] = line.trim().split(/\s+/);
if (name && size) lvMap.set(name, Math.round(parseFloat(size)));
}
expect(lvMap.get("swap")).toBe(27648);
expect(lvMap.get("var")).toBe(102400);
expect(lvMap.get("varlog")).toBe(10240);
expect(lvMap.get("home")).toBe(10240);
expect(lvMap.get("srv")).toBe(20480);
expect(lvMap.get("rancher")).toBe(20480);
// longhorn gets remaining — should be at least 5GB (200GB disk - ~191GB used)
expect(lvMap.get("longhorn")).toBeGreaterThan(5000);
});
it("non-var volumes are mounted with XFS", () => {
const mounts = sshExec(vmIp, SSH_USER, "mount | grep labvg", { keyPath: sshKeyPath });
// /var and /var/log deferred to next reboot (can't migrate live)
expect(mounts.stdout).toContain("/home ");
expect(mounts.stdout).toContain("/srv ");
expect(mounts.stdout).toContain("/var/lib/rancher ");
expect(mounts.stdout).toContain("/var/lib/longhorn ");
expect(mounts.stdout).toContain("xfs");
});
it("swap is active", () => {
const result = sshExec(vmIp, SSH_USER, "swapon --show --noheadings", { keyPath: sshKeyPath });
// swapon may show /dev/dm-X or /dev/labvg/swap
expect(result.stdout.length).toBeGreaterThan(0);
});
it("fstab has LVM entries", () => {
const result = sshExec(vmIp, SSH_USER, "grep labvg /etc/fstab", { keyPath: sshKeyPath });
const lines = result.stdout.trim().split("\n");
expect(lines.length).toBeGreaterThanOrEqual(7); // swap + var + varlog + home + srv + rancher + longhorn
});
it("hostname was set", () => {
const result = sshExec(vmIp, SSH_USER, "hostname", { keyPath: sshKeyPath });
expect(result.stdout.trim()).toBe("asahi-test");
});
it("admin user was created with sudo", () => {
const result = sshExec(vmIp, SSH_USER, "sudo id testadmin", { keyPath: sshKeyPath });
expect(result.stdout).toContain("testadmin");
expect(result.stdout).toContain("wheel");
});
it("provisioning metadata file exists", () => {
const result = sshExec(vmIp, SSH_USER, "cat /etc/lab-provisioned", { keyPath: sshKeyPath });
expect(result.stdout).toContain("hostname=asahi-test");
expect(result.stdout).toContain("role=infra");
expect(result.stdout).toContain("method=asahi-firstboot");
});
it("marker file prevents re-run", () => {
const result = sshExec(vmIp, SSH_USER, "test -f /etc/lab-lvm-setup-done && echo yes", { keyPath: sshKeyPath });
expect(result.stdout.trim()).toBe("yes");
});
// ── Reprovision test ──────────────────────────────────────────────
it("reprovision: detects existing labvg and re-mounts", async () => {
// Write a test file to a preserved LV
await sshRun(vmIp, SSH_USER,
"echo 'precious-data' | sudo tee /var/lib/rancher/test-preserve.txt",
"write test data", { keyPath: sshKeyPath });
// Remove marker to simulate fresh boot after reinstall
await sshRun(vmIp, SSH_USER, "sudo rm /etc/lab-lvm-setup-done", "remove marker", { keyPath: sshKeyPath });
// Unmount everything (simulate reinstall wiping root)
await sshRun(vmIp, SSH_USER, `
sudo umount /var/lib/longhorn 2>/dev/null || true
sudo umount /var/lib/rancher 2>/dev/null || true
sudo umount /srv 2>/dev/null || true
sudo umount /home 2>/dev/null || true
sudo umount /var/log 2>/dev/null || true
# Don't unmount /var — it's in use
sudo swapoff /dev/labvg/swap 2>/dev/null || true
sudo sed -i '/labvg/d' /etc/fstab
`, "unmount LVs", { keyPath: sshKeyPath });
// Re-run firstboot script — should detect existing VG
log("Re-running firstboot (reprovision)...");
const result = await sshRun(vmIp, SSH_USER,
"sudo /tmp/firstboot.sh 2>&1",
"firstboot reprovision", { keyPath: sshKeyPath });
expect(result).toBe(0);
// Verify data was preserved
const data = sshExec(vmIp, SSH_USER, "cat /var/lib/rancher/test-preserve.txt", { keyPath: sshKeyPath });
expect(data.stdout.trim()).toBe("precious-data");
// Verify marker was re-created
const marker = sshExec(vmIp, SSH_USER, "test -f /etc/lab-lvm-setup-done && echo yes", { keyPath: sshKeyPath });
expect(marker.stdout.trim()).toBe("yes");
// Verify fstab was re-populated
const fstab = sshExec(vmIp, SSH_USER, "grep labvg /etc/fstab", { keyPath: sshKeyPath });
expect(fstab.stdout).toContain("/var/lib/rancher");
}, 60_000);
});

View File

@@ -0,0 +1,353 @@
// Validation tests for Asahi provisioning artifacts.
//
// Tests that can run WITHOUT Apple Silicon hardware:
// 1. Shellcheck the generated firstboot script
// 2. Verify the built rootfs ZIP structure
// 3. Mount the rootfs and verify injected files
// 4. Validate installer_data.json against the Asahi installer's Python parser
// 5. Verify partition layout arithmetic
//
// Prerequisites:
// - Run scripts/build-asahi-rootfs.sh first (creates asahi-repo/)
// - shellcheck installed (dnf install ShellCheck)
// - python3 installed
// - root for loop mount (sudo)
//
// Run: sudo pnpm run test:integration:asahi-validate
import { describe, it, expect, beforeAll, afterAll } from "vitest";
import { existsSync, lstatSync, readFileSync, writeFileSync, mkdirSync, rmSync } from "node:fs";
import { execSync, spawnSync } from "node:child_process";
import { join } from "node:path";
import { tmpdir } from "node:os";
import { renderFirstbootScript } from "../../src/bastion/src/templates/asahi-firstboot.sh.js";
const PROJECT_ROOT = join(import.meta.dirname, "..", "..");
const ASAHI_REPO = join(PROJECT_ROOT, "asahi-repo");
const ASAHI_CACHE = join(PROJECT_ROOT, ".asahi-cache");
const IS_ROOT = process.getuid?.() === 0;
function run(cmd: string, opts?: { timeout?: number }): string {
const full = IS_ROOT ? cmd : `sudo ${cmd}`;
return execSync(full, { encoding: "utf-8", stdio: "pipe", timeout: opts?.timeout ?? 60_000 });
}
function hasBuiltArtifacts(): boolean {
return existsSync(join(ASAHI_REPO, "fedora-asahi-lab.zip")) &&
existsSync(join(ASAHI_REPO, "installer_data.json"));
}
describe("asahi script validation", () => {
it("firstboot script passes shellcheck", () => {
const script = renderFirstbootScript({
hostname: "test-node",
role: "infra",
serverIp: "10.0.0.1",
httpPort: 8080,
sshKeys: ["ssh-ed25519 AAAA... user@host"],
adminUser: "testadmin",
mac: "aa:bb:cc:dd:ee:ff",
});
const tmpFile = join(tmpdir(), `asahi-shellcheck-${Date.now()}.sh`);
writeFileSync(tmpFile, script);
try {
const result = spawnSync("shellcheck", [
"-s", "bash",
"-e", "SC2086,SC2164", // allow unquoted variables (intentional in some LVM commands)
tmpFile,
], { encoding: "utf-8", stdio: "pipe", timeout: 30_000 });
if (result.status !== 0) {
console.log("Shellcheck warnings/errors:");
console.log(result.stdout);
}
// Allow warnings (exit 1 for warnings), fail on errors (exit 2+)
expect(result.status).toBeLessThan(2);
} finally {
try { rmSync(tmpFile); } catch { /* ignore */ }
}
});
it("firstboot script for worker role passes shellcheck", () => {
const script = renderFirstbootScript({
hostname: "worker-node",
role: "worker",
serverIp: "10.0.0.1",
httpPort: 8080,
sshKeys: [],
adminUser: "michal",
mac: "00:11:22:33:44:55",
});
const tmpFile = join(tmpdir(), `asahi-shellcheck-worker-${Date.now()}.sh`);
writeFileSync(tmpFile, script);
try {
const result = spawnSync("shellcheck", ["-s", "bash", "-e", "SC2086,SC2164", tmpFile],
{ encoding: "utf-8", stdio: "pipe", timeout: 30_000 });
if (result.status !== 0) console.log(result.stdout);
expect(result.status).toBeLessThan(2);
} finally {
try { rmSync(tmpFile); } catch { /* ignore */ }
}
});
it("firstboot script for vanilla role passes shellcheck", () => {
const script = renderFirstbootScript({
hostname: "vanilla-node",
role: "vanilla",
serverIp: "10.0.0.1",
httpPort: 8080,
sshKeys: ["ssh-rsa AAAA... user@host"],
adminUser: "admin",
mac: "ff:ee:dd:cc:bb:aa",
});
const tmpFile = join(tmpdir(), `asahi-shellcheck-vanilla-${Date.now()}.sh`);
writeFileSync(tmpFile, script);
try {
const result = spawnSync("shellcheck", ["-s", "bash", "-e", "SC2086,SC2164", tmpFile],
{ encoding: "utf-8", stdio: "pipe", timeout: 30_000 });
if (result.status !== 0) console.log(result.stdout);
expect(result.status).toBeLessThan(2);
} finally {
try { rmSync(tmpFile); } catch { /* ignore */ }
}
});
});
describe("asahi installer_data.json validation", () => {
let installerData: Record<string, unknown>;
beforeAll(() => {
if (!hasBuiltArtifacts()) {
throw new Error("Run scripts/build-asahi-rootfs.sh first to generate artifacts");
}
installerData = JSON.parse(readFileSync(join(ASAHI_REPO, "installer_data.json"), "utf-8"));
});
it("has os_list with one entry", () => {
const osList = installerData["os_list"] as unknown[];
expect(osList).toBeInstanceOf(Array);
expect(osList.length).toBe(1);
});
it("has required top-level fields", () => {
const os = (installerData["os_list"] as Record<string, unknown>[])[0]!;
expect(os["name"]).toBeDefined();
expect(os["default_os_name"]).toBeDefined();
expect(os["boot_object"]).toBeDefined();
expect(os["next_object"]).toBeDefined();
expect(os["package"]).toBe("fedora-asahi-lab.zip");
expect(os["supported_fw"]).toBeInstanceOf(Array);
expect((os["supported_fw"] as string[]).length).toBeGreaterThan(0);
});
it("has 4 partitions (EFI + Boot + Root + Data)", () => {
const os = (installerData["os_list"] as Record<string, unknown>[])[0]!;
const partitions = os["partitions"] as Record<string, unknown>[];
expect(partitions).toHaveLength(4);
expect(partitions[0]!["name"]).toBe("EFI");
expect(partitions[1]!["name"]).toBe("Boot");
expect(partitions[2]!["name"]).toBe("Root");
expect(partitions[3]!["name"]).toBe("Data");
});
it("EFI partition has correct format", () => {
const os = (installerData["os_list"] as Record<string, unknown>[])[0]!;
const efi = (os["partitions"] as Record<string, unknown>[])[0]!;
expect(efi["type"]).toBe("EFI");
expect(efi["format"]).toBe("fat");
expect(efi["copy_firmware"]).toBe(true);
// Size should be ~500MB in bytes
const size = parseInt(String(efi["size"]).replace("B", ""), 10);
expect(size).toBeGreaterThanOrEqual(500 * 1024 * 1024);
});
it("Boot partition references boot.img", () => {
const os = (installerData["os_list"] as Record<string, unknown>[])[0]!;
const boot = (os["partitions"] as Record<string, unknown>[])[1]!;
expect(boot["type"]).toBe("Linux");
expect(boot["image"]).toBe("boot.img");
});
it("Root partition does NOT expand", () => {
const os = (installerData["os_list"] as Record<string, unknown>[])[0]!;
const root = (os["partitions"] as Record<string, unknown>[])[2]!;
expect(root["type"]).toBe("Linux");
expect(root["image"]).toBe("root.img");
expect(root["expand"]).toBe(false);
});
it("Data partition expands for LVM", () => {
const os = (installerData["os_list"] as Record<string, unknown>[])[0]!;
const data = (os["partitions"] as Record<string, unknown>[])[3]!;
expect(data["type"]).toBe("Linux");
expect(data["expand"]).toBe(true);
expect(data["image"]).toBeUndefined(); // No image — empty partition for LVM
});
it("partition sizes use bytes format (NB suffix)", () => {
const os = (installerData["os_list"] as Record<string, unknown>[])[0]!;
const partitions = os["partitions"] as Record<string, unknown>[];
for (const p of partitions) {
const size = String(p["size"]);
expect(size).toMatch(/^\d+B$/);
}
});
it("validates against Asahi installer Python parser", () => {
// Download the Asahi installer and run its validation logic on our config
const validation = spawnSync("python3", ["-c", `
import json, sys
with open("${join(ASAHI_REPO, "installer_data.json")}") as f:
data = json.load(f)
errors = []
os_list = data.get("os_list", [])
if not os_list:
errors.append("Empty os_list")
for os_entry in os_list:
required = ["name", "default_os_name", "boot_object", "next_object", "package", "supported_fw", "partitions"]
for field in required:
if field not in os_entry:
errors.append(f"Missing field: {field}")
partitions = os_entry.get("partitions", [])
if not partitions:
errors.append("No partitions defined")
has_efi = False
has_root_image = False
expand_count = 0
for p in partitions:
if "name" not in p or "type" not in p or "size" not in p:
errors.append(f"Partition missing name/type/size: {p}")
if p.get("type") == "EFI":
has_efi = True
if p.get("format") != "fat":
errors.append("EFI partition must be FAT format")
if p.get("image"):
has_root_image = True
if p.get("expand"):
expand_count += 1
# Validate size format
size_str = str(p.get("size", ""))
if not size_str.endswith("B") or not size_str[:-1].isdigit():
errors.append(f"Invalid size format: {size_str} (expected NB)")
if not has_efi:
errors.append("No EFI partition found")
if not has_root_image:
errors.append("No partition with root image found")
if expand_count > 1:
errors.append(f"Multiple expanding partitions ({expand_count}) — only one should expand")
# Verify supported_fw is a list of strings
fw = os_entry.get("supported_fw", [])
if not isinstance(fw, list) or not all(isinstance(v, str) for v in fw):
errors.append("supported_fw must be a list of strings")
if errors:
print("ERRORS:")
for e in errors:
print(f" - {e}")
sys.exit(1)
else:
print("OK: installer_data.json is valid")
`], { encoding: "utf-8", stdio: "pipe", timeout: 10_000 });
if (validation.status !== 0) {
console.log(validation.stdout);
console.log(validation.stderr);
}
expect(validation.stdout).toContain("OK");
expect(validation.status).toBe(0);
});
});
describe("asahi rootfs ZIP validation", () => {
beforeAll(() => {
if (!hasBuiltArtifacts()) {
throw new Error("Run scripts/build-asahi-rootfs.sh first to generate artifacts");
}
});
it("ZIP contains required files", () => {
const result = spawnSync("unzip", ["-l", join(ASAHI_REPO, "fedora-asahi-lab.zip")],
{ encoding: "utf-8", stdio: "pipe", timeout: 10_000 });
expect(result.stdout).toContain("boot.img");
expect(result.stdout).toContain("root.img");
expect(result.stdout).toContain("esp/");
});
it("boot.img is ~1GB", () => {
const result = spawnSync("unzip", ["-l", join(ASAHI_REPO, "fedora-asahi-lab.zip")],
{ encoding: "utf-8", stdio: "pipe", timeout: 10_000 });
const bootLine = result.stdout.split("\n").find(l => l.includes("boot.img") && !l.includes("/"));
expect(bootLine).toBeDefined();
const size = parseInt(bootLine!.trim().split(/\s+/)[0]!, 10);
expect(size).toBeGreaterThan(500 * 1024 * 1024); // > 500MB
expect(size).toBeLessThan(2 * 1024 * 1024 * 1024); // < 2GB
});
it("root.img is > 3GB", () => {
const result = spawnSync("unzip", ["-l", join(ASAHI_REPO, "fedora-asahi-lab.zip")],
{ encoding: "utf-8", stdio: "pipe", timeout: 10_000 });
const rootLine = result.stdout.split("\n").find(l => l.includes("root.img"));
expect(rootLine).toBeDefined();
const size = parseInt(rootLine!.trim().split(/\s+/)[0]!, 10);
expect(size).toBeGreaterThan(3 * 1024 * 1024 * 1024); // > 3GB
});
it("rootfs contains lab-firstboot.sh", () => {
const mountDir = join(tmpdir(), `asahi-rootfs-check-${Date.now()}`);
const extractDir = join(tmpdir(), `asahi-rootfs-extract-${Date.now()}`);
mkdirSync(mountDir);
mkdirSync(extractDir);
try {
// Extract root.img from ZIP
run(`unzip -o -j "${join(ASAHI_REPO, "fedora-asahi-lab.zip")}" root.img -d "${extractDir}"`);
// Mount and check
run(`mount -o loop,ro "${join(extractDir, "root.img")}" "${mountDir}"`);
// Verify firstboot script
expect(existsSync(join(mountDir, "usr/local/bin/lab-firstboot.sh"))).toBe(true);
const script = readFileSync(join(mountDir, "usr/local/bin/lab-firstboot.sh"), "utf-8");
expect(script).toContain("#!/bin/bash");
expect(script).toContain("labvg");
expect(script).toContain("pvcreate");
// Verify systemd service
expect(existsSync(join(mountDir, "etc/systemd/system/lab-firstboot.service"))).toBe(true);
const service = readFileSync(join(mountDir, "etc/systemd/system/lab-firstboot.service"), "utf-8");
expect(service).toContain("lab-firstboot.sh");
// Verify service is enabled (symlink exists)
const symlinkPath = join(mountDir, "etc/systemd/system/multi-user.target.wants/lab-firstboot.service");
let symlinkExists = false;
try { lstatSync(symlinkPath); symlinkExists = true; } catch { /* not found */ }
expect(symlinkExists).toBe(true);
// Verify SSH keys
expect(existsSync(join(mountDir, "root/.ssh/authorized_keys"))).toBe(true);
// Verify lvm2 + xfsprogs are in the image
const hasLvm = existsSync(join(mountDir, "usr/bin/pvcreate")) || existsSync(join(mountDir, "usr/sbin/pvcreate"));
const hasXfs = existsSync(join(mountDir, "usr/bin/mkfs.xfs")) || existsSync(join(mountDir, "usr/sbin/mkfs.xfs"));
expect(hasLvm).toBe(true);
expect(hasXfs).toBe(true);
} finally {
run(`umount "${mountDir}" 2>/dev/null || true`);
rmSync(mountDir, { recursive: true, force: true });
rmSync(extractDir, { recursive: true, force: true });
}
}, 120_000);
});