Commit Graph

69 Commits

Author SHA1 Message Date
Michal
7cfd8fe1b8 feat: daemonize bastion start, fix status for root-owned processes
- `lab init bastion standalone start` now runs in background by default
- `--foreground` flag for running in foreground (debugging/containers)
- Shows startup output then detaches with PID + log path
- Status command uses /proc check instead of kill -0 (works cross-user)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 22:38:46 +00:00
Michal
4d2e8677d4 fix: PID file permission handling + root check
- Require root when dnsmasq is needed (clear error message)
- Handle stale PID files owned by different user (remove + recreate)
- Create bastion dir with 755 permissions
- 3 new PID file tests (30 total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 22:31:17 +00:00
Michal
52e1932bde feat: multi-architecture builds (x86_64 + arm64)
- build-rpm.sh: --arch flag for targeting x86_64 or arm64, --all for both
  Uses bun cross-compile with --target=bun-linux-x64/arm64
- build-bastion.sh: --arch flag for Docker platform targeting
- release.sh: builds both architectures by default
- CI: builds + publishes RPM/DEB for both architectures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 22:02:52 +00:00
Michal
86cd961ee4 feat: release pipeline, k3s manifests, infra k3s bootstrap
- scripts/release.sh: full release orchestration (build, publish, install)
- deploy/k3s/: Deployment, ConfigMap, PVC, Namespace with kustomize
  hostNetwork for dnsmasq, NET_ADMIN caps, local-path PVC
- Infra role gets /var/lib/rancher partition (20GB, preserved on reprovision)
  for k3s etcd data persistence across reinstalls
- Infra %post installs k3s server (INSTALL_K3S_SKIP_START=true)
- 5 new kickstart tests (27 total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 21:56:39 +00:00
Michal
ed1df8a77c feat: ESLint, shell completions, Docker, nfpm packaging, CI/CD
- ESLint with typescript-eslint + prettier (eslint.config.js)
- Shell completions for bash and fish (scripts/generate-completions.ts)
- Multi-stage Dockerfile for bastion (fedora:43 + dnsmasq + node)
- nfpm.yaml for RPM/DEB packaging with bun-compiled binary
- Build scripts: build-rpm.sh, build-bastion.sh, publish-rpm/deb.sh
- Gitea Actions CI/CD: lint, typecheck, test, build, publish

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 21:51:01 +00:00
Michal
520af41a52 feat: colorful progress output with icons and SSH command
Progress callbacks from kickstart now show:
  ◆ 78:55:36:08:35:14  partitioning -- preparing disk layout
  ◆◆◆ 78:55:36:08:35:14  post-install -- configuring system
  ✔ 78:55:36:08:35:14  complete -- ready at 10.0.1.88

    ssh michal@10.0.1.88

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 12:04:52 +00:00
Michal
9803817004 fix: reprovision SSH reboot is expected to close connection
The SSH connection closing during reboot is normal, not an error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:45:16 +00:00
Michal
db26c5ecb1 docs: architecture design document
ARCHITECTURE.md covering: CLI structure (lab init/provision), project layout,
boot flow, partition scheme, container architecture, CI/CD pipeline,
state management, and technology stack.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:42:16 +00:00
Michal
d01b675cca feat: firewall management + reprovision SSH key fix
- Open firewall ports (dhcp, tftp, http, 4011) on bastion start
- Close firewall ports on bastion shutdown
- Auto-detect firewall zone for interface
- Fix reprovision SSH to use execFileSync with explicit key path

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:39:57 +00:00
Michal
62f896593d feat: CLI subcommands, PID self-restart, unit tests (22 passing)
CLI restructured:
  lab init bastion standalone start/stop/status
  lab provision list/install/reprovision/forget

- Nested commander subcommand groups (init > bastion > standalone, provision)
- PID file management: auto-kills old bastion on start, cleans up on stop
- stop command reads PID file and sends SIGTERM
- status command shows running state, port, machine counts
- forget command (DELETE /api/machines/:mac) removes from all state

Unit tests (22 tests, 3 files):
- kickstart.test.ts: worker/infra roles, SSH keys, partitions, admin user
- state.test.ts: load/save, atomic writes
- dispatch.test.ts: install/discover/local-boot routing, progress, forget

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:12:17 +00:00
Michal
64533b2dcf refactor: restructure bastion as pnpm monorepo (@lab/shared, @lab/bastion, @lab/cli)
- Split into 3 workspace packages: shared (types/constants), bastion (server), cli
- CLI binary renamed from "bastion" to "lab"
- Cross-package imports via @lab/shared and @lab/bastion workspace references
- Extracted BastionConfig, BastionState, HardwareInfo types into @lab/shared
- Added APP_NAME/APP_VERSION constants
- tsconfig.base.json with project references for build ordering
- Root workspace scripts: build, test, typecheck, clean

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:05:41 +00:00
Michal
937c01f5d9 fix: add --skip-dnsmasq/--skip-artifacts flags, fix config propagation
Enables running the TS bastion without dnsmasq for testing.
VM-tested: SSH works, partitions correct, k3s prereqs configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 03:11:29 +00:00
Michal
177e993736 feat: TypeScript bastion rewrite (initial scaffold)
Full rewrite of the bash bastion.sh into a TypeScript application:
- Fastify HTTP server with typed routes (dispatch, kickstart, API)
- Commander CLI (serve, install, list, reprovision)
- Kickstart templates as TypeScript template literals (no more heredoc hell)
- dnsmasq management via execa subprocess
- Merged machine list view (hardware + install info in one table)
- Containerized via podman-compose (Dockerfile + docker-compose.yml)
- All partition logic preserved (LVM, reprovision detection, role-based)

Not yet tested end-to-end — needs VM validation before replacing bash version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 02:55:52 +00:00
Michal
fac14b6d4a feat: server kickstart with LVM, user creation, progress callbacks, reprovision
- LVM partition layout: /, /var, /var/log, /home, /srv, swap, tmpfs /tmp
  plus /var/lib/longhorn for worker role (grows to fill disk)
- Reprovision preserves /home, /srv, /var/lib/longhorn via %pre detection
- Admin user created matching the user running the bastion script
  with SSH keys from authorized_keys + local pubkeys, passwordless sudo
- Progress callbacks from %pre and %post to /api/progress endpoint
  with IP reported on completion (ssh command printed)
- Installed machines boot from local disk (iPXE exit) instead of
  re-entering discovery mode
- --role worker|infra flag (infra skips longhorn partition)
- reprovision subcommand: queues install + SSH reboot into PXE
- Self-cleanup: kills old bastion instances on start
- Domain config (DOMAIN env, default ad.itaz.eu)
- efibootmgr in %post to set local disk first in boot order
- k3s prereqs: kernel modules, sysctl, firewalld disabled, chrony
- VM reprovision test script (test-reprovision.sh)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 02:40:40 +00:00
014e8a6e72 Merge pull request 'fix: PXE boot Content-Length, firewall zones, UEFI improvements' (#1) from fix/pxe-boot-issues into main
Reviewed-on: #1
2026-03-17 01:03:37 +00:00
Michal
75d17eb87c fix: HTTP Content-Length, firewall zones, UEFI boot improvements
- Fix Content-Length using byte count instead of character count
  (em dash in iPXE scripts caused mismatch, breaking iPXE chain)
- Use firewall zone-aware commands matching interface zone
- Add UEFI HTTP Boot support (arch 16/20) alongside PXE TFTP
- Add pxe-service directives for proper proxy DHCP responses
- Use bind-dynamic instead of bind-interfaces for bridge compat
- Add tftp-no-blocksize for UEFI firmware compatibility
- Use local ipxe packages instead of downloading from internet
- Add custom UEFI PXE loader stub (pxeloader.c) for chainloading
- Enable HTTP request logging for debugging boot issues

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 00:59:27 +00:00
Michal Rydlikowski
2a429088c5 bastion: discover-first PXE provisioning with multi-arch support
Rewrote bastion from install-only to discover-first flow:
- Default mode discovers hardware (PXE boot → inventory → poweroff)
- Discovered machines promoted to install via subcommand
- Per-MAC iPXE dispatch (/dispatch?mac=) routes discover vs install
- Python HTTP server with discovery API, state management, kickstart gen
- Added full DHCP mode (DHCP_MODE=full) for isolated/test networks
- Added arm64 UEFI support (client-arch 11, iPXE arm64 binary)
- Added QEMU test script (aarch64+KVM on Asahi Linux)
- All API endpoints unit tested and working

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 00:06:04 +00:00
Michal Rydlikowski
5ba22b94ea first commit 2026-03-16 00:00:13 +00:00
Michal Rydlikowski
ac695f506f first commit 2026-03-15 23:50:43 +00:00