Files
lab/bastion/src/cli/src/commands/app.ts

489 lines
17 KiB
TypeScript
Raw Normal View History

feat: install logging, error trapping, PXE/ISO integration tests Kickstart installs on real hardware failed silently — no error reporting, only 3 progress callbacks, zero log streaming. This overhaul makes every install fully observable. Kickstart improvements: - Error trapping in %pre and %post (trap ERR sends failure details to bastion) - 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata - Background log streamer: tails %post output and batch-sends to /api/log - bastion_log() function for explicit log lines from kickstart scripts Bastion API: - POST /api/log — receives raw log lines from kickstart (single or batch) - InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence - GET /api/logs/:mac — now returns log_lines + log_total alongside stages - SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log) - Progress events forwarded to labd via bastion-progress WebSocket message - Post-provision k3s logs routed through progressBus (was console-only) dnsmasq fixes found during VM testing: - HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach) - pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode) - PXEClient vendor class echo for UEFI firmware compatibility Integration tests: - PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install - ISO boot test: blank VM boots from bastion-generated ISO → same flow - Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot) - test-provision.sh: runs both PXE + ISO tests with prerequisite checks - 250GB sparse QCOW2 disk (LVM layout needs ~204GB) 201 unit tests passing (11 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:26:33 +00:00
// CLI command: labctl app k3s install/health <target>
// Install or check k3s on a target machine via SSH.
import { existsSync, writeFileSync, mkdirSync } from "node:fs";
feat: install logging, error trapping, PXE/ISO integration tests Kickstart installs on real hardware failed silently — no error reporting, only 3 progress callbacks, zero log streaming. This overhaul makes every install fully observable. Kickstart improvements: - Error trapping in %pre and %post (trap ERR sends failure details to bastion) - 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata - Background log streamer: tails %post output and batch-sends to /api/log - bastion_log() function for explicit log lines from kickstart scripts Bastion API: - POST /api/log — receives raw log lines from kickstart (single or batch) - InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence - GET /api/logs/:mac — now returns log_lines + log_total alongside stages - SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log) - Progress events forwarded to labd via bastion-progress WebSocket message - Post-provision k3s logs routed through progressBus (was console-only) dnsmasq fixes found during VM testing: - HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach) - pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode) - PXEClient vendor class echo for UEFI firmware compatibility Integration tests: - PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install - ISO boot test: blank VM boots from bastion-generated ISO → same flow - Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot) - test-provision.sh: runs both PXE + ISO tests with prerequisite checks - 250GB sparse QCOW2 disk (LVM layout needs ~204GB) 201 unit tests passing (11 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:26:33 +00:00
import { homedir } from "node:os";
import { join } from "node:path";
import { execSync } from "node:child_process";
feat: install logging, error trapping, PXE/ISO integration tests Kickstart installs on real hardware failed silently — no error reporting, only 3 progress callbacks, zero log streaming. This overhaul makes every install fully observable. Kickstart improvements: - Error trapping in %pre and %post (trap ERR sends failure details to bastion) - 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata - Background log streamer: tails %post output and batch-sends to /api/log - bastion_log() function for explicit log lines from kickstart scripts Bastion API: - POST /api/log — receives raw log lines from kickstart (single or batch) - InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence - GET /api/logs/:mac — now returns log_lines + log_total alongside stages - SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log) - Progress events forwarded to labd via bastion-progress WebSocket message - Post-provision k3s logs routed through progressBus (was console-only) dnsmasq fixes found during VM testing: - HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach) - pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode) - PXEClient vendor class echo for UEFI firmware compatibility Integration tests: - PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install - ISO boot test: blank VM boots from bastion-generated ISO → same flow - Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot) - test-provision.sh: runs both PXE + ISO tests with prerequisite checks - 250GB sparse QCOW2 disk (LVM layout needs ~204GB) 201 unit tests passing (11 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:26:33 +00:00
import type { Command } from "commander";
import type { BastionState } from "@lab/shared";
import { K3sModule, sshExec } from "@lab/modules";
import { getLabdClient } from "../api/config.js";
function resolveTarget(
target: string,
state: BastionState | null,
): { ip: string; hostname: string; role: string } | null {
// Direct IP
if (/^\d+\.\d+\.\d+\.\d+$/.test(target)) {
return { ip: target, hostname: target, role: "infra" };
}
if (!state) return null;
// Check by MAC
const mac = target.toLowerCase().replace(/-/g, ":");
const installed = state.installed[mac];
if (installed?.ip) {
return { ip: installed.ip, hostname: installed.hostname, role: installed.role };
}
// Check by hostname
for (const [, info] of Object.entries(state.installed)) {
if (info.hostname === target || info.hostname.startsWith(target + ".")) {
return { ip: info.ip, hostname: info.hostname, role: info.role };
}
}
return null;
}
function findSshKey(): string | undefined {
const sudoUser = process.env["SUDO_USER"];
const realHome = sudoUser ? join("/home", sudoUser) : homedir();
for (const name of ["id_ed25519", "id_ecdsa", "id_rsa"]) {
const keyPath = join(realHome, ".ssh", name);
if (existsSync(keyPath)) return keyPath;
}
return undefined;
}
async function fetchState(): Promise<BastionState | null> {
try {
return await getLabdClient().getMachines();
} catch {
return null;
}
}
import { registerLabcontrollerCommands } from "./labcontroller.js";
export function registerAppCommand(program: Command): void {
const appCmd = program.command("app").description("Application management");
// labcontroller subcommands
registerLabcontrollerCommands(appCmd);
const k3sCmd = appCmd.command("k3s").description("k3s cluster management");
k3sCmd
.command("install <target>")
.description("Install k3s on a target machine (hostname, IP, or MAC)")
.option("--role <role>", "k3s role: infra (server) or worker (agent)", "infra")
.option("--user <user>", "SSH user", "michal")
.option("--k3s-server <url>", "k3s server URL (required for worker role)")
.option("--k3s-token <token>", "k3s join token (required for worker role)")
.action(async (target: string, opts: {
role: string;
user: string;
k3sServer?: string;
k3sToken?: string;
}) => {
const state = await fetchState();
const resolved = resolveTarget(target, state);
if (!resolved) {
console.error(`Cannot resolve target: ${target}`);
console.error("Provide an IP address, hostname, or MAC of an installed machine.");
process.exit(1);
}
const role = opts.role === "worker" ? "worker" : "infra";
const sshKey = findSshKey();
console.log(`Installing k3s on ${resolved.hostname} (${resolved.ip}) as ${role}...`);
console.log("");
const k3s = new K3sModule();
const moduleCtx = {
hostname: resolved.hostname,
ip: resolved.ip,
role,
os: "fedora-43" as const,
arch: "x86_64" as const,
sshUser: opts.user,
...(sshKey ? { sshKeyPath: sshKey } : {}),
config: {
...(opts.k3sServer ? { k3sServerUrl: opts.k3sServer } : {}),
...(opts.k3sToken ? { k3sToken: opts.k3sToken } : {}),
},
};
const installResult = await k3s.install(moduleCtx);
for (const line of installResult.output) {
console.log(` ${line}`);
}
if (!installResult.success) {
console.error(`\nk3s install failed: ${installResult.errors.join(", ")}`);
process.exit(1);
}
console.log("\nRunning post-install configuration...\n");
const configResult = await k3s.configure(moduleCtx);
for (const line of configResult.output) {
console.log(` ${line}`);
}
if (!configResult.success) {
console.error(`\nk3s configure failed: ${configResult.errors.join(", ")}`);
process.exit(1);
}
console.log("\nk3s installed successfully.");
// Check if the machine's role requires additional app deployments
try {
const { ROLE_REGISTRY } = await import("@lab/shared");
const freshState = await fetchState();
if (freshState) {
for (const [, info] of Object.entries(freshState.installed)) {
if (info.ip === resolved.ip || info.hostname === resolved.hostname) {
const roleInfo = ROLE_REGISTRY.find((r: { name: string }) => r.name === info.role);
if (roleInfo && roleInfo.apps.length > 0) {
console.log(`\nRole ${info.role} requires: ${roleInfo.apps.join(", ")}`);
console.log(`Deploying automatically...`);
const { execFileSync } = await import("node:child_process");
try {
execFileSync("node", [
process.argv[1] ?? "",
"app", "labcontroller", "deploy", resolved.hostname,
"--user", opts.user,
], { stdio: "inherit" });
} catch {
console.error(`\nAuto-deploy failed. Run manually: labctl app labcontroller deploy ${resolved.hostname}`);
}
}
break;
}
}
}
} catch { /* best-effort chain */ }
console.log(`\nTo get kubeconfig: ssh ${opts.user}@${resolved.ip} sudo cat /etc/rancher/k3s/k3s.yaml`);
});
k3sCmd
.command("health [target]")
.description("Check k3s health (all hosts if no target given)")
.option("--user <user>", "SSH user", "michal")
.action(async (target: string | undefined, opts: { user: string }) => {
const sshKey = findSshKey();
if (!target) {
let state: BastionState;
try {
state = await getLabdClient().getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const entries = Object.entries(state.installed);
if (entries.length === 0) {
console.log("No installed machines.");
return;
}
const BOLD = "\x1b[1m";
const GREEN = "\x1b[32m";
const RED = "\x1b[31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
const pad = (s: string, w: number) => s.padEnd(w);
console.log(
`${BOLD}${pad("HOST", 22)}${pad("IP", 16)}${pad("ROLE", 8)}${pad("K3S", 14)}${pad("NODE", 10)}${pad("ENCRYPT", 10)}${pad("CNI", 14)}${pad("PODS", 6)}${RESET}`,
);
interface HealthRow {
host: string; ip: string; role: string;
k3s: string; node: string; encrypt: string; cni: string; pods: string;
k3sC: string; nodeC: string; encC: string; cniC: string;
}
const probes = entries.map(async ([_mac, info]): Promise<HealthRow> => {
const r: HealthRow = {
host: info.hostname, ip: info.ip, role: info.role,
k3s: "—", node: "—", encrypt: "—", cni: "—", pods: "—",
k3sC: DIM, nodeC: DIM, encC: DIM, cniC: DIM,
};
if (!info.ip || info.role === "vanilla") {
r.k3s = info.role === "vanilla" ? "n/a" : "no ip";
return r;
}
try {
const svc = await sshExec(info.ip, opts.user, "systemctl is-active k3s 2>/dev/null || systemctl is-active k3s-agent 2>/dev/null", {
...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000,
});
if (svc.stdout.trim() !== "active") {
r.k3s = svc.stdout.trim() === "inactive" ? "stopped" : "not installed";
r.k3sC = svc.stdout.trim() === "inactive" ? RED : DIM;
return r;
}
r.k3s = "running"; r.k3sC = GREEN;
const [nodeRes, encRes, cniRes, podRes] = await Promise.all([
sshExec(info.ip, opts.user,
"sudo k3s kubectl get nodes -o jsonpath='{.items[0].status.conditions[?(@.type==\"Ready\")].status}' 2>/dev/null",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
sshExec(info.ip, opts.user,
"sudo k3s secrets-encrypt status 2>/dev/null | head -1",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
sshExec(info.ip, opts.user,
"sudo k3s kubectl get pods -n kube-system -l k8s-app=cilium --no-headers 2>/dev/null | head -1",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
sshExec(info.ip, opts.user,
"sudo k3s kubectl get pods -A --no-headers 2>/dev/null | wc -l",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 }),
]);
r.node = nodeRes.stdout.includes("True") ? "Ready" : "NotReady";
r.nodeC = nodeRes.stdout.includes("True") ? GREEN : RED;
r.encrypt = encRes.stdout.includes("Enabled") ? "yes" : "no";
r.encC = encRes.stdout.includes("Enabled") ? GREEN : RED;
r.cni = cniRes.stdout.includes("Running") ? "cilium" : "flannel";
r.cniC = cniRes.stdout.includes("Running") ? GREEN : DIM;
r.pods = podRes.stdout.trim() || "?";
} catch {
r.k3s = "unreachable"; r.k3sC = RED;
}
return r;
});
const results = await Promise.all(probes);
for (const r of results) {
console.log(
`${pad(r.host, 22)}${pad(r.ip, 16)}${pad(r.role, 8)}${r.k3sC}${pad(r.k3s, 14)}${RESET}${r.nodeC}${pad(r.node, 10)}${RESET}${r.encC}${pad(r.encrypt, 10)}${RESET}${r.cniC}${pad(r.cni, 14)}${RESET}${pad(r.pods, 6)}`,
);
}
return;
}
// Single target: detailed health check
const state = await fetchState();
const resolved = resolveTarget(target, state);
if (!resolved) {
console.error(`Cannot resolve target: ${target}`);
process.exit(1);
}
console.log(`Checking k3s health on ${resolved.hostname} (${resolved.ip})...\n`);
const k3s = new K3sModule();
const healthResult = await k3s.health({
hostname: resolved.hostname,
ip: resolved.ip,
role: resolved.role,
os: "fedora-43" as const,
arch: "x86_64" as const,
sshUser: opts.user,
...(sshKey ? { sshKeyPath: sshKey } : {}),
config: {},
});
for (const line of healthResult.output) {
console.log(` ${line}`);
}
if (healthResult.errors.length > 0) {
for (const err of healthResult.errors) {
console.error(` ERROR: ${err}`);
}
}
process.exit(healthResult.success ? 0 : 1);
});
k3sCmd
.command("list")
.description("List installed machines and their k3s status")
.option("--user <user>", "SSH user", "michal")
.action(async (opts: { user: string }) => {
let state: BastionState;
try {
state = await getLabdClient().getMachines();
} catch (err) {
console.error(`Cannot reach labd: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const entries = Object.entries(state.installed);
if (entries.length === 0) {
console.log("No installed machines.");
return;
}
const sshKey = findSshKey();
const BOLD = "\x1b[1m";
const GREEN = "\x1b[32m";
const RED = "\x1b[31m";
const DIM = "\x1b[2m";
const RESET = "\x1b[0m";
const hdr = (s: string, w: number) => s.padEnd(w);
console.log(
`${BOLD}${hdr("HOSTNAME", 28)}${hdr("IP", 18)}${hdr("ROLE", 10)}${hdr("K3S", 16)}${hdr("NODE", 12)}${hdr("PODS", 6)}${RESET}`,
);
const probes = entries.map(async ([_mac, info]) => {
const row = {
hostname: info.hostname,
ip: info.ip,
role: info.role,
k3s: "—",
node: "—",
pods: "—",
k3sColor: DIM,
nodeColor: DIM,
};
if (!info.ip || info.role === "vanilla") {
row.k3s = info.role === "vanilla" ? "n/a" : "no ip";
return row;
}
try {
const svcResult = await sshExec(info.ip, opts.user, "systemctl is-active k3s 2>/dev/null || systemctl is-active k3s-agent 2>/dev/null", {
...(sshKey ? { keyPath: sshKey } : {}),
timeoutMs: 8_000,
});
const svcStatus = svcResult.stdout.trim();
if (svcStatus === "active") {
row.k3s = "running";
row.k3sColor = GREEN;
const nodeResult = await sshExec(info.ip, opts.user,
"sudo k3s kubectl get nodes -o jsonpath='{.items[0].status.conditions[?(@.type==\"Ready\")].status}' 2>/dev/null || echo unknown",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 },
);
const nodeReady = nodeResult.stdout.trim();
if (nodeReady.includes("True")) {
row.node = "Ready";
row.nodeColor = GREEN;
} else {
row.node = "NotReady";
row.nodeColor = RED;
}
const podResult = await sshExec(info.ip, opts.user,
"sudo k3s kubectl get pods -A --no-headers 2>/dev/null | wc -l",
{ ...(sshKey ? { keyPath: sshKey } : {}), timeoutMs: 8_000 },
);
row.pods = podResult.stdout.trim() || "?";
} else if (svcStatus === "inactive" || svcStatus === "dead") {
row.k3s = "stopped";
row.k3sColor = RED;
} else {
row.k3s = "not installed";
row.k3sColor = DIM;
}
} catch {
row.k3s = "unreachable";
row.k3sColor = RED;
}
return row;
});
const results = await Promise.all(probes);
for (const r of results) {
console.log(
`${hdr(r.hostname, 28)}${hdr(r.ip, 18)}${hdr(r.role, 10)}${r.k3sColor}${hdr(r.k3s, 16)}${RESET}${r.nodeColor}${hdr(r.node, 12)}${RESET}${hdr(r.pods, 6)}`,
);
}
});
k3sCmd
.command("kubeconfig <target>")
.description("Fetch kubeconfig from a target and merge into ~/.kube/config")
.option("--user <user>", "SSH user", "root")
.option("--context <name>", "Context name (defaults to hostname)")
.option("--print", "Print kubeconfig to stdout instead of merging")
.action(async (target: string, opts: {
user: string;
context?: string;
print?: boolean;
}) => {
const state = await fetchState();
const resolved = resolveTarget(target, state);
if (!resolved) {
console.error(`Cannot resolve target: ${target}`);
console.error("Provide an IP address, hostname, or MAC of an installed machine.");
process.exit(1);
}
const sshKey = findSshKey();
// Fetch kubeconfig via SSH
let raw: string;
try {
const result = await sshExec(resolved.ip, opts.user, "cat /etc/rancher/k3s/k3s.yaml", {
...(sshKey ? { keyPath: sshKey } : {}),
timeoutMs: 10_000,
});
raw = result.stdout;
} catch (err) {
console.error(`Failed to fetch kubeconfig: ${err instanceof Error ? err.message : String(err)}`);
process.exit(1);
}
const contextName = opts.context ?? resolved.hostname;
// Rewrite: replace 127.0.0.1 with actual IP, rename cluster/user/context
const rewritten = raw
.replace(/server:\s*https:\/\/127\.0\.0\.1:/, `server: https://${resolved.ip}:`)
.replace(/name:\s*default/g, `name: ${contextName}`)
.replace(/cluster:\s*default/g, `cluster: ${contextName}`)
.replace(/user:\s*default/g, `user: ${contextName}`)
.replace(/current-context:\s*default/, `current-context: ${contextName}`);
if (opts.print) {
process.stdout.write(rewritten);
return;
}
// Merge into ~/.kube/config using kubectl
const kubeDir = join(homedir(), ".kube");
mkdirSync(kubeDir, { recursive: true });
const mainConfig = join(kubeDir, "config");
const tmpFile = join(kubeDir, `.labctl-${contextName}.tmp`);
writeFileSync(tmpFile, rewritten, { mode: 0o600 });
try {
if (existsSync(mainConfig)) {
const merged = execSync(
`KUBECONFIG="${mainConfig}:${tmpFile}" kubectl config view --flatten`,
{ encoding: "utf-8" },
);
writeFileSync(mainConfig, merged, { mode: 0o600 });
} else {
writeFileSync(mainConfig, rewritten, { mode: 0o600 });
}
// Set current context
execSync(`kubectl config use-context ${contextName}`, { stdio: "pipe" });
console.log(`Merged kubeconfig for ${contextName} (${resolved.ip})`);
console.log(`Context set to: ${contextName}`);
console.log(`\nSwitch contexts: kubectl config use-context <name>`);
} catch (err) {
console.error(`Failed to merge kubeconfig: ${err instanceof Error ? err.message : String(err)}`);
console.error(`Standalone config saved at: ${tmpFile}`);
process.exit(1);
} finally {
try { const { unlinkSync } = await import("node:fs"); unlinkSync(tmpFile); } catch { /* ignore */ }
}
});
feat: install logging, error trapping, PXE/ISO integration tests Kickstart installs on real hardware failed silently — no error reporting, only 3 progress callbacks, zero log streaming. This overhaul makes every install fully observable. Kickstart improvements: - Error trapping in %pre and %post (trap ERR sends failure details to bastion) - 12+ granular progress stages (was 3): SSH, hostname, k3s prep, EFI boot, metadata - Background log streamer: tails %post output and batch-sends to /api/log - bastion_log() function for explicit log lines from kickstart scripts Bastion API: - POST /api/log — receives raw log lines from kickstart (single or batch) - InstallLogBuffer — per-MAC ring buffer (2000 lines) + file persistence - GET /api/logs/:mac — now returns log_lines + log_total alongside stages - SSE /api/logs/:mac/follow — uses named events (event: stage vs event: log) - Progress events forwarded to labd via bastion-progress WebSocket message - Post-provision k3s logs routed through progressBus (was console-only) dnsmasq fixes found during VM testing: - HTTP Boot filename: ipxe-real.efi → ipxe.efi (leftover from old 2-stage approach) - pxe-service directives: only in proxy mode (breaks OVMF PXE in full mode) - PXEClient vendor class echo for UEFI firmware compatibility Integration tests: - PXE boot test: blank UEFI VM → dnsmasq → HTTP Boot → iPXE → bastion → install - ISO boot test: blank VM boots from bastion-generated ISO → same flow - Shared helpers: pxe-network (no DHCP, nftables fix), pxe-vm (UEFI + ISO boot) - test-provision.sh: runs both PXE + ISO tests with prerequisite checks - 250GB sparse QCOW2 disk (LVM layout needs ~204GB) 201 unit tests passing (11 new). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 22:26:33 +00:00
}