fix(k3s): audit logs via journald + etcd recovery #13

Merged

michal merged 1 commits from fix/k3s-audit-via-journald into main

2026-05-05 20:29:52 +00:00

Author	SHA1	Message	Date
Michal	dd92147341	fix(k3s): route audit logs through journald, codify etcd member recovery Some checks failed CI/CD / typecheck (pull_request) Failing after 13s Details CI/CD / lint (pull_request) Failing after 23s Details CI/CD / test (pull_request) Failing after 10s Details CI/CD / build (pull_request) Has been skipped Details CI/CD / publish-rpm (pull_request) Has been skipped Details CI/CD / publish-deb (pull_request) Has been skipped Details Two changes prompted by today's etcd raft panic on worker1-k8s0 (tocommit out of range, lost-write on follower) and the cascading disk pressure that surfaced underneath it. Audit logs to journald - kube-apiserver now uses audit-log-path=- so audit events flow to k3s.service stdout and into journald instead of growing files in /var/log/kubernetes. The previous setup combined apiserver's internal rotation with a logrotate *.log glob that double-rotated the rotated files into permanent orphans (observed: 7+ GB). - New journald-limits operation writes a SystemMaxUse=2G drop-in so audit volume cannot fill /var/log even under bursty load. - log-rotation operation repurposed to decommission the obsolete logrotate rule and reap leftover audit files. Idempotent: no-op on fresh installs. Etcd member recovery - New recoverEtcdMember(broken, peer, hostname) codifies the documented k3s recovery: stop k3s, etcdctl member remove, wipe /var/lib/rancher/k3s/server/{db,tls,cred}, restart, poll for rejoin. Refuses to operate when cluster size < 3 to preserve quorum. Tests - 7 new unit tests covering both decommission paths and the recovery procedure (54 total, all green). - install.test.ts asserts the file-based audit args are gone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 21:29:16 +01:00

Author

SHA1

Message

Date

Michal

dd92147341

fix(k3s): route audit logs through journald, codify etcd member recovery

CI/CD / typecheck (pull_request) Failing after 13s

Details

CI/CD / lint (pull_request) Failing after 23s

Details

CI/CD / test (pull_request) Failing after 10s

Details

CI/CD / build (pull_request) Has been skipped

Details

CI/CD / publish-rpm (pull_request) Has been skipped

Details

CI/CD / publish-deb (pull_request) Has been skipped

Details

Two changes prompted by today's etcd raft panic on worker1-k8s0
(tocommit out of range, lost-write on follower) and the cascading
disk pressure that surfaced underneath it.

Audit logs to journald
- kube-apiserver now uses audit-log-path=- so audit events flow to
  k3s.service stdout and into journald instead of growing files in
  /var/log/kubernetes. The previous setup combined apiserver's
  internal rotation with a logrotate *.log glob that double-rotated
  the rotated files into permanent orphans (observed: 7+ GB).
- New journald-limits operation writes a SystemMaxUse=2G drop-in so
  audit volume cannot fill /var/log even under bursty load.
- log-rotation operation repurposed to decommission the obsolete
  logrotate rule and reap leftover audit files. Idempotent: no-op
  on fresh installs.

Etcd member recovery
- New recoverEtcdMember(broken, peer, hostname) codifies the
  documented k3s recovery: stop k3s, etcdctl member remove, wipe
  /var/lib/rancher/k3s/server/{db,tls,cred}, restart, poll for
  rejoin. Refuses to operate when cluster size < 3 to preserve
  quorum.

Tests
- 7 new unit tests covering both decommission paths and the
  recovery procedure (54 total, all green).
- install.test.ts asserts the file-based audit args are gone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-05 21:29:16 +01:00

fix(k3s): audit logs via journald + etcd recovery #13

1 Commits