fix(deploy): self-healing pre-migrate bootstrap for SecretBackend rollout
Some checks failed
CI/CD / typecheck (push) Successful in 51s
CI/CD / lint (push) Successful in 1m42s
CI/CD / test (push) Successful in 1m6s
CI/CD / smoke (push) Failing after 3m41s
CI/CD / build (push) Successful in 4m31s
CI/CD / publish (push) Has been skipped

Why: clusters upgrading from the pre-SecretBackend schema crash-loop on the
first rollout. `prisma db push` applies the Phase 0 migration as three
sequential steps — add Secret.backendId column (default ''), create
SecretBackend table, add FK — and the FK fails because empty-string values
reference no row in the empty SecretBackend table. This happened on the live
cluster today; I fixed it by hand with psql. This PR makes the fix
automatic so a fresh cluster or anyone replaying the migration doesn't hit
the same trap.

- New `src/db/src/scripts/pre-migrate-bootstrap.ts` — idempotent node script.
  Checks if SecretBackend table exists; if so, ensures a default row exists
  (insert on conflict noop), then backfills any Secret.backendId = '' to
  point at it. Uses Prisma raw queries so it runs against a partially-
  migrated schema.

- `deploy/entrypoint.sh` now catches a failed first push, runs the
  bootstrap, and retries. Fresh installs and fully-migrated clusters take
  the happy path (one push, no bootstrap needed). Pre-Phase-0 upgrades take
  the healing path (push fails → bootstrap seeds → retry succeeds).

- The bootstrap is deliberately non-fatal — even on unexpected errors it
  logs and exits 0 so the retry still runs. If that retry also fails, the
  push error surfaces normally and the pod crash-loops visibly rather than
  silently starting in a half-migrated state.

Verified the idempotent path logically: on the already-bootstrapped cluster
(1 backend row, 0 empty-backendId Secrets), the script's UPDATE matches
zero rows and the INSERT hits ON CONFLICT DO NOTHING — pure no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Michal
2026-04-19 22:59:07 +01:00
parent d5236171cc
commit a21220b6f6
2 changed files with 121 additions and 1 deletions

View File

@@ -1,8 +1,23 @@
#!/bin/sh
set -e
# Self-healing schema push:
# 1. Try once — for fresh installs and already-migrated clusters this is all
# that's needed.
# 2. On failure (typically a Phase 0 upgrade where the new SecretBackend FK
# can't attach because pre-existing Secret rows reference nothing), run
# the pre-migrate bootstrap to seed a default SecretBackend + backfill
# Secret.backendId, then retry.
# 3. If the retry still fails, let the error surface so the pod crashes
# visibly rather than starting in a half-migrated state.
echo "mcpd: pushing database schema..."
if pnpm -F @mcpctl/db exec prisma db push --schema=prisma/schema.prisma --accept-data-loss 2>&1; then
:
else
echo "mcpd: schema push failed — running pre-migrate bootstrap + retrying..."
node src/db/dist/scripts/pre-migrate-bootstrap.js || true
pnpm -F @mcpctl/db exec prisma db push --schema=prisma/schema.prisma --accept-data-loss 2>&1
fi
echo "mcpd: seeding templates..."
TEMPLATES_DIR=templates node src/mcpd/dist/seed-runner.js

View File

@@ -0,0 +1,105 @@
/**
* Self-healing pre-migration step for the SecretBackend rollout (Phase 0).
*
* Why this exists: `prisma db push` applies schema changes sequentially. When
* a cluster upgrades from a pre-SecretBackend DB:
* 1. `Secret.backendId` column is added with `DEFAULT ''`
* 2. `SecretBackend` table is created (empty)
* 3. The FK `Secret.backendId → SecretBackend.id` is added — and FAILS
* because every Secret row now has `backendId = ''` which references no
* row in SecretBackend.
*
* This script runs AFTER a failed `prisma db push` attempt:
* - If SecretBackend table doesn't exist yet → noop (fresh install case;
* db push will create everything and the FK succeeds because there are
* no Secret rows to violate it).
* - If SecretBackend exists but is empty → insert a default plaintext row.
* - If any Secret rows have `backendId = ''` → point them at the default.
*
* Idempotent: safe to run multiple times. No-op on a fully-migrated cluster.
* Never throws; logs and exits 0 even on errors so the subsequent
* `prisma db push` retry is still attempted.
*/
import { PrismaClient, Prisma } from '@prisma/client';
const DEFAULT_ID = 'cdefault000backend00000001';
async function main(): Promise<void> {
const prisma = new PrismaClient();
try {
// Does the SecretBackend table exist yet? We check by querying the
// information_schema rather than catching Prisma's error — cleaner, and
// lets us distinguish "table missing" from "query succeeded but empty".
const tableExists = await prisma.$queryRaw<Array<{ exists: boolean }>>`
SELECT EXISTS (
SELECT 1 FROM information_schema.tables
WHERE table_schema = 'public' AND table_name = 'SecretBackend'
) AS exists
`;
if (!tableExists[0]?.exists) {
console.log('bootstrap: SecretBackend table not present yet — skipping');
return;
}
// Ensure at least one row exists, marked isDefault.
const existingDefault = await prisma.$queryRaw<Array<{ id: string }>>`
SELECT id FROM "SecretBackend" WHERE "isDefault" = true LIMIT 1
`;
let defaultId: string;
if (existingDefault.length === 0) {
await prisma.$executeRaw`
INSERT INTO "SecretBackend"
("id", "name", "type", "config", "isDefault", "description", "version", "createdAt", "updatedAt")
VALUES (
${DEFAULT_ID},
'default',
'plaintext',
'{}'::jsonb,
true,
'Default in-database plaintext backend. Seeded by pre-migrate-bootstrap.',
1,
CURRENT_TIMESTAMP,
CURRENT_TIMESTAMP
)
ON CONFLICT (name) DO NOTHING
`;
// Re-read — if there was an existing row with the same name but no
// isDefault flag we need its id, not the one we tried to insert.
const afterInsert = await prisma.$queryRaw<Array<{ id: string }>>`
SELECT id FROM "SecretBackend" WHERE name = 'default' LIMIT 1
`;
if (afterInsert.length === 0) {
console.log('bootstrap: could not establish a default SecretBackend — bailing');
return;
}
defaultId = afterInsert[0]!.id;
// Make sure it's flagged default.
await prisma.$executeRaw`
UPDATE "SecretBackend" SET "isDefault" = true WHERE id = ${defaultId}
`;
console.log(`bootstrap: seeded default SecretBackend (id=${defaultId})`);
} else {
defaultId = existingDefault[0]!.id;
}
// Backfill Secret.backendId for any rows left with an empty value.
// Using $executeRaw returns affected row count.
const updated = await prisma.$executeRaw(
Prisma.sql`UPDATE "Secret" SET "backendId" = ${defaultId} WHERE "backendId" = ''`,
);
if (updated > 0) {
console.log(`bootstrap: backfilled ${updated} Secret row(s) with default backendId`);
}
} catch (err) {
// Never fail the deploy — worst case prisma db push tries again anyway.
// Log the error so it's visible in pod logs.
console.error('bootstrap: non-fatal error:', err instanceof Error ? err.message : err);
} finally {
await prisma.$disconnect();
}
}
main().catch((err: unknown) => {
console.error('bootstrap: fatal error (ignored):', err);
// Intentionally exit 0 — we don't want to block the deploy on this.
});