Skip to content
All posts
Behind the Scenes2026-05-017 min readBy the SongForgeAI team

We claimed signed seals for 390 builds. They weren’t actually signed.

A real engineering postmortem. The cryptographic seal infrastructure shipped at Build 1431. The env var that activated it was set in Build 1817 — 390 builds and six months later. Here is what we found, what it cost, and what we changed so the class of bug can’t hide again.

This is a postmortem. We are publishing it because the bug is the kind of bug that makes you uncomfortable to publish about, and that is exactly when it becomes most important to.

The claim and the reality

For roughly six months, the SongForgeAI homepage, the /scoring/standard/whitepaper, the SDK README, and the developer reference all carried some version of the same statement: every score response is signed with an ed25519 cryptographic seal so any third party can verify the score independently.

The infrastructure was real. signSeal() existed in src/lib/seal-signer.ts and used @noble/ed25519. The public key was published at /.well-known/songforgeai-pubkey.json. The verifier code was importable. The TypeScript types had a signature field on the seal payload.

What was missing was a single line in our Vercel project configuration: the SEAL_SIGNING_PRIVATE_KEY environment variable was never set. signSeal() begins with a guard that returns null when the env var is absent. Score responses kept their seal block (rubric version, model, temperature, build SHA) but the signature field was always omitted.

For 390 builds.

How we missed it

We have a CI gate called check:trust-claims. It runs every push and asserts that 31 specific public claims about the codebase remain true at runtime. One of those claims is reproducibility-seal-shape, which validates that score responses include a seal block with the right structure: rubricVersion, model, temperature, buildSha, build.

The check passed. Every time. Because the seal block was the right shape. The signature field, when present, was a separate optional field. The check verified the structure, not the field's presence under realistic runtime conditions. The gate kept passing because the bug it was designed to prevent was a different bug. The gap between "infrastructure deployed" and "runtime activated" sat in a place no test was looking.

This is the most quietly dangerous category of bug in any product. Code review caught the seal-signer module the day it was added. TypeScript caught every type error in the seal payload. CI caught every regression in the seal block's shape. Nothing caught "you forgot to set the env var that makes the entire thing work."

How we found it

The discovery came indirectly. We were shipping /verify, a public-facing page where any visitor can paste a seal + signature + public key and watch ed25519 verification run in their browser. The page needed a "use the bundled example" button that loaded a known-good fixture. To generate the fixture we needed to sign a payload locally with the production private key.

The operator opened Vercel's environment variables panel to retrieve the key. The variable was not in the list. We assumed user error — different account, wrong project, search filter. It wasn't. The variable was not set, had never been set, and had been silently shaped by Vercel's UI as if a value existed (the placeholder text was sk_live_a12…, which made it look like a Stripe-style key was redacted, when in fact the field was empty).

We generated a new keypair locally, wrote the public key to the well-known file, set the private key in Vercel, redeployed, and verified that the next score response carried a real signature.

The total time from discovery to fix was 13 minutes.

What it cost

Let us be honest about this:

  • No user was harmed. The seal block still existed; it just didn't carry a signature. Anyone reading a score response had the same data they had before, minus a field that was advertised but never delivered.
  • No invariant was broken. Scores remained reproducible against the seal's metadata (rubric version, model, temperature). The signature was a layer of cryptographic guarantee on top of that, not the underlying determinism.
  • The marketing claim was false. "Every score is signed" was true in code, false at runtime, for six months. We told visitors something we weren't actually doing. That is the cost.

We are not going to soften this. The honest description of what happened is: we shipped a public claim about a feature we hadn't fully turned on. The fact that the feature was halfway-on (the seal block existed, the runtime wasn't signing) made it harder to detect, but easier to defend, which is exactly the wrong combination.

What we changed

One commit, one new CI gate.

The new gate is check:runtime-activation. It runs every 30 minutes against the production deployment. It hits /api/health, reads a new runtimeFlags object that we added to the response, and asserts that every env-gated optional feature — seal signing, funnel events writes, the synthetic Crucible monitor's bypass token, Stripe, Anthropic, Upstash — reports true. If any feature is supposed to be active and isn't, the gate fails red and pages the operator within 30 minutes.

The discipline this enforces is: when you ship a new env-gated feature, you add the flag to runtimeFlags in /api/health, and you add the flag name to the EXPECTED_ACTIVE_FLAGS array in the gate. If you forget to set the env var in production after that, the gate catches it within half an hour.

The class of bug that hid for 390 builds can no longer hide for more than 30 minutes.

Why this is on the blog

Two reasons.

The first reason is selfish. Trust is the moat in a measurement product. We claim "every score is signed" and "the rubric is open" and "the standard is published." When one of those claims turns out to have been false, the right move is not to quietly fix it. The right move is to tell people exactly what happened, what it cost, and what we changed. The cost of the disclosure is small. The cost of being caught hiding it later would be terminal.

The second reason is field-level. Most AI products ship trust claims that are true in code, false at runtime, and impossible to verify externally. The norm is "we use a multi-stage scoring pipeline" or "outputs are validated against an internal rubric" or "every response carries a verifiable signature" — with no public artifact behind any of those claims. We were operating in that norm for six months without realizing it.

The fix is not "trust us more." The fix is to publish the runtime check. Anyone who wants to verify, today, that seal_signing is active in production can hit /api/health and read the runtimeFlags field. Anyone who wants to verify that a score's seal is real can paste it into /verify and watch the math agree. Anyone who wants to read what we just shipped to prevent this class of bug can read the source for check-runtime-activation.ts.

"Trust us" was always the wrong frame. "Verify us" is the only honest one.

The single sentence

If your AI product makes a public claim about a feature, you owe the world a way to verify the feature is on. We failed that test for six months. The fix took 13 minutes. The discipline that makes sure it can't happen again took two hours and is in the codebase under scripts/check-runtime-activation.ts. If you build AI products, copy the pattern.