Skip to content
All incidents
high2026-04-26duration unknown — present since at least Build 1075 (changelog) + 1201 (engineering); discovered + fixed Build 1425

/engineering rendered 0 commits + /changelog said "Build log unavailable"

For an unknown stretch of weeks, the public /engineering report rendered every velocity stat as 0 (0 commits, 0 punch-list shipped, 0 files touched, mean 0/day) and /changelog displayed "Build log is unavailable on this deploy." Both pages depend on `git log` at request time. Vercel's runtime container has no git binary; the catch blocks silently returned empty data; the all-zeros UI rendered as if the engineering culture had stopped.

Impact

Two public trust surfaces actively contradicted the brand promise of measurable rigor. Anyone evaluating SongForgeAI's discipline by visiting /engineering or /changelog saw "no activity" — the opposite of the truth (335 commits in the prior 3 days alone). Detected by the operator during an external audit walkthrough that noticed the discrepancy.

Severity

high

Prevention

Pre-build snapshot pattern: scripts/generate-git-snapshot.ts runs during `next build` (where git IS available), writes the parsed log to a JSON file, and both pages read the JSON at runtime. New smoke checks at scripts/smoke.ts assert /engineering shows non-zero commit data and /changelog shows real Build N references — the next regression trips the alarm immediately.

Timeline

  • **Build 1075** (~weeks ago): `/changelog` shipped, calling `execSync('git log ...')` at request time. The page was tested locally where git IS available. On Vercel runtime, the catch block silently returned []. The page rendered the "Build log is unavailable on this deploy" branch.
  • **Build 1201** (~weeks ago): `/engineering` shipped with the same pattern. Same silent failure: every per-day count rendered 0, the heatmap was empty, the highlights list was empty.
  • **2026-04-26 ~17:30 UTC**: Operator noticed the discrepancy while reviewing audit feedback. The /engineering page showed "0 Commits" while `git log` from a local clone showed 803 commits in the last 14 days.
  • **2026-04-26 ~17:50 UTC** (Build 1425): root-caused (Vercel runtime has no git), fix designed (pre-build snapshot pattern), shipped, pushed.
  • **2026-04-26 ~18:22 UTC** (Build 1426): the build script used `tsx` directly instead of `npx tsx`, Vercel build container couldn't find tsx in PATH, deploy failed. Fixed within 5 minutes by matching the existing `npx tsx` pattern from other scripts in package.json.
  • **2026-04-26 ~18:30 UTC**: Both pages confirmed populating correctly on production.

Root cause

Architectural mismatch: both pages were authored as if Vercel's request-time runtime container is identical to the build container. It is not. Specifically:

  • The build container (`vercel build`) runs in a Linux environment with the full `.git` repo and the `git` binary on PATH. `execSync('git log ...')` works.
  • The runtime container (where pages render per-request) is a stripped serverless function. The `.git` directory is not present + `git` is not installed. `execSync('git log ...')` throws.

Both pages caught the exception with `return []` and continued. The catch block was the silent-failure mode — every render produced empty data without surfacing a warning.

The /changelog page is dynamic because it reads `?category=` search params. The /engineering page was explicitly marked `force-dynamic` (per a comment "so each request reflects the actual deploy state"). Both decisions forced runtime evaluation, where git is unavailable.

Detection

The bug was visible from day one to anyone visiting either page — the rendered content (zeros + "unavailable" text) was the symptom. But:

  • No automated test exercised the page content. The CI gates checked typecheck + tests + bundle-size, none of which caught a runtime data-source failure.
  • No smoke check asserted the /engineering or /changelog page content.
  • The pages were "shipped" + assumed working because they returned 200 OK and rendered SOMETHING.

Detection took weeks because the operator + the existing CI surface both treated 200 OK + rendered HTML as "working."

Mitigation

Pre-build snapshot pattern, shipped at B1425:

1. `scripts/generate-git-snapshot.ts` runs at build time (where git IS available). Captures last 300 commits + per-commit file touches as JSON to `src/lib/.git-snapshot.json` (gitignored). 2. `src/lib/git-snapshot-placeholder.ts` is a typed loader that reads the JSON via `fs.readFileSync` (works in the runtime container — JSON is bundled with the deploy). 3. `/changelog` and `/engineering` data layers swapped from `execSync` → `loadGitSnapshot()`. 4. `package.json` build script: `"build": "npx tsx scripts/generate-git-snapshot.ts && next build"`.

The data shape is unchanged. Pages render exactly as they were designed to render the day they shipped.

Prevention

Three layers of defense against recurrence:

1. **Architectural**: the pre-build snapshot pattern documents that any data dependency on shell binaries (git, ffmpeg, etc.) must be resolved at build time, not runtime. The codebase comment in git-snapshot-placeholder.ts pins this for future maintainers.

2. **Smoke checks** (scripts/smoke.ts, B1426): explicit assertions that

  • `/changelog` body contains a Build N reference (catches "Build log unavailable" regression)
  • `/engineering` body does not show "0 Commits" (catches snapshot-script failure)
  • The smoke script runs as `npm run smoke` and is wired into post-deploy checks.

3. **Build-script visibility**: the snapshot script logs `[git-snapshot] wrote N commits ...` on every build. Deploy logs make any "wrote 0 commits" visible to the operator at deploy time, not weeks later when an external auditor catches it.

What this incident cost

Hard to quantify. Two of the highest-trust public surfaces — the engineering report + the changelog — actively undermined the brand for an unknown number of weeks. Every visitor who looked for proof of velocity saw "0/day" and reasonably concluded the project was inactive. The opposite was true (335 commits in the prior 3 days). The credibility lost on those visits is unrecoverable.

What we got right

  • The /incidents page existed before this incident (B1207). The publication discipline did not require negotiation in the heat of the fix.
  • B1425 root-caused + shipped within 30 minutes of operator detection.
  • B1426 hotfix for the broken build script shipped within 5 minutes of CI failure.
  • The smoke checks added at B1426 ensure this exact regression cannot ship silently again.

What we got wrong

  • The "force-dynamic" decision on /engineering was made for the right reason (per-deploy freshness) but executed against the wrong assumption (runtime parity with build). The reasoning in the code comment was correct; the implementation was wrong.
  • No smoke check for /engineering or /changelog content existed. Both pages were treated as "shipped" without any contract about what their rendered content must contain. Smoke checks added retroactively.
  • Detection relied on an external audit walkthrough, not an internal alarm. If the audit hadn't surfaced it, the pages would still be lying about the engineering culture today.

Updated discipline

Going forward: any page that depends on a shell binary, a network call, or an environment variable to render real content gets a smoke check that asserts the real content is present. 200 OK + rendered HTML is not "working" — working means the page tells the truth.