Skip to content
All posts
Behind the Scenes2026-04-2711 min readBy Todd Nigro

We rewrote the forge as a state machine. The cutover took five days.

A two-pipeline migration that most teams take a quarter to complete shipped in five builds. Not because it was easy — because three years of strangler-fig discipline factored V1 into reusable modules years before V2 needed them. A retrospective on the build sequence, the 3:1 reuse ratio, and the architectural debt we deliberately did not pay.

The headline

SongForgeAI's forge page was rewritten as a state machine. The cutover from the legacy imperative path (V1) to the state-machine architecture (V2) shipped over five builds in a single working day. V1 still exists as the rollback target; the bake window is 14 days and then it gets deleted.

Most cutovers of this shape — replacing the spine of a multi-thousand-LOC product surface — take a quarter. Some never finish. This one shipped in a day because the actual rewrite work had been done in the background for three years, distributed across roughly 100 commits, none of which knew it was preparing for a V2.

This post is the trail. The principle worth taking from it is simple: refactor compounds. Extract-method discipline that looks like local cleanup at the time accumulates into reuse-ratios that make the next architectural shift cheap.

The build sequence

B1532 — Batch mode

V2 had been admin-only since B1047. The shell handled single-song forges end-to-end with a clean state-machine loop. Batch mode (5–10 songs back-to-back, queue-builder + cooldowns + wake-lock + per-song persistence + 429 retry) had zero coverage.

The B1532 build added ForgeV2BatchPanel.tsx at ~365 LOC. The panel mounts six existing V1 primitives:

  • useBatchState (96 LOC, B1127) — refs + state for the queue
  • useBatchGenerators (104 LOC, B1466) — Random / Prompt-Inspired / Artist-Inspired queue builders
  • useForgeUsage (68 LOC, B1118) — usage gating
  • createBatchForgeHandler (266 LOC, B1352 factory) — orchestration
  • processBatchSong (424 LOC, B1034) — per-song flow
  • BatchResultsGrid — result view

None of these were extracted "for the V2 cutover." They were extracted across builds B873 through B1466 because the original forge page was getting too big and EXTRACT-1 had a per-build ratchet to lift pure functions out. Every single one happened to be the right shape for V2 to remount as a thin wrapper.

Total V2-specific code added: ~365 LOC. Total V1 code reused: ~1,500 LOC. Build shipped clean against 123 V1 batch tests + 49 V2 state-machine tests.

B1533 — Surprise mode

The "click 🎲 with no prompt and the platform rolls a random ghost / splice / voltage / vocal / scenario / emotion / cleanMode for you" flow. Audit had flagged it as the second cutover gap because the homepage CTA depends on it.

This build did require some plumbing — V2's SUBMIT event didn't carry creative-palette fields. A new CreativePalette interface threaded through the state base + every transition via the existing base() helper. The runForge effect contract gained an optional 4th palette parameter. Production adapter detects surprise mode via palette.surpriseLabel and emits the right body shape.

The actual rolling logic? Reused V1's rollSurpriseMode + buildSurpriseLabel exports verbatim. Both pure functions. Both tested. Zero adapter layer needed.

Critical detail: the palette plumbing added in B1533 is exactly what B1534 (Customize panel) needed. Two birds, one wiring pass. This wasn't planning — it was the natural shape the state machine wanted.

B1534 — Customize panel

Voltage slider, ghost picker, genre splice toggle, clean-mode segmented control. The power-user surface.

Mounted V1's useGenreConfig hook (91 LOC) and ForgeAdvancedOptionsPanel component (446 LOC) directly. The panel is presentational — takes setters as props, owns no state. It doesn't know or care which shell mounts it.

The paletteFromCustomize() helper returns undefined when every knob is at default, which means users who never open the panel get a body shape byte-identical to pre-1534. This is what kept the prompt-drift golden-eval gate (12 snapshots) green by construction.

B1535 — Refine flow

Refine is the second-most-common entry point — paste lyrics, lock the lines you love, set a preservation level, get a before/after comparison. Used by the dashboard's "Refine This Song" button.

Same play. useRefineState (84 LOC), createRefineHandler (168 LOC factory), RefineInputSection (195 LOC), RefineResultView (188 LOC), useForgeUsage, plus preflightRefine + runRefineWithRetries. Six pure primitives, each extracted across earlier builds for unrelated reasons. V2's ForgeV2RefinePanel (~245 LOC) wired them together.

The mounting pattern preserved a contract that mattered: the dashboard "Refine This Song" button stashes lyrics in sessionStorage.refine_lyrics then navigates to /forge?mode=refine. V2 reads + clears the key on mount. So the dashboard's existing button kept working when V2 became the default — no dashboard code change required.

B1536 — The flip

The default flag inverted. NEXT_PUBLIC_FORGE_V2_DISABLED=1 became the operator's kill switch (60-second redeploy rolls back the entire user base). ?forge_v2=0 on any URL became the user-facing escape. The amber "admin preview" banner downgraded to a footer-only legacy escape link. The 12-test flag suite was rewritten to lock the new precedence.

Zero prompt builders touched. Zero API routes changed. The pipeline was frozen by design across all five builds.

The numbers

MetricValue
Builds in cutover5 (B1532–B1536)
V2-specific code added~1,250 LOC
V1 code reused unchanged~3,400 LOC
Reuse ratio3:1
Tests at start~210 files, all green
Tests at endSame files, all green
Prompt builders touched0
API routes touched0
Golden-eval prompt-drift snapshots regenerated0 of 12

The single most useful number is the reuse ratio. Three lines of V1 code remounted for every one line of V2 code written. That ratio is not a coincidence and it is not a result of skill. It is the cumulative payoff of an extract-method discipline that ratcheted forge page.tsx from 2,800 LOC to 1,909 LOC across 34 EXTRACT-1 passes, and SongDetail.tsx from ~1,400 to ~1,160 LOC across 7 passes — work that happened across builds B580 through B1466, none of which were planning a V2 rewrite.

What didn't get paid for

Two debts deliberately survived the cutover:

1. The ?forge_v2=0 escape hatch

Still wired. Will be deleted in B1538+ after the 14-day bake window. Until then it's the user-facing safety net — anyone hitting a V2 regression appends the param and gets V1.

This is technical debt, but it's also auditable debt. The deletion is on the punch list. The kill-switch path is documented. Operators who need to roll back can do so in a single redeploy.

2. The legacy V1 forge code path (~600 LOC)

Lives at src/app/forge/page.tsx + src/app/forge/handle-forge-v1.ts + ~10 helpers that V2 replaced. Will be deleted in B1538+ after the bake window. The reason it survives the cutover: a single-shot deletion would require the bake window to PROVE V2 has zero regressions. It hasn't proved that yet (~12 hours of production traffic at the time of writing). Two weeks of real users + the kill switch as insurance is the right pace.

The principle

Most product surfaces accumulate accidental coupling. A new feature lands in the page component. The component grows. The next feature has to wedge between the existing ones. By the time anyone wants to "rewrite the spine," the spine is glued to every leaf and the rewrite touches everything.

Extract-method discipline pulls leaves off as they grow. Each extraction looks like local cleanup. None of them announce themselves as "preparing for a future architectural shift." But the cumulative effect is that when a future shift becomes worthwhile, the leaves are already free-standing — they remount onto a new spine without rewrites.

The compounding cost: every extraction takes a build. The compounding payoff: every shift that comes later costs less than it would have. The trade is the right one because shifts always come.

What I'd do differently

Catch the alert() shim earlier. Two of the new V2 panels (Batch, Refine) shipped with window.alert() as the upgrade-modal fallback because V2 didn't have a modal system at the time. Both got replaced in B1540 with the V1 UpgradeModal component. Should have been caught in code review on B1532 / B1535.

Move trust signals to the result view immediately. The old forge result was a green-bordered debug card showing title + score + lyrics in a <pre> tag. The attribution receipt download, the deterministic-analyzer panels, the 12-metric breakdown — all invisible from the result. Fixed in the design council's B1538 spec, but should have been caught in the B1535 result extraction.

Mount a real CommandPalette in B1532, not B1539. The mode toggle (Single | Batch | Refine) should never have been visible tabs. Should have been a ⌘K command palette + footer hints from the first batch port. Cosmetic mistake; cost a build to fix.

What's next

The 14-day bake. Telemetry watching for V2 vs V1 latency comparison, score-distribution drift, new error signatures. If clean, B1538+ deletes the legacy path. If a regression surfaces, the kill switch puts everyone back on V1 in 60 seconds and we investigate.

The reuse ratio for the next architectural shift will be higher again because B1532–B1540 added new extracted modules: TypographyMoment, VoiceControl, LyricsRender, SurprisePreRoll, CommandPalette, commit-subject-parser. Each one is a leaf the next refactor will remount for free.

The discipline is boring. The compounding is not.