Methodology

How the forge actually works.

SongForgeAI isn’t one prompt and one model. It’s a seven-stage pipeline built on four Berklee-grade songwriting techniques, a 12-metric open-standard rubric, and a severity-routed refinement loop. Everything below is live in production on every song — no toggles, no A/B, no “premium tier” gate on craft.

The pipeline

Prompt enrichment

Optional destination-writing enrichment (Stolpe). Turns "a song about heartbreak" into a lyric-ready brief: destination phrase, angle, sensory anchors, temporal frame, narrator stance.

Try the enrich button

SuperPrompt

A 20-expert panel runs across the enriched prompt to tighten direction, identify tensions worth writing into, and choose the structural move the song needs. Always active, always cheap (Haiku-routed).

50-voice war room forge

The full writing session. 50 legendary songwriters in five panels argue across rounds — ghost collaborators, structure debates, imagery critique, voice auditions. Single-phase Sonnet run; temperature 1.0. Output is Suno-ready lyrics with performance directives.

12-metric eval

The Lyric Scoring Standard v1.2 (CC-BY-4.0). 12 metrics across Craft (25%), Expression (40%), Impact (35%). Anti-inflation rules: Gravity, Burden of Proof, Antagonist Ceiling, Historical Context, Anti-Platitude. A 50 is average; 90+ is rare.

See the rubric

Gauntlet refinement

Severity-routed auto-refinement — Polish, Targeted Surgery, or Structural Rebuild, chosen by the composite score. Gauntlet receives the eval wounds AND the client-side prosody lint (Pattison) AND the Kintsugi assessment (protect vs. replace) as its fix roadmap. Re-scored after; if it didn’t beat the original within a 3-point noise band, we revert.

SuperStyle

Haiku-routed style-string optimizer. Produces the Suno "styles" field (700–900 chars) with Hayes/Brindell arrangement directives baked in when the env gate is set. Formatting task, not creative — cheap by design.

Personal Style Memory

Star any line on any song and it becomes a style anchor in your next forge. Capped at 50 stored lines; top 8 inject into the prompt. The model is instructed NOT to quote them verbatim — the goal is stylistic resonance, not repetition.

Four Berklee-grade techniques, baked in

The techniques below come from published Berklee faculty work — Pat Pattison’s Setting Your Words to Music, Andrea Stolpe’s Popular Lyric Writing: 10 Steps to Effective Storytelling, and the Hayes & Brindell arrangement curriculum. No competitor cites pedagogy. We do, because the pedagogy is what makes the output actually work.

Destination writing (Stolpe)

Every song has a "destination" — the single emotional line the entire lyric walks the listener toward. The enrich button computes this upfront so the forge writes TOWARD something, not just AROUND a topic.

Prosody (Pattison)

Preserve the natural shape of the language. Stressed content words land on strong beats; unstressed function words recede. The forge prompt enforces this; the gauntlet has a line-stress classifier that flags weak endings and stress clusters as wounds to fix.

Arrangement directives (Hayes / Brindell)

Not just WHAT the song says but HOW it unfolds sonically. The SuperStyle step encodes arrangement intent into the Suno style string when enabled — sparse-to-dense verse-to-chorus lift, backing-vocal layering cues, instrumental entry staging.

External/Internal detail (Stolpe)

Verses lean external (concrete, image-evoking, grounds the listener). Choruses lean internal (thought, emotion, names the narrator’s state). The client-side classifier surfaces an E/I chip per line so you can see when a chorus is all abstraction or a verse is all thought.

Diagnostic axes outside the rubric

The 12-metric Lyric Scoring Standard deliberately doesn’t cover every axis a model could judge. Some craft signals are too fragile for the model and too reliable for a deterministic heuristic — we run those client-side, for free, and surface them as diagnostics. They don’t move the composite. They DO feed the gauntlet fix list so weak lines get rewritten, not just flagged.

Singability lint

Per-line warnings: long line (>14 syllables), very-long (>20), variance outlier, weak ending (function-word close), stress cluster (5+ consecutive content words). All fire client-side at zero cost; all feed the gauntlet fix list.

Meter variance

Coefficient of variation on syllable counts within each section. Tight blocks read as "locked in"; loose blocks read as drafts. Surfaced as a grade chip on the result screen.

POV stability

Per-line first-/second-/third-person classification. Tracks mid-song narrator drift — one of the highest-signal craft problems AI lyrics tend to produce and the eval panel tends to miss.

Banned terms

362 AI cliches (neon, echo, shatter, tapestry, etc.) scanned post-generation. Violations trigger a targeted Haiku cleanup pass; the song never ships with them.

Refine: one family, five flavors

Every refinement flavor in SongForgeAI belongs to one family. Auto-Refine runs after every forge. Boost and Fix Wounds are on-demand upgrades. Cold Reader and Refine from Paste are specialized tools.

Auto-Refine

automatic·all tiers

Runs automatically after every forge. Severity-routed (polish / targeted / structural) based on the eval score. Keeps the refined version only if it beat the original within a 3-point noise band.

Refine from Paste

on-demand·all tiers

Paste existing lyrics (yours or anyone’s). Lock lines you want preserved; set a preservation level. Get a before/after score comparison. Starting point for working on a song the forge didn’t write.

Boost

on-demand·paid tiers

Iterative lift toward 90+. Runs up to 5 additional refinement rounds with plateau detection, keeping the highest-scoring version across all iterations. Use when a good song should be a great one.

Fix Wounds

on-demand·paid tiers

Targeted rewrite of the specific lines the eval panel flagged as weak. Preserves everything else. Use when the song is mostly right but one or two lines are dragging it down.

Cold Reader

on-demand·admin only

Adversarial re-read. Reads the lyric as if hearing it for the first time, with no context on what you meant. Surfaces the lines that DO NOT land with a stranger.

Craft analyzers

Deterministic, zero-API-cost diagnostics that run alongside the eval panel. They describe the lyric's shape, meter, POV, rhyme, and singability — but never change the composite score. The Lyric Scoring Standard is the rubric; these are the instruments.

Emotional Arc

Shape·v1.0·shipped b945

Analyzes section-by-section valence + arousal to classify the song's emotional trajectory as flat, single-axis, shaped, or volatile.

Fire Line Classifier

Score·v1.0·shipped b943

Deterministic 7-feature score (0-100) for whether a line is memorable enough to be quoted back months later. Complements the eval panel's subjective Transcendent Lines judgment.

Detail Balance (Stolpe)

Axis·v1.1·shipped b929

Measures external/internal detail ratio per section. Enforces the Stolpe rule: verses lean external (world/body), choruses lean internal (feeling/meaning).

Singability Lint

Lint·v1.0·shipped b903

Flags lines likely to be awkward to sing: syllable counts too long/short, extreme variance within a block. Warning-only — does not affect composite.

Rhyme Scheme Analyzer

Axis·v1.1·shipped b937

Classifies line-end rhymes as perfect / family / slant / none using a curated CMU-dict lookup with heuristic fallback. Powers the "Rhyme Intelligence" eval input.

Meter Variance

Axis·v1.0·shipped b929

Syllable-count consistency within sections. Tight blocks (8/8/10/8) grade well; drift blocks (8/14/5/12) flag as drafts.

POV Stability

Axis·v1.0·shipped b929

Detects mid-song narrator pronoun switches (I → he, we → you) that almost always read as craft problems, not style choices.

Ops discipline

Versioned every deploy. Every push to main ships a new build number. The footer carries it live. Bug reports arrive with a build number attached; we can reproduce the exact production state.

Craft regression floor. CI gates on a test-count floor (930+ as of Build 929). Prompt-assembly invariants, warning-kind coverage, and detail-classifier golden cases all pin in one fixture; silent regressions fail the build.

Per-song craft telemetry. Every completed forge logs composite score, per-metric breakdown, prosody-warning count, external/internal detail ratio, and which craft levers were active. “Did feature X actually move output quality” becomes a SQL query instead of a vibe check.

Route-export guardrail. A CI validator prevents the class of Vercel failure where a Next.js App Router route accidentally exports a helper (seen Build 906). One class of bug, one rule, permanent closure.

SSE safety budgets. All four streaming endpoints share a 285s safety timeout (Vercel kills at 300s). Every stream has a streamOpen flag, retry-on-critical plumbing, and graceful client-side 2x retry on network failure.

The methodology is the product.

No competitor we’ve found cites its pedagogy, publishes its rubric, or gates itself on a craft regression floor. The song you get out of this pipeline doesn’t win because the model is bigger; it wins because the system around the model was built by people who read Pattison and Stolpe and kept score.

Forge a song Read the rubric Songwriting guides