Skip to content
All guides
Craft2026-05-197 min read

SA#17 — The Orthogonal Question: Why Fidelity Is Different From Quality

You hand the AI a brief: "a song about my grandmother's last summer at the lake house, country, 3 verses + chorus + bridge, no clichés about hands or hearts." The AI gives you back a craft-clean, image-rich, banned-cliché-free song about... a breakup. Or about generic loneliness. The quality is real. The brief was ignored. This is the failure mode Sacred Accident #17 named, and it's the reason SongForgeAI now publishes a second standard alongside the Lyric Scoring Standard: the Fidelity Standard.

The accident

In early 2026 we ran a five-song stress test. Five hard prompts, each with explicit anchors: an old water tower in a dying Midwest town, seventeen years of maintenance, a mother's disappearance, the tower as a character, the last place she felt safe. We watched the songs come back.

The 12-metric quality rubric scored all five in the 80s. They were craft-clean. Image-rich. Banned-cliché-free. Anti-platitude-clean. By every measure our existing rubric tracked, they were good songs.

One of them was a breakup song. Another was a generic loneliness song. A third drifted into a Hozier-tier arena chorus that mentioned the tower exactly once. Only the fourth and fifth songs were actually about what the prompt asked for.

The quality scores were not wrong. The lyrics were good. They just weren't the songs the prompt asked for.

This is the accident SA#17 names: "The system can write a good song without writing the same song."

Two questions, not one

The Lyric Scoring Standard (v1.0, published 2026-04-23) answers is the song good? Twelve metrics across craft, expression, and impact. Weighted into one composite. Anchored at 95 against a 1949 country song that survived seventy-five years.

The Fidelity Standard (v0.1.0, published 2026-05-19) answers did the system write the song you asked for? Seven components, separately weighted, producing a different composite. Both standards are CC BY 4.0 licensed; both are published at /scoring/standard.

Critically: these are orthogonal questions. A song can:

  • Score 90 on quality + 35 on fidelity. (Great-sounding song, wrong prompt.)
  • Score 70 on quality + 92 on fidelity. (Faithful execution of a hard brief, still has craft work left to do.)
  • Score 88 on quality + 88 on fidelity. (Good song that actually served the prompt — the goal.)

The two numbers don't substitute for each other. The 12-metric quality rubric doesn't measure brief-adherence; the 7-component fidelity composite doesn't measure craft. You need both axes to answer "is this lyric usable?"

What the seven fidelity components measure

The fidelity composite weighs seven components:

  • Premise match (30%). Did the lyric serve the named premise? This is the load-bearing component. A Haiku-tier judgment with cited evidence lines.
  • Anchor coverage (25%). Did each required detail land in a vivid line? Heuristic — finds the best-matching sung line per anchor + scores 0/33/66/100 on specificity-signal density.
  • Structure compliance (15%). Does the section map match what was asked? Regex — compares brief.structure to actual section markers with partial credit for adjacent-equivalent sections.
  • Style constraints (15%). Were the style rules honored? Thesis-language detector + per-section specificity scorer.
  • Forbidden language (5%). Did the lyric avoid the banned words? Pure regex.
  • Chorus evolution (5%). Did the chorus actually shift across the song? Detects byte-shift across positions + verse-to-chorus content overlap.
  • Earned transcendence (5%). Did a V1 image return in the final chorus with transformed meaning? Jaccard similarity of non-image content words.

Premise + anchors = 55% load-bearing. The premise is the question; the anchors are the supporting details. Everything else is secondary discipline.

Why this matters for working songwriters

If you're writing for a sync placement, a film project, a country radio target, or a worship anthem with theological constraints — the brief IS the assignment. A great-sounding song that ignored the brief isn't usable. You can't ship it. You can't even iterate from it without restarting against the actual prompt.

Most AI lyric tools optimize the generation step and let "does the output match the prompt?" become a vibes-based human check. That's fine for hobbyist generation. It's not fine for assignment work.

Two-grade scoring lets you treat the brief as the contract. When the fidelity grade is low + the quality grade is high, you know exactly what happened: the system wrote well, but off-prompt. The fix is targeted: re-forge with sharper anchors, or use Refine Mode to rewrite specific sections that drifted from the premise.

When the quality grade is low + the fidelity grade is high, the opposite: the system served the prompt but the craft is weak. Run the gauntlet, target the weak metrics, keep the structural choices.

The dashboard signals

Every song forged after Build 2771 carries a persisted fidelity audit. The dashboard surfaces it in three places:

  • Hero band, side-by-side with quality. Two pills, both color-coded. Complexity-gated: a light brief (just "a song about morning") hides the fidelity chip unless it scores below 80; a heavy brief (5 anchors + style constraints + structure requirement) makes fidelity the headline.
  • FidelityPanel deep disclosure. Click into a song; the panel breaks down all seven components, shows the premise audit's verdict + evidence, lists any sensory-rewrite pairs the gauntlet executed, and shows the planned-vs-shipped chorus evolution.
  • /dashboard?tab=fidelity histogram. Aggregate across your catalog: how many faithfully-served / mostly-served / drifts / wrong-song verdicts. Coverage percentage. Recent win + recent miss.

The leaderboard ranks by composite too: 0.6 × quality + 0.4 × fidelity. A great-quality song that ignored the brief drops; a faithful song with mediocre craft rises.

When fidelity matters more than quality

Some assignments make fidelity the dominant axis:

  • Sync placement briefs. The director wrote the brief. The brief IS the assignment. A lyrically beautiful off-prompt cut fails the brief, not the director's taste.
  • Theological constraint work. Worship songs, hymn-form work, scripture-reframe songs. The lyric must satisfy doctrinal precision. A craft-clean off-doctrine line is unusable even at quality 95.
  • Genre-canonical work. A country radio song must sound like country radio. A pop-punk song must hit the pop-punk register. Mode-specific style constraints make fidelity load-bearing.
  • Story-twist songwriting. If the brief asks for a specific narrative reveal at the bridge, the lyric must execute it. Beautiful execution of a different twist fails the assignment.

Some assignments are looser. Surprise Mode forges with no brief at all — fidelity composite scores trivially because there's no brief to honor. Hobbyist generation often falls in this bucket. The two-grade system gates which questions matter; the complexity-bucket math handles the gating mechanically.

Read the standard

The Fidelity Standard v0.1.0 lives at /scoring/standard/fidelity. It's a public, citable, CC BY 4.0-licensed document covering the seven components, the composite formula, the grade calibration (A+ through F, checklist-completion semantics), the brief-complexity 0-10 formula, and the three UX prominence buckets.

RFC-0010 opens the public comment window through 2026-05-26. The five open questions for the comment phase live in the RFC body: weight allocation, constraint-mode caps, complexity threshold for the 'primary' bucket, chorus + transcendence weight ceilings, and the semantics of an N/A verdict. Comments via email to todd@songforgeai.com with subject "RFC-0010 comment".

After the comment window closes, v1.0.0 ships with any incorporated changes. The audit implementation tracks separately in code at FIDELITY_AUDIT_VERSION (currently 1.5.0).

Related rubric metrics

Every craft directive on this page maps to one or more metrics in the Lyric Scoring Standard. If you want the measurable side:

Related guides