The Fidelity Standard
A public open standard for measuring AI lyric fidelity.
Version 0.2.0 · Published 2026-05-20 · CC BY 4.0
1. Fidelity is the orthogonal question
The Lyric Scoring Standard measures lyric quality — is the song good? Twelve metrics across craft, expression, and impact, weighted into one composite.
This standard measures the orthogonal axis: fidelity — did the lyric serve the brief the user gave it? A song can score 90 on quality and 35 on fidelity (a great-sounding song about something other than what was asked for). A song can score 70 on quality and 92 on fidelity (a faithful execution of a hard brief that still has craft work left to do). Both numbers matter, and they answer different questions.
The SongForgeAI Constraint-Aware Forge measures fidelity through eight components, weighted into one composite score (0–100) with a letter grade (A+ through F). The composite is complexity-gated at the UX layer: a light brief gets the chip hidden by default (low signal-to-noise); a heavy brief gets the chip displayed as the headline (the user asked for a hard target — the first answer they want is whether the system hit it).
2. The eight components
Each component scores 0–100. Null components (the brief didn't ask for that constraint OR the lyric lacks the structural prerequisite) are excluded from the composite and their weight redistributes proportionally across remaining components. The weighted sum is the composite, with constraint-mode multipliers applied last (see §4).
| Component | Weight | Question it answers |
|---|---|---|
| Premise match | 30% | Did the lyric serve the named premise? |
| Anchor coverage | 25% | Did each required detail land in a vivid line? |
| Structure compliance | 15% | Does the section map match what was asked? |
| Style constraints | 15% | Were the style rules honored (sensory-only, etc.)? |
| Forbidden language | 5% | Did the lyric avoid the banned words? |
| Chorus evolution | 5% | Did the chorus actually shift across the song? |
| Earned transcendence | 5% | Did a V1 image return with transformed meaning? |
| Register adherence | 6% | Did the lyric match the requested register (clean / mature / explicit)? |
Premise (30%) + anchor coverage (25%) = 55% load-bearing weight per Sacred Accident #17. The remaining 45% distributes across structure, style, register, and the three smaller craft-discipline checks.
2.1 Component-by-component
Premise match (30%) — Haiku judgment
A Haiku-tier judge reads the brief premise and the lyric, then returns a verdict (faithfully-served / mostly-served / drifts / wrong-song) with reasoning, a one-sentence echo of what the lyric IS about, and 2–4 cited evidence lines. The verdict is paired with a 0–100 score in four bands:
- 81–100: faithfully-served — premise is unmistakably the through-line
- 61–80: mostly-served — premise is the through-line with weaknesses
- 31–60: drifts — premise served partially with significant deviation
- 0–30: wrong-song — lyric is about something other than the brief
The premise-echo field is load-bearing: if the model's 1-sentence read of the lyric's topic doesn't match the brief premise (in subject, not phrasing), the verdict cannot be faithfully-served.
Anchor coverage (25%) — heuristic
Each anchor phrase from the brief (e.g. “17 years maintenance,” “the kitchen at midnight”) is scored 0–100 in four bands by finding the best matching sung line and counting specificity signals (concrete nouns, physical verbs, sensory words) in it:
- 100: delivered — key tokens appear in a line with 3+ specificity signals
- 66: partial — key tokens appear in a line with 1–2 signals
- 33: mentioned-bare — key tokens appear in an abstract line
- 0: missing — key tokens don't appear in any sung line
The 4-band scale matches the verdict shape used by chorus evolution + transcendence so the dashboard's per-component panels read consistently.
Structure compliance (15%) — heuristic
The brief's requested section sequence (e.g. verse-chorus-verse-chorus-bridge-chorus) is compared against the actual section markers in the lyric. Exact-position matches earn full credit; adjacent-equivalent substitutions (chorus ↔ hook ↔ refrain) earn 75%; additive sections (pre-chorus inserted between requested verse + chorus) don't penalize. Per-section credit is averaged into the component score; extras subtract up to 30%.
Style constraints (15%) — heuristic
When the brief carries a sensory-only / show-don't-tell / no-thesis constraint, every thesis-shaped line (declarative analysis, moral summary, therapy register, essay thesis) is a wound. Score = 100 minus density penalty. When the constraint isn't set, thesis flags are informational only; the style component scores from per-section image-density alone.
Forbidden language (5%) — heuristic
The user can name phrases the lyric must avoid (on top of the universal banned-terms list). Score = 100 − 10 × violations.
Chorus evolution (5%) — heuristic
Hit songs don't repeat the chorus verbatim — even when the words stay identical, the context shifts so the listener hears a different chorus by the final position. The audit detects (a) whether the chorus body actually byte-shifted across positions, and (b) whether verses reference back to chorus content (a signal that context-shift execution is happening even on verbatim repeats). Same 4-band scale as anchor coverage.
Earned transcendence (5%) — heuristic
A song earns its transcendence when a concrete image planted in verse 1 returns in the final chorus with transformed surrounding words — same physical detail, new weight. The audit identifies shared concrete-noun image tokens between V1 and the final chorus, then scores the surrounding-word similarity (Jaccard, non-image content words):
- 100: transformed — image returns with significant word-shift
- 66: partial — image returns with some shift
- 33: verbatim — image returns but lines are near-identical
- 0: missing — no V1 concrete image returns
Register adherence (6%) — heuristic
A brief can specify one of three registers: clean (radio-safe, no profanity), mature (adult themes, profanity as story serves), or explicit (full creative latitude). The audit detects strong-language markers in the lyric body, infers the de-facto register from marker density, and scores the gap against what the brief requested. Today’s forge surfaces do not set a register constraint — the component is null-by-default until the Explicit-Mode toggle (a separate, deferred work item) lands. The framework is documented in advance so implementers can prepare:
- 95: register-matched — requested and inferred align
- 70: mild-drift — one band off (clean↔mature, mature↔explicit)
- 35: wrong-register — two-band drift (clean↔explicit)
- null: brief did not specify a register
The four hard nevers (sexual content involving minors, hate or slurs targeting protected groups, credible threats / doxxing, instructions facilitating illegal harm) remain blocked regardless of register — this component controls REGISTER, not the prohibition perimeter.
3. The composite formula
composite = constraintMultiplier × Σ (componentScore_i × weight_i)
for i ∈ contributing components
where:
contributing = components with non-null scores
weight_i normalized so Σ weights = 1.0 across contributing
constraintMultiplier ∈ { strict: 0.95, standard: 1.0, loose: 1.15 }
result clamped to [0, 100]Null components (the brief didn't ask for that constraint OR the lyric lacks the prerequisite) are excluded and their weight redistributes proportionally — a sparse brief is never penalized for things the user never asked for. Three constraint-mode multipliers apply the user's stance on how strictly fidelity is graded: strict (0.95) caps a perfect score below 100 to register the user demanded zero deviation; standard (1.0) is neutral; loose (1.15) lets the user opt into deviation and reward the result.
4. Grade calibration
| Grade | Range | Meaning |
|---|---|---|
| A+ | 95–100 | Every constraint met |
| A | 90–94 | Every major constraint met; maybe one weak anchor |
| B+ | 85–89 | All anchors landed; one structural or style miss |
| B | 80–84 | Most anchors landed; recognizable as requested |
| C+ | 70–79 | Premise served; multiple anchors weak or missed |
| C | 60–69 | Premise served but reads as a cousin of the request |
| D | 50–59 | Half the constraints landed |
| F | 0–49 | Wrong song |
Checklist-completion semantics, not anti-inflation. 100 means every constraint met; less than 100 means specific misses. The bar IS the brief.
5. Brief complexity gating
The fidelity composite carries a complexity bucket that controls UX prominence on the dashboard. Briefs with weak signal don't benefit from a loud fidelity chip; briefs with heavy constraints benefit from fidelity being the headline answer.
| Bucket | Complexity | UX behavior |
|---|---|---|
| hide | 0–2 (light brief) | Suppress chip unless score < 80 (rescue path) |
| secondary | 3–6 (standard brief) | Equal-weight with quality chip |
| primary | 7+ (heavy brief) | Fidelity becomes the headline; quality secondary |
Complexity = anchors (max 5) + style constraints (max 3) + structure (if set) + forbidden language (if set) + premise (if ≥40 chars).
6. Version history
- v0.2.0 (2026-05-20) — initial public draft. Opens 7-day RFC comment window via RFC-0010.
Full version history at /scoring/standard/fidelity/changelog.
The audit implementation version (in code) tracks separately at FIDELITY_AUDIT_VERSION in src/lib/fidelity-audit.ts. The current implementation version is 1.4.0 (post-B2788).
The Fidelity Standard ships as an installable package with the seven-component JSON + helper functions (computeFidelityGrade, computeFidelityBucket, computeBriefComplexity, applyConstraintMultiplier). MIT for the package code; CC BY 4.0 for the standard JSON. Companion to @songforgeai/scoring-rubric.
Source on GitHub: packages/fidelity-standard · View on npm
7. How to cite
The Fidelity Standard is released under CC BY 4.0. Use it, fork it, build on it. The only requirement is attribution. For all six common citation formats (BibTeX, APA 7, MLA 9, Chicago, RIS, plain text), see the dedicated cite page.
Plain-text citation:
Nigro, T. (2026). The Fidelity Standard v0.2.0.
SongForgeAI. https://songforgeai.com/scoring/standard/fidelity
CC BY 4.0.BibTeX:
@techreport{nigro2026fidelity,
author = {Nigro, Todd},
title = {The Fidelity Standard v0.2.0},
institution = {SongForgeAI},
year = {2026},
url = {https://songforgeai.com/scoring/standard/fidelity},
note = {CC BY 4.0}
}8. Related standards
- The Lyric Scoring Standard — 12-metric quality rubric. Fidelity's orthogonal sibling.
- The Lyric Scoring Standard whitepaper v1.0 — the founder-voice publishing edition.
The Fidelity Standard is a living document. Version 0.2.0 opens the public comment window via RFC-0010. Contributions, critiques, and forks all welcome under CC BY 4.0.
Annual Fidelity Index targeted for 2027-01-15 — the first issue will publish full-corpus aggregate scores + orthogonality study + methodology, under CC BY 4.0.