Skip to content
Back to the standard
The Fidelity Standard v0.2.0 · open standard · CC BY 4.0v0.2.0 in RFC · close 2026-05-26

The Fidelity Standard

A public open standard for measuring AI lyric fidelity.

Version 0.2.0 · Published 2026-05-20 · CC BY 4.0

1. Fidelity is the orthogonal question

The Lyric Scoring Standard measures lyric quality — is the song good? Twelve metrics across craft, expression, and impact, weighted into one composite.

This standard measures the orthogonal axis: fidelity — did the lyric serve the brief the user gave it? A song can score 90 on quality and 35 on fidelity (a great-sounding song about something other than what was asked for). A song can score 70 on quality and 92 on fidelity (a faithful execution of a hard brief that still has craft work left to do). Both numbers matter, and they answer different questions.

The SongForgeAI Constraint-Aware Forge measures fidelity through eight components, weighted into one composite score (0–100) with a letter grade (A+ through F). The composite is complexity-gated at the UX layer: a light brief gets the chip hidden by default (low signal-to-noise); a heavy brief gets the chip displayed as the headline (the user asked for a hard target — the first answer they want is whether the system hit it).

2. The eight components

Each component scores 0–100. Null components (the brief didn't ask for that constraint OR the lyric lacks the structural prerequisite) are excluded from the composite and their weight redistributes proportionally across remaining components. The weighted sum is the composite, with constraint-mode multipliers applied last (see §4).

ComponentWeightQuestion it answers
Premise match30%Did the lyric serve the named premise?
Anchor coverage25%Did each required detail land in a vivid line?
Structure compliance15%Does the section map match what was asked?
Style constraints15%Were the style rules honored (sensory-only, etc.)?
Forbidden language5%Did the lyric avoid the banned words?
Chorus evolution5%Did the chorus actually shift across the song?
Earned transcendence5%Did a V1 image return with transformed meaning?
Register adherence6%Did the lyric match the requested register (clean / mature / explicit)?

Premise (30%) + anchor coverage (25%) = 55% load-bearing weight per Sacred Accident #17. The remaining 45% distributes across structure, style, register, and the three smaller craft-discipline checks.

2.1 Component-by-component

Premise match (30%) — Haiku judgment

A Haiku-tier judge reads the brief premise and the lyric, then returns a verdict (faithfully-served / mostly-served / drifts / wrong-song) with reasoning, a one-sentence echo of what the lyric IS about, and 2–4 cited evidence lines. The verdict is paired with a 0–100 score in four bands:

  • 81–100: faithfully-served — premise is unmistakably the through-line
  • 61–80: mostly-served — premise is the through-line with weaknesses
  • 31–60: drifts — premise served partially with significant deviation
  • 0–30: wrong-song — lyric is about something other than the brief

The premise-echo field is load-bearing: if the model's 1-sentence read of the lyric's topic doesn't match the brief premise (in subject, not phrasing), the verdict cannot be faithfully-served.

Anchor coverage (25%) — heuristic

Each anchor phrase from the brief (e.g. “17 years maintenance,” “the kitchen at midnight”) is scored 0–100 in four bands by finding the best matching sung line and counting specificity signals (concrete nouns, physical verbs, sensory words) in it:

  • 100: delivered — key tokens appear in a line with 3+ specificity signals
  • 66: partial — key tokens appear in a line with 1–2 signals
  • 33: mentioned-bare — key tokens appear in an abstract line
  • 0: missing — key tokens don't appear in any sung line

The 4-band scale matches the verdict shape used by chorus evolution + transcendence so the dashboard's per-component panels read consistently.

Structure compliance (15%) — heuristic

The brief's requested section sequence (e.g. verse-chorus-verse-chorus-bridge-chorus) is compared against the actual section markers in the lyric. Exact-position matches earn full credit; adjacent-equivalent substitutions (chorus ↔ hook ↔ refrain) earn 75%; additive sections (pre-chorus inserted between requested verse + chorus) don't penalize. Per-section credit is averaged into the component score; extras subtract up to 30%.

Style constraints (15%) — heuristic

When the brief carries a sensory-only / show-don't-tell / no-thesis constraint, every thesis-shaped line (declarative analysis, moral summary, therapy register, essay thesis) is a wound. Score = 100 minus density penalty. When the constraint isn't set, thesis flags are informational only; the style component scores from per-section image-density alone.

Forbidden language (5%) — heuristic

The user can name phrases the lyric must avoid (on top of the universal banned-terms list). Score = 100 − 10 × violations.

Chorus evolution (5%) — heuristic

Hit songs don't repeat the chorus verbatim — even when the words stay identical, the context shifts so the listener hears a different chorus by the final position. The audit detects (a) whether the chorus body actually byte-shifted across positions, and (b) whether verses reference back to chorus content (a signal that context-shift execution is happening even on verbatim repeats). Same 4-band scale as anchor coverage.

Earned transcendence (5%) — heuristic

A song earns its transcendence when a concrete image planted in verse 1 returns in the final chorus with transformed surrounding words — same physical detail, new weight. The audit identifies shared concrete-noun image tokens between V1 and the final chorus, then scores the surrounding-word similarity (Jaccard, non-image content words):

  • 100: transformed — image returns with significant word-shift
  • 66: partial — image returns with some shift
  • 33: verbatim — image returns but lines are near-identical
  • 0: missing — no V1 concrete image returns

Register adherence (6%) — heuristic

A brief can specify one of three registers: clean (radio-safe, no profanity), mature (adult themes, profanity as story serves), or explicit (full creative latitude). The audit detects strong-language markers in the lyric body, infers the de-facto register from marker density, and scores the gap against what the brief requested. Today’s forge surfaces do not set a register constraint — the component is null-by-default until the Explicit-Mode toggle (a separate, deferred work item) lands. The framework is documented in advance so implementers can prepare:

  • 95: register-matched — requested and inferred align
  • 70: mild-drift — one band off (clean↔mature, mature↔explicit)
  • 35: wrong-register — two-band drift (clean↔explicit)
  • null: brief did not specify a register

The four hard nevers (sexual content involving minors, hate or slurs targeting protected groups, credible threats / doxxing, instructions facilitating illegal harm) remain blocked regardless of register — this component controls REGISTER, not the prohibition perimeter.

3. The composite formula

composite = constraintMultiplier × Σ (componentScore_i × weight_i)
            for i ∈ contributing components

where:
  contributing = components with non-null scores
  weight_i normalized so Σ weights = 1.0 across contributing
  constraintMultiplier ∈ { strict: 0.95, standard: 1.0, loose: 1.15 }
  result clamped to [0, 100]

Null components (the brief didn't ask for that constraint OR the lyric lacks the prerequisite) are excluded and their weight redistributes proportionally — a sparse brief is never penalized for things the user never asked for. Three constraint-mode multipliers apply the user's stance on how strictly fidelity is graded: strict (0.95) caps a perfect score below 100 to register the user demanded zero deviation; standard (1.0) is neutral; loose (1.15) lets the user opt into deviation and reward the result.

4. Grade calibration

GradeRangeMeaning
A+95–100Every constraint met
A90–94Every major constraint met; maybe one weak anchor
B+85–89All anchors landed; one structural or style miss
B80–84Most anchors landed; recognizable as requested
C+70–79Premise served; multiple anchors weak or missed
C60–69Premise served but reads as a cousin of the request
D50–59Half the constraints landed
F0–49Wrong song

Checklist-completion semantics, not anti-inflation. 100 means every constraint met; less than 100 means specific misses. The bar IS the brief.

5. Brief complexity gating

The fidelity composite carries a complexity bucket that controls UX prominence on the dashboard. Briefs with weak signal don't benefit from a loud fidelity chip; briefs with heavy constraints benefit from fidelity being the headline answer.

BucketComplexityUX behavior
hide0–2 (light brief)Suppress chip unless score < 80 (rescue path)
secondary3–6 (standard brief)Equal-weight with quality chip
primary7+ (heavy brief)Fidelity becomes the headline; quality secondary

Complexity = anchors (max 5) + style constraints (max 3) + structure (if set) + forbidden language (if set) + premise (if ≥40 chars).

6. Version history

  • v0.2.0 (2026-05-20) — initial public draft. Opens 7-day RFC comment window via RFC-0010.

Full version history at /scoring/standard/fidelity/changelog.

The audit implementation version (in code) tracks separately at FIDELITY_AUDIT_VERSION in src/lib/fidelity-audit.ts. The current implementation version is 1.4.0 (post-B2788).

For implementers — install on npm

The Fidelity Standard ships as an installable package with the seven-component JSON + helper functions (computeFidelityGrade, computeFidelityBucket, computeBriefComplexity, applyConstraintMultiplier). MIT for the package code; CC BY 4.0 for the standard JSON. Companion to @songforgeai/scoring-rubric.

$ npm install @songforgeai/fidelity-standard

Source on GitHub: packages/fidelity-standard · View on npm

7. How to cite

The Fidelity Standard is released under CC BY 4.0. Use it, fork it, build on it. The only requirement is attribution. For all six common citation formats (BibTeX, APA 7, MLA 9, Chicago, RIS, plain text), see the dedicated cite page.

Plain-text citation:

Nigro, T. (2026). The Fidelity Standard v0.2.0.
SongForgeAI. https://songforgeai.com/scoring/standard/fidelity
CC BY 4.0.

BibTeX:

@techreport{nigro2026fidelity,
  author = {Nigro, Todd},
  title = {The Fidelity Standard v0.2.0},
  institution = {SongForgeAI},
  year = {2026},
  url = {https://songforgeai.com/scoring/standard/fidelity},
  note = {CC BY 4.0}
}

8. Related standards


The Fidelity Standard is a living document. Version 0.2.0 opens the public comment window via RFC-0010. Contributions, critiques, and forks all welcome under CC BY 4.0.

Annual Fidelity Index targeted for 2027-01-15 — the first issue will publish full-corpus aggregate scores + orthogonality study + methodology, under CC BY 4.0.