The Fidelity Standard v0.4.0 · open standard · CC BY 4.0RFC-0010 · comment closed 2026-05-26

The Fidelity Standard

A public open standard for measuring AI lyric fidelity.

Version 0.4.0 · Published 2026-06-12 · CC BY 4.0

1. Fidelity is the orthogonal question

The Lyric Scoring Standard measures lyric quality — is the song good? Twelve metrics across craft, expression, and impact, weighted into one composite.

This standard measures the orthogonal axis: fidelity — did the lyric serve the brief the user gave it? A song can score 90 on quality and 35 on fidelity (a great-sounding song about something other than what was asked for). A song can score 70 on quality and 92 on fidelity (a faithful execution of a hard brief that still has craft work left to do). Both numbers matter, and they answer different questions.

The SongForgeAI Constraint-Aware Forge measures fidelity through eight components, weighted into one composite score (0–100) with a letter grade (A+ through F). The composite is complexity-gated at the UX layer: a light brief gets the chip hidden by default (low signal-to-noise); a heavy brief gets the chip displayed as the headline (the user asked for a hard target — the first answer they want is whether the system hit it).

2. The eight components

Each component scores 0–100. Null components (the brief didn't ask for that constraint OR the lyric lacks the structural prerequisite) are excluded from the composite and their weight redistributes proportionally across remaining components. The weighted sum is the composite, with constraint-mode multipliers applied last (see §4).

Component	Weight	Question it answers
Premise match	30%	Did the lyric serve the named premise?
Anchor coverage	25%	Did each required detail land in a vivid line? (Judged by semantic presence since v0.4.0 — echo of the brief's words is not compliance.)
Structure compliance	15%	Does the section map match what was asked?
Style constraints	15%	Were the style rules honored (sensory-only, etc.)?
Forbidden language	5%	Did the lyric avoid the banned words — including synonym/paraphrase smuggles? (v0.4.0: violations hard-cap the composite.)
Chorus evolution	12%	Did the chorus actually shift across the song? (Promoted from 5% in v0.3.0, with a verbatim hard gate.)
Earned transcendence	5%	Did a V1 image return with transformed meaning?
Content rating adherence	6%	Did the lyric match the requested content rating (clean / mature / explicit)?

Weights are raw and normalize over the components that actually contribute. Premise (30%) + anchor coverage (25%) remain the two largest — the load-bearing pair per Sacred Accident #17. The rest distributes across structure, style, chorus evolution (promoted to 12% in v0.3.0), register, and the smaller craft-discipline checks.

2.1 Component-by-component

Premise match (30%) — Haiku judgment

A Haiku-tier judge reads the brief premise and the lyric, then returns a verdict (faithfully-served / mostly-served / drifts / wrong-song) with reasoning, a one-sentence echo of what the lyric IS about, and 2–4 cited evidence lines. The verdict is paired with a 0–100 score in four bands:

81–100: faithfully-served — premise is unmistakably the through-line
61–80: mostly-served — premise is the through-line with weaknesses
31–60: drifts — premise served partially with significant deviation
0–30: wrong-song — lyric is about something other than the brief

The premise-echo field is load-bearing: if the model's 1-sentence read of the lyric's topic doesn't match the brief premise (in subject, not phrasing), the verdict cannot be faithfully-served.

Anchor coverage (25%) — heuristic (constraint-checker since v0.4.0)

Each anchor phrase from the brief (e.g. “17 years maintenance,” “the kitchen at midnight”) is scored 0–100 in four bands by finding the best matching sung line and counting specificity signals (concrete nouns, physical verbs, sensory words) in it. Since v0.4.0 the match is SEMANTIC, not lexical: morphological variants, curated synonym groups (“mama” satisfies a “mother” anchor), and number-word equivalence (“seventeen” satisfies “17”) all count as presence — an anchor can be satisfied without echoing the brief's exact words. Conversely, echoing the brief's words in an abstract line is NOT compliance; it stays in the 33 band.

100: delivered — the anchor's concept is present in a line with 3+ specificity signals
66: partial — the concept is present in a line with 1–2 signals
33: mentioned-bare — the concept appears only in an abstract line (echo without delivery)
0: missing — no key concept present in any sung line (exact, morphological, or synonym)

The 4-band scale matches the verdict shape used by chorus evolution + transcendence so the dashboard's per-component panels read consistently.

Structure compliance (15%) — heuristic

The brief's requested section sequence (e.g. verse-chorus-verse-chorus-bridge-chorus) is compared against the actual section markers in the lyric. Exact-position matches earn full credit; adjacent-equivalent substitutions (chorus ↔ hook ↔ refrain) earn 75%; additive sections (pre-chorus inserted between requested verse + chorus) don't penalize. Per-section credit is averaged into the component score; extras subtract up to 30%.

Style constraints (15%) — heuristic

When the brief carries a sensory-only / show-don't-tell / no-thesis constraint, every thesis-shaped line (declarative analysis, moral summary, therapy register, essay thesis) is a wound. Score = 100 minus density penalty. When the constraint isn't set, thesis flags are informational only; the style component scores from per-section image-density alone.

Forbidden language (5%) — deterministic ban detector, with a hard-ban composite cap (v0.4.0)

The user can name phrases the lyric must avoid (on top of the universal banned-terms list). Since v0.4.0 each banned phrase compiles into deterministic rules: the literal token, morphological variants (“Saturday” catches “Saturdays”), and a curated synonym/paraphrase expansion — a “gravity” ban catches “weightless,” a mortality ban catches the “remember dissolving” smuggle, and weekday-category bans catch all named weekdays. Every violation reports its line number, matched rule, and matched text. Score = 100 − 25 × distinct violated phrases.

Hard-ban cap (v0.4.0): unresolved violations cap the fidelity composite at 60 − 10 × (distinct − 1), floored at 20, with a machine-readable banCap reason on the composite — the grade derives from the capped value, so no A can render on an unresolved-ban song. The reference implementation also caps the orthogonal QUALITY composite at 60 and structurally bars violating lines from transcendent-line promotion, strength citations, and One Line candidacy. The most beautiful line that breaks an explicit constraint is the highest-priority defect in the system, not its best output. The cap releases only when violations clear.

Chorus evolution (12%) — heuristic, with a verbatim hard gate

Hit songs don't repeat the chorus verbatim — even when the words stay identical, the context shifts so the listener hears a different chorus by the final position. The audit detects (a) whether the chorus body actually byte-shifted across positions, and (b) whether verses reference back to chorus content (a signal that context-shift execution is happening even on verbatim repeats). Same 4-band scale as anchor coverage.

Verbatim hard gate (v0.3.0): a separate deterministic detector computes the pairwise line-identity rate across all chorus blocks. When three or more blocks are near-verbatim (detector score ≤ 25), the chorus-evolution component is capped at the detector score — two verbatim passes are ordinary songcraft and stay the evolution audit's call — an LLM judgment that the chorus "evolved" cannot override byte-level evidence that it didn't. The gate only ever lowers the component; varied and single-chorus songs are untouched.

Earned transcendence (5%) — heuristic

A song earns its transcendence when a concrete image planted in verse 1 returns in the final chorus with transformed surrounding words — same physical detail, new weight. The audit identifies shared concrete-noun image tokens between V1 and the final chorus, then scores the surrounding-word similarity (Jaccard, non-image content words):

100: transformed — image returns with significant word-shift
66: partial — image returns with some shift
33: verbatim — image returns but lines are near-identical
0: missing — no V1 concrete image returns

Content rating adherence (6%) — heuristic

A brief can specify one of three content ratings: clean (radio-safe, no profanity), mature (adult themes, profanity as story serves), or explicit (full creative latitude). The audit detects strong-language markers in the lyric body, infers the de-facto rating from marker density, and scores the gap against what the brief requested. Today’s forge surfaces do not set a content-rating constraint — the component is null-by-default until the Explicit-Mode toggle (a separate, deferred work item) lands. Not to be confused with the SA#32 tone register (joy / swagger / rage / playfulness / etc) — content rating is the MPAA-style profanity perimeter; tone register is the emotional posture. The framework is documented in advance so implementers can prepare:

95: register-matched — requested and inferred align
70: mild-drift — one band off (clean↔mature, mature↔explicit)
35: wrong-register — two-band drift (clean↔explicit)
null: brief did not specify a register

The four hard nevers (sexual content involving minors, hate or slurs targeting protected groups, credible threats / doxxing, instructions facilitating illegal harm) remain blocked regardless of register — this component controls REGISTER, not the prohibition perimeter.

3. The composite formula

composite = constraintMultiplier × Σ (componentScore_i × weight_i)
            for i ∈ contributing components

where:
  contributing = components with non-null scores
  weight_i normalized so Σ weights = 1.0 across contributing
  constraintMultiplier ∈ { strict: 0.95, standard: 1.0, loose: 1.15 }
  result clamped to [0, 100]

Null components (the brief didn't ask for that constraint OR the lyric lacks the prerequisite) are excluded and their weight redistributes proportionally — a sparse brief is never penalized for things the user never asked for. Three constraint-mode multipliers apply the user's stance on how strictly fidelity is graded: strict (0.95) caps a perfect score below 100 to register the user demanded zero deviation; standard (1.0) is neutral; loose (1.15) lets the user opt into deviation and reward the result.

4. Grade calibration

Grade	Range	Meaning
A+	95–100	Every constraint met
A	90–94	Every major constraint met; maybe one weak anchor
B+	85–89	All anchors landed; one structural or style miss
B	80–84	Most anchors landed; recognizable as requested
C+	70–79	Premise served; multiple anchors weak or missed
C	60–69	Premise served but reads as a cousin of the request
D	50–59	Half the constraints landed
F	0–49	Wrong song

Checklist-completion semantics, not anti-inflation. 100 means every constraint met; less than 100 means specific misses. The bar IS the brief.

5. Brief complexity gating

The fidelity composite carries a complexity bucket that controls UX prominence on the dashboard. Briefs with weak signal don't benefit from a loud fidelity chip; briefs with heavy constraints benefit from fidelity being the headline answer.

Bucket	Complexity	UX behavior
hide	0–2 (light brief)	Suppress chip unless score < 80 (rescue path)
secondary	3–6 (standard brief)	Equal-weight with quality chip
primary	7+ (heavy brief)	Fidelity becomes the headline; quality secondary

Complexity = anchors (max 5) + style constraints (max 3) + structure (if set) + forbidden language (if set) + premise (if ≥40 chars).

6. Version history

v0.4.0 (2026-06-12) — the inversion fix: semantic anchor checking (echo ≠ compliance), deterministic ban detection with synonym expansion + per-line violations, and the hard-ban composite cap with machine-readable banCap reason.
v0.3.0 (2026-06-10) — chorus evolution promoted 5% → 12% raw weight, paired with the deterministic verbatim hard gate.
v0.2.0 (2026-05-20) — eighth component added: content rating adherence (6%), null-by-default.
v0.1.0 (2026-05-19) — initial public draft. Opened the 7-day RFC comment window via RFC-0010 (closed 2026-05-26).

Full version history at /scoring/standard/fidelity/changelog.

The audit implementation version (in code) tracks separately at FIDELITY_AUDIT_VERSION in src/lib/fidelity-audit.ts. The current implementation version is 2.0.0 (post-B3793 — the inversion fix; major bump because composite math + component semantics changed).

For implementers — install on npm

The Fidelity Standard ships as an installable package with the seven-component JSON + helper functions (computeFidelityGrade, computeFidelityBucket, computeBriefComplexity, applyConstraintMultiplier). MIT for the package code; CC BY 4.0 for the standard JSON. Companion to @songforgeai/scoring-rubric.

$ npm install @songforgeai/fidelity-standard

Source on GitHub: packages/fidelity-standard · View on npm

7. How to cite

The Fidelity Standard is released under CC BY 4.0. Use it, fork it, build on it. The only requirement is attribution. For all six common citation formats (BibTeX, APA 7, MLA 9, Chicago, RIS, plain text), see the dedicated cite page.

Plain-text citation:

Nigro, T. (2026). The Fidelity Standard v0.4.0.
SongForgeAI. https://songforgeai.com/scoring/standard/fidelity
CC BY 4.0.

BibTeX:

@techreport{nigro2026fidelity,
  author = {Nigro, Todd},
  title = {The Fidelity Standard v0.4.0},
  institution = {SongForgeAI},
  year = {2026},
  url = {https://songforgeai.com/scoring/standard/fidelity},
  note = {CC BY 4.0}
}

8. Related standards

The Lyric Scoring Standard — 12-metric quality rubric. Fidelity's orthogonal sibling.
The Lyric Scoring Standard whitepaper v1.0 — the founder-voice publishing edition.

The Fidelity Standard is a living document. Version 0.4.0 opens the public comment window via RFC-0010. Contributions, critiques, and forks all welcome under CC BY 4.0.

Annual Fidelity Index targeted for 2027-01-15 — the first issue will publish full-corpus aggregate scores + orthogonality study + methodology, under CC BY 4.0.