Motivation
Sacred Accident #17 ("the system can write a good song without writing the same song") said fidelity is the orthogonal question to quality. The 12-metric Lyric Scoring Standard (v1.0, RFC-0001) measures quality. CAF Phase 2 (builds B2763–B2788) shipped the seven-component fidelity audit + the `computeFidelityComposite` math that combines the audits into one 0-100 composite + grade.
That work is now PRODUCED and SHIPPED — every newly-forged song gets a fidelity audit persisted on the row. The dashboard shows a two-grade chip (quality + fidelity) and the deep panel (FidelityPanel) shows the per-component breakdown.
What is NOT yet pinned: the numbers themselves. Weights, mode multipliers, grade band edges, complexity formula, UX buckets — all are operator-locked at B2763 + subsequent calibration, but the PUBLIC commitment is still v0.1.0. This RFC opens the 7-day public comment window that lands v1.0.0.
What this RFC pins
Component weights (per SA#17)
``` 30% premise match 25% anchor coverage 15% structure compliance 15% style constraints 5% forbidden language 5% chorus evolution 5% earned transcendence ```
Premise + anchors = 55% (load-bearing). The remaining 45% distributes across structure (15%), style (15%), and three smaller craft-discipline checks (5% × 3).
Composite formula
``` composite = constraintMultiplier × Σ (componentScore_i × weight_i) for i ∈ contributing components
where: contributing = components with non-null scores weight_i normalized so Σ weights = 1.0 across contributing constraintMultiplier ∈ { strict: 0.95, standard: 1.0, loose: 1.15 } result clamped to [0, 100] ```
Null-component redistribution is load-bearing: a sparse brief that didn't ask for, say, a structure requirement, must NOT be penalized for the missing component. The weight redistributes proportionally across the components the brief DID ask for.
Constraint-mode multipliers
``` strict × 0.95 — caps a perfect score below 100 to register the user demanded zero deviation standard × 1.00 — neutral; the default loose × 1.15 — the user opted into deviation; reward the result (capped at 100) ```
Grade calibration (checklist-completion semantics)
``` A+ 95-100 Every constraint met A 90-94 Every major constraint met; maybe one weak anchor B+ 85-89 All anchors landed; one structural or style miss B 80-84 Most anchors landed; recognizable as requested C+ 70-79 Premise served; multiple anchors weak/missed C 60-69 Premise served but reads as a cousin D 50-59 Half the constraints landed F 0-49 Wrong song ```
These are NOT anti-inflation bands; this is checklist-completion math. 100 means every constraint met; less than 100 means specific misses. The bar IS the brief.
Brief complexity (0–10)
``` points = min(anchorCount, 5) + min(styleConstraintCount, 3) + (structure non-empty ? 1 : 0) + (forbiddenLanguage non-empty ? 1 : 0) + (premise.length >= 40 ? 1 : 0) complexity = min(points, 10) ```
UX prominence buckets
``` complexity ≤ 2 → 'hide' (light brief; suppress unless score < 80) complexity 3-6 → 'secondary' (standard brief; equal-weight with quality) complexity ≥ 7 → 'primary' (heavy brief; fidelity is the headline; quality secondary) ```
The 'hide' bucket carries a rescue path: when score drops below 80, the chip promotes to 'secondary' with a "low fidelity" warning so the user sees the miss even on a light brief.
Open questions for the comment window
1. **Weight allocation**: is the 30/25 split between premise and anchors the right calibration? Some early feedback suggested anchors should weigh more because they're verifiable mechanically (regex), while premise is Haiku- judged (noisier). Counter-argument: the premise IS the question; anchors are the SUPPORTING DETAILS. SA#17 is weighted appropriately by emphasizing premise.
2. **Constraint-mode caps**: should strict mode cap at 0.95 or at 0.90? At 0.95, a "perfect" strict-mode score reads as A (95) not A+ (100). Some feedback that this is the right behavior; some that it's confusing. Open for discussion.
3. **Complexity threshold for 'primary' bucket**: currently ≥ 7. Should this be ≥ 6 or ≥ 8? The intuition: 'primary' = "the brief is so heavy that fidelity is the headline answer." 7+ filters to briefs with at least 4 of 5 constraint dimensions populated, which feels about right. Open for adjustment if early signal disagrees.
4. **Chorus evolution + transcendence weights**: both at 5%. Some argument that one or both should weigh MORE because the literary moves they measure are load-bearing for re-listen quality. Counter-argument: 5% × 2 = 10% is enough signal; raising it crowds out the other components without changing rank order materially.
5. **'na' verdict semantics**: when a component is N/A (the brief didn't ask, or the lyric lacks the prerequisite), should it count toward complexity or just abstain? Today it abstains. Some argument for counting it negatively (system "missed" the chance to ship that constraint shape), but that breaks the "the bar IS the brief" principle.
What lands at the end of the window (2026-05-26)
If feedback is structural, v1.0.0 incorporates the changes. If feedback is cosmetic, v1.0.0 ships the v0.1.0 numbers verbatim with the additional documentation. Either way, a versioned changelog at /scoring/standard/fidelity/changelog lands with v1.0.0 documenting the diff.
The audit IMPLEMENTATION version (`FIDELITY_AUDIT_VERSION` in src/lib/fidelity-audit.ts, currently 1.4.0) is independent of the public standard version. Audit implementation can move faster than the public standard; the standard is the commitment, the audit is the execution.
How to comment
The same channel as RFC-0009: email todd@songforgeai.com with the subject "RFC-0010 comment." A comment system upgrade lands when the first signal of friction surfaces (per the RFC process governance at top of `src/lib/rfcs.ts`).