Versions + RFCs, in one timeline
Every published version of the rubric, every RFC that ratified or proposed a change. Cite by URL. The current rubric version is v1.2.0, published 2026-05-02.
- RFC-00102026-05-19in-comment
Fidelity Score v0.1.0 — calibration + composite formula
Pins the seven-component fidelity composite (premise 30% / anchors 25% / structure 15% / style 15% / forbidden 5% / chorus 5% / transcendence 5%), the constraint-mode multipliers (strict 0.95, standard 1.0, loose 1.15), the 8-tier grade calibration A+ through F, the brief-complexity 0–10 formula, and the three UX prominence buckets (hide / secondary / primary). Opens public comment on the entire Phase-2 CAF stack documented at /scoring/standard/fidelity v0.1.0.
Read RFC-0010 - v1.2.02026-05-02version
MINOR (MINOR): M8 (Voice & POV Integrity) refactored from "one coherent narrator" to "INTENTIONAL POV" — deliberate switches (K-pop multi-voice, hip-hop features, gospel call-and-response) no longer score as drift failures. M11 (Memorability) refactored from the single 60-minute test to a 4-signal cumulative read (hook integration + phonemic distinctiveness + chorus repetition strategy + one-listen recall) so cumulative / oral-tradition / ritual-repetition forms aren't false-positive low-scored. Per Super Deep Audit §5 cuts #1 + #2. Score deltas on the golden-eval set: <3 points on average — refactor is descriptive (when does the metric apply) not prescriptive (changing what the metric values). Within MINOR threshold per RFC-0001.
- RFC-00092026-04-26in-comment
Multi-language scoring methodology (Latin / Italian / Spanish / Japanese / French)
Pins the methodology for scoring non-English lyrics. Punch List #42. Operator-driven request: "could a user ask for a Gregorian Chant in Latin, an opera in Italian?" Today the rubric runs in English-only mode; the banned-terms dictionary is English; the platitude detector (RFC-0002) is English. Every non-English score therefore depends on the model's implicit translation. This RFC pins what changes in the rubric vs what stays language-agnostic.
Read RFC-0009 - RFC-00082026-04-26accepted
Open Scoring Corpus contribution policy
Pins the policy for how third parties contribute scored lyrics to the open Lyric Scoring corpus. Punch List #33 entry point. Eventual goal: 1,000+ human-scored lyrics, public, used to version + calibrate the rubric across the industry.
Read RFC-0008 - RFC-00072026-04-26in-comment
Reproducibility audit methodology
Pins the methodology for the quarterly reproducibility audit (Section 4 of /reports/calibration-2026-q2). For a sampled batch of 25 published scores, replay the same lyrics + genre through the same model + temp + rubric version and report per-row score delta. Zero or near-zero deltas confirm the seal is honest.
Read RFC-0007 - RFC-00062026-04-26in-comment
Per-cohort engagement methodology + divergence-as-calibration
Pins the methodology behind the five-persona listener panel engagement scores (B1281+) and defines when per-cohort divergence from the composite triggers a rubric calibration response. Empirical input cites the B1340 /admin/cohort-divergence dashboard.
Read RFC-0006 - RFC-00042026-04-26accepted
Voltage Coach behavior change policy
Defines what counts as a "behavior change" to the voltage coach (B1291) — kind classification, hint copy, accept-rate triggers — and pins the publication discipline for each. Empirical input cite the B1346 kind-breakdown surface that landed earlier this session.
Read RFC-0004 - RFC-00052026-04-26accepted
GPT-4o pre-gauntlet critique as Sonnet gauntlet input
Adds a second GPT-4o call to the pipeline — this time as a literary critic rather than a re-scorer. Output feeds the Sonnet gauntlet as additional evidence the gauntlet decision-rule incorporates or rejects. Cost: ~$0.04/forge. Off-by-default behind SF_GPT4O_PREGAUNTLET. Operator-driven request: "should ChatGPT be used more than we are using it?"
Read RFC-0005 - RFC-00032026-04-26accepted
Hum Score as the M11 (Memorability) calibration signal
The 24-hour-delayed Hum Test (B1303) is the only longitudinal-recall signal we have. This RFC proposes formally treating the systematic delta between fresh-M11 and Hum-M11 as the calibration ground truth for the Memorability metric, with quarterly rubric adjustments triggered when the corpus-wide median delta exceeds ±5pts.
Read RFC-0003 - v1.0.12026-04-25version
PATCH (B1211): docs only. Reproducibility seal landed in /api/v1/score (B1199); model card published at /scoring/standard/model-card (B1197). No score deltas. Versioning policy formalized in RFC-0001 (in-comment until 2026-05-02): MAJOR for >5pt golden-eval delta, MINOR for clarifications, PATCH for docs/typos.
- v1.1.02026-04-25version
MINOR (B1240): first MINOR bump shipped through the published cadence. Anti-Inflation rules expanded from 4 to 5 with the addition of the Anti-Platitude rule (lines that resolve with generic emotional summaries hit the lowest Specificity + Voice band regardless of surface polish). Antagonist Ceiling clarified to require evidence; Historical Context anchored to the published corpus. Score deltas on the golden-eval set: <2 points on average (within MINOR threshold). Migration: existing scored content is auto-rescored on next eval; the seal field's rubricVersion now reads '1.1.0'. RFC-0002 (anti-platitude formalization) drafted as the in-comment artifact for this bump.
- RFC-00022026-04-25accepted
Anti-Platitude rule (5th anti-inflation rule, v1.1.0)
Lines that resolve with generic emotional summaries ("all I need is love", "this is my truth", "love wins") hit the lowest Specificity + Voice band regardless of surface polish. Documented inline so implementers cite a published rule rather than discover it empirically.
Read RFC-0002 - RFC-00012026-04-25accepted
Rubric versioning policy (v1.x cadence + diff format)
How the Lyric Scoring Standard versions: when a bump happens, what gets published with it, and how third parties verify which rubric scored their lyrics.
Read RFC-0001 - v1.0.02026-04-20version
Initial public release. 12 metrics finalized, anti-inflation rules documented, grade scale locked.
The standard
12-metric rubric · CC BY 4.0 · machine-readable JSON
Whitepaper
Formal methodology, anti-inflation rules, calibration corpus
Inter-rater agreement
Pre-registered methodology · 30-rater human cohort · ICC
Reproducibility seal
ed25519 signature spec · pubkey JSON · verifier
Version diff
Compare any two rubric versions side by side
Model card
Reference-implementation model disclosure
Prior art
Conservatory rubrics + MIR research + industry frameworks the open standard extends
Sleeper ledger
Day-1 vs Day-2 cold-temperament drift aggregates. Auto-publishes at 200 sleeper-tested songs.
Hit Calibration
Curated corpus of historically-significant songs scored by the rubric — proves it grades craft, not chart success.
Register-aware craft
SA#32 — 10 emotional registers, per-register Gravity Rule + Burden of Proof modifiers. Quality is not universal; register defines craft.