Skip to content
Back to the Standard
Standard version diff

What changed in every version of the standard.

The Lyric Scoring Standard is currently at v1.2.0. This page is the structural diff between every published version: which metrics changed, which anti-inflation rules were added, which RFCs ratified the bumps, and the score-delta band of each transition. The companion timeline view at /scoring/standard/changelog interleaves these with the RFC submission history.

Bump-type contract

Per RFC-0001 (Rubric Versioning Policy):

  • MAJOR — golden-eval score delta exceeds 5 points. Breaks wire compatibility for tooling pinned to the prior version. Requires public RFC + 30-day comment window.
  • MINOR — clarifications, new anti-inflation rules, descriptive refactors. Score deltas under 5 points. isCompatibleRubricVersion stays true; pinned tooling keeps working.
  • PATCH — docs, typos, surface improvements. Zero score delta. No RFC required.
1.1.01.2.0MINOR
B19392026-05-02

Score delta: <3 points on the golden-eval set.

  • M8 — Voice & POV Integrity

    Changed

    Refactored from "one coherent narrator" to "INTENTIONAL POV." Deliberate narrator switches that align with structural beats (K-pop multi-voice, hip-hop featured-verse contrast, gospel call-and-response) no longer score as drift failures. Random narrator switches without semantic function still do.

  • M11 — Memorability

    Changed

    Refactored from a single 60-minute recall test to a 4-signal cumulative read: hook integration + phonemic distinctiveness + chorus repetition strategy + one-listen recall. Cumulative / oral-tradition / ritual-repetition forms are no longer false-positive low-scored. Mantric chants that earn their repetition through structural intention now register at the canonical band.

  • Refactor descriptive, not prescriptive

    Note

    Both M8 and M11 changes describe WHEN the metric applies; they don’t change WHAT the metric values. Score deltas on the golden-eval set average under 3 points — within the MINOR threshold per RFC-0001.

1.0.11.1.0MINOR
B12402026-04-25

Score delta: <2 points on the golden-eval set.

  • Anti-Platitude rule

    Added

    Anti-inflation rules expanded from 4 to 5. New rule: lines that resolve with generic emotional summaries ("all I need is love," "this is my truth") hit the rubric’s lowest Specificity + Voice band regardless of surface polish.

  • Antagonist Ceiling rule

    Changed

    Clarified to require evidence. The dedicated critical voice that challenges every score now must produce specific lines / specific failure modes — vague disagreement does not lower the score.

  • Historical Context Anchor rule

    Changed

    Anchored explicitly to the published corpus, especially the Hank Williams S-band entry at composite 95. "Professional craft" now points at a citeable artifact, not an abstract claim.

  • Migration

    Note

    Existing scored content auto-rescores on next eval. The reproducibility seal’s rubricVersion field now reads "1.1.0."

Ratifying RFC: RFC-0002-anti-platitude-rule

1.0.01.0.1PATCH
B12112026-04-25
  • Reproducibility seal in API

    Added

    The seal field landed in /api/v1/score (B1199). Every score response now carries the ed25519-signed envelope binding rubricVersion + model + temperature + buildSha + composite + grade.

  • Public model card

    Added

    Published at /scoring/standard/model-card (B1197). Pins the runtime contract: which Anthropic model, what temperature, where the prompt lives, what limitations apply.

  • Versioning policy formalized

    Added

    RFC-0001 documented the version-bump rules: MAJOR for >5-point golden-eval delta, MINOR for clarifications, PATCH for docs/typos. Every later bump follows this contract.

  • No rubric content changes

    Note

    PATCH bumps don’t change the rubric itself — same metrics, same weights, same anti-inflation rules. Score deltas: zero.

Ratifying RFC: RFC-0001-rubric-versioning-policy

(initial)1.0.0MAJOR
2026-04-20
  • 12 metrics across 3 tiers

    Added

    Initial public release. Craft (25%): M1 Prosody, M2 Structure, M3 Rhyme, M4 Economy. Expression (40%): M5 Specificity, M6 Imagery, M7 Emotional Truth, M8 Voice. Impact (35%): M9 Transcendence, M10 Arc, M11 Memorability, M12 Genre.

  • 4 anti-inflation rules

    Added

    Gravity (default 50), Burden of Proof (claims earned by lines that show), Antagonist Ceiling, Historical Context. Anti-Platitude rule (5th) was added in the 1.1.0 bump.

  • Grade scale

    Added

    Locked: S+ (96+), S (91-95), A+ (86-90), A (80-85), B+ (73-79), B (65-72), C+ (55-64), C (45-54), D+ (35-44), D (25-34), F (<25). Same scale used by every later version.

How to verify a transition yourself

  1. Fetch the current rubric JSON at https://songforgeai.com/scoring-standard.json. Compare against the prior version (archived in git at the commit that bumped the version).
  2. Verify the version field matches the bump type rule (MAJOR / MINOR / PATCH).
  3. Cross-reference the golden-eval delta against the published threshold (<5 for MINOR; PATCH should be 0).
  4. Check the ratifying RFC at /rfc. MINOR + MAJOR bumps require an RFC; PATCH bumps ship without one.
Diff data sourced from the changelog field of the published rubric JSON plus the RFC registry. Updated on every published bump. Cite as “Lyric Scoring Standard v1.2.0 — Version Diff (B1963).” Back to the standard
More from the standard