Hit Calibration Artifact · Public corpus

How the rubric scores songs you already know.

A curated corpus of historically-significant songs run through the same Lyric Scoring Standard the rest of SongForgeAI runs. Proves the rubric grades craft — not chart success. Every entry is frozen, citable, and answers the same question: does this rubric actually work?

Corpus progress

0/ 10+songs calibrated

Pre-publication phase. The corpus is being curated. Methodology + table schema are locked. First entries land soon. Check back, or read the methodology below to see how each entry is produced.

What a published entry will look likeSample, not real data

Song	Year	Score	Verdict
Yesterday The Beatles	1965	94/ A	masterpiece

The real corpus will replace this preview as the operator seeds it. Every row is a song's lyric run through the same 8-voice Crucible at /crucible — same rubric, same anti-inflation rules, frozen at first publication.

Methodology

Curation: Each entry is a historically-significant song. Selection draws from canonical lists (RIAA Diamond, Grammy SOTY, Billboard #1s across decades, songwriting-craft canon — Cohen, Dylan, McCartney, Joni Mitchell, Springsteen, Sondheim, etc). Curation is operator-led; the goal is craft diversity, not chart-rank ranking.

Scoring: Each song's lyric is run through the same 8-voice Crucible the public uses at /crucible. The result is frozen here at first publication — no re-scoring based on later rubric edits. If the rubric drifts, the calibration corpus drift becomes visible (which is the point).

What we publish: Title, artist, year, chart context, our score, our verdict, voice counts, and a 1-2 sentence critique summary that reflects the panel's actual reading. We never publish the lyric itself. Standard music-criticism practice.

Why no popularity bias: The rubric grades craft, not chart success. If a song that charted #1 scores 52 here because the lyric is craft-thin, we publish 52. If a song that never charted scores 92 because the craft is masterful, we publish 92. Auto-inflating known hits would break the entire claim the rubric stands on.

When entries change: Never. Once published, the row is frozen at frozen_at. Re-running the rubric on the same lyric should produce the same score (within ±2-3 points of natural Sonnet variance, per the consensus eval discipline). Significant drift is itself a published finding — the corpus's most useful property is its stability.

Why this corpus exists

Every AI scoring system answers the same skeptical question eventually: "how do I know your rubric isn't just grading its own output well?" The Hit Calibration Artifact is the most direct answer available — it runs the rubric against songs the skeptic ALREADY KNOWS and lets them check.

If "Hallelujah" scores low here, our rubric is broken. If "Achy Breaky Heart" scores 92, our rubric is sycophantic. The corpus is the public proof that neither failure mode applies.

Back to the Lyric Scoring Standard Run your own lyric through the Crucible

Page revalidates hourly. Entries frozen at first publication. Build 2180.

How the rubric scores songs you already know.

What a published entry will look likeSample, not real data

Methodology

Why this corpus exists

The standard

Whitepaper

Inter-rater agreement

Reproducibility seal

Changelog

Version diff

Model card

Prior art

Sleeper ledger

Register-aware craft