Skip to content
Lyric Scoring Standard

Sleeper Test ledger · Published artifact

Day-1 vs Day-2 score drift, measured across the catalog.

Every song forged on SongForgeAI is re-scored 24 hours later by a COLD-temperament panel reading lyrics only — no title, no genre, no prior score. The delta between Day-1 and Day-2 is the song's holding power. This page aggregates the drift across the public catalog as a Lyric Scoring Standard companion artifact.

Catalog progress
27/ 200sleeper-tested songs

Pre-publication phase. Ledger publishes once the catalog reaches 200 sleeper-tested songs. At current forge volume that's roughly 22 days from now. Methodology + cron is live; the cron writes to a JSONB column on the songs table daily at 04:00 UTC.

Methodology (pre-registered)

Schedule: Daily cron at 04:00 UTC (public infra). Each run picks up to 10 songs forged 23–25 hours earlier that haven't been sleeper-tested yet.

Re-score conditions: COLD temperament (B2127 panel discipline) reading lyrics only — no title, no genre, no prior score, no room notes. The cold pass is deliberately context-stripped (B2128) so the panel reads as a stranger encountering the song for the first time.

Trust labels: |delta| <= 3 stable (scored on signal). 3 < |delta| <= 8 mild_drift. |delta| > 8 momentum_score (the room got hot; the cold reader caught it).

Publication threshold: 200 sleeper-tested songs. Below 50: methodology only. Between 50 and 200: early-signal aggregates with a calibration warning. At 200+: full public ledger, citable as a SongForgeAI standard artifact.

Public catalog only: The aggregate counts public songs (is_public = true). Private songs are also sleeper-tested by the same cron but are excluded from public statistics.

Why this is the strongest external-validity claim a rubric can publish

Score-on-its-own-work is the credibility break in every AI scoring system: the model that wrote the song is reading the song. The Sleeper Test breaks that loop by making the second reading happen 24 hours later, under a different temperament, with the original context withheld. The delta between Day-1 and Day-2 is unfakeable by the forging system. It IS the song's holding power.

No other AI lyric tool publishes Day-2 drift statistics because no other AI lyric tool runs a Day-2 cold re-read against its own published rubric. The cron has been running since the migration applied; this page is the artifact that converts the cron's output from a private telemetry blob into a citable public standard.

Page revalidates hourly. Cron writes daily at 04:00 UTC. Build 2158.