Tools2026-04-285 min read

What a 60 Means in the Lyric Scoring Standard

Almost every AI tool that scores creative output reports inflated numbers. A "score" of 80 means almost nothing because the floor was 70 and nobody is allowed to fail. The Lyric Scoring Standard inverts this: the default for every metric is 50. Here is what each band actually means about your song, why we built the scale this way, and what to do when you land at the band you landed at.

The Gravity Rule

The first anti-inflation rule in the published Lyric Scoring Standard is the Gravity Rule: every metric defaults to 50. Not 70 with bonus points for trying. Not 80 unless something is "very wrong." 50 is the median; you have to earn every point above it.

Why 50 and not, say, 75? Because 75-as-baseline is what makes other AI tools’ scores meaningless. If the easy region of the score is 70-95 and the hard region is 95-100, the score communicates almost nothing — every output looks "pretty good." A 78 in that scale is indistinguishable from a 72 from any practical standpoint.

50-as-baseline reverses this. Most AI lyric output lands between 50-65 because that’s actually where output sits when measured against professional craft. A 78 then means real differentiation — six bands above the average. A 90 is genuinely rare. Anyone calibrating to the published 21-song corpus sees the math: hand-scored S-band entries (Hank Williams, Joni Mitchell, Marvin Gaye) sit at 92-95. AI output that hits 87 is in the company of canonical work.

What each band means

The published grade scale + the meaning of each band:

S+ (96+). Vanishingly rare. The published corpus has no entries this high; the rubric’s S-band anchor is 95. A 96+ has not been observed in routine evaluation. Treat 96+ output with skepticism — verify the seal, re-run, cross-reference.
S (91-95). Canonical-level work. The published S-band anchors live here: Hank Williams, Joni Mitchell, Marvin Gaye, Caetano Veloso. Genuine S-band lyrics survive on artistic merit alone — they don’t need production to land.
A+ (86-90). Excellent professional output. A working songwriter on a good day. AI output rarely reaches this band without targeted iteration; when it does, the lyric typically has at least one transcendent line + named-anchor specificity throughout.
A (80-85). Strong professional work. Most "great" output falls here. Differs from A+ by missing one of the rubric’s S-band requirements (a transcendent line, a memorable hook return, etc.) but landing solidly elsewhere.
B+ (73-79). Good professional output. Releaseable, won’t embarrass anyone. Most well-iterated AI output lands here. Lacks a "moment" but has no major weaknesses.
B (65-72). Competent but generic. Recognizable as a song; doesn’t differentiate itself. Most first-draft AI output sits in this band.
C+ (55-64). Above amateur, below professional. Has clear weaknesses + clear strengths. Often the band of partial drafts that need a verse rewrite.
C (45-54). Median amateur. The rubric’s gravity floor — 50 is the metric default, so a composite of 50 lands here. Many "fine" lyrics live in this band; nothing wrong, nothing memorable.
D+ / D / F (below 45). Real problems. F is reserved for output that fails on multiple metrics simultaneously. Most output never reaches this band; when it does, the gauntlet typically catches it before the user sees it.

Why your AI tool says 87 and ours says 71

If you’ve scored the same lyric on multiple AI tools, you’ve seen the spread: tool A says 87, tool B says 92, the SongForgeAI rubric says 71. Same lyric. Three numbers.

Three reasons:

Different scales. Most AI tools scale 70-100; SongForgeAI scales 0-100 with 50-as-default. Their 87 is roughly equivalent to our 67 in real-band terms.
Different rubric explicitness. Most tools score against an unpublished rubric; we score against the published one (CC BY 4.0). When you can’t see what they’re measuring, the score isn’t arguable. Our 71 is arguable: you can read which metric we marked you down on and disagree with that specific deduction.
Different anti-inflation discipline. The Anti-Platitude rule, the Burden of Proof rule, the Antagonist Ceiling — these are unique to the published Standard. They specifically prevent inflation that other rubrics tolerate.

This isn’t saying we’re right and they’re wrong; it’s saying our number means something specific that you can verify. Cross-checking against the published corpus shows the math directly: hand-scored canonical work sits where it sits, and your output relates to it numerically.

What to do at each band

Practical: when the rubric returns a band, here’s what to do next.

S-band (91+): Stop iterating. The rubric reads this as professional-craft-level work. Render it; release it; move on.
A-band (80-90): One targeted pass. Identify the lowest single metric, rewrite the lines that earned that low band, re-score. A-band lyrics typically have one weak metric pulling the composite down; fixing it lifts the composite 3-5 points.
B-band (65-79): Iterate or refine. The rubric’s automated gauntlet (every paid forge runs this) typically lifts B-band by 5-8 points on the rewrite pass. Manual Refine Mode with line-locking is faster than re-forging when you want to preserve specific lines.
C-band (45-64): Restart or restructure. The rubric is telling you the lyric isn’t working at the line-by-line level. Either restart with a sharper concept or rewrite the structure (verse/chorus arrangement, narrator, time setting) before iterating on individual lines.
D-band or F (below 45): Discard. The lyric has compound failures across metrics; targeted rewrites won’t lift it. Save the concept, restart the draft.

The number isn’t the point

The rubric isn’t about hitting a high number for its own sake; it’s about which lines need work and why. A 71 with the per-metric breakdown ("M5 Specificity 58, M11 Memorability 62, M9 Transcendence 55") tells you exactly what to fix. A 87 with no breakdown tells you nothing.

Use the band as a quick read on whether the lyric is iterable from where you are. Use the per-metric breakdown to know what to iterate.