Lyric Scoring Standard

The first published rubric for AI-graded lyrics

A 12-metric public standard. Versioned. Signed. Open-licensed (CC BY 4.0). SongForgeAI uses it to grade lyrics across Craft, Expression, and Impact — but the rubric belongs to anyone who wants to copy, cite, or argue with it. A 50 is average. 80+ is strong. 90+ is rare.

12 metricsEvidence-based scoringAnti-inflation built in

12 metrics across Craft, Expression, and Impact

Weighted composite: Expression counts most (40%)

Anti-inflation rules prevent meaningless high scores

Every score includes per-metric reasoning and evidence

Deliberately hard — a 50 is average, not a failing grade

No signup

Score a lyric publicly

5 free scores per IP per day. Same 12 metrics, no account needed. Best for skimming the rubric.

Try the rubric on your lyrics

Score a draft right now.

Paste lyrics, get a 12-metric breakdown — composite score, transcendent lines, wounds, and per-metric reasoning. Same rubric the documentation below describes; same one the forge runs internally.

Minimum 50 characters.

Title (optional)

Genre (optional)

Sign-in required (free tier includes scoring). See the published rubric →

Craft (25%)

Can this person write? Mechanics, structure, rhyme, and word choice.

Expression (40%)

Does it say something worth hearing? Specificity, originality, truth, and voice.

Impact (35%)

Will anyone remember it tomorrow? Transcendence, arc, stickiness, and genre fit.

“The scoring side was one of the areas where I saw the most dramatic improvement. At this point, I'd probably use the scoring and evaluation tools on 90% of the songs I write.”

Brett The WriterProfessional Songwriter & Producer · Nashville

Read the full review →Compensated endorsement · words are the creator’s own

Sample scorecard

What an actual evaluation looks like — annotated.

Composite

Grade B+ · Top 18%

Genre

Country

Top 22% in genre

Prosody74

Strong natural rhythm, one forced rhyme in V2

Structure80

Clean arc, bridge earns its place

Rhyme72

Good slant rhyme use, one predictable end-rhyme

Economy76

Tight overall, two filler words in chorus

Specificity85

"Tangerines and someone else's smile" — earned

Imagery82

Original governing image, one stock metaphor

Emotion79

Rings true. Bridge vulnerability is genuine.

Voice77

Consistent narrator, one POV slip in V3

Transcendence81

Line 14 is the one. "Drove home with the windows down to forget it."

Arc75

Moves from avoidance to acceptance. Could push further.

Memorable73

Chorus hook is sticky, verses less so

Genre80

Authentic country with modern specificity

We built anti-inflation into the scoring system so that high scores actually mean something.

Gravity Rule

The default is 50, not 80. Every point above average must be earned with specific evidence from the lyrics.

Burden of Proof

Scores above 80 require the scorer to cite specific lines and explain why they justify the number.

Antagonist Ceiling

A dedicated critical voice challenges every score. If it finds a real weakness, the score drops.

Historical Context

Scores are anchored to professional craft standards. A 90+ means near-flawless execution across all 12 metrics — intentionally rare.

Methodology: how scoring works

Every song is scored by a separate AI evaluation pass — not the same model that wrote the lyrics. Multiple evaluators with different perspectives must reach consensus on each of the 12 metrics.

A dedicated critical voice challenges every score. If it identifies a real weakness — a cliché, a broken meter, a forced rhyme — the score drops. Unresolved objections cap the composite depending on severity.

This rigorous multi-voice process prevents the inflated scores that single-pass AI evaluation produces. Scores are calibrated relative to professional songwriting craft, not to other AI output.

What “deliberately hard” means: a single-pass AI scorer will give most output 80+. Our multi-voice process produces a distribution centered around 50, because the default assumption is “average until proven otherwise.” Scores above 80 require the scorer to cite specific lines. Scores above 90 require near-flawless execution across all 12 metrics — which is why they are rare in practice, not by arbitrary design.