A lyric score that actually means something
Every lyric is measured across 12 metrics in Craft, Expression, and Impact. Scores are deliberately hard. A 50 is average. An 80+ is strong. A 90+ is rare.
Craft (25%)
Can this person write? Mechanics, structure, rhyme, and word choice.
Expression (40%)
Does it say something worth hearing? Specificity, originality, truth, and voice.
Impact (35%)
Will anyone remember it tomorrow? Transcendence, arc, stickiness, and genre fit.
Sample scorecard
What an actual evaluation looks like — annotated.
Strong natural rhythm, one forced rhyme in V2
Clean arc, bridge earns its place
Good slant rhyme use, one predictable end-rhyme
Tight overall, two filler words in chorus
"Tangerines and someone else's smile" — earned
Original governing image, one stock metaphor
Rings true. Bridge vulnerability is genuine.
Consistent narrator, one POV slip in V3
Line 14 is the one. "Drove home with the windows down to forget it."
Moves from avoidance to acceptance. Could push further.
Chorus hook is sticky, verses less so
Authentic country with modern specificity
“And drove home with the windows down to forget it”
Marked by 3 of 8 panel voices. Physical action carrying unspoken grief.
The 12 Metrics
Lyrical Specificity
Concrete imagery, sensory detail, proper nouns, time anchors. The opposite of abstract generalities.
The song lives in a real place with real objects. "Tangerines and someone else's smile" instead of "memories of you."
Imagery Originality
Fresh metaphors, defamiliarized objects, governing images that haven't been written to death.
Images that surprise on first read and deepen on second. No shattered hearts, no oceans of tears, no wings of freedom.
Emotional Truth
The ring-test: does it feel true? Earned emotion, unforced vulnerability, no borrowed sentiment.
The emotion arrives through specificity and honesty, not through telling the listener what to feel.
Voice & POV Integrity
Narrator consistency, perspective clarity, and a credible speaker. Does this sound like one person talking?
A distinct human presence. Word choices, diction, and references that belong to one coherent narrator.
Why scores are hard to game
We built anti-inflation into the scoring system so that high scores actually mean something.
Gravity Rule
The default is 50, not 80. Every point above average must be earned with specific evidence from the lyrics.
Burden of Proof
Scores above 80 require the scorer to cite specific lines and explain why they justify the number.
Antagonist Ceiling
A dedicated critical voice challenges every score. If it finds a real weakness, the score drops.
Historical Context
Scores are anchored to professional craft standards. A 90+ means near-flawless execution across all 12 metrics — intentionally rare.
Methodology: how scoring works
Every song is scored by a separate AI evaluation pass — not the same model that wrote the lyrics. Multiple evaluators with different perspectives must reach consensus on each of the 12 metrics.
A dedicated critical voice challenges every score. If it identifies a real weakness — a cliché, a broken meter, a forced rhyme — the score drops. Unresolved objections cap the composite depending on severity.
This rigorous multi-voice process prevents the inflated scores that single-pass AI evaluation produces. Scores are calibrated relative to professional songwriting craft, not to other AI output.
What “deliberately hard” means: a single-pass AI scorer will give most output 80+. Our multi-voice process produces a distribution centered around 50, because the default assumption is “average until proven otherwise.” Scores above 80 require the scorer to cite specific lines. Scores above 90 require near-flawless execution across all 12 metrics — which is why they are rare in practice, not by arbitrary design.
Grade Scale
Near-flawless across all 12 metrics. Exceptionally rare in practice.
Exceptional. Every line earns its place with cited evidence.
Outstanding. Minor imperfections only.
Strong. Craft is evident throughout.
Good. Solid work with room to grow.
Competent. Foundation is there.
Developing. Moments of promise.
Average. Functional but unremarkable.
Below average. Significant gaps.
Needs fundamental rework.
How the composite score works
Each metric scores 0-100. The composite is a weighted average across the three tiers:
What a score should help you do
See it in action
Every song you forge or evaluate gets a full 12-metric breakdown with reasoning per metric.