Behind the Scenes2026-02-185 min readBy the SongForgeAI team

I Scored My Own Lyrics and Got a 54. Here Is What I Learned.

A songwriter puts their best work through the 12-metric scoring system. The results are humbling, specific, and immediately useful.

We built the scoring system. We know the rubric. We know what every metric measures, what the anti-inflation rules do, how the multi-voice evaluation works. So when we put one of our own lyrics through it and got a 54, we paid attention.

What we submitted

A song we genuinely liked. Solid chorus. Clean rhyme scheme. A bridge that felt earned. The kind of lyric that, in a normal writing session, would have felt finished. We submitted it to Score My Lyrics expecting a comfortable 75.

What the scorer said

Composite: 54. Grade: C+.

The Craft tier scored fine — Prosody 68, Structure 65, Rhyme 70, Economy 62. Mechanically competent. The writing knew what it was doing.

Expression destroyed us. Specificity: 36. The scorer cited three lines that "could appear in any song about this topic without modification." Imagery Originality: 45 — "relies on stock metaphors that have appeared in thousands of songs." Emotional Truth: 51 — "the narrator describes feeling something but never demonstrates it through action or observed detail."

Impact was mixed. Memorability: 58 — "the chorus is singable but not quotable." Transcendence: 38 — "no single line that would stop a listener."

What we learned

Competence is not quality. The song did everything right mechanically. Structure, rhyme, meter — all functional. But functional is average, and average is 50. The scoring system does not reward doing things correctly. It rewards doing things memorably.

We were writing to sound like a song instead of writing to say something specific. Every line could have been in a different song. That is not a craft problem — it is a specificity problem. The images were not ours. They were the images any songwriter would reach for given the same topic.

The per-metric breakdown was more useful than the composite. The composite said "below average." The metrics said exactly why and exactly where. Specificity 36 is an instruction: replace the generic images with ones that belong to this narrator in this moment. That is actionable. A composite score alone is not.

What we did about it

We put the lyric through Refine Mode with preservation at 40% — enough to rebuild the weak sections while keeping the core. The chorus stayed. The bridge stayed. Two verse lines got replaced with physical details from the actual moment that inspired the song. The revised version scored 78.

The 24-point jump was not magic. It was specificity. The same emotion, the same narrator, the same story — but with details that belonged to one person in one place instead of anyone anywhere.

Why we are telling you this

If the team that built the scorer cannot casually hit 75, neither can you. And that is the point. A score that is easy to earn is a score that means nothing. The value of SongForgeAI's scoring is that when you do hit 80, you know it is real. When you hit 85, you know you wrote something genuinely strong. And when the Specificity metric says 36, you know exactly what to fix.

Try scoring your own lyrics. The number might be humbling. The per-metric breakdown will be useful.

The Line That Almost Killed the Song (And What Editing Really Means)

Case Study: The Punk Song That Refused to Stop Being Polite