Skip to content
Back to the standard
Metric 8 of 12

Voice & POV Integrity

Expression tierWeight: 40% (tier)Short name: Voice

Narrator consistency, perspective clarity, and a credible speaker. Does this sound like one person talking?

What good looks like

A distinct human presence. Word choices, diction, and references that belong to one coherent narrator.

How SongForgeAI scores it

narrator-profile.ts (Build 951) extracts gender / age / relational tuple from the lyrics and flags cross-section contradictions ("my wife + my husband", "I'm 16 + I'm 40").

Sub-criteria

Inside Voice

Named sub-concepts the eval engine considers when computing this metric’s score. Each one is a discrete signal — ignoring it pulls the metric out of band even when the other dimensions look intact.

POV Consistency

A narrator stays internally coherent across sections. The speaker who said "I take the long way home" in V1 cannot be the speaker who says "we built this city together" in V3 unless the song earned the shift. POV consistency tracks gender / age / relational tuple / diction register; intentional POV switches (gospel call-and-response, K-pop multi-voice, hip-hop features) count as consistent when each voice is internally stable and switches mark structural moments. v1.2.0 of the rubric formalized "intentional POV" as the metric's lens; this sub-criterion names the underlying signal.

Signalsnarrator-profile.ts (B951) extracts gender / age / relational signals per section; cross-section contradictions flagged ("my wife" + "my husband", "I'm 16" + "I'm 40"). Diction-shift detection catches register drift (a mechanic doesn't suddenly quote Rilke). Intentional POV switches are recognized via section-marker patterns + collaborative-form signals.

Failure looks likeA speaker who's 24 in V1 ("I'm too young to know what I want") and 50 in the bridge ("I've lived enough lives to know"); or a narrator who speaks in workplace argot for two verses and suddenly invokes Stoic philosophy in the chorus without earning the shift; or a song that switches from first-person singular to first-person plural in the bridge for no structural reason.

Example score

81/100on this metric

Narrator is consistent across V1 ("I take the long way home") and V3 ("I'm fine. I'm fine.")

WhySame speaker, same emotional register, same diction throughout. Would score higher if the bridge introduced a credible POV expansion (e.g., addressing a specific person by name) without breaking the established voice.

One representative example. Real scores carry a reproducibility seal — verify at /scoring/standard.

Common failure mode

POV drift — the speaker sounds like three different people across three verses.

What it looks like in the wild

  • Verse 1 and verse 3 clearly spoken by the same person in the same emotional weather.
  • Diction that matches the character — a mechanic doesn't quote Rilke.
  • The narrator credibly knows what they claim to know.

Why it’s in the Expression tier

Does it say something worth hearing? Specificity, originality, truth, and voice.

The Expression tier contributes 40% to the final composite score. The tier weight is distributed across its member metrics — no single metric dominates the composite.

Other Expression metrics

The full rubric

Related reading

See this metric scored against real songs

Every song forged through SongForgeAI is scored on Voice. Browse the leaderboard or forge your own to see how lines land on this axis.