Skip to content
All RFCs
RFC-0006in-commentopened 2026-04-26comment until 2026-05-03

Per-cohort engagement methodology + divergence-as-calibration

Pins the methodology behind the five-persona listener panel engagement scores (B1281+) and defines when per-cohort divergence from the composite triggers a rubric calibration response. Empirical input cites the B1340 /admin/cohort-divergence dashboard.

Motivation

The five-persona listener panel produces an engagement score (0-100) per cohort per song. Five personas: Spotify Skimmer, Lyrics Reader, Songwriter Peer, Emotional Listener, Genre Purist. The B1340 /admin/cohort-divergence dashboard surfaces the corpus-wide gap between the composite score and the mean per-cohort engagement.

Today the engagement values are produced by the eval pipeline at the same time the rubric metrics are scored, but the methodology — what each persona is reading the lyric AS, what "engagement" means quantitatively, and how the divergence should drive rubric updates — has never been pinned in writing.

This RFC pins it. Without governance, the persona prompts can silently drift between builds and break every prior measurement.

Proposal

Persona definitions (locked)

Each persona is a weighted reader prior. The eval prompt asks the model to score engagement AS that persona — not to predict how that persona-archetype would react in aggregate. Locked shorthand definitions:

Spotify Skimmer — listens half-attentive at a noisy gym; skips at the first weak section. Engagement ≈ "did I keep listening past the chorus?" Lyrics Reader — reads the lyrics on Genius before/while listening; values literary craft over hook. Engagement ≈ "would I quote this line in a tweet?" Songwriter Peer — fellow craftsperson; reads for technique + originality. Engagement ≈ "would I be jealous of this line?" Emotional Listener — listens during a real personal moment (drive home, breakup, late-night). Engagement ≈ "did this move me?" Genre Purist — knows the genre conventions cold; engagement ≈ "does this earn its place in the canon vs sound like a knockoff?"

Engagement scale (locked at 0-100)

0-39 skip / disengage 40-69 finishes the song, doesn't replay 70-89 replays, may share with one person 90-100 actively recommends to others

These bands match the rubric's grade letter buckets so cohort engagement reads alongside the composite naturally on SongDetail cards.

Divergence-as-calibration trigger

Define per-song divergence: divergence_song = composite − mean(cohort_engagement) And the rolling-30-day median: drift = median(divergence_song over rolling 30 days)

Calibration response per drift band:

|drift| ≤ 10pts no action; rubric calibrated to listener expectation 10pts < |drift| ≤ 20pts record as drift in next Quality Council. No rubric change YET. |drift| > 20pts triggers obligation: next quarterly bump MUST address the gap. Either rebalance metric weights toward what cohorts care about, OR adjust persona prompts (a major operation; treat as MAJOR rubric change).

Per-persona pessimist tracking

The persona with the consistently-lowest mean engagement is the "pessimist." If the same persona holds the pessimist title for 3 consecutive 30-day windows, the persona's prompt gets audit-reviewed in the next Quality Council. Either the prompt is too harsh (recalibrate), the persona is reading something real the rubric isn't seeing (add to anti-inflation rules), or the persona genuinely matches an underserved listener segment (no change; document the finding).

Reproducibility consequence

Every score response's seal field gains a new field `pipeline.cohortEngagementVersion: <int>` (currently implicit at version 1). MAJOR persona-prompt changes bump this; MINOR copy tweaks don't. Lets longitudinal analysis distinguish v1 from v2 cohort scores.

Empirical baseline (today's snapshot)

The B1340 dashboard captures today's per-persona means + drift. That snapshot becomes the v1 baseline. Future MAJOR changes must cite the trajectory from this baseline.

Out of scope

  • Adding new personas. Future RFC; very high bar.
  • Per-genre persona behavior (a country Songwriter Peer reads differently than a hip-hop one). Future RFC; needs a separate dashboard.
  • Cross-persona disagreement metrics (Lyrics Reader 90 vs Spotify Skimmer 30 on the same song). Future RFC.

Comment window

This RFC is open for comment until 2026-05-03. Email support@songforgeai.com with the subject `RFC-0006` to leave a comment.

Resolution

(Pending — will be filled in after 2026-05-03 with a summary of comments + the accepted text. The thresholds above are the working defaults until then.)