Public craft-metrics report
The Real Data Behind SongForgeAI
Every song forged by the pipeline logs its composite score, prosody warnings, detail balance, emotional arc shape, and fire-line peaks into forge_metrics. This page is the aggregate — updated hourly, suppressed below 30-sample cells, no user-identifying data.
Report generated Sat, 06 Jun 2026 07:02:15 GMT. Window: 4/28/2026 to 6/5/2026.
Corpus summary
Songs forged
1,000
Avg composite
81.5
Median composite
84.0
Arc variety
95% non-flat
Composite score distribution
0-49
0.1%
50-59
0.1%
60-69
0.3%
70-79
35.1%
80-89
63.6%
90+
0.8%
Emotional arc shape distribution
flat
5.1%
single-axis
59.5%
shaped
34.7%
volatile
0.6%
By genre
Each slice shows at least 30 songs. Genres below that floor are suppressed to protect small-cell identity.
| Genre | Songs | Avg composite | Prosody/song | Fire-line peak |
|---|---|---|---|---|
| contemporary | 150 | 81.8 | 6.31 | 64.1 |
| folk | 139 | 83.2 | 5.75 | 63.8 |
| alternative | 127 | 79.8 | 6.64 | 62.5 |
| 1990s | 53 | 82.1 | 7.36 | 61.6 |
| indie | 45 | 83.2 | 6.89 | 61.9 |
| christian | 42 | 79.1 | 7.26 | 63.5 |
| arena | 35 | 78.5 | 7.80 | 61.9 |
By quarter
| Period | Songs | Avg composite | Prosody/song | % gauntlet improved | % enriched |
|---|---|---|---|---|---|
| 2026-Q2 | 1,000 | 81.5 | 6.68 | — | 38.9% |
Glossary
- composite score
- 0-100 output of the 12-metric rubric at /scoring/standard. Eight-voice panel with anti-inflation rules (gravity, burden of proof, antagonist ceiling). 50 = the gravity default; 80+ is genuinely strong; 90+ is rare territory.
- prosody warnings
- Count of Pattison-rule violations per song from the deterministic singability scanner. Fewer is better. Catches stress mismatches, weak hook-word placement, vowel-on-low-note-with-cluster, etc.
- arc shape
- Classification of the song’s emotional contour, computed per-section from valence + arousal:
- flat — valence and arousal both sit in a narrow band the whole song. Reads as monotone; usually a wound, not a stylistic choice.
- single-axis — one of valence or arousal moves meaningfully but the other is flat. The song goes dark or goes loud, but not both.
- shaped — both valence and arousal change section to section in a coherent direction (rising, falling, or rising-then-falling). The intended healthy shape.
- volatile — both axes move but the direction is incoherent: a section jumps brighter then darker then brighter again with no accumulating arc. Reads as confused emotional intent.
- fire-line peak
- Highest line-level Ritter memetic score across the lyric. Tries to surface the most quotable / shareable single line.
- % gauntlet improved
- Of songs the gauntlet ran on (mode=‘gauntlet’ rows), fraction whose composite score went UP after the gauntlet rewrite. Build 1183 fix: the prior aggregator counted initial forges in the denominator, biasing this toward 0%. The current number is “of gauntlet attempts, what fraction lifted the score.” Reverted gauntlets aren’t logged to this table at all (separate code path), so they don’t pull this down.
- % enriched
- Fraction of forges that had the SuperPrompt enrichment step run. Caveat: as of Build 1183 only the V2 forge UI sets this flag; the dominant V1 path doesn’t log it even when SuperPrompt fires. So a low number here (e.g. 2%) reflects V2 routing share, NOT how often enrichment actually ran. Tracking under-coverage; will be fixed when V1 logging is wired up or V1 is retired.
- Berklee batch lift
- Composite + prosody comparison before vs after Build 935, when Pattison prosody rules and Stolpe destination-writing priors were pushed directly into the forge + gauntlet system prompts. A positive composite delta + a negative prosody delta is the target.
- arc variety
- The percentage of songs whose arc shape is NOT ‘flat’ (i.e. everything that’s single-axis, shaped, or volatile combined). High = the pipeline is producing songs with motion; low = monotone output.
Methodology + disclosure
- All data sourced from the
forge_metricstable, populated fire-and-forget after every completed forge. - Any slice below 30 samples is suppressed. No user, song, or lyric identifiers are included in this report.
- This report is updated hourly via ISR. Refresh to pull the latest snapshot. The open rubric is at /scoring/standard.