Motivation
The scoring rubric was calibrated against an English-language corpus. Anti-inflation rules are calibrated to English idioms. The 87-cliché banned-terms dictionary is English (B884). The Anti-Platitude rule (RFC-0002) catalogues English platitudes patterns ("all I need is love", "love wins"). Every non-English lyric currently scored therefore depends on the model's implicit translation back to English in its head.
That is operationally fine for occasional non-English forges (the operator pipes through "Latin chant" today and gets something coherent). It is NOT fine if multi-language becomes a first-class product surface, because:
1. The reproducibility seal claims "this rubric scored these lyrics" — but the actual scoring is "this English rubric scored its translation of these lyrics," which is a different operation. 2. The cliché dictionary has zero coverage in Italian, Spanish, French, Japanese, Latin. A truly platitudinous Italian line scores HIGHER than its English equivalent because no rule catches it. 3. Rubric-trained band labels ("85+ = standout") were calibrated against English songs across genre. We have no evidence the same numerical score in Italian correlates with the same listener-engagement reality.
This RFC pins the methodology BEFORE the implementation lands, so the v1 multi-language scoring is honest about its limits.
Operator context
An operator-initiated audit (B1398 era) raised the question: "Could a user ask for a Gregorian Chant in Latin, an opera in Italian?" The audit found: feasible, low marginal cost, real strategic value. The phased rollout was planned but not yet implemented because the SCORING piece needed RFC discussion first. This RFC is that discussion.
Proposal
Phase 1 — language parameter only (forge generation)
Ships INDEPENDENTLY of this RFC's resolution. Adds:
- `language` field to forge request schema (default 'en')
- Language dropdown on /forge UI
- Prompt instruction: "generate lyrics in <language>"
- Stored on songs row in a new `language` column
Phase 1 does NOT change scoring. Scores produced for non-English lyrics carry a SEAL ANNOTATION: `seal.languageScored: 'en' | 'native'` which reads 'en' for everything in v1 (the rubric ran on the English-translated form in the model's head, even though the lyrics are Italian). This is the honest disclosure.
Phase 2 — language-aware rubric loader
For each supported non-English language, ship:
- Language-specific banned-terms list (curated by a native speaker, NOT machine-translated)
- Language-specific platitude patterns extending RFC-0002
- Language-specific genre-fit conventions (e.g., Italian opera has different recitative-vs-aria expectations than English musical theatre)
Stored as `scoring-rubric-<lang>.json` alongside the canonical `scoring-rubric.json`. The eval pipeline picks the right one based on the song's `language` field.
When Phase 2 ships for a language, that language's seal flips to `languageScored: 'native'`. Old scores keep their 'en' seal — they were scored in English-mode and the seal must reflect that.
Phase 3 — calibration corpus per language
Per-language extension of RFC-0008 (open scoring corpus). For each Phase-2 language:
- At least 50 human-scored native-language lyrics in the corpus before that language's rubric is considered "calibrated"
- At least 1 native-speaker songwriter-rater confirmed in the verified contributors pool
- Quarterly cross-language drift report (RFC-0007 extension)
Until Phase 3 is met for a language, the language's rubric is flagged `calibration: 'thin'` in the response. Third parties querying `/api/v1/score` with non-English lyrics in a thin- calibration language get an explicit advisory in the seal:
`seal.calibrationAdvisory: 'thin' | 'full'`
Languages in scope for Phases 1-2
Latin (chant + classical) Italian (opera + cantautori) Spanish (boleros + reggaeton + flamenco) French (chanson + hip-hop) Japanese (J-pop + enka)
Picked because each has a clear native musical canon the operator's audit identified. Excludes:
- Chinese (different rhyme/meter system; future RFC)
- Arabic (right-to-left; renders + scoring need separate work)
- Hindi/Tamil/etc. (need contributor-network we don't have yet)
What stays language-agnostic
The rubric DIMENSIONS don't change. Specificity is still specificity in Italian. Voice is still voice in Latin. Arc is still arc. What changes per-language is:
- The cliché dictionary (different set of banned platitudes)
- The platitude pattern catalog (different idiomatic constructions)
- The genre-fit definitions (different canonical genres)
- The example anchors in the rubric prose (an Italian "85" example, not an English one translated)
The 12 metric definitions, the weights (Craft 25 / Expression 40 / Impact 35), and the 0-100 scale stay identical.
Reproducibility consequence
The seal field gains TWO new fields: `seal.lyricsLanguage: 'en' | 'la' | 'it' | 'es' | 'fr' | 'ja'` `seal.languageScored: 'en' | 'native'` `seal.calibrationAdvisory: 'thin' | 'full'`
A score from Phase 1 (English rubric on Italian lyrics): `{ lyricsLanguage: 'it', languageScored: 'en', calibrationAdvisory: 'thin' }`
A score from Phase 2 once Italian rubric ships: `{ lyricsLanguage: 'it', languageScored: 'native', calibrationAdvisory: 'thin' }`
A score after 50+ Italian corpus entries (Phase 3): `{ lyricsLanguage: 'it', languageScored: 'native', calibrationAdvisory: 'full' }`
This makes it impossible to silently cite an under-calibrated non-English score as if it had the same authority as an English score.
Acceptance criteria
This RFC accepts when ALL of:
1. The 7-day public comment window closes 2. The Phase 1 implementation has shipped (language param + UI dropdown + 'en' seal annotation) 3. At least one native speaker per Phase-2 language has committed to draft the language-specific rubric extensions 4. The seal annotation contract is locked in code (so future non-English scores carry the honest disclosure even before Phase 2 ships)
If criterion 3 doesn't land within 90 days for any language, that language is dropped from the Phases 2-3 list and a follow-on RFC pins what to do (open contributor call, deprioritize, etc.).
Why ship this RFC NOW (before implementation)
Three reasons:
1. **Reproducibility seal honesty.** If Phase 1 ships without the seal annotation contract, every non-English score after Phase 1 is silently mis-attributed (the seal claims it was scored under v1.x of the rubric, but the rubric was calibrated against English). Pinning the seal contract NOW means Phase 1's first non-English forge produces an honest record from day one.
2. **Calibration discipline.** Without a published policy on what "calibrated for Italian" means, the first language extension will silently set a precedent. This RFC pins the precedent BEFORE someone has to defend it after-the-fact.
3. **Contributor recruitment.** Knowing what we need (50 human-scored Italian lyrics + 1 verified contributor) gives us something concrete to ask for. Without the spec we can't open the recruitment channel.
Out of scope
- Real-time translation (user types English, model outputs Italian) — different feature, future RFC
- Multi-language Crucible (the 8-voice critique panel speaks English; would need 8 native-language voice prompts per language)
- Multi-language /admin surfaces (operator surfaces stay in English)
- Per-language pricing / rate limits
Comment window
This RFC is open for comment until 2026-05-03. Email support@songforgeai.com with the subject `RFC-0009` to leave a comment.
Resolution
(Pending — will be filled in after 2026-05-03 with a summary of comments received and the accepted text. Phase 1 implementation is in-flight under Punch List #42; the seal annotation contract from this RFC will be implemented even if Phase 1 lands first, via a backfill build.)