Motivation

The scoring rubric was calibrated against an English-language corpus. Anti-inflation rules are calibrated to English idioms. The 87-cliché banned-terms dictionary is English (B884). The Anti-Platitude rule (RFC-0002) catalogues English platitudes patterns ("all I need is love", "love wins"). Every non-English lyric currently scored therefore depends on the model's implicit translation back to English in its head.

That is operationally fine for occasional non-English forges (the operator pipes through "Latin chant" today and gets something coherent). It is NOT fine if multi-language becomes a first-class product surface, because:

1. The reproducibility seal claims "this rubric scored these lyrics" — but the actual scoring is "this English rubric scored its translation of these lyrics," which is a different operation. 2. The cliché dictionary has zero coverage in Italian, Spanish, French, Japanese, Latin. A truly platitudinous Italian line scores HIGHER than its English equivalent because no rule catches it. 3. Rubric-trained band labels ("85+ = standout") were calibrated against English songs across genre. We have no evidence the same numerical score in Italian correlates with the same listener-engagement reality.

This RFC pins the methodology BEFORE the implementation lands, so the v1 multi-language scoring is honest about its limits.

Operator context

An operator-initiated audit (B1398 era) raised the question: "Could a user ask for a Gregorian Chant in Latin, an opera in Italian?" The audit found: feasible, low marginal cost, real strategic value. The phased rollout was planned but not yet implemented because the SCORING piece needed RFC discussion first. This RFC is that discussion.

Proposal

Phase 1 — language parameter only (forge generation)

Ships INDEPENDENTLY of this RFC's resolution. Adds:

`language` field to forge request schema (default 'en')
Language dropdown on /forge UI
Prompt instruction: "generate lyrics in <language>"
Stored on songs row in a new `language` column

Phase 1 does NOT change scoring. Scores produced for non-English lyrics carry a SEAL ANNOTATION: `seal.languageScored: 'en' | 'native'` which reads 'en' for everything in v1 (the rubric ran on the English-translated form in the model's head, even though the lyrics are Italian). This is the honest disclosure.

Phase 2 — language-aware rubric loader

For each supported non-English language, ship:

Language-specific banned-terms list (curated by a native speaker, NOT machine-translated)
Language-specific platitude patterns extending RFC-0002
Language-specific genre-fit conventions (e.g., Italian opera has different recitative-vs-aria expectations than English musical theatre)

Stored as `scoring-rubric-<lang>.json` alongside the canonical `scoring-rubric.json`. The eval pipeline picks the right one based on the song's `language` field.

When Phase 2 ships for a language, that language's seal flips to `languageScored: 'native'`. Old scores keep their 'en' seal — they were scored in English-mode and the seal must reflect that.

Phase 3 — calibration corpus per language

Per-language extension of RFC-0008 (open scoring corpus). For each Phase-2 language:

At least 50 human-scored native-language lyrics in the corpus before that language's rubric is considered "calibrated"
At least 1 native-speaker songwriter-rater confirmed in the verified contributors pool
Quarterly cross-language drift report (RFC-0007 extension)

Until Phase 3 is met for a language, the language's rubric is flagged `calibration: 'thin'` in the response. Third parties querying `/api/v1/score` with non-English lyrics in a thin- calibration language get an explicit advisory in the seal:

`seal.calibrationAdvisory: 'thin' | 'full'`

Languages in scope for Phases 1-2

Latin (chant + classical) Italian (opera + cantautori) Spanish (boleros + reggaeton + flamenco) French (chanson + hip-hop) Japanese (J-pop + enka)

Picked because each has a clear native musical canon the operator's audit identified. Excludes:

Chinese (different rhyme/meter system; future RFC)
Arabic (right-to-left; renders + scoring need separate work)
Hindi/Tamil/etc. (need contributor-network we don't have yet)

What stays language-agnostic

The rubric DIMENSIONS don't change. Specificity is still specificity in Italian. Voice is still voice in Latin. Arc is still arc. What changes per-language is:

The cliché dictionary (different set of banned platitudes)
The platitude pattern catalog (different idiomatic constructions)
The genre-fit definitions (different canonical genres)
The example anchors in the rubric prose (an Italian "85" example, not an English one translated)

The 12 metric definitions, the weights (Craft 25 / Expression 40 / Impact 35), and the 0-100 scale stay identical.

Reproducibility consequence

The seal field gains TWO new fields: `seal.lyricsLanguage: 'en' | 'la' | 'it' | 'es' | 'fr' | 'ja'` `seal.languageScored: 'en' | 'native'` `seal.calibrationAdvisory: 'thin' | 'full'`

A score from Phase 1 (English rubric on Italian lyrics): `{ lyricsLanguage: 'it', languageScored: 'en', calibrationAdvisory: 'thin' }`

A score from Phase 2 once Italian rubric ships: `{ lyricsLanguage: 'it', languageScored: 'native', calibrationAdvisory: 'thin' }`

A score after 50+ Italian corpus entries (Phase 3): `{ lyricsLanguage: 'it', languageScored: 'native', calibrationAdvisory: 'full' }`

This makes it impossible to silently cite an under-calibrated non-English score as if it had the same authority as an English score.

Acceptance criteria

This RFC accepts when ALL of:

1. The 7-day public comment window closes 2. The Phase 1 implementation has shipped (language param + UI dropdown + 'en' seal annotation) 3. At least one native speaker per Phase-2 language has committed to draft the language-specific rubric extensions 4. The seal annotation contract is locked in code (so future non-English scores carry the honest disclosure even before Phase 2 ships)

If criterion 3 doesn't land within 90 days for any language, that language is dropped from the Phases 2-3 list and a follow-on RFC pins what to do (open contributor call, deprioritize, etc.).

Why ship this RFC NOW (before implementation)

Three reasons:

1. **Reproducibility seal honesty.** If Phase 1 ships without the seal annotation contract, every non-English score after Phase 1 is silently mis-attributed (the seal claims it was scored under v1.x of the rubric, but the rubric was calibrated against English). Pinning the seal contract NOW means Phase 1's first non-English forge produces an honest record from day one.

2. **Calibration discipline.** Without a published policy on what "calibrated for Italian" means, the first language extension will silently set a precedent. This RFC pins the precedent BEFORE someone has to defend it after-the-fact.

3. **Contributor recruitment.** Knowing what we need (50 human-scored Italian lyrics + 1 verified contributor) gives us something concrete to ask for. Without the spec we can't open the recruitment channel.

Out of scope

Real-time translation (user types English, model outputs Italian) — different feature, future RFC
Multi-language Crucible (the 8-voice critique panel speaks English; would need 8 native-language voice prompts per language)
Multi-language /admin surfaces (operator surfaces stay in English)
Per-language pricing / rate limits

Comment window

This RFC is open for comment until 2026-05-03. Email support@songforgeai.com with the subject `RFC-0009` to leave a comment.

Resolution

(Pending — will be filled in after 2026-05-03 with a summary of comments received and the accepted text. Phase 1 implementation is in-flight under Punch List #42; the seal annotation contract from this RFC will be implemented even if Phase 1 lands first, via a backfill build.)