Operating principles·reviewed 2026-04-24

How decisions get made.

The 10 principles that govern what SongForgeAI ships, cuts, publishes, and refuses. Each principle is paired with the inversion we reject — saying what we value is empty without naming what we will not do for it.

The 10 principles

1
Lyrics that feel lived, not lyrics that sound like songs.
The product exists to help writers reach the second category. Every feature is judged against whether it moves output toward "someone was in the room." If a feature makes lyrics more fluent without making them more true, it loses.
We reject: Polishing techniques that produce smoother surfaces without earning more depth.
2
Publish the rubric. Publish the numbers. Publish the limitations.
The Lyric Scoring Standard is open (CC BY 4.0). The model card documents which Anthropic model + temperature produces every score. The quarterly metrics report shows the rubric applied to the real corpus. If we can't explain it publicly, we won't ship it as authoritative.
We reject: Closed proprietary scoring that buyers cannot independently verify or critique.
3
Anti-inflation is load-bearing.
A 50 is average. A 90+ is rare. The Gravity Rule, Burden of Proof, Antagonist Ceiling, and Historical Context anchors exist because a rubric that flatters every output loses meaning. We will defend the calibration even when it lowers customer scores.
We reject: Score inflation in response to user feedback. If users want higher scores, the answer is better lyrics, not a softer rubric.
4
Every CI gate ratchets in one direction: stricter.
The explicit-any allowlist went from 194 to zero. The console-log ceiling only moves down. Bundle-size, a11y, visual-regression — every gate accumulates discipline. Loosening a ratchet requires a written reason, not a hotfix toggle.
We reject: Skipping CI ("we'll fix it next sprint"). The next sprint never fixes it.
5
Reproducibility before optimization.
Every score response carries a seal: rubric version + model id + temperature + git SHA. Every deploy carries a fingerprint visible in the footer. Performance work is welcome; performance work that breaks reproducibility is not.
We reject: Black-box optimizations that change scores without bumping the rubric version.
6
The user owns their work, including the right to leave.
Forged songs export as Suno-ready text. Scores carry a verifiable seal. We do not lock outputs behind format moats. Account deletion is permanent and complete. Refunds are honored without an interrogation.
We reject: Lock-in features that punish users for evaluating alternatives.
7
Ship the smallest possible verifiable thing.
A bug fix does not need surrounding cleanup. A one-shot operation does not need a helper. Three similar lines beats a premature abstraction. Builds are numbered + dated + linked to a git SHA so every change can be reverted as a single unit.
We reject: Refactor-and-feature commits that mix scope and resist rollback.
8
Honest mechanics over soft marketing.
The model card lists known limitations. The quarterly metrics report includes the failure modes alongside the wins. The /about page names the founder. We would rather lose a deal to a competitor than win one with a claim we cannot defend.
We reject: Capability claims that survive only because no one stress-tested them.
9
Cost the user, then cost ourselves.
Pricing is published. Tier limits are explicit. The cost-per-song instrumentation is internal but the unit economics inform the public roadmap. We will absorb cost increases before passing them through; we will not engineer dark-pattern upsells when usage spikes.
We reject: Surprise charges. Surprise rate-limit downgrades. Surprise feature paywalling.
10
Engineering excellence becomes credible only when it is published.
Our edge is legibility — the published rubric, the open metrics, the model card, the operating principles, the changelog, the postmortems, the RFCs. Anyone can claim discipline; we want every claim to be cite-able from a public artifact, dated and signed by a deploy SHA.
We reject: Engineering polish that exists only inside the codebase, invisible to the people we serve.

Decision gates

The principles above translate into operational rules. When a decision arrives, the question routes through one of these gates.

Should this feature ship?

Does it advance lyrics-that-feel-lived? Is the smallest verifiable version possible? Does it respect every active CI ratchet? If yes to all three: ship.

Should this score-system change ship?

Does it bump the rubric version? Is the bump documented in the changelog with a migration note? Are golden evals updated? If no: do not ship; defer until the version bump is real.

Should this CI gate be loosened?

Only with a written reason in the commit message naming the underlying problem. "Tests are flaky" is not a reason; "this test depends on Anthropic API determinism we no longer guarantee, replaced by snapshot test in the same commit" is.

Should this be marketing copy or honest mechanics?

If the claim cannot be cited from public docs (rubric, model card, metrics, principles), rewrite the claim until it can.

When we contradict ourselves, call it.

These principles are public so they can be enforced from outside. If a SongForgeAI feature ships in a way that contradicts a principle listed here, the regression is on us. Email support@songforgeai.com with the contradiction and we will either fix the product or update this page in plain language.

The 10 principles

Lyrics that feel lived, not lyrics that sound like songs.

Publish the rubric. Publish the numbers. Publish the limitations.

Anti-inflation is load-bearing.

Every CI gate ratchets in one direction: stricter.

Reproducibility before optimization.

The user owns their work, including the right to leave.

Ship the smallest possible verifiable thing.

Honest mechanics over soft marketing.

Cost the user, then cost ourselves.