Skip to content
Operating principles
Maintainers’ posture

Audit yourself in public.

A pristine never-corrected published standard is statistically improbable. The question is whether the maintainers will surface their own drift or wait to be caught.

This page is the canonical home of the posture: the policy, the log, and the deadline contract on every finding.

The policy

1. Disclosure happens before the patch.

When operational data surfaces a bias or methodology drift in the published rubric, the public disclosure ships first. The fix follows on a dated timeline, committed in the disclosure post itself. The order matters: a maintainer who silently patches and then announces is not auditing themselves; they are announcing.

2. Methodology lives next to the finding.

Every disclosure ships with the methodology document explaining how the bias was measured, the ranked hypotheses for root cause, the fix plan with build numbers, and the validation criteria. The disclosure is not a press release; it is a research artifact.

3. The fix carries a public deadline.

Every disclosure names a specific build (BNNNN) and date by which the fix must ship. Missing the deadline is itself a disclosure event — a follow-up post explains the slip and the new deadline. Slippage without disclosure is a worse trust violation than the original drift.

4. Validation is published.

When the fix ships, the validation result lands on this page as a follow-up entry on the same disclosure. The public sees the gap close in numbers. If the gap doesn’t close, the next phase of work begins, also documented publicly.

5. Old scores stay valid against their rubric version.

Retroactively re-scoring production scores after a rubric correction damages trust more than the original drift. Every score response carries a reproducibility seal listing rubric version + model + temperature + build SHA. Old scores remain verifiable against their original rubric version; new forges run against the corrected version. Both seals are valid.

Disclosure log

Every public bias / drift finding the maintainers surfaced before any external party did. Append-only; entries are never deleted. Latest first.

  • Cross-language scoring drift (+6.2 points)

    2026-05-03 · B1980in-progress

    A 200-song operational corpus showed the rubric scoring non-English texts (Latin, Italian, Spanish, French) approximately 6.2 points higher than English texts of comparable craft. 16 of 16 songs at 91+ were Latin or Italian.

    Fix deadline: 2026-05-17 (Phase 5 validation, B2000+)

    Read the full disclosure

What triggers a disclosure

  • Operational corpus audit surfaces a measurable bias. The May 3 finding (cross-language drift) was triggered by an operator-supplied 200-song export. The cadence ritual now schedules audits every 14 days.
  • External auditor reports a finding that we missed. If a researcher, third-party implementer, or user identifies drift the cadence didn’t catch, the disclosure ships within 7 days of verification — same template, same posture.
  • Trust Decay Audit catches a claim-vs-reality drift. Every 14 days the audit walks public claims against verified implementation facts. Drifts past 7 days are a ratchet failure — disclosed as a follow-on.
  • Phase milestone validation lands. Every disclosure has phases; each phase’s validation result lands here as a follow-up entry on the original disclosure. The reader sees the full arc, not a press release.

Adjacent transparency surfaces