Skip to content
All incidents
low2026-04-25duration n/a

This page exists. There is no first incident yet.

Inaugural entry. The infrastructure is live so the moment something breaks, the writeup ships within the same week.

Impact

None.

Severity

low

Prevention

The /incidents page exists. The publication discipline does not require an incident to exist — it requires the channel to exist before one does.

Why publish this empty?

The Linear List item #50 ("Postmortems published") is a discipline commitment, not a feature toggle. Shipping the page only after the first incident would mean that on the morning of the first real outage, the team is also negotiating the publication template, the URL structure, and whether to publish at all.

Building the channel before it's needed pre-commits us to the discipline. The first real incident gets a writeup at `/incidents/<slug>` with no debate.

What gets published here

  • User-visible degradation lasting more than 5 minutes
  • Data loss or exposure of any duration
  • Scoring or pipeline regressions that altered published results
  • Security advisories and fixes (after coordinated disclosure)

What does not get published here

  • Internal-only outages with no user impact
  • Bugs caught and fixed in the same hour
  • Routine deploys
  • Any incident that's still actively being remediated (we wait for the dust to settle so the writeup tells the truth, not the panic)

The template

Every postmortem includes:

1. **Timeline** — what happened and when, in absolute timestamps 2. **Root cause** — the underlying problem, not the proximate trigger 3. **Detection** — how we found out, how long that took 4. **Mitigation** — what restored service 5. **Prevention** — what changed so it can't recur

The post is plainly worded. No corporate fog. Anyone reading should walk away knowing exactly what broke and what we did about it.