The symptom

We installed @sentry/nextjs at Build 1301. The DSN was set in Vercel. Server code called captureException on every error path. The dashboard showed zero events for 24 builds.

This post is the trail. It is published verbatim because the only thing more useful than a fix is a record of why the fix was hard to find — that record is what prevents the next person from spending three days on the same wall.

The bug, in three layers

Layer 1: the instrumentation hook flag

Next.js 14's src/instrumentation.ts file does NOT initialize automatically. It needs an explicit opt-in in next.config.js:

experimental: { instrumentationHook: true }

Without that line, the file is parsed but never imported. The Sentry SDK loads in the function bundle but never runs Sentry.init. Every captureException call no-ops. There is no warning. The framework treats "you have an instrumentation file but the flag is off" as "you must not want it."

Fixed at Build 1322.

Layer 2: the file at the wrong path

With the flag on, Sentry still showed no events. The next discovery: a Next.js 14 project that uses the src/ directory layout (we do) expects src/instrumentation.ts, NOT a project-root instrumentation.ts. Both paths are documented in different parts of the Next.js docs. Ours was at the root because that is where the Sentry CLI scaffold put it.

Moved the file to src/instrumentation.ts at Build 1325. Confirmed via /api/admin/sentry-status that the SDK now reported initialized: true.

Still no events in the dashboard.

Layer 3: the Lambda terminating before the queue drained

Sentry's transport is async — captureException enqueues, then a background flush sends the payload over HTTP. On Vercel's serverless functions the Lambda terminates as soon as the response is sent. Our exceptions were being captured and queued, then the function instance vanished before the queue flushed.

The fix is await Sentry.flush(2000) immediately before returning from any path that captures. Wrapped this into the existing captureException shim at Build 1327 so future call sites get the flush automatically.

First event arrived in the dashboard within seconds of the next deploy.

What made this hard

Each layer masked the next. Fixing layer 1 made the SDK appear initialized. Fixing layer 2 made the call sites appear successful. Only layer 3 — silent termination — was visible from outside.
The "fail loudly" path was a no-op path. A missing flag, a misplaced file, and an early termination all produced the same observable outcome: zero events. There was no failure mode that emitted a warning.
The diagnostic surfaces did not exist. We built /api/admin/sentry-status and /api/admin/sentry-test during the debug, not before. The first hour was spent re-deploying and praying. The next two hours were faster because we could ask the system "are you initialized? is the DSN parsed? where would events route?"

The durable lessons

Build the diagnostic surface BEFORE the bug. Any third-party SDK that ships with documented "this might silently no-op" failure modes deserves a status endpoint and a test endpoint on day one. We did this for triangulation (B1337) — paid for itself within hours of shipping.
Honor framework idiosyncrasies. Both the experimental.instrumentationHook flag and the src/-aware path resolution are documented, just not in the same place. When integrating with a framework, read the framework docs for the specific feature, not the SDK docs for the integration. The SDK author cannot anticipate every framework permutation.
Async + serverless = always flush. Anything that buffers (Sentry, OpenTelemetry, posthog, custom log aggregators) needs an explicit drain before the response returns on Vercel/AWS Lambda/Cloudflare Workers. There is no "Lambda waits for outstanding promises by default" mode that survives termination.
Ratchet the discipline. We added a trust-claims predicate that asserts the instrumentation flag stays in next.config.js, the file stays at src/instrumentation.ts, and the capture shim still calls Sentry.flush. Three CI assertions that mean this exact regression is impossible without a deliberate code change explaining why.

Cost

~3 hours of debugging across 5 builds (1322, 1323, 1325, 1326, 1327). The cost would have been hours-of-confusion at every recurrence forever; the trust-claims ratchets close that.

What this proves about the operating model

The Quality Council cadence (every 3 days, per docs/QUALITY-COUNCIL.md) flagged "Sentry installed but inactive for 24+ builds" as a top internal regression for the cycle. That flag is what triggered the debug session. Without the cadence, the SDK could have stayed quietly broken for months — captureException calls multiplying in the codebase, every one a no-op, and the team accumulating false confidence that errors were being tracked.

The cadence cost us 30 minutes of Council time and saved us from the version of this story where we discover the gap during a real incident, not during a routine check.

How a single missing config flag silently disabled our error tracking for 24 builds