Back to writing

The best thing I've read all week aside from the fiction I now read to give my brain a creative break is an engineering postmortem.

Anthropic published a detailed breakdown of seven weeks of degraded Claude performance. It's the kind of transparency I appreciate seeing from a company. I couldn't stop thinking about what it reveals, not just about Claude, but about this new world we're all operating in now.

For weeks, clients were asking me: "Is Claude slower? Is it burning through tokens faster?" I did not have a good answer. I was paying more for a product that felt degraded. In this new world, that is just a thing that can happen.

On April 23, Anthropic published a postmortem about seven weeks of degraded performance. Worth reading in full. What I want to unpack is what it reveals about the world we are actually operating in.

Non-deterministic systems fail non-deterministically.

Classical software breaks in predictable ways: a function returns the wrong value, you find the bug, you fix it. What happened here was three unrelated changes, each affecting a different slice of traffic on a different schedule. None looked like an incident on its own. Together they looked like random, diffuse degradation. "It feels worse" is valid signal in this world, but it is not a stack trace, and it does not trace back to one commit.

Feedback is load-bearing infrastructure.

Anthropic credited the users who filed feedback or posted reproducible examples. That's not a thank-you note. That's a description of how the bug was found. In a probabilistic system, your users are part of the test suite. Feedback is not a support channel. It is observability.

Quality has no SLA. Response time does. Intelligence does not.

In the old world, failures are binary and auditable. S3 is up or it is down. An incident is declared, a status page goes up, a timeline gets published. You can write a test that catches a regression. In this new world, the vendor's internal decisions, like a system prompt change, a caching optimization, or a reasoning effort downgrade, affect your output quality invisibly, without your knowledge, without an error code. An incident can run for seven weeks before it is declared as one.

The accountability bar just moved.

Root causes. Timelines. What internal tests missed and why. Usage limits reset for all subscribers. That's a vendor treating degradation as a debt, not a regrettable inconvenience. That bar should travel.

In this new world, your AI vendor is a runtime dependency. How they fail, and how they respond when they do, is part of the evaluation.

This New World Has New Rules · Big Light Studio