Failure as a Clue: What Postmortems Get Wrong

Most postmortems feel performative.

Not on purpose.
Not because people don’t care.
But because the structure of the meeting forces everyone into a narrow, safe narrative:

“Let’s explain what happened without offending anyone, making too many waves, or triggering a political problem.”

When you do that, you get a timeline, not a diagnosis.
A sequence of events, not a pattern.
A story, not a signal.

And then—surprise—nothing changes.

The same issues return, the same failures repeat, and the organization chalks it up to “bad luck” or “another outage” instead of acknowledging the structural roots that made the failure inevitable.

This is the part most people miss:

Failures aren’t anomalies. They’re clues.

They’re diagnostic data points revealing how the system actually behaves—regardless of how leaders believe it behaves.

And when you treat failure as a clue instead of a crime scene, everything changes.

The Three Conversations Happening in Every Postmortem

There are always three layers swirling under the surface:

1. The technical sequence

What happened in what order.

2. The organizational forces

Incentives, shortcuts, political pressures, time constraints, and hidden rules.

3. The human layer

Fear, expectations, risk appetite, communication gaps, trust levels.

Most companies focus only on #1, politely avoid #3, and never touch #2.

But the truth is simple:

Most failures originate in layer #2 and only express themselves in layer #1.

If you don’t look at incentives, constraints, and misalignment, you’re not doing a postmortem.
You’re just documenting an autopsy.

Why the Most Important Clues Never Make It into the Report

When a system fails, people instinctively try to:

soften blame
avoid sounding critical
protect coworkers
avoid managerial consequences
minimize the perceived scope
restore confidence

So the real signals—the structural ones—get deleted:

“We’ve been avoiding this migration because the team is underwater.”
“We rely on a person, not a process, for this decision.”
“Everyone knows this part of the system is fragile, we just hope it holds.”
“Product promised something engineering couldn’t deliver.”
“We cut corners because leadership needed a demo.”
“We’ve normalized this failure mode for so long it doesn’t even register.”

These are the actual causes.

But they’re almost never written down.

Why?

Because they implicate structures, not individuals.
And structures don’t defend themselves—but the people inside them do.

The Difference Between Postmortems That Change Something and Postmortems That Don’t

The teams that get better follow a simple rule:

Postmortems are not about blame. They are about reality.

And reality doesn’t care about feelings, titles, departments, or the organizational chart.

Reality cares about:

constraints
incentives
throughput
cognitive load
process mismatches
communication structure
system design
error pathways

If you want real improvement, you investigate the system, not the people.

You treat a failure as a symptom, not a sin.

And you ask the most powerful question in any diagnostic investigation:

“What made this the predictable outcome?”

Because it was predictable.

Failures always follow the path of least resistance.

A Better Way: The Diagnostic Postmortem

The Diagnostic method focuses on structures, incentives, and design—not blame.

Here’s the approach I’ve seen transform organizations:

1. Start with truth, not comfort

Reward clarity, not self-protection.

2. Look for incentive fingerprints

The outcome tells you whose incentives were misaligned.

3. Identify structural weaknesses, not individual mistakes

Most “errors” were set in motion long before a human touched anything.

4. Ask what the system allowed—not what the engineer did

If a person could make a catastrophic mistake, the system was designed to allow it.

5. Extract the reusable pattern

Every major failure has a sibling hiding somewhere else.

6. Make the system safer for the next engineer

If success depends on heroics, you don’t have reliability—you have gambling.

Why This Matters

Any organization can build dashboards.
Any team can build automation.
Any stack can scale with the right budget.

But very few teams build the ability to learn from failure.

The companies that do become resilient.
Everyone else becomes lucky.

The difference is whether leaders treat failure as:

evidence
or
embarrassment

Because here’s the truth:

Systems don’t break because people are bad.
They break because the system was designed in a way that made failure easy.

And that is the most actionable clue you’ll ever get.

If You Want to Go Deeper

This post is the written foundation of one of my most requested talks:

“Postmortems Are Autopsies: How to Actually Learn From Failure.”

If you’re interested in having me speak at your engineering organization, SRE team, or conference, you can find more details at:

👉 kevinmmiller.us/speaking

Failures aren’t the enemy.
They’re messages.
And they’re telling you something important—if you’re willing to listen.