Why do things go wrong? Sometimes, it’s a whole combination of factors. Felix Salmon has some good examples, and reminded me of one of my favourite metaphors: the Swiss cheese model of accident causation.
In the Swiss Cheese model, an organization’s defenses against failure are modeled as a series of barriers, represented as slices of cheese. The holes in the slices represent weaknesses in individual parts of the system and are continually varying in size and position across the slices. The system produces failures when a hole in each slice momentarily aligns, permitting (in Reason’s words) “a trajectory of accident opportunity”, so that a hazard passes through holes in all of the slices, leading to a failure.
It’s a lovely vision, those little accidents waiting to happen, wriggling through the slices of cheese. But as Salmon points out
… it’s important to try to prevent failures by adding extra layers of Swiss cheese, and by assiduously trying to minimize the size of the holes in any given layer. But as IT systems grow in size and complexity, they will fail in increasingly unpredictable and catastrophic ways. No amount of post-mortem analysis, from Congress or the SEC or anybody else, will have any real ability to stop those catastrophic failures from happening. What’s more, it’s futile to expect that we can somehow design these systems to “fail well” and thereby lessen the chances of even worse failures in the future.
Which reminds me of Tony Hoare‘s comment on complexity and reliability
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.