Think about it for a moment. Do 10% of spreadsheets contain errors? Or 20% (for the pessimists among you)? These rates are high, and should be enough to make alarm bells ring, but the actual rates are probably far higher.
A few years ago Professor Ray Panko, at the University of Hawaii, pulled together the available evidence from field audits of spreadsheets. These are the results he shows:
|Coopers & Lybrand, 1997||23||21||91%|
|Butler (HMCE), 2000||7||6||86%|
More recently Lawrence and Lee analysed 30 project financing spreadsheets. All 30 had errors; the error rate was 100%.
It’s difficult to know how to interpret these results. They are certainly very high numbers, and send a chill down my spine. However, in terms of all the spreadsheets out there in the world, these error rates may be:
- because not all the errors were caught in the audit. Spreadsheet reviewers and auditors are subject to human error like the rest of us, and depending on how long they spent on the audit may well have missed some of the errors.
- because the sample of spreadsheets chosen for audit was biased. Possibly only those that were considered to be most important, and over which the greatest care had been taken, were selected.
- because the sample of spreadsheets chosen for audit was biased. Possibly only those that were considered to be most likely to have errors in were selected.
- Not comparable
- because different definitions of significant errors were used in the different studies.
Cell error rates
Other studies surveyed by Panko show that the error rate per cell is between 0.38% and 21%. These results are difficult to interpret: are they percentages of all cells, cells containing formulae, or unique formulae? (If a formula is copied down a row or column, it may count as many formula cells, but is only one unique formula). If we assume a rate of 1% of unique formulae having errors, and look at spreadsheets containing from 150 to 350 unique formulae, we find that the probability of an individual spreadsheet containing an error is between 78% and 97%. This is (obviously) a high number, but is reasonably consistent with the field audit results discussed above.
Lawrence and Lee found that 30% of the spreadsheets they reviewed had errors in over 10% of unique formulae; one spreadsheet had errors in more than one in five of unique formulae. Interestingly, this was the smallest spreadsheet, showing that error rates don’t necessarily increase with complexity.
To make matters worse, people tend to overestimate their own capabilities. Panko describes an experiment in which people were asked to say whether they thought spreadsheets that they had developed contained errors. On the basis on their responses, about 18% of the spreadsheets would have been wrong; the true figure was 81%. The actuary who told me “As far as I am concerned, none of my spreadsheets has ever had a bug in it” was probably deluding himself.
One source of this over confidence is probably lack of testing and thorough review. If you don’t think that your spreadsheet has errors in, you may not bother testing it, and so never find the errors. Nothing ever happens to make you revise your view.
It’s extremely likely that a large proportion of spreadsheets contain errors. People don’t realise just how large that proportion is, and also have misplaced confidence in their own spreadsheets.
The following external links are relevant: