Categories
Notes Old site

What is a bug?

In computing parlance, unlike normal life, bugs and viruses have nothing to do with each other. A bug is simply a fault, or error, while a virus is a malicious program that propagates from computer to computer by hiding itself inside another program or document.

Legend has it that the term bug was invented by Grace Murray Hopper, a Rear Admiral in the US Navy, who was one of the pioneers of computing. Early computers were huge machines made of relays and valves and wires and so on; compared to today’s sleek laptops or PDAs they were veritable Heath Robinson contraptions. Anyway, they were open to the atmosphere. Hopper tells the story:

Things were going badly; there was something wrong in one of the circuits of the long glass-enclosed computer. Finally, someone located the trouble spot and, using ordinary tweezers, removed the problem, a two-inch moth. From then on, when anything went wrong with a computer, we said it had bugs in it.

Hopper’s team introduced a new term into computing jargon when they said that they had debugged the machine. However, contrary to popular legend, the term bug had been in use since 1878, or even earlier, when Edison used it to refer to a flaw in a system.

Resources

The following external links are relevant:

Categories
Notes Old site

Spreadsheet error rates

Think about it for a moment. Do 10% of spreadsheets contain errors? Or 20% (for the pessimists among you)? These rates are high, and should be enough to make alarm bells ring, but the actual rates are probably far higher.

A few years ago Professor Ray Panko, at the University of Hawaii, pulled together the available evidence from field audits of spreadsheets. These are the results he shows:

Study Number of
spreadsheets
Number with
errors
Percentage
with errors
Coopers & Lybrand, 1997 23 21 91%
KPMG, 1997 22 20 91%
Lukasic, 1998 2 2 100%
Butler (HMCE), 2000 7 6 86%
Total 54 49 91%

More recently Lawrence and Lee analysed 30 project financing spreadsheets. All 30 had errors; the error rate was 100%.

It’s difficult to know how to interpret these results. They are certainly very high numbers, and send a chill down my spine. However, in terms of all the spreadsheets out there in the world, these error rates may be:

Understated
because not all the errors were caught in the audit. Spreadsheet reviewers and auditors are subject to human error like the rest of us, and depending on how long they spent on the audit may well have missed some of the errors.
because the sample of spreadsheets chosen for audit was biased. Possibly only those that were considered to be most important, and over which the greatest care had been taken, were selected.
Overstated
because the sample of spreadsheets chosen for audit was biased. Possibly only those that were considered to be most likely to have errors in were selected.
Not comparable
because different definitions of significant errors were used in the different studies.

Cell error rates

Other studies surveyed by Panko show that the error rate per cell is between 0.38% and 21%. These results are difficult to interpret: are they percentages of all cells, cells containing formulae, or unique formulae? (If a formula is copied down a row or column, it may count as many formula cells, but is only one unique formula). If we assume a rate of 1% of unique formulae having errors, and look at spreadsheets containing from 150 to 350 unique formulae, we find that the probability of an individual spreadsheet containing an error is between 78% and 97%. This is (obviously) a high number, but is reasonably consistent with the field audit results discussed above.

Lawrence and Lee found that 30% of the spreadsheets they reviewed had errors in over 10% of unique formulae; one spreadsheet had errors in more than one in five of unique formulae. Interestingly, this was the smallest spreadsheet, showing that error rates don’t necessarily increase with complexity.

Self confidence

To make matters worse, people tend to overestimate their own capabilities. Panko describes an experiment in which people were asked to say whether they thought spreadsheets that they had developed contained errors. On the basis on their responses, about 18% of the spreadsheets would have been wrong; the true figure was 81%. The actuary who told me “As far as I am concerned, none of my spreadsheets has ever had a bug in it” was probably deluding himself.

One source of this over confidence is probably lack of testing and thorough review. If you don’t think that your spreadsheet has errors in, you may not bother testing it, and so never find the errors. Nothing ever happens to make you revise your view.

Summary

It’s extremely likely that a large proportion of spreadsheets contain errors. People don’t realise just how large that proportion is, and also have misplaced confidence in their own spreadsheets.

Categories
Notes Old site

Provident Financial modelling problem

On 6th March 2003 Provident Financial Group of Cincinnati announced a restatement of its results for the five financial years from 1997 to 2002. Between 1997 and 1999 Provident created nine pools of car leases. Part of the financial restatement was because the leases were treated off balance sheet, rather than on balance sheet as was later thought to be appropriate. But there was also a significant restatement of earnings, because there was a mistake in the model that calculated the debt amortisation for the leases. It appears that the analysts who built the model used for the first pool “put in the wrong value, and they didn’t accrue enough interest expense over the deal term. The first model that was put together had the problem, and that got carried through the other eight,” according to the Chief Financial Officer, who also went on to say that he did not think other banks had made similar errors. “We made such a unique mistake here that I think it’s unlikely.”

It appears that the error was found when Provident introduced a new financial model that was tested against the original, and that the two models produced different results. They then went back and looked at the original model to see which one was correct. We don’t know that these were spreadsheet models, but it’s entirely possible. And the lack of testing may have led to earned income being overstated by $70 million over five years. Provident also faces a class action suit from investors.

If I am right, and the erroneous model was a spreadsheet (and from the fact that those who built it were referred as “analysts” rather than “programmers” or “developers” some sort of user-developed software seems likely), this is a classic example of a spreadsheet being built as a one-off and then reused without adequate controls. Later pools must have used a different spreadsheet, as they were not subject to the same restatement.

The CFO has more confidence than I do in the ability of other banks to avoid similar errors.

See the press release from Provident, and press coverage from the Cincinnati Post and New York Times.