Categories
Software

The right tool?

The next time you notice something being done in Excel where you work, take a moment to question whether it’s the right tool for the job, or whether you or someone in your organisation is a tool for allowing its use.

No, not my words, but from the FT’s consistently excellent Alphaville blog. The point is, it’s easy to use Excel. But it’s very hard to use Excel well.

There are many people out there who can use Excel to solve a problem. They knock up a spreadsheet with a few clicks of the mouse, some dragging and dropping, a few whizzo functions, some spiffy charts, and it all looks really slick. But what if anything needs to be changed? Sensitivity testing? And how do you know you got it right in the first place? Building spreadsheets is an area in which people are overconfident of their abilities, and tend to think that nothing can go wrong.

Instead of automatically reaching for the mouse, why not Stop Clicking, Start Typing?

But we won’t. There’s a huge tendency to avoid learning new things, and everyone thinks they know how to use Excel. The trouble is, they know how to use it for displaying data (mostly), and don’t realise that what they are really doing is computer programming. A friend of mine works with a bunch of biologists, and says

I spend most of my time doing statistical analyses and developing new statistical methods in R. Then the biologists stomp all over it with Excel, trash their own data, get horribly confused, and complain that “they’re not programmers” so they won’t use R.

But that’s the problem. They are programmers, whether they like it or not.

Personally, I don’t think that things will change. We’ll keep on using Excel, there will keep on being major errors in what we do, and we’ll continue to throw up our hands in horror. But it’s rather depressing — it’s so damned inefficient, if nothing else.

 

Categories
Notes Old site

What is a bug?

In computing parlance, unlike normal life, bugs and viruses have nothing to do with each other. A bug is simply a fault, or error, while a virus is a malicious program that propagates from computer to computer by hiding itself inside another program or document.

Legend has it that the term bug was invented by Grace Murray Hopper, a Rear Admiral in the US Navy, who was one of the pioneers of computing. Early computers were huge machines made of relays and valves and wires and so on; compared to today’s sleek laptops or PDAs they were veritable Heath Robinson contraptions. Anyway, they were open to the atmosphere. Hopper tells the story:

Things were going badly; there was something wrong in one of the circuits of the long glass-enclosed computer. Finally, someone located the trouble spot and, using ordinary tweezers, removed the problem, a two-inch moth. From then on, when anything went wrong with a computer, we said it had bugs in it.

Hopper’s team introduced a new term into computing jargon when they said that they had debugged the machine. However, contrary to popular legend, the term bug had been in use since 1878, or even earlier, when Edison used it to refer to a flaw in a system.

Resources

The following external links are relevant:

Categories
Notes Old site

Spreadsheet error rates

Think about it for a moment. Do 10% of spreadsheets contain errors? Or 20% (for the pessimists among you)? These rates are high, and should be enough to make alarm bells ring, but the actual rates are probably far higher.

A few years ago Professor Ray Panko, at the University of Hawaii, pulled together the available evidence from field audits of spreadsheets. These are the results he shows:

Study Number of
spreadsheets
Number with
errors
Percentage
with errors
Coopers & Lybrand, 1997 23 21 91%
KPMG, 1997 22 20 91%
Lukasic, 1998 2 2 100%
Butler (HMCE), 2000 7 6 86%
Total 54 49 91%

More recently Lawrence and Lee analysed 30 project financing spreadsheets. All 30 had errors; the error rate was 100%.

It’s difficult to know how to interpret these results. They are certainly very high numbers, and send a chill down my spine. However, in terms of all the spreadsheets out there in the world, these error rates may be:

Understated
because not all the errors were caught in the audit. Spreadsheet reviewers and auditors are subject to human error like the rest of us, and depending on how long they spent on the audit may well have missed some of the errors.
because the sample of spreadsheets chosen for audit was biased. Possibly only those that were considered to be most important, and over which the greatest care had been taken, were selected.
Overstated
because the sample of spreadsheets chosen for audit was biased. Possibly only those that were considered to be most likely to have errors in were selected.
Not comparable
because different definitions of significant errors were used in the different studies.

Cell error rates

Other studies surveyed by Panko show that the error rate per cell is between 0.38% and 21%. These results are difficult to interpret: are they percentages of all cells, cells containing formulae, or unique formulae? (If a formula is copied down a row or column, it may count as many formula cells, but is only one unique formula). If we assume a rate of 1% of unique formulae having errors, and look at spreadsheets containing from 150 to 350 unique formulae, we find that the probability of an individual spreadsheet containing an error is between 78% and 97%. This is (obviously) a high number, but is reasonably consistent with the field audit results discussed above.

Lawrence and Lee found that 30% of the spreadsheets they reviewed had errors in over 10% of unique formulae; one spreadsheet had errors in more than one in five of unique formulae. Interestingly, this was the smallest spreadsheet, showing that error rates don’t necessarily increase with complexity.

Self confidence

To make matters worse, people tend to overestimate their own capabilities. Panko describes an experiment in which people were asked to say whether they thought spreadsheets that they had developed contained errors. On the basis on their responses, about 18% of the spreadsheets would have been wrong; the true figure was 81%. The actuary who told me “As far as I am concerned, none of my spreadsheets has ever had a bug in it” was probably deluding himself.

One source of this over confidence is probably lack of testing and thorough review. If you don’t think that your spreadsheet has errors in, you may not bother testing it, and so never find the errors. Nothing ever happens to make you revise your view.

Summary

It’s extremely likely that a large proportion of spreadsheets contain errors. People don’t realise just how large that proportion is, and also have misplaced confidence in their own spreadsheets.

Categories
Notes Old site

The true significance of bugs in spreadsheets

There are many reports of extremely high occurrence rates for bugs in spreadsheets. From reading them, you might think that very few spreadsheets are error-free.

However, many people who are aware of the likelihood of errors in spreadsheets go to great lengths to find and remove them. I have found few significant errors in the often large and complex spreadsheets I have reviewed (mainly in the insurance industry).

In my view the true significance of bugs lies not in their existence, which can lead to spreadsheets producing erroneous results, but in the enormous amount of time and effort that goes into preventing them. Spreadsheets are usually built and maintained by people who have little or no software engineering expertise. These people often:

  • Do not have good software development processes
  • Are not aware of the characteristics of good software and how they apply to spreadsheets
  • Do not know good methods of testing and reviewing software
  • Do not know how to design software (especially spreadsheets) so as to reduce the likelihood of bugs

I believe that the use of simple software engineering techniques can help. Some of these techniques are described, somewhat briefly, in other notes on this site.

Categories
Notes Old site

Availability

There is no point in building a software system unless it is available for use. Availability can be compromised in a number of ways:

  • The software doesn’t run on the equipment available to the users
  • It often crashes
  • The network over which it runs is unreliable

Availability is often an issue with real time systems, but can be significant in other situations too. For example, consider a software system that is used in preparing quarterly financial results, but takes longer than three months to update. If changes are needed, there is a conflict between providing the functionality that is required, and providing any functionality at all. Obviously this example has been exaggerated for effect, but the point is that availability can be affected by maintenance requirements as well as by physical problems.

Categories
Notes Old site

Usefulness

There is no point in building a piece of software if it is not going to be used, and it won’t be used unless it is useful to someone. There is often a temptation to build something just because you can: you can see a really nifty way of doing it, and fancy the challenge or think it would be cool. It is a temptation that should usually be resisted.

This applies at all levels of software development, to individual features as well as to entire software systems.

For example, if there is a simple task that is currently performed manually, and you are thinking of automating it, you should look very carefully at what would be gained. Would the new system be easier or quicker to use, would it provide more information, or would it be less likely to result in errors? Unless the answers to these and similar questions are extremely positive, you should be very wary of going ahead.

You should ask similar questions when considering introducing a system or feature that will provide completely new functionality. Is anyone actually interested in this functionality? Will the benefits of using it outweigh the inconvenience? Remember that you, as the developer or designer, may not fully understand the priorities of the users.

Categories
Notes Old site

Maintainability

Good software must be maintainable. It is very rarely the case that a piece of software is developed once and then never nedds to be changed again during its lifetime. This is especially true of user-developed software such as spreadsheets, whose big advantage is that they are easily modified.

Even if the requirements do not change, any reasonably sized piece of software is most unlikely to work properly at first; changes will have to be made in order to remove bugs. In fact, the whole process of development can be seen as one of maintenance, starting from a situation in which nothing works at all.

In the worst case, a simple change requires the whole piece of software to be rewritten from scratch. In the best case, really quite significant changes to the functionality require only small changes in the code. The secret is to write good code and to document it effectively. In this context, good code is code that uses techniques of abstraction and modularity.

Resources

The following external links are relevant:

Categories
Notes Old site

Usability

One of the criteria for good software is that the software is usable. If it is difficult to use, it will meet one of several fates: either it won’t be used, or it will take much longer to use than it should, or it will be used in the wrong way and will produce invalid results. None of these outcomes is desirable.

When looking at usability, it is important to consider all the possible users, their goals in using the software, and the context in which they use it. For instance, technical terms that are second nature to one group of users might be totally meaningless to others. This is particularly important for user-developed software, such as spreadsheets, where the developers often have a very different outlook to other users.

Usability means making it easy for the user to do the right thing, and difficult for them to do the wrong thing. It can be affected by the physical layout of the interface (whether graphical or text based), the wording of any text, the order in which operations must be performed, and many other factors. One of the trickiest aspects of achieving good usability is that it is often difficult to predict how users will react, especially if you are not a typical user yourself. The only way to get it right is to test the software with real users, and take note of what they say.

Categories
Notes Old site

Criteria for good software

What is the difference between good software and bad software? It is reasonably easy to recognise the extremes when we encounter them, but there is a set of criteria that can be used to explain the differences. These criteria are also useful when building or modifying software, as they can be used to help make design decisions.

Good software has the following characteristics:

  • Correctness: it produces the right results and exhibits the right behaviour.
  • Usability: the user finds it easy to make it perform the desired functions.
  • Maintainability: it can be easily maintained and updated.
  • Availability: it is available (ie, will run) when it is needed.
  • Useful: it performs a task that someone wants performed.

It is often thought that performance should be another criterion, but in my view the need for good performance results from the usability and availability criteria.

These criteria apply to every kind of software, including spreadsheets. Spreadsheets are an interesting case because it is difficult to tell the difference between usability and maintainability, as one of their main uses is to be updated and enhanced.

Resources

The following external links are relevant:

Categories
Notes Old site

Correctness

When we say that a piece of software is correct, we mean that it does the right thing: in other words, it produces the right numbers, and exhibits the right behaviour. This is clearly a Good Thing, and is why correctness is one of the criteria for good software.

In order to tell whether a piece of software is correct, we must know what the right thing is; in other words, its behaviour must be specified. For calculations, this means stating the formulae and algorithms to be used; other behaviours must be described in similar rigorous terms. Behaviours that should be specified include:

  • Changes made to files
  • Changes made to databases
  • User dialogs that should appear
  • What constitutes a valid input
  • What should happen when an invalid input is encountered
  • What should happen when an error is encountered (eg, a file can’t be found)

The only way of telling whether software is in fact correct is by testing. And you can only test against the specification. So it is absolutely vital to specify your software as thoroughly as you can, and then to test it equally thoroughly.