I just can’t resist this one. Vista Equity Partners is paying around $100 million less than expected for Tibco Software Inc because Goldman Sachs got the number of shares wrong in the spreadsheet that did all the calculations. OK, $100 million isn’t much in the context of a $4 billion deal, but it’s an awful lot of money in any other context. But only just over twice Goldman’s fees to Tibco for the transaction. It’s not clear how the mistake arose.
Tag: spreadsheet errors
The right tool?
The next time you notice something being done in Excel where you work, take a moment to question whether it’s the right tool for the job, or whether you or someone in your organisation is a tool for allowing its use.
No, not my words, but from the FT’s consistently excellent Alphaville blog. The point is, it’s easy to use Excel. But it’s very hard to use Excel well.
There are many people out there who can use Excel to solve a problem. They knock up a spreadsheet with a few clicks of the mouse, some dragging and dropping, a few whizzo functions, some spiffy charts, and it all looks really slick. But what if anything needs to be changed? Sensitivity testing? And how do you know you got it right in the first place? Building spreadsheets is an area in which people are overconfident of their abilities, and tend to think that nothing can go wrong.
Instead of automatically reaching for the mouse, why not Stop Clicking, Start Typing?
But we won’t. There’s a huge tendency to avoid learning new things, and everyone thinks they know how to use Excel. The trouble is, they know how to use it for displaying data (mostly), and don’t realise that what they are really doing is computer programming. A friend of mine works with a bunch of biologists, and says
I spend most of my time doing statistical analyses and developing new statistical methods in R. Then the biologists stomp all over it with Excel, trash their own data, get horribly confused, and complain that “they’re not programmers” so they won’t use R.
But that’s the problem. They are programmers, whether they like it or not.
Personally, I don’t think that things will change. We’ll keep on using Excel, there will keep on being major errors in what we do, and we’ll continue to throw up our hands in horror. But it’s rather depressing — it’s so damned inefficient, if nothing else.
There’s a bit of a furore going on at the moment: it turns out that a controversial paper in the debate about the after-effects of the financial crisis had some peculiarities in its data analysis.
Rortybomb has a great description, and the FT’s Alphaville and Tyler Cowen have interesting comments.
In summary, back in 2010 Carmen Reinhart and Kenneth Rogoff published a paper Growth in a time of debt in which they claim that “median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise; average (mean) growth rates are several percent lower.” Reinhart and Rogoff didn’t release the data they used for their analysis. Since then, apparently, people have tried and failed to reproduce the analysis that gave this result.
Now, a paper has been released that does reproduce the result: Herndon, Ash and Pollin’s Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff,
Except that it doesn’t, really. Herndon, Ash and Pollin identify three issues with Reinhart and Rogoff’s analysis, which mean that the result is not quite what it seems at first glance. It’s all to do with the weighted average that R&R use for the growth rates.
First, there are data sets for 20 countries covering the period 1946-2009. R&R exclude data for three countries for the first few years. It turns out that those three countries had high debt levels and solid growth in the omitted periods. R&R didn’t explain these exclusions.
Second, the weights for the averaging aren’t straightforward (or, possibly, they are too straightforward). Rortybomb has a good explanation:
Reinhart-Rogoff divides country years into debt-to-GDP buckets. They then take the average real growth for each country within the buckets. So the growth rate of the 19 years that the U.K. is above 90 percent debt-to-GDP are averaged into one number. These country numbers are then averaged, equally by country, to calculate the average real GDP growth weight.
In case that didn’t make sense, let’s look at an example. The U.K. has 19 years (1946-1964) above 90 percent debt-to-GDP with an average 2.4 percent growth rate. New Zealand has one year in their sample above 90 percent debt-to-GDP with a growth rate of -7.6. These two numbers, 2.4 and -7.6 percent, are given equal weight in the final calculation, as they average the countries equally. Even though there are 19 times as many data points for the U.K.
Third, there was an Excel error in the averaging. A formula omits five rows. Again, Rortybomb has a good picture:
Oops!
So, in summary, the weighted average omits some years, some countries, and isn’t weighted in the expected way. It doesn’t seem to me that any one of these is the odd man out, and I don’t think it really matters why either of the omissions occurred: in other words, I don’t think this is a major story about an Excel error.
I do think, though, that it’s an excellent example of something I’ve been worried about for some time: should you believe claims in published papers, when the claims are based on data analysis or modelling?
Let’s consider another, hypothetical, example. Someone’s modelled, say, the effects of differing capital levels on bank solvency in a financial crisis. There’s a beautifully argued paper, full of elaborate equations specifying interactions between this, that and the other. Everyone agrees that the equations are the bee’s knees, and appear to make sense. The paper presents results from running a model based on the equations. How do you know whether the model does actually implement all the spiffy equations correctly? By the way, I don’t think it makes any difference whether or not the papers are peer reviewed. It’s not my experience that peer reviewers check the code.
In most cases, you just can’t tell, and have to take the results on trust. This worries me. Excel errors are notorious. And there’s no reason to think that other models are error-free, either. I’m always finding bugs in people’s programs.
Transparency is really the only solution. Data should be made available, as should the source code of any models used. It’s not the full answer, of course, as there’s then the question of whether anyone has bothered to check the transparently provided information. And, if they have, what they can do to disseminate the results. Obviously for an influential paper like the R&R paper, any confirmation that the results are reproducible or otherwise is likely to be published itself, and enough people will be interested that the outcome will become widely known. But there’s no generally applicable way of doing it.
In January 2006 it was discovered that there was a deficit of $5 million in the budget for Nevada City. The budget spreadsheet was the same as the one for 2005, updated for 2006. Apparently it was working correctly until sometime in late December, when it developed a problem. The difference was a $5M deficit in the water and sewers fund, that wasn’t present on 14 December but was there in the version discussed at a meeting on 3 January. Evidently some councillors noticed the change. The finance director hadn’t noticed, as he had a printed version that didn’t show the error. Once the error was noticed, it took the finance director about a day to fix it. As he was doing so, he found other errors.
Obviously the spreadsheet had changed after it had been printed out. We have no information about how the error was introduced. We are not told whether the other errors that were found and corrected were actually in the 2005 version of the spreadsheet, but it wouldn’t surprise me.
It appears that the actual Excel file was sent round to the councillors, and indeed posted on the city’s web site. This raises other concerns, unless it was protected so that it couldn’t be changed.
Resources
The following external links are relevant:
In November 2005 Kodak restated its Q3 results by $9 million. The restatement was attributed to restructuring and severance costs, plus a real estate gain. The restructuring costs were apparently because they got the accounting treatment wrong; we were told that “the magnitude of the worldwide restructuring program the company is undertaking imposes significant challenges to ensure the appropriate accounting”
There was also an error of $11 million in the severance calculation for just one employee. The error was traced to a faulty spreadsheet: apparently “too many zeros were added to accrued severance”. No payment was actually made to the employee in question (which was lucky for Kodak, but maybe unlucky for the employee). It sounds as if the error was either a simple data entry problem, or, possibly more likely, that the spreadsheet was expecting an entry in $’000 but got one in $.
This could be yet another example of a spreadsheet that is theoretically correct, but is not easy to use. If the wrong
information is used for the calculations, then the answers will be wrong: it’s our old friend, Garbage In, Garbage Out.
Resources
The following external links are relevant:
In 2004 a paper on “Financial Modelling of Project Financing Transactions” was presented to the Institute of Actuaries of Australia Financial Services Forum. It’s well worth a read if you are involved in any sort of financial modelling, whether or not you are an actuary and whether or not you are modelling project financing.
The paper includes an analysis of the risks of models and some ideas for managing the risks, gives a clear introduction to Monte Carlo simulation and why you might use it, and has a section on why actuaries might be good people for modelling project financing.
It also includes some statistics on the error rates the authors have found in spreadsheet models of project financing. The authors say “Research has shown that error rates in project financing models can be as high as 10%. Section 5 of this paper provides some statistics on error rates collected by Mercer Finance & Risk Consulting. Out of the thirty highest value projects reviewed during the 2004 financial year, nine (that is, 30%) exceeded the 10% threshold; four exceeded the 15% threshold; and one exceeded the 20% threshold.”
The wording may be misleading here. They are not saying that, for example, 10% of the models have errors. In fact, all the models (100%) that they reviewed contained errors. They are saying that in four of the models they reviewed over 15% of the unique spreadsheet formulae contained errors, and that one model had errors in over one in five of the formulae. This model was one of the smaller ones, too, so it’s no use saying “it’s only a small model, so it’ll be OK”.
Although the spreadsheets they reviewed were all modelling project financing, there is absolutely no reason to suppose that the high error rates are peculiar to the project finance field. Financial models of any sort are complex, and it’s hard (but not impossible) to write a spreadsheet that doesn’t contain errors.
So let me say, once again, that it’s important to get the process right when developing financial models (whether using a spreadsheet or specialist modelling software). Be clear what it is that you want the model to do: write a specification that is detailed enough to test against. Use appropriate techniques when building the model: something that looks like a really cool way of doing things may be difficult for other people to understand. Document the design decisions you make. Use a good change control process to keep track of what’s going on. Test the implementation against your specification. Record the tests, so that other people have some reason to believe you when you say the system has been tested. And, above all, don’t trust yourself. You are bound to make mistakes in the coding, and if you don’t look for them you won’t find them.
Resources
The following external links are relevant:
It appears that a bug crept into the final release of Excel 2003 that wasn’t present in the betas. The RAND() function in Excel 2003 was apparently upgraded so that it produced a better distribution of pseudo-random numbers between 0 and 1. The trouble is that the numbers it produces may be more random, but aren’t always in the correct range; sometimes they are negative. The problem affects both the RAND() and RANDBETWEEN() functions.
Microsoft have now (January 2004) released a hotfix (their term, not mine) that they claim fixes the problem. They also claim that it fixes several other problems, a number of which they had not previously mentioned. Some of these problems cause Excel to quit unexpectedly; at least it’s obvious to the user when this happens (although you may lose your work). Others are more subtle:
- Sometimes the cells in a range are not actually updated when the range is recalculated.
- When you use a VBA macro to calculate your worksheet, a custom function from a different worksheet may appear to run.
- When you create multilevel subtotals for your data in an Excel 2003 worksheet, the totals may appear staggered incorrectly, and may exclude grand totals for some functions.
The hotfix is not downloadable. You have to contact Microsoft and convince them that you need it. Also, the installation process includes editing the registry by hand.
In October 2003, about 2 weeks after releasing their third quarter earnings figures, Fannie Mae had to restate their unrealised gains by $1.2 billion. This was apparently the result of “honest mistakes made in a spreadsheet used in the implementation of a new accounting standard.” Honest mistake or not, $1.2 billion is a lot of money: more than the $70 million of Provident Financial in March, or the $24 million lost by TransAlta in June. It’s reasonably common to see errors of half a million or so, but they don’t usually make the headlines.
Apparently Fannie Mae picked up the error as part of the normal processes of preparing their financial statements for filing. Presumably they failed to pick it up as part of their review process before issuing the earnings statement. They claim that the event demonstrates that their accounting processes and controls work as they should.
Better late than never, I suppose, but I can’t help thinking that their processes and controls should have picked up the problem at an earlier stage. We don’t know whether the mistake was in the model or the implementation (ie, whether they had understood the accounting standard correctly but had made a mistake in the implementation of that understanding, or whether they had misunderstood the new accounting standard). It’s entirely possible that their reviewing processes don’t separate the two issues, thus making it harder to find either kind of mistake.
Let me know if you’d like any of your spreadsheets reviewed, or if you are not sure that your processes and controls are as effective as Fannie Mae’s. Fannie Mae apparently continue to be proud of theirs, so self-confidence isn’t necessarily a foolproof guide.
Resources
The following external links are relevant:
Next time you paste some figures into a spreadsheet, be afraid. Be very afraid. Someone at TransAlta Corporation managed to lose $24m (Canadian) through a “clerical error” in pasting into an Excel spreadsheet. The spreadsheet in question was used to submit bids to the New York Independent System Operator (New York ISO) for May 2003 transmission congestion contracts (TCCs). It’s not really important to understand what TCCs are: the crux of the problem is that you put bids in at the end of one month that affect the prices you pay during the next. Once submitted, the bids can’t be retracted. Because of the spreadsheet error, TransAlta ended up with more TCCs than they wanted, and at higher prices than they intended. They didn’t spot the error until they were notified that their bids had been successful.
There are at least two operational failures evident in this story.
First, the whole thing was an accident waiting to happen if pasting numbers into the spreadsheet was part of the normal process of preparing these bids. Manual pasting is notoriously prone to errors. If data has to be transferred between spreadsheets, automatic links (used with appropriate precautions) or automated copying and pasting via macros (ditto) are much safer. If the pasting was unusual, then extra care should have been taken and more thorough testing should have been performed.
Which brings us to the second failure. How come the error wasn’t found before the bids were submitted? The answer has got to be that there was inadequate review and testing. I suspect that there weren’t formal tests, and that the reviewing was not systematic. It isn’t difficult to devise effective reviewing procedures that should pick up this sort of problem (we are told that the error was caused by mismatching bids and prices, which implies that the error was due to misaligning the pasted region).
See the transcript of the investor conference at which the error was explained for more details.
Resources
The following external links are relevant:
In June 2003 a pasting error was discovered before it was too late. The error came to light in a university examiners meeting: the spreadsheet containing the marks for a small joint degree (3 or 4 students) was found to be wrong. The numbers just didn’t make sense, as the average marks were inconsistent with the range of individual marks. The numbers were recalculated by hand before the end of the examiners meeting, and no harm was done, let alone $24m worth (see the note on TransAlta Corporation).
The problem was caused (like so many others) by copying and pasting: the formulae were correct in the first row, but not in the later rows. History doesn’t relate, but presumably it was a question of relative versus absolute references.
The spreadsheet had actually been checked before the meeting by the Chair of one of the departments involved. Unfortunately, the check had been confined to the first row, which was indeed correct.
There are several morals to this story.
- Be very afraid when copying and pasting
- Don’t rely on checking the formulae, look at overall reasonableness too
- If you’re checking a representative sample of formulae, remember that the first row (or first column) probably isn’t representative.
Resources
The following external links are relevant: