Interesting Software

Reinhart and Rogoff: was Excel the problem?

There’s a bit of a furore going on at the moment: it turns out that a controversial paper in the debate about the after-effects of the financial crisis had some peculiarities in its data analysis.

Rortybomb has a great description, and the FT’s Alphaville and Tyler Cowen have interesting comments.

In summary, back in 2010 Carmen Reinhart and Kenneth Rogoff published a paper Growth in a time of debt in which they claim that “median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise; average (mean) growth rates are several percent lower.” Reinhart and Rogoff didn’t release the data they used for their analysis. Since then, apparently, people have tried and failed to reproduce the analysis that gave this result.

Now, a paper has been released that does reproduce the result: Herndon, Ash and Pollin’s Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff,

Except that it doesn’t, really. Herndon, Ash and Pollin identify three issues with Reinhart and Rogoff’s analysis, which mean that the result is not quite what it seems at first glance. It’s all to do with the weighted average that R&R use for the growth rates.

First, there are data sets for 20 countries covering the period 1946-2009. R&R exclude data for three countries for the first few years. It turns out that those three countries had high debt levels and solid growth in the omitted periods. R&R didn’t explain these exclusions.

Second, the weights for the averaging aren’t straightforward (or, possibly, they are too straightforward). Rortybomb has a good explanation:

Reinhart-Rogoff divides country years into debt-to-GDP buckets. They then take the average real growth for each country within the buckets. So the growth rate of the 19 years that the U.K. is above 90 percent debt-to-GDP are averaged into one number. These country numbers are then averaged, equally by country, to calculate the average real GDP growth weight.

In case that didn’t make sense, let’s look at an example. The U.K. has 19 years (1946-1964) above 90 percent debt-to-GDP with an average 2.4 percent growth rate. New Zealand has one year in their sample above 90 percent debt-to-GDP with a growth rate of -7.6. These two numbers, 2.4 and -7.6 percent, are given equal weight in the final calculation, as they average the countries equally. Even though there are 19 times as many data points for the U.K.

Third, there was an Excel error in the averaging. A formula omits five rows. Again, Rortybomb has a good picture:



So, in summary, the weighted average omits some years, some countries, and isn’t weighted in the expected way. It doesn’t seem to me that any one of these is the odd man out, and I don’t think it really matters why either of the omissions occurred: in other words, I don’t think this is a major story about an Excel error.

I do think, though, that it’s an excellent example of something I’ve been worried about for some time: should you believe claims in published papers, when the claims are based on data analysis or modelling?

Let’s consider another, hypothetical, example. Someone’s modelled, say, the effects of differing capital levels on bank solvency in a financial crisis. There’s a beautifully argued paper, full of elaborate equations specifying interactions between this, that and the other. Everyone agrees that the equations are the bee’s knees, and appear to make sense. The paper presents results from running a model based on the equations. How do you know whether the model does actually implement all the spiffy equations correctly? By the way, I don’t think it makes any difference whether or not the papers are peer reviewed. It’s not my experience that peer reviewers check the code.

In most cases, you just can’t tell, and have to take the results on trust. This worries me. Excel errors are notorious. And there’s no reason to think that other models are error-free, either. I’m always finding bugs in people’s programs.

Transparency is really the only solution. Data should be made available, as should the source code of any models used. It’s not the full answer, of course, as there’s then the question of whether anyone has bothered to check the transparently provided information. And, if they have, what they can do to disseminate the results. Obviously for an influential paper like the R&R paper, any confirmation that the results are reproducible or otherwise is likely to be published itself, and enough people will be interested that the outcome will become widely known. But there’s no generally applicable way of doing it.

risk management

Are VaRs normal?

An article in the FT’s recent special report on Finance and Insurance started from the premise that VaR models were a significant factor in landing banks with huge losses in the wake of the collapse of the US housing market, and went on to discuss how new models are being developed to overcome some of their limitations. Part of the point is valid — there were many models that didn’t predict the collapse. But the article is positively misleading in places. For instance, it implies that VaR models are based on the normal distribution:

VaR models forecast profit and loss, at a certain confidence level, based on a bell-shaped, or “normal”, distribution of probabilities.

In fact Value at Risk, or VaR, is a statistical estimate that can be based on any distribution. And it’s pretty obvious that for many financial applications a normal distribution would be inappropriate. The people who develop these risk models are pretty bright, and that won’t have escaped them. The real problem is that it’s difficult to work out what would be a good distribution to use — or, more accurately, it’s difficult to parameterise the distribution. To get an accurate figure for a VaR you need to know the shape of the distribution out in the tails. And for that, you need data. But by definition, the situations out in the tails aren’t encountered very often, so there’s not much data. And that applies whatever the distribution you’re using. So simply moving away from the normal distribution to something a bit more sexy isn’t necessarily going to make a huge difference to the accuracy of the models.

The article goes on to discuss the use of Monte Carlo models to calculate VaR. Monte Carlo models are useful if the mathematics of the distribution you are using don’t lend themselves to simple analytic solutions. But they don’t stop you having to know the shape of the distribution out in the tails. So they do help extend the range of distributions that can usefully be used, but it’s still a VaR model.

And that’s another problem entirely. VaR, like any other statistical estimate (such as mean, median or variance) is just a single number that summarises one aspect of a complex situation. Given a probability and a time period, it gives you a threshold value for your loss (or profit —  but in risk management applications, it’s usually the loss that’s of interest). So you can say, for instance, that there’s a .5% chance that, over a period of one year, your loss will be £100m or more. But it doesn’t tell you how much more than the threshold value your loss could be — £200m? £2bn?

And it’s a statistical estimate, too. .5% may seem very unlikely, but it can happen.

I wouldn’t disagree that a reliance on VaR models contributed to banks’ losses, but I would express it more as an over-reliance on models, full stop. It’s really difficult to model things out in the tails, whatever type of model you are using.

risk management

Unintended consequences

Facebook bans at work are apparently linked to increased security breaches. It seems that strict policies on social networking sites are “actually forcing users to access non-trusted sites and use tech devices that are not monitored or controlled by the company security program.” People are very adaptable, and often very determined. If you stop them doing something one way, they’ll find another. Computer security is really difficult, as it’s by no means a matter only of technology: human nature is a major factor, and often more easily predicted with the benefit of hindsight.

For instance, Bruce Schneier points out that if something’s protected with heavy security, it’s obviously worth stealing. It’s the converse of Poe’s The Purloined Letter, in which the best hiding place is in full view. Does this apply to computer systems?


Software risks: testing might help (or not)

It’s good to test your software. That’s pretty much a given, as far as I’m concerned. If you don’t test it, you can’t tell whether it will work. It seems pretty obvious.

It also seems pretty obvious that a) you shouldn’t use test data in a live system, b) in order to test whether it’s doing the right thing, you have to know what the right thing is and c) your system should cope in reasonable ways with reasonably common situations.

If you use test data in a live system there’s a big risk that the test data will be mistaken for real data and give the wrong results to users. If you label all the test data as being different, or if it’s unlike real data in some other way, so that it can’t be confused with the real stuff, there’s a risk that the labelling will change the behaviour of the system, so the test becomes invalid. Because of this, most testing takes place before a system actually goes live. That’s all very well, unless a system’s outputs depend on the data that it’s used in the past. In that case you need to make sure that the actual system that goes live isn’t contaminated in any way by test data, otherwise you could, to take an example at random, accidentally downgrade France’s credit rating.

There’s a possibility that if you don’t have a full specification of a system, your testing will be incomplete. Well, it’s more of a certainty, really. This becomes an especial problem if you are buying a system (or component) in. If you don’t know exactly how it’s meant to behave in all circumstances, you can’t tell whether it’s working or not. It’s not really an answer just to try it out in a wide variety of situations, and assume it will behave the same way in similar situations in the future, because you don’t know precisely what differences might be significant and result in unexpected behaviours. The trouble is, the supplier may be concerned that a fully detailed specification might enable you to reverse engineer  the bought-in system, and would thus endanger their intellectual property rights. There’s a theory that this might actually have happened with the Chinese high speed rail network, which has had some serious accidents in the last year or so.

It can’t be that uncommon that when people go online to enter an actual meter reading, because the estimated reading is wrong, the actual reading is less than the estimated one. In fact, that’s probably why most people bother to enter their readings. So assuming that meter has gone round the clock, through 9999 units, to get from the old reading to the new one, doesn’t seem like a good idea. The article explains the full story — you can only enter a reduced reading on the Southern Electric site within 2 weeks of the date of the original one. But the time limit isn’t made clear to users, and not getting round to something within 2 weeks is, in my experience, far from unusual. Some testing from the user point of view would surely have been useful.


risk management Uncertainty

Confidence and causality

Ok, it’s a bit trite, but human behaviour is really important, and a good understanding of human behaviour is a goal for people in many different fields. Marketing, education and social policy all seek to influence our behaviour in different ways and for different purposes — that’s surely what the whole Nudge thing is all about, for a start. Economists have traditionally taken a pretty simplistic view: homo economicus seems to have a very narrow view of the utility function he (and it is often he) is trying to maximise.

Psychologists have known for some time that real life just isn’t that simple. Daniel Kahneman and Amos Tversky first published some of their work on how people make “irrational” economic choices in the early 1970s, and since then the idea of irrationality has been widely accepted. It’s now well known that we have many behavioural biases: the trouble is, what do we do with the knowledge? It’s difficult to incorporate it into economic or financial models (or indeed other behavioural models): it’s often possible to model one or two biases, but not the whole raft. Which means that models that rely, directly or indirectly, on assumptions about peoples’ behaviour can be spectacularly unreliable.

Kahneman, who won the 2002 Nobel Memorial Prize in Economics (Tversky died in 1996) has written in a recent article about the dangers of over confidence (it’s well worth a read). One thing that comes out of it for me is how much people want to be able to ascribe causality: saying that variations are just random variations, rather than being because of people’s skill at picking investments, or some environmental or social effect on bowel cancer, is not a common reaction, and indeed is often resisted.

It’s something we should think about when judging how much reliance to place on the results of our models. When I build a model, I naturally think I’ve done a good job, and I’m confident that  it’s useful. And if, in due course, it turns out to make reasonable predictions, I’m positive that it’s because of my skill in building it. But, just by chance, my model is likely to be right some of the time anyway. It may never be right again.

Data risk management

Fiddling the figures: Benford reveals all

Well, some of it, anyway. There’s been quite a lot of coverage in on the web recently about Benford’s law and the Greek debt crisis.

As I’m sure you remember, Benford’s law says that in lists of numbers from many real life sources of data, the leading digit isn’t uniformly distributed. In fact, around 30% of leading digits are 1, while fewer than 5% are 9. The phenomenon has been known for some time, and is often used to detect possible fraud – if people are cooking the books, they don’t usually get the distributions right.

It’s been in the news because it turns out that the macroeconomic data reported by Greece shows the greatest deviation from Benford’s law among all euro states (hat tip Marginal Revolution).

There was also a possible result that the numbers in published accounts in the financial industry deviated more from Benford’s law now than they used to. But it now appears that the analysis may be faulty.

How else can Benford’s law be used? What about testing the results of stochastic modelling, for example? If the phenomena we are trying to model are ones for which Benford’s law works, then the results of the model should comply too.

risk management

Arbitrage or speculation?

A good piece in The Register about the difference between arbitrage and speculation, and how it can all go horribly wrong.

risk management

Going rogue

An interesting (if slightly tongue in cheek) piece in The Register about rogue traders. IT competence is an important transferable skill!

risk management

It’s not enough!

A very interesting column  from the FT’s Andrew Hill today, arguing that risk controls and processes on their own aren’t enough.

“The $2.3bn trading scandal at UBS … makes me wonder whether corporate boards have fully appreciated the risks of relying on risk managers.”

“Companies that put all their trust in risk controls – human or technological – foster dangerous complacency at the top.”

He goes on to point out that there are equally serious, but different, risks involved if all responsibility for risk management is in the hands of the boss.

The real point is that a modern corporate enterprise is a complex system, and relying on a single mechanism to manage your risks is just plain stupid. You need redundancy, so that if one control fails, there’s something else that might pick it up. And you need redundancy at all levels, from the detailed systems and controls through to the whole-enterprise level. So as well as good risk managers, and a good risk management system, and a board who are on top of the whole thing, the corporate culture (at all levels) has an important role to play.