Statistically speaking…

Numbers are often perceived as a sign of respectability. Press releases often include them — it seems so much more believable to say 75.4% of people do such-and-such than to say many or even most people. Quote a specific percentage and people tend to believe it.

The trouble is, the numbers we see in the press are often misleading or just plain wrong. Some recent sources of error include:

Journalists writing the story have not fully understood the press release, or the writers of the press release didn’t understand the original results. A common area of confusion is the significance of quoted results, and what that really means. There’s a really good Understanding Uncertainty blog on this. In summary:

Take Paul the Octopus, who correctly predicted 8 football results in a row, which is unlikely (probability 1/256), due to chance. Is it reasonable to say that these results are unlikely to be due to chance (in other words that Paul is psychic)? Of course not, and nobody said this at the time, even after this 2.5 sigma event. So why do they say it about the Higgs Boson?

The numbers being compared aren’t like for like. There’s a good Understanding Uncertainty blog on this one, too (it’s an excellent website!). The recent news that Brits are more obese than other Europeans is a case in point: first, the figures for most countries are for people aged 18 and over, but for the Brits (who are in fact, in this case, just the English) are for people aged 16 and over; and second, the data for most countries is based on asking people what they weigh and how tall they are, but the English data is based on actual measurements. And guess what? People don’t always tell the absolute truth when asked how heavy they are.
People, and possibly especially journalists, are really unwilling to believe that phenomena are due to chance rather than to causality. I’ve written about this before. For instance, all those stories in the press about such-and-such a local authority being a black spot for whatever health risk is top of the list on that day: often due simply to random variation. In brief, a smaller population is quite likely to have results relatively far from the mean. It’s very easy to over interpret results.

People aren’t always very good at understanding percentages, either, and in particular the difference between percentages and percentage points. And people are really bad at understanding probabilities and risks:

The trouble is, many of us struggle with understanding risk. I realised how tenuous my grasp of risk was when I noticed that 1 in 20 sounded a bigger risk to me, than 5 percent (yes, they’re exactly the same). Representing risk so that people can get a true understanding of it is an art as well as a science.

Which is why giving children lessons in gambling may not be a stupid idea.

There are many people out there doing their best to introduce some sanity into the world. The Understanding Uncertainty website is consistently interesting and well written (have I mentioned that before?), Ben Goldacre has lots of useful stuff, the Guardian’s datablog is just starting a series on statistics (the first article explains samples and how bias can skew results), and Straight Statistics is also well worth a look.

One reply on “Statistically speaking…”