Categories
Uncertainty

Statistically speaking…

Numbers are often perceived as a sign of respectability. Press releases often include them — it seems so much more believable to say 75.4% of people do such-and-such than to say many or even most people. Quote a specific percentage and people tend to believe it.

The trouble is, the numbers we see in the press are often misleading or just plain wrong. Some recent sources of error include:

  • Journalists writing the story have not fully understood the press release, or the writers of the press release didn’t understand the original results. A common area of confusion is the significance of quoted results, and what that really means. There’s a really good Understanding Uncertainty blog on this. In summary:

Take Paul the Octopus, who correctly predicted 8 football results in a row, which is unlikely (probability 1/256), due to chance. Is it reasonable to say that these results are unlikely to be due to chance (in other words that Paul is psychic)? Of course not, and nobody said this at the time, even after this 2.5 sigma event. So why do they say it about the Higgs Boson?

  • The numbers being compared aren’t like for like. There’s a good Understanding Uncertainty blog on this one, too (it’s an excellent website!). The recent news that Brits are more obese than other Europeans is a case in point: first, the figures for most countries are for people aged 18 and over, but for the Brits (who are in fact, in this case, just the English) are for people aged 16 and over; and second, the data for most countries is based on asking people what they weigh and how tall they are, but the English data is based on actual measurements. And guess what? People don’t always tell the absolute truth when asked how heavy they are.
  • People, and possibly especially journalists, are really unwilling to believe that phenomena are due to chance rather than to causality. I’ve written about this before. For instance, all those stories in the press about such-and-such a local authority being a black spot for whatever health risk is top of the list on that day: often due simply to random variation. In brief, a smaller population is quite likely to have results relatively far from the mean. It’s very easy to over interpret results.

People aren’t always very good at understanding percentages, either, and in particular the difference between percentages and percentage points. And people are really bad at understanding probabilities and risks:

The trouble is, many of us struggle with understanding risk. I realised how tenuous my grasp of risk was when I noticed that 1 in 20 sounded a bigger risk to me, than 5 percent (yes, they’re exactly the same). Representing risk so that people can get a true understanding of it is an art as well as a science.

Which is why giving children lessons in gambling may not be a stupid idea.

There are many people out there doing their best to introduce some sanity into the world. The Understanding Uncertainty website is consistently interesting and well written (have I mentioned that before?), Ben Goldacre has lots of useful stuff, the Guardian’s datablog is just starting a series on statistics (the first article explains samples and how bias can skew results), and Straight Statistics is also well worth a look.

Categories
Interesting

Interesting links

There’s some cool stuff here:

  1. Are shredders still useful? From the results of a recent DARPA unshredding contest, I’d say they mostly are. Hat tip Bruce Schneier.
  2. Good arguments for transparency in the corporate world.
  3. An economist would say “what would I do if I were a horse?
  4. One of the best advent calendars on the web.
  5. The PC is dead.
Categories
Uncertainty

How can you tell?

It’s well known that people are very keen to find causality in the world, and reluctant to accept that a lot of what goes on is just random. Those of us who’ve been educated properly know that correlation is not causation, but it’s sometimes difficult to put that into practice.

There are some common examples. People who have lucky mascots, or rituals they go through before special events, such as exams or sports matches they are taking part in (or watching, or teams they support are taking part in). To my mind, many business books that use the argument “XYZ Corp did really well under Pat as CEO, Pat has these character traits and behaviour patterns, so if you can develop the same traits and patterns your business will do really well too” are along the same lines. You need a very strong argument that causality is actually present before believing it.

It’s the argument against active fund management, too: if you take a large number managers, one of them will come out top over a year, or indeed over any period you like to mention, even if their performance isn’t affected by their skill. So saying that so-and-so has had consistently good results isn’t a very strong argument that they are actually better at it than anyone else, as opposed to luckier than most other people.

However, even if some fund managers are genuinely skilled, it’s possible for unskilled ones to mimic their performance. Tim Harford explains it well:

… it is possible for an unskilled fund manager to mimic a genuinely skilled one, in the same way that an insect might mimic a leaf, or a harmless creature mimic a poisonous one.

This mimicry, too, involves three steps: first, invest all your funds in whatever benchmark you need to beat, whether it’s treasury bills or a stock market index; second, make a bet that some unlikely event will not come to pass using the invested funds as security; finally, boast of benchmark beating returns, because you’ve delivered the benchmark plus the additional money from winning the bet. Collect your performance fee. (In the unlikely event that you lost the bet and with it all your investors’ cash, simply cough awkwardly and look at your shoes.)

It turns out that it’s impossible to tell the difference between this and a more conventional strategy just by looking at the investment returns.

These are the “black swans” made famous by Nassim Taleb: low probability, high-impact events, except that these particular swans are genetically engineered – deliberately manufactured and then hidden away, to escape at unwelcome moments.

Harford goes on to explain how these mimicking strategies can be used to game nearly all bonus schemes based solely on performance.

This suggests an obvious question: is this in fact a surer way of getting good investment performance than relying on skill anyway?

 

 

Categories
Interesting

Interesting links

I’ve found these interesting, in one way or another:

  1. Is the eurozone a casino? The current betting strategy is madness.
  2. Do what I say, not what I do.
  3. xkcd’s really on a roll at the moment — one for mathematicians.
  4. One for your Christmas list. It would be so cool!
  5. You should choose your Christmas cards carefully if you’ve got any astronomer friends.
Categories
Data

There’s a yotta data out there…

One result of the unrelenting increase in computing power is that the amount of data is now huge. Earlier this year, it was estimated that 295 exabytes of data was being stored around the world in 2007. An exabyte is a billion gigabytes. That’s a lot of data, though admittedly it’s not a yottabyte yet (a million exabytes).

Not only is data storage technology improving — for example, you can buy a 1 TB (terabyte: 1000 gigabytes) hard drive for well under £100 — but more computing power means that it’s now possible to analyse these huge amounts of data.

It’s changing the way companies do business, too:

As Ron Kohavi at Microsoft memorably put it, objective, fine-grained data are replacing HiPPOs (Highest Paid Person’s Opinions) as the basis for decision-making at more and more companies.

In the past, it may simply have been too hard to collect masses of data and then analyse it. That’s simply not true now. It’ll be interesting to see if these data-driven decisions turn out to be better than the traditions seat of the pants ones. My bet is that, on the whole, they will.

Categories
Uncategorized

Microlives

Following the micromort, a 1-in-a-million chance of sudden death, we now have the microlife, which is 30 minutes off your life expectancy.

Both micromorts and microlives are good units for comparing risks:

Here are some things that would, on average, cost a 30-year-old man 1 microlife:

  • Smoking 2 cigarettes
  • Drinking 7 units of alcohol (eg 2 pints of strong beer)
  • Each day of being 5 Kg overweight

The full article, which is well worth reading, explores the relationship between microlives and micromorts, and points out that different branches of the UK government appear to place similar values on them.

Categories
Interesting

Interesting links

Some things that have recently struck me in one way or another:

  1. Literary references to actuaries aren’t that common
  2. Some interesting graphical representations of relative sizes from xkcd: money (recent) and radiation (older). And from elsewhere: how big is a PhD?
  3. Old news is the latest thing
  4. US/UK culture gap: “Like most US universities, [UC Davis] maintains its own police force, employing (as of 2009) 101 people (including administrators), far more than the largest academic departments. The officer wielding the spray is on record as earning $110,000 in 2010, more than all but the better paid full professors.” More
  5. Social differences on public transport: the tube’s posh.
Categories
Data Society

Where the money is

I love the Guardian’s datablog. It consistently presents large quantities of data in interesting interactive ways. Yesterday it took data from the annual survey of hours and earnings, and presented it in three different ways:

  • Choose a salary, and see how earnings for different jobs compare
  • Choose a job, and see what the earnings are
  • Choose a job group

To my mind, one of the most interesting aspects, and one which the presentation highlights, is the gender gap and how it varies between jobs. It seems to be greatest for the highest paid jobs, on the whole.

Actuaries are lumped in with management consultants, economists and statisticians, not necessarily a totally homogeneous grouping, and have a gender difference of 18%. The difference for corporate managers and senior officials is 39%.

Today, it’s got a geographic analysis based on the same data. Guess what? London and the south east come out top.

One of the best things about the datablog is that, as well as coming up with good ways of presenting the data, it also provides access to the raw data so you can do your own thing, or check that the conclusions are in fact warranted. Great stuff.

Categories
Actuarial

Cause of death

Cause of death statistics are notoriously unreliable, for several reasons. Most notably, most of the information comes from death certificates, which only have space for a single cause. Often, there are a number of factors which together resulted in the death, and it’s rather random which cause is chosen, and which manifestation of it: proximate, ultimate, or something in between.

You might think that an autopsies would help, but comparatively few of them are performed, and in any case they might not produce accurate results: around one in four are of miserable quality, apparently. Autopsies are done the old fashioned way, with scalpels, but it appears that using scanning technology might be quicker, cheaper, and possibly as accurate.

Categories
Software

Software risks: testing might help (or not)

It’s good to test your software. That’s pretty much a given, as far as I’m concerned. If you don’t test it, you can’t tell whether it will work. It seems pretty obvious.

It also seems pretty obvious that a) you shouldn’t use test data in a live system, b) in order to test whether it’s doing the right thing, you have to know what the right thing is and c) your system should cope in reasonable ways with reasonably common situations.

If you use test data in a live system there’s a big risk that the test data will be mistaken for real data and give the wrong results to users. If you label all the test data as being different, or if it’s unlike real data in some other way, so that it can’t be confused with the real stuff, there’s a risk that the labelling will change the behaviour of the system, so the test becomes invalid. Because of this, most testing takes place before a system actually goes live. That’s all very well, unless a system’s outputs depend on the data that it’s used in the past. In that case you need to make sure that the actual system that goes live isn’t contaminated in any way by test data, otherwise you could, to take an example at random, accidentally downgrade France’s credit rating.

There’s a possibility that if you don’t have a full specification of a system, your testing will be incomplete. Well, it’s more of a certainty, really. This becomes an especial problem if you are buying a system (or component) in. If you don’t know exactly how it’s meant to behave in all circumstances, you can’t tell whether it’s working or not. It’s not really an answer just to try it out in a wide variety of situations, and assume it will behave the same way in similar situations in the future, because you don’t know precisely what differences might be significant and result in unexpected behaviours. The trouble is, the supplier may be concerned that a fully detailed specification might enable you to reverse engineer  the bought-in system, and would thus endanger their intellectual property rights. There’s a theory that this might actually have happened with the Chinese high speed rail network, which has had some serious accidents in the last year or so.

It can’t be that uncommon that when people go online to enter an actual meter reading, because the estimated reading is wrong, the actual reading is less than the estimated one. In fact, that’s probably why most people bother to enter their readings. So assuming that meter has gone round the clock, through 9999 units, to get from the old reading to the new one, doesn’t seem like a good idea. The article explains the full story — you can only enter a reduced reading on the Southern Electric site within 2 weeks of the date of the original one. But the time limit isn’t made clear to users, and not getting round to something within 2 weeks is, in my experience, far from unusual. Some testing from the user point of view would surely have been useful.