I’m no longer a software risk consultant; I start at the Board for Actuarial Standards at the beginning of May. Consulting has been fun, but it’s time to move on.
Category: Old site
Content imported from my previous website, which was active until April 2007.
Newsletter Apr 2007
News update 2007-04: April 2007
===================
Contents:
1. Last issue
2. Newsletter information
===============
1. Last issue
Some of you may have noticed that this newsletter hasn’t appeared
for a while, and somebody has even asked why (hi Steve!).
Early this year I decided that I was going to stop being an
independent consultant, and started to look for another role. The
search was successful, and at the beginning of May 2007 I start
work at the Board for Actuarial Standards, which is part of the
Financial Reporting Council.
http://www.frc.org.uk/bas/
Under the circumstances the newsletter rather faded out of the
picture. It’s not as if there’s been a dearth of things to write
about, though; most obviously, the recent RIM outage which was caused,
apparently, by a software upgrade that went wrong.
Looking back over the incidents and issues I’ve written about over
the past few years, I think there’s a simple theme that emerges:
you can’t count on things not going wrong. The fact that something
hasn’t happened in the past doesn’t mean that it’s not going to
happen in the future. The trouble is that we learn from experience.
In practice, this often means that we don’t learn from what we
don’t experience. There’s a new book just out, which I haven’t read
yet but which looks very interesting, that’s relevant here: The
“Black Swan: The Impact of the Highly Improbable” by Nassim
Nicholas Taleb. I’m looking forward to reading it.
Even when you are aware of a risk, because the bad event has
happened before, it’s often quite difficult to be as realistic
about it as you should. Software development is an excellent
example here; all software developers have written software that
they thought would work, only to be surprised when it turns out to
contain bugs. The first time this happens, the initial belief may
be reasonable. When it has happened over and over again, the
developer’s confidence may seem less reasonable. However, each time
the developer (and I speak personally here) can produce plausible
justifications: they have learned from their previous experiences,
and taken steps to avoid all the problems that have arisen in the
past. The problem is that there are always more, new, problems that
can arise. Never believe anyone who says “it could never happen”.
An excellent newsletter about risks is the Risks Digest. Although a
number of its items are fairly technical, there are many that raise
more general issues that are widely applicable. I’ve been reading
Risks on and off for nearly twenty years, and shall certainly
continue doing so.
http://catless.ncl.ac.uk/risks
===============
2. Newsletter information
This is the last issue of a monthly newsletter on risk management
in financial services, operational risk and user-developed software
from Louise Pryor (http://www.louisepryor.com). Copyright (c)
Louise Pryor 2007. All rights reserved. You may distribute it in
whole or in part as long as this notice is included.
To subscribe, email news-subscribe AT louisepryor.com. To
unsubscribe, email news-unsubscribe AT louisepryor.com. Send all
comments, feedback and other queries to news-admin AT
louisepryor.com. (Change ” AT ” to “@”). All comments will be
considered as publishable unless you state otherwise. My blog is at
http://www.louisepryor.com/blog. The newsletter is archived at
http://www.louisepryor.com/newsArchive.do.
Small error, big result
In my November newsletter I discussed how an error that appears small at the time it occurs can have a big result down the line. Gordon Bagot replied as follows:
I too have come across this problem with a former client. A currency exchange rate was wrongly transferred from one electronic data file to another, manually, and with numbers transposed. The whole firm used the transposed number, in all their systems, until I was asked to do some consultancy work where I used my own source of exchange rates. The firm argued with me, delayed payment too, until I spent the time trying to resolve the problem with one of the client’s staff. Result, I was right, client wrong, no apology, but I did get payment then more promptly.
There is so much in the way of statistical analyses done, on which quite major investment decisions are made, that I can’t understand why time, money, resources is not allocated to ensure data is 100% correct as is possible.
I do hope items such as this are noted by your clients.
Me too!
Newsletter Nov 2006
News update 2006-11: November 2006
===================
Contents:
1. Small error, big result
2. Nobody will ever…
3. Just pick it up and carry it
4. Newsletter information
===============
1. Small error, big result
Guess what happens when you put the wrong data in? Surprise,
surprise, you get the wrong answer out at the other end. This has
just happened to HBOS, where the accumulated wrong answers are
worth £17m.
http://business.guardian.co.uk/story/0,,1958434,00.html
http://www.investmentweek.co.uk/public/showPage.html?page=355063
Apparently there’s been an error in the unit pricing data for four
years. It all started about four years ago, when a decimal point
was put in the wrong place. The errors have mounted up over time.
The problem was noticed around a year ago, and since then Clerical
Medical (the arm of HBOS where this happened) have gone back
through all their policy records to work out exactly which policies
have been affected and by how much.
There are a couple of interesting issues here.
The first one is, of course, the way the mistake happened to start
with. The press stories imply that it was a simple data input
error: somebody typed the wrong number, 23.4 instead of 2.34 or
something like that. It’s easy to do. I know I’ve mistyped things
in the past, and I’m sure you have too. The question is, how can
data input errors be prevented?
Well, first off, you try to avoid manual data input. Most data
comes from a computer anyway, and you try to set up a direct feed
rather than rely on somebody reading numbers from a screen or
printout and typing them in. If it turns out that a direct feed is
impossible, maybe you try copying and pasting: there are plenty of
risks there, but they are different to those of manual input.
You also want to have some sort of data validation on the inputs:
maybe a range outside which entries are queried, or a maximum
difference between the new value and the old value, or between it
and other, related values. It’s often difficult to set this sort of
thing up so that there are no false alarms and all errors are
caught, but these data validation checks are nevertheless useful.
Another option is to have some higher level checks built in. Maybe
all manual data inputs have to be made twice, by different people.
This is time-consuming and a bore, so it may be overkill in many
cases. Or you can set up some check totals: make sure that the
entries that have just been input add up to the same as the total
on the sheet from which you are copying them. This can mean
inputting data you don’t actually need, because it’s included in
the total, but it’s sometimes worth it.
The second interesting issue is connected with overall
reasonableness checks. It sounds as if the error was small at the
beginning, so wasn’t noticed. Then, as it grew larger, it still
wasn’t noticed. This is pure speculation on my part, but people
often do reasonableness checks in terms of the difference from one
time period to the next. And in this case, as no new error was
introduced, everything would be consistent so nothing wrong would
be noticed.
A really important lesson to learn is that consistency with the
prior period, or previous version, doesn’t necessarily mean that
there’s no error. It just means that there is no new error, or that
any new error is insignificant. And errors don’t always stay
insignificant.
More generally, relying on the results being consistent with your
expectations, or being reasonable in the light of your experience,
can be risky if your expectations or experience are based on the
use of the system that you are assessing. If you’ve been using a
program for some time, you have probably built up some expectations
about how the results are likely to vary with the inputs, and
you’ll probably have some good explanations of why things work in
the way they do. However, this can be risky; we tend to be very
good at post hoc rationalisation. If you rely on overall
reasonableness checks, or on consistency with experience, you need
to be sure that your checks are genuinely independent, rather than
being based on your experience of the system you are trying to
check.
===============
2. Nobody will ever…
The space shuttle can’t cope if it’s off the ground over 31st
December/1st January. Or, at least, its software can’t cope. The
software is about 30 years old, and doesn’t have any way of moving
from day 365 of the year back to day 1. So on 1 January 2007, for
example, it will think it is day 366 of 2006. Although they’ve done
simulations of flights lasting over the year end, and have
encountered no problems, NASA isn’t keen to try it in earnest.
There are distinct echoes of Y2k here. Nobody really knows if
anything will go wrong, but they don’t want to risk it.
http://corviles.notlong.com
It just shows that the “nobody will ever want to do X” design
principle is one to be avoided. Also, of course, that software can
stay in service much longer than you expect.
And, of course, nobody would ever hack into an ATM using an MP3
player. Except that Maxwell Parsons did. He targeted free-standing
ATMs (ie, not built in to walls) in bars and bingo halls. They were
connected to the bank using an ordinary telephone line, and as they
were free-standing he could physically get at it. He used the MP3
player to record the data going down the line,and then decoded it
on a laptop and used it to counterfeit credit cards. Apparently the
banking industry have now plugged the loophole.
http://www.timesonline.co.uk/article/0,,29389-2453590,00.html
===============
3. Just pick it up and carry it
We tend to think of data theft as being primarily a network
problem: nasty people hack into a computer somewhere, and syphon
off the data; or they persuade nice people to visit a nasty
website, and enter confidential information. Physical theft is
pretty common, though. Laptops can and do hold vast amounts of
data, and are very easy to pick up and carry. Just this month, in
the UK, three laptops containing payroll details of 15,000
Metropolitan Police officers were stolen from LogicaCMG, and a
laptop containing customer information was stolen from an employee
of the Nationwide building society.
http://www.theregister.co.uk/2006/11/22/met_police_laptop_theft/
http://news.bbc.co.uk/1/hi/uk/6160800.stm
Often the data theft is probably accidental–it’s the laptop that’s
the real target, rather than the data on it–but that doesn’t
really make the data any more secure. In some cases it’s against
company policy to hold confidential data on a laptop. You’d think
that this should be the general rule, with very few exceptions.
Yes, it may be convenient to be able to work at home, or while
you’re travelling, but do the benefits outweigh the risk?
Physical security isn’t an issue only where laptops are concerned.
Earlier this month thieves stole a number of router cards from a
London data centre, resulting in serious service disruptions. A
couple of weeks earlier thieves got into a different data centre,
and drove off with a van full of equipment.
http://news.zdnet.co.uk/security/0,1000000189,39284520,00.htm
If they could get in and walk (or drive) off with equipment, what’s
to stop them walking off with whole disk drives or servers?
===============
4. Newsletter information
This is a monthly newsletter on risk management in financial services,
operational risk and user-developed software from Louise Pryor
(http://www.louisepryor.com). Copyright (c) Louise Pryor 2006. All
rights reserved. You may distribute it in whole or in part as long as
this notice is included.
To subscribe, email news-subscribe AT louisepryor.com. To
unsubscribe, email news-unsubscribe AT louisepryor.com. Send all
comments, feedback and other queries to news-admin AT
louisepryor.com. (Change ” AT ” to “@”). All comments will be
considered as publishable unless you state otherwise. My blog is at
http://www.louisepryor.com/blog. The newsletter is archived at
http://www.louisepryor.com/newsArchive.do.
Women in IT
It’s official! I’m an unusual person. Just 16% of IT workers are women.
Seriously, though, it appears that the proportion of women in IT is actually falling, as more leave the field than join it.
Software causes tube problems
Widespread delays to the London Underground this week were caused by one of the Tube’s infrastructure operators installing new software.
The new software was loaded over the weekend, presumably to minimise any disruption. There’s no indication of what actually went wrong, or whether it could have been prevented by better (or more, or any) testing.
Yes, the decimal point does matter
A misplaced decimal point has cost Clerical Medical £17m. Apparently a wrong decimal point was input to some unit pricing data in 2002.
Spreadsheet use in investment banks
A white paper from Lepus Consulting on The Management of Spreadsheet Use in Financial Services. Despite the title, it considers only investment banks. It’s mainly anecdotal evidence from a survey (no numbers), with a short guide to best practice.
Web based spreadsheet
Another web-based spreadsheet. I don’t know how it compares to Google’s.
MP3 players: more than just a nuisanceMP
Apparently it’s possible to use them to hack into ATMs, as well as to annoy your fellow passengers.