Newsletter Jul 2006 – Louise Pryor

News update 2006-07: July 2006
===================

Contents:
1. Software in ICAs
2. Is it safe?
3. Don’t try this at home
4. Measuring risk
5. Newsletter information

===============
1. Software in ICAs

Many thanks to everyone who has completed the online survey about
software use in ICAs, at http://www.louisepryor.com/survey.html

If you haven’t completed the survey this is your last chance, as the
survey closes on 1st August. It should only take about a quarter of an
hour, and the more responses there are the more useful the results will
be. All participants will get a full analysis of the results, and
responses are confidential.

===============
2. Is it safe?

There’s an interesting piece in the New Scientist, by Henry Petroski,
about how apparent success can mask flaws that will lead to disaster. He
is writing about engineering flaws, but there are lessons for the
software world as well.

Imagine that the Titanic had not struck an iceberg on its maiden voyage:
it would have had a triumphal arrival in New York, and other ships would
have been built to similar designs. The designs would have been adapted
to use smaller engines and lighter hulls. They would have continued to
be thought of as “unsinkable” and at the leading edge of technology as
long as none of them did actually sink; and that state would probably
have lasted for some time. It was only luck that the Titanic happened to
strike an iceberg when it did.

Similarly, foam fell off nearly all space shuttles as they were launched
before 2003; because no damage was done, it was accepted as a routine
event. When foam fell off Columbia, it wasn’t immediately perceived as a
potential threat.

As Petroski says, “latent systematic weaknesses in a design are most
likely to be revealed through failure.” The fact that something has
worked well for a long time is not in itself a guarantee that it will
continue to do so: it may be that the circumstances in which it won’t
work just haven’t happened to arise.

It’s obviously a lot easier to test things in the software world: it’s
possible to run a program with many different sets of inputs, whereas
launching the space shuttle many times is not feasible. However, just
because testing is easier it doesn’t mean that it’s actually done.

In addition, there’s the “it won’t happen” syndrome. The roof of a road
tunnel in Boston collapsed recently; apparently an engineer warned of
the potential problem back in 1999. He was told that the method used to
support the ceiling panels was tried and tested, and that the work was
being done to design specifications and that the ceiling would hold.
Petroski says that it’s important to have “a healthy scepticism about
built things, and an awareness that apparent success can mask imminent
failure”.

The point about unlikely events is that they don’t happen very often. A
piece of software that has worked many times before isn’t necessarily
going to work in the future. It may get a combination of inputs that has
never occurred before. The best defence is to perform thorough testing
and code review.

http://www.newscientist.com/article/mg19125625.600.html
(requires subscription)
http://athricea.notlong.com

If you want to know how to go about testing and reviewing your
spreadsheets and other user developed applications, please get in touch.
Having good processes in place not only makes the applications more
reliable, it also reduces the time required to develop them.

===============
3. Don’t try this at home

Or, at least, don’t do things on live data unless you’ve tested it out
first. It’s a sensible and common safety precaution, though probably not
common enough. Always make changes on a copy of your spreadsheet, so
that you can go back to the last known working version. If you’re
operating on a database, don’t try things out on the production system:
you might corrupt or destroy data as well as making the system
unavailable.

Even if you’re aware of the problems, things can still go wrong.
PlusNet, a UK ISP, recently deleted more than 700GB of its customers’
e-mail and prevented about half its 140,000 users from sending and
receiving new e-mail. Things went horribly wrong during the process of
bringing a new back-up storage server online. Instead of deleting the
contents of the old back-up disks, the engineer deleted the live
storage. PlusNet explained what happened: “At the time of making this
change the engineer had two management console sessions open — one to
the backup storage system and one to live storage. These both have the
same interface, and until Friday it was impossible to open more than one
connection to any part of the storage system at once. The patches we
installed on Friday evening removed this limitation, but unaware of
this, the engineer made an incorrect presumption that the window he was
working in was the back-up rather than the live server. Subsequently the
command to reconfigure the disk pack and remove all data therein was
made to the wrong server.”

PlusNet have called in data recovery engineers, but nearly 3 weeks after
the disaster the missing emails haven’t yet been restored. It took
nearly 2 weeks for email service to be restored for all customers.

With hindsight, of course, it would have been better had the management
console sessions indicated clearly which storage system they were
operating on. Before the patches, it appears that this wasn’t strictly
speaking necessary, but it’s always dangerous to rely on assumptions of
the form “this couldn’t possibly happen” (or, even worse, “no reasonable
user would…”). Of course it’s often difficult to recognise that type
of assumption, as they often aren’t explicit.

http://www.theregister.co.uk/2006/07/11/plusnet_email_fiasco/

===============
4. Measuring risk

I usually enjoy the “Gavyn Davies does the maths” column in the Guardian
on Thursdays, and this week was no exception. Apparently punters do
better, long term, by betting on favourites rather than long shots. They
still lose money, but they lose less money. As Davies points out, this
is unexpected: in a rational world, punters would refuse to back the
long shots until the odds were more realistic, removing the anomaly. He
suggests three possible explanations, favouring the one that says that
the whole market really does have a misperception about risk, failing to
appreciate that a there is a major bias in the odds. In other words, the
risk measure (the odds) isn’t an accurate reflection of the actual risk.

http://www.guardian.co.uk/g2/story/0,,1830876,00.html

The same sort of thing may be happening in the financial world.
According to the Bank of England’s Financial Stability Report, trading
profits have recently risen as a proportion of income at UK commercial
and global investment banks. however, the risk taken on by the banks, as
measured by value at risk (VaR) has risen by much less. This is
surprising because it’s commonly accepted that in order to make more
money, you have to take on more risk. The Bank presents two possible
explanations, but favours the second, that VaR just isn’t a good measure
of risk in the trading book. Unfortunately, it’s very widely used, and
is the only risk metric that banks must disclose, even though they may
well use others internally. This may mean that more weight is put on it
than is justified.

http://www.bankofengland.co.uk/publications/fsr/2006/index.htm

The trouble is that measuring risk is difficult, and, like many
phenomena, there is no single metric that is adequate. You can’t measure
buildings just by their height: other dimensions provide other
information, and you can only get the full picture by looking at them
all. Two towns with the same population may be very different in other
respects. VaR measures only the maximum loss likely to occur over a
given period at a certain level of probability. It doesn’t measure the
expected loss, or even the maximum loss at a different level of
probability or over a different period.

===============
5. Newsletter information

This is a monthly newsletter on risk management in financial services,
operational risk and user-developed software from Louise Pryor
(http://www.louisepryor.com). Copyright (c) Louise Pryor 2006. All
rights reserved. You may distribute it in whole or in part as long as
this notice is included.

To subscribe, email news-subscribe AT louisepryor.com. To unsubscribe,
email news-unsubscribe AT louisepryor.com. Send all comments, feedback
and other queries to news-admin AT louisepryor.com. (Change ” AT ” to
“@”). All comments will be considered as publishable unless you state
otherwise. The newsletter is archived at
http://www.louisepryor.com/newsArchive.do.