News update 2005-01: January 2005
===================
A monthly newsletter on risk management in financial services,
operational risk and user-developed software from Louise Pryor
(http://www.louisepryor.com).
Comments and feedback to news-admin@louisepryor.com. Please tell me if
you don’t want to be quoted.
Subscribe by sending an email to news-subscribe@louisepryor.com.
Unsubscribe by sending an email to news-unsubscribe@louisepryor.com.
Newsletter archived at http://www.louisepryor.com/newsArchive.do.
In this issue:
1. Safety nets can be dangerous
2. Rocket science
3. FSA update
4. Disastrous IT projects
5. Newsletter information
===============
1. Safety nets can be dangerous
A good way to prevent a bad thing happening is to add more ways of
stopping it, right? Wrong. Adding more redundancy often increases
the risk. The following four factors may come into play:
– Redundancy only works if the different methods (systems,
procedures, or whatever) are truly independent. If they are not,
and sometimes the failure of one can make the others fail too,
the redundancy is more apparent than real. For example, you
might think that adding more engines to an air-plane would make
crashes from engine failures less likely. In fact, this is only
true up to a certain limit, depending on the plane and the
engines concerned. If you have more engines it is more likely
that one of them will fail; and if the failure is likely to be
catastrophic, such as by starting a fire that destroys all the
other engines, the risks of having more engines may outweigh the
possible benefits.
– There is a well know psychological phenomenon known as bystander
apathy. The more witnesses there are to an accident, the less
likely each individual witness is to call an ambulance.
Similarly, the more people responsible for performing a check,
the less thorough each will be. Each thinks “there are n other
people doing this – if there is anything wrong, one of them is
bound to spot it”. If everybody is responsible then nobody takes
the responsibility.
– If a system is believed to be safer, people are likely to take
more risks with it. They drive faster if they are wearing seat
belts. Add two levels of warning to a system and people will
ignore the first level.
– Dedicated workers will bypass safety systems if they interfere
with what they see as their primary responsibilities. If it’s
important to get a report out on time, people will not bother to
password protect it or update the security systems on their
computer. Software developers will code all night, but not bother
with backups, source control or systematic testing. Fire doors
are propped open.
The basic arguments are set out in an excellent paper by Scott
Sagan. He is discussing the problems of making nuclear
installations more secure against terrorism, but of course the
principles apply to all types of risk management systems and
processes. Further discussion and some great examples were provided
by Don Norman and Geoffrey Newbury in the Risks forum.
As Norman points out, three of these four factors are psychological
rather than technical. They should be taken seriously when devising
any method of risk control.
http://cisac.stanford.edu/publications/20274/
http://catless.ncl.ac.uk/Risks/23.63.html#subj11.1
http://catless.ncl.ac.uk/Risks/23.64.html#subj10.1
===============
2. Rocket science
Wasn’t the Huygens landing a great way to start the year! The
pictures were stunning, even though there were fewer of them than
expected.
As you probably recall, Huygens was dropped onto Titan from the
Cassini interplanetary probe. Signals from Huygens were received by
Cassini and then transmitted back to earth; direct transmission was
not possible because of the limited power available to Huygens.
There was redundancy built into the design, with two radio channels
being used for the communications, so that if one failed the data
would still get through on the other. In the event, one of the
channels did fail (a software error: Cassini’s receiver was never
told to turn on). But guess what? The two channels weren’t fully
redundant. Channel A, the one that failed, was the only one
carrying data that would help measure wind speeds. And half the
pictures taken during the descent were sent only over channel A,
and the other half only over channel B.
Redundancy only works if it’s genuine (see above), and it can
sometimes be subverted by people trying to do their job (in this
case, trying to get as much information back from Titan as
possible).
However, it turns out we were lucky to get any pictures at all.
When Cassini was launched in 1997, the team were pretty confident
that things would go well. Cassini and Huygens had been thoroughly
tested on the ground, both separately and together. But there was
one test that had been omitted: they decided not to subject every
system to a simulation of the exact signals and conditions it would
experience during flight, because this would have meant
disassembling some of the communications components. It would have
been time consuming and expensive to do this, then reassemble,
retest and recertify them. It was a simple cost-benefit analysis,
which in hindsight was completely wrong.
A couple of the team in Darmstadt were worried that this test had
not been performed, and eventually persuaded mission control that
it would be possible to perform a similar test during Cassini’s
long trip out to Saturn. They devised a test to send a signal from
Earth to Cassini to simulate the signals that Cassini would receive
from Huygens during the landing. Cassini could then echo the
information back to Earth, where the team could tell whether it had
been received and deciphered correctly.
The test failed. It turned out that the reception mechanism on
Cassini had not been designed to account correctly for the Doppler
shift in the signal caused by the high relative acceleration
between Cassini and Huygens. Rather annoyingly, the problem could
have been fixed by some trivial parameter changes in the firmware,
but once Cassini had left Earth these changes could no longer be
made.
Eventually they worked out a way of changing the trajectory of
Huygens so that the Doppler shift would be reduced. This is why the
landing wasn’t until this January, instead of late 2004 as
originally planned.
There are some obvious morals to this story. However much testing
you do, it may always be the next test you do that uncovers the
problem. And the problem may be a big one. Just because you’ve done
99% of the testing it doesn’t mean that the system is 99% likely to
work.
Moreover, thorough review is no substitute for testing. The problem
was spotted by none of the design reviews of the communications
link. An issue that was overlooked by the design team was also
overlooked by the reviewing team. There are some hints in some of
the published accounts that the reviewing team didn’t imagine that
the design team could possibly overlook the effects of the Doppler
shift: so don’t make unwarranted assumptions, and do check up on
any assumptions you make.
The full IEEE Spectrum article on this episode contains all the
details.
http://www.spectrum.ieee.org/WEBONLY/publicfeature/oct04/1004titan.html
Wikipedia has a useful summary, while the ESA site is the horse’s
mouth.
http://en.wikipedia.org/wiki/Huygens_probe
http://www.esa.int/SPECIALS/Cassini-Huygens/index.html
===============
3. FSA update
The FSA have just issued their annual Financial Risk Outlook and
Annual Plan. If you want to know what is on their mind, these two
documents are vital reading.
http://www.fsa.gov.uk/pubs/plan/financial_risk_outlook_2005.pdf
http://www.fsa.gov.uk/pubs/plan/pb2005_06.pdf
New consultation and discussion papers out this month:
—————————————————–
CP05/01 Quarterly Consultation (No 3)
CP05/02 Regulatory fees and levies 2005/06
CP05/03 Strengthening capital standards. Including feedback on
CP189
Feedback published this month:
—————————–
PS04/28 Lloyd’s: Integrated prudential requirements and changes to
actuarial and auditing requirements – Including feedback
on CP04/7, CP04/13 (part) and CP04/15 (part) and ‘made
text’
PS05/01 Treating with-profits policyholders fairly – Feedback on
CP04/14 and made text
Current consultations, with dates by which responses should be
received by the FSA, are listed at
http://www.fsa.gov.uk/pubs/2_consultations.html
===============
4. Disastrous IT projects
We often hear about disastrous IT projects. Stories recently have
included:
– A new system for keeping track of the MoT status of cars is
running two and a half years late. The budget (in 2003) was
£230m. I haven’t been able to find out what the budget overrun
is.
– The computer systems at the Child Support agency don’t work and
are about £29m over a £456m budget.
– An FBI system intended to be an important tool in fighting
terrorism may be scrapped because it doesn’t work and may never
work. So far it has cost $170m.
http://www.theregister.co.uk/2004/12/30/pcs_slams_mot_system_delays/
http://www.pcw.co.uk/news/1160762
http://news.bbc.co.uk/1/hi/uk_politics/4205221.stm
http://www.cnn.com/2005/US/01/13/fbi.software/
The amounts of money involved in spreadsheet errors are sometimes
comparable. For example, the state of New Hampshire has recently
had to find an extra $70m in its budget.
http://www.theunionleader.com/articles_showfast.html?article=50185
It seems to me that the press coverage of both types of problem
gives a misleading impression. Although a number of IT projects are
spectacular disasters, there are many others that are extremely
successful. And the disasters are rarely caused entirely by IT
factors; they often go hand in hand with more general management
problems. On the other hand, there are probably far more problems
arising from spreadsheets than we ever hear about. But spreadsheets
don’t count as IT. They probably should.
===============
5. Newsletter information
This newsletter is issued approximately monthly by Louise Pryor
(http://www.louisepryor.com). Copyright (c) Louise Pryor 2005. All
rights reserved. You may distribute it in whole or in part as long
as this notice is included. To subscribe, email
news-subscribe@louisepryor.com. To unsubscribe, email
news-unsubscribe@louisepryor.com. All comments, feedback and other
queries to news-admin@louisepryor.com. Archives at
http://www.louisepryor.com/newsArchive.do.
——————————————————————–
The Edinburgh Bach Choir will be performing Bach’s St Matthew Passion
at St Cuthberts Church, Lothian Road on Saturday March 12th. See
http://www.edinburghbachchoir.org.uk for more details. Tickets from
the Usher Hall or members of the Choir.