Inaugural lecture

Uncertainty in Critical Systems

Inaugural Lecture

Queen Mary (University of London)

1 November, 2000

Norman Fenton

RADAR (Risk Assessment & Decision Analysis Research)

Dept Computer Science

Queen Mary (University of London)

and

Agena Ltd

It’s wonderful to see so many family, friends, and colleagues here today. Thanks especially those who I know had difficult journeys. Obviously you're not aware of W H Auden's observation. He said that "A professor is someone who talks in someone else's sleep"

My challenge today is to explain to this very broad audience some of the basic ideas that underpin our recent research and to give a feel for how these ideas are being exploited.

I will start by saying something about how I got here and what I did along the way and then will try to provide some motivation for our current work on how to handle uncertainty and risk for making decisions about critical systems. I mean here systems (normally computer controlled) whose failures have critical consequences - so this could be anything from a flight control system to a software system in a major bank.

I will try to explain in layman’s terms the theoretical basis and technology which underlies our approach to reasoning about uncertainty (Bayes Theorem and Bayesian Belief Nets).

o give a feel for the broad applicability of our work I will also present an example of our work applied to legal reasoning which might well change your view of jury verdicts.

There aren’t many people who can say with absolute justification that they have come full circle. But I certainly have - I was born right here in Mile End hospital in Bancroft Road . That's less than 30 yards from my office now in Comp Science. Moreover I was brought up

in the area living in Stepney until our family moved to Ilford. Unlike many mathematicians (which I guess I still consider myself to be because of my training) my main early influences were not such illuminaries as Guass, Cauchy, and Decartes. No I was much more serious than that.

It was Jimmy Greaves Dave Mackay and the other Spurs legends of the 1960s. In fact, while it is a fantastic honour to become Professor at an institution like this, it is also tinged with some disappointment. It means, I guess that I have to finally accept that I will not now achieve my REAL ambition of playing for Spurs. Joking aside, being as I actually had some skill that ambition was really put to rest when George Graham became manager two years ago.

Anyway, as I give you a brief synopsis of my academic career it also gives me an opportunity to acknowledge some of the people who have had a major impact on it.

For almost all the mathematics that I know I am indebted to Haya Freedman who was my undergraduate tutor at LSE and Peter Vamos who supervised my PhD at Sheffield University

My first academic post was in the Maths Dept at University College Dublin but after a year I left to take up a postdoctoral research fellowship at Oxford University in the Mathematical Institute.

Some of the friends I made in Oxford (and who I'm delighted to see here today) are convinced that I never did much there other than attempt to have a good time. I’d like to put on the record that this is complete nonsense. I actually learnt some really important things.

How to punt
How to tie a bow tie

In actual fact career wise it was during my time in Oxford that I started to do research in software engineering and indeed my office was eventually moved to the world-renowned Programming Research group. But it was actually a contact outside of Oxford who triggered my interest. It was Agnes Kaposi who was head of the electrical engineering department at South Bank. She asked if I would supervise with her a PhD student called Robin Whitty. My subsequent career in software engineering is due entirely to the inspiration of Agnes and Robin. In fact because of the close research links I had developed with them I eventually took up a post at South Bank in 1984. In the 5 years I was there we did a lot of work addressing the problems of software complexity and in particular how to measure different properties of software systems.

The idea of measuring software quality, and in particular reliability, brought me in 1989 to Bev Littlewood and CSR at City University where I had eleven great years until leaving there earlier this year. It was Bev who really got me interested in critical systems assessment

When Martin Neil joined our group in 1995 he brought with him some ideas which were to dramatically change the direction of my research. I am therefore especially indebted to Martin who has been a constant and influential factor in the work I am about to describe, and who in addition to being one of my fellow directors at Agena, also joined me here at Queen Mary when I moved from CSR.

To explain ideas behind our work, I’d like to start with an analogy.

Let’s suppose that, instead of assessing the risks of deploying a critical software system, your task is to assess and manage the risks of driving a car.

The data for road fatalities in both the US and Europe is very curious. Do you know in which months there are fewest fatalities? It is February followed by January. In other words there are fewest fatalities when the weather is at its worst and when presumably the roads are at their most dangerous. If you apply traditional statistical regression techniques to this curious data you will end up with a simple model like this:

Colder months yield fewer fatalities. Now as a purely predictive model you could argue that this is not too bad. But for risk management it is useless. It would provide irrational information since it would suggest that if you want to minimise your probability of dying in a car crash you should do your driving when the roads are at their most dangerous.

What we know is that there are a number of causal factors which do much to explain the apparently strange statistical observations.

Clearly the season influences the weather and both weather and the season influence the road conditions When the road conditions are bad people tend to drive slower.The danger level is at its highest when people are driving fast and the road conditions are bad. Both the season and the weather influence the number of journeys made - people generally make more journeys in summer and will generally drive less when weather conditions are bad. The actual number of fatalities is influenced not just by the danger level but by the number of journeys. If relatively few people are driving, albeit dangerously, there will be relatively few fatalities.Using this kind of model, which happens to be an example of a Bayesian network, we can fully explain those strange statistical observations and also use it to make sensible decisions about risk. I will return to this example and show you how the model works in practice once I’ve told you a bit more about Bayesian networks.

In the case of software and systems assessment the classic problem we were trying to solve

was to determine if and when a system under development was of sufficiently high quality to release, that is, to put into operation. But it turns out that we have to overcome very similar data problems to the one we’ve just seen.

For example, much of software metrics is concerned with using measures that you can collect during the development and testing phases of software projects to predict properties of the software after it has been released to customers. In particular there is a very strong belief that you can use fault data collected during pre-release testing to make predictions about the number and location of faults that you might expect to get in operation (that is. post-release).

And THAT’s what you really are MUCH more interested in because these are the faults that users actually experience. The popular assumption is that there is a clear positive correlation. The more faults there are in a module pre-release, the more there are likely to be post-release.

In other words you might expect to see something like this

where each dot represents a module.

In fact, if we now look at data from a real system it turns out that what really happens is quite different from popular expectations.

This is a plot of actual data taken from a major telephone switching system. This was a system of several millions of LOC of software. Each dot represents a module sampled randomly. A module typically has around 2000 LOC.

What you can see is that the real problem modules pre-release tend to turn out to be OK – they reveal mostly zero faults post-release.

Conversely, the real problem modules post-release (and these are the real concern) did not really show up at all as problem modules pre-release.

Just as in the case of the road accidents traditional statistical models and metrics find it hard to deal with this kind of phenomenum, but it turns out that there are simple causal explanations. In this case most of the modules that had high number of faults pre-release, and a low number of post-release faults just happened to be very well tested to the point that all the faults had been tested out of them. The amount of testing is therefore a very simple explanatory factor that must be incorporated into any predictive risk model of defects.

So clearly metrics data in isolation doesn't really help with assessing the risks of deploying a system. What you really need is a mechanism for combining different information like metrics data and reliability data with subjective judgements about things like quality of testing and the experience of design personnel.

So in summary, what we really need for assessment is to be able to incorporate:

uncertainty
genuine cause and effect relationships;
multiple types of evidence
expert judgement;
incomplete information.

We also want all our assumptions to be

visible and auditable.

We spent a lot of time investigating different formalisms that could satisfy these requirements and came to a firm conclusion that Bayesian belief nets (BBNs) were by far the best solution for our problem.

Here is an example of a BBN which is a very simplified version of one we have used in software assessment to tackle that problem of predicting defects.

BBNs are graphs consisting of nodes and arrows. The nodes of the graph such as testing effort and operational defects represent the uncertain variables.

The arcs represent causal or influential relationships – so there's an arc from operational usage to operational defects because the extent to which you use a system influences the number of defects you will find.

In addition to the nodes and arrows, there is a probability table associated with each node that expresses the probability of the node conditional on its parent nodes.

For example, if we assumed for simplicity that each of these variables had just two states low and high, then the probability table for the node (op defects) might look like this:

It expresses the probability of each possible value of the node (high and low here) conditional on each combination of values of the parent nodes.

At any point time you should be able to see the marginal probability values (presented here as percentages) associated with each node like this:

The real power of BBNs is in the fact that, once we have built the models, we can execute them and use them for predictions and decision making using as little or as much evidence as happens to be available.

Let’s see how by running a simple version of the road fatalities model.

HUGIN

Notice that with each variable are the probabilities. At this point we haven't entered any observations in the model, so what we have here are called the prior probabilities. For example, the prior probability that the weather is good is 63% and all the seasons have equal probabilities. The prior probability of high number of fatal accidents is 46%.

Now let’s enter some observations. Let’s first see what happens in winter.

Notice how all the probabilities change. In winter road conditions are more likely to be bad, but this means that people tend to drive slower. Also fewer journeys are made in winter. The impact of this is that the probability of high number of fatal accidents has dropped to 43%.

Now compare what happens in summer. Road conditions are better but this means people drive faster. There are also more journeys. These factors explain why we now see an increase in the probability of high fatalities to 50%.

This explains the strange statistical results but doesn't help us with risk reduction.

The only things we directly control ourselves are the speed we drive and the number of journeys we make. Let’s suppose that irrespective of the time of year we all drive fast and make a lot of journeys

Notice how the probability of fatalities increases again to 61%. However, now compare the situation between summer and winter. In summer the road conditions are less likely to be bad so we see a drop to 59% in high fatalitiy prob. In winter road conditions are worse and the probability increases to 65%. This tells us that if we do not alter our driving habits then fatalities are more likely in winter than summer - exactly the opposite of what the naïve model was telling us.

Now, while BBNs are relatively new, the underlying theory is based on a very simple theorem which is over 200 years old. It was due to an 18th century mathematician - the Reverend Thomas Bayes

Bayes Theorem underpins the probabilistic reasoning I just showed you. It is all about how you adjust your beliefs in some uncertain event when you observe evidence which may or may not support the event.

I will explain it by example. In this example we are interested in predicting if a person entering a chest clinic has lung cancer.

Suppose that A represents the uncertain event that "person has lung cancer" Our prior p(A)=0.1 simply reflects empirical data - 1 in every 10 people who come to the clinic have lung cancer.

Suppose that B represents "person is a smoker" - again the 0.5 probability is based on an empirical observation that 50% of people coming to the clinic are smokers.

The question is: to what extent do we revise our prior judgement about the probability the person has lung cancer P(A) in the light of observing evidence that the the person is a smoker. In other words we want to calculate p(A|B).

In this case (like many others) we might have information about p(B|A) but not about p(A|B).

In this case we can find out p(B|A) simply by checking the proportion of people with lung cancer who are smokers. Suppose, for example, that we know that p(B|A)=0.8.

Now Bayes theorem tells us how to compute p(A|B) in terms of p(B|A):

You start with P(A) - the prior. You multiply this by p(B|A), the likelihood. You divide by the constant term P(B) and you end up with you posrterior belief P(A|B)

It follows that p(A|B)=0.16. Thus if we get new evidence that the person is a smoker we revise our probability of the person having lung cancer (it increases from 0.1 to 0.16).Note this is not a dramatic increase. If you had to put a bet on it, you still wouldn’t bet that that the person has lung cancer.

Now Bayes Theorem represents the probabilistic reasoning needed when there are just two related variables.

In a BBN you have many variables and links and

when you enter pieces of evidence you update all of the probabilities in the BBN by applying Bayes theorem recursively. This is called PROPAGATION. This is what you saw happen when I entered observations into the example BBN.

Although the underlying theory of Bayes has been around for a long time it turns out that propagation is computationally intractable even for small BBNs, so until very recently nobody could run realistic BBN models even though everybody knew Bayes was a great formalism for handling uncertainty.

Fortunately in the late 1980s researchers developed algorithms which meant that many classes of very large BBNs could be propogated efficiently.

Software tools that implement these algorithms such as the tool I just used have since become available commercially and this has led to an explosion of interest.

To date BBNs have proven useful in practical applications such as medical diagnosis and diagnosis of mechanical failures. Anybody who has used a PC in the last 3 years will have been interacting with a BBN - it is BBNs that underlie the help wizards in Microsoft Office. but perhaps this is not a good selling point.

Our own research on BBNs was, and continues to be application driven.

While the mathematical and statistical community focused on the problem of refining the propagation algorithms nobody was doing any work on methods that help you to BUILD big BBNs But for the systems assessment type problems that we were working on we had to crack these problems of scale.

That meant being able to build not just large graph models but also very large probability tables. For example, if you have a node which has say 5 possible state values

and if it has 2 parents that each have 5 possible state values, then the number of cells in the probability table for A is 125. A node with 10 values and 3 parents with 10 values needs 10,000 probability values. In many of our models the nodes are discretised continuous variables having 100 state values. It is therefore impossible to generate the probability tables by hand. Our work in research projects such as SERENE and IMPRESS have resulted in methods and tools that have enabled us to build BBNs to solve complex real world problems.

To give you a feel for the scale of these things, here, for example, is a part of a BBN that we developed with NATS to help predict the changes in risk of mid-air collisions given changes to the architecure of the air traffic management system.

In fact even this net is relatively small fry.

The largest BBN we have built - and we believe it is the largest built anywhere in the world - is what underlies a decision support system that we built for DERA (Defence Research Agency).

I'd like to acknowledge Simon Forey as a key member of the team here. This system is used to predict the reliability of military vehicles at various phases during development and it helps DERA to assess different designs from different organisations.

The BBN underlying this system is built dynamically and typically consists of several hundred nodes and of the order of 200 million probability values.

Another BBN-based decision support system BPAM we built for Railtrack helps them assess the safety of individual electronic components that are bought from external contractors.

We built the AID system for Philips to help them track and predict software defects in consumer electronic products.

The BBNs behind those kind of applications are rather too complex to present as examples today. So what I would like to finish with is a rather simple example which demonstrates both the power and very broad applicability of BBNs.

People who know me really well will know that I have had my share of legal battles including two against THFC which somewhat dimmed my enthusiasm for Spurs. During these battles I guess it’s fair to say that I have been less than enamoured by some of the lawyers I have come across. Despite this I want to put it on the record that I REALLY have nothing against lawyers. It’s just that 99 percent of them give the rest a bad name.

The background to this piece of work were the so-called classical fallacies of legal reasoning. Consider this scenario.

"Suppose a crime has been committed. Blood is found at the scene for which there is no innocent explanation. It is of a type which is present in 1% of the population."

One of the fallacies is the so-called prosecutor’s fallacy. This is the assertion that "There is a 1% chance that the defendant would have the crime blood type if he were innocent. Thus, there is a 99% chance that he is guilty."

The prosecutor’s fallacy is to assume that P(A|B) is the same as P(B|A)

where A represents the event "Defendant innocent" and

B represents the event "Defendant has the matching blood type".

The prosecutor’s argument is only valid if the prosecutor assumes a prior probability of guilt equal to 0.5 before any evidence is known. This is hardly commensurate with the innocent until proven guilty principle.

These kinds of mistakes about probability continue to be made. For example, in a recent trial involving DNA evidence the evidence was such that if the defendant were not the source of the crime sample, then the probability of a match was 1 in 3 million. However, for the prosecution the Forensic scientist stated that

"The likelihood of this [the source of the sample] being any other man but [the defendant] is 1 in 3 million".

This statement was essentially an instance of the prosecutor’s fallacy.

In two successive appeals against the original conviction the defence used an expert witness (a Bayesian academic) to explain to the jurors how to combine evidence using Bayes’ theorem. He explained how to combine the DNA evidence with other evidence (such as the failure of the victim to pick out the accused on an ID parade). Applying Bayes Theorem in this way reduces the probability of guilt significantly.

Although all three judgements of the case assume that Bayes Theorem is appropriate for expert evidence, the Court of Appeal essentially rejected its admissibility and the appeal was rejected, with the ruling

"To introduce Bayes Theorem, or any similar method, into a criminal trial plunges the jury into inappropriate and unnecessary realms of theory and complexity deflecting them from their proper task".

The reaction of the Court of Appeal is both alarming, yet also understandable. It is alarming because it rejects the best logical approach to formalising and reasoning about uncertainty simply on the basis that lay people could not understand the science. Yet it is understandable because the defence’s Bayesian arguments were presented from "first principles" – essentially trying to unravel all of the propagation calculations that normally go on behind the scenes. There is a rich irony here. It was a very complicated scientific theory (DNA) which led to the original conviction; the prosecution would not have dreamed of trying to explain the underlying theories of DNA from first principles like the defence tried to do with Bayes..

To ensure that Bayesian arguments are as widely accepted (without having to be understood) within the legal profession as, say, DNA evidence we feel that BBNs provide a way forward- can just show the impact of different assumptions without trying to explain all the Bayesian calculations.

In looking at these kinds of examples we discovered a genuinely new fallacy which is easily explained by BBNs.

I am sure you will all recall examples of serious criminal cases where a jury has found the defendant not guilty and then it is subseqently revealed that the defendant had a previous conviction for a similar crime. Media reports of such cases usually say things like members of the jury wept when they heard this and were furious that they had not been told beforehand.

In general, therefore, we can talk about: the questionable verdict scenario

It therefore seems reasonable to pose the observer’s question:

Does the subsequent evidence of a previous similar conviction make you less confidant that the jury was correct in its verdict?

How many of you here would answer yes?

The fallacy, which we shall refer to as the jury observation fallacy, is that most people answer yes to the observer’s question irrespective of a range of underlying assumptions when in fact the rational answer should be NO.

I will now explain why using a BBN.

Here is a BBN model of the jury fallacy with conditional probabilities derived from home office data and experts. For simplification we assume that the crime has occurred on an island with 10,000 people - hence before we enter any observations into this BBN, the probability that a randomly selected person is guilty of this crime is one in 10,000 or .01 percent as the figures show here are given in percentages.

Suppose now that you know the person has been chargeD and tried. The probability that the person is truly guilty shoots up to 87% and the probability that the person will be found guilty is similar. This latter figure represents the empirical data that 88% of defendants who are tried for serious crimes are actually found guilty.

Now look what happens to the probability of guilt when we enter the observation that the defendant is found innocent. It drops to 7%.

Now if we subsequently find that the defendant has a previous conviction look what happens. Rather than increase the probability of guilt it actually decreases slightly. Hence the jury observation fallacy. In fact, if we find out the defendant had NO previous conviction then the probability of guilt increases slightly.

So what is the explanation for this strange phenomenum?

It’s rather simple. Look at what happened when we entered the innocent verdict. The probability of hard evidence dropped dramatically - a jury can only convict on hard evidence. On the other hand the probability of a previous conviction increased. This means that the person was probably charged on the basis of a previous similar conviction rather than on the existence of hard evidence – this is called explaining away.

So in summary, the knowledge that the person had a previous similar conviction should actually make you MORE not less convinced in the correctness of a jury’s "not guilty" decision; the rational answer to the observer’s question is NO. The public should feel more uncomfortable (and the jury more justified in weeping) if they subsequently discovered that the defendant had no previous similar conviction.

I began by identifying the inadequacies of traditional statistical approaches when it came to risk assessment - this was a problem that we confronted in our work on software assessmentI showed that causal models using BBNs have many advantages over the classical approaches and provide a realistic alternative. BBNs provide explicit modelling of cause-effect relationships and uncertainty. They enable us to incorporate expert judgement as well as empirical data, and to combine many diverse types of information. BBNs also make explicit those assumptions that were previously hidden in decision making process - hence they provide visibility and auditability which are crucial when it comes to decision making for critical systems of all types.

Our current and planned research on BBNs will focus on the problems of scalability and on new applications domains.

On scalability our aim in the SCULLY project is to move to the next stage of evolution where we are trying to provide ways of automating more our methods for building large nets

We also plan to extend the work we have started on incorporating BBNs with other mult-criteria dceison aids.

In the SIMP project, along with BAE Systems we are going to use BBNs to help risk analysis and decision making in major systems integration projects, such as new aircraft carriers and submarines.

In terms of applications we are developing models that can learn users preferences in order to make intelligent recommendations for such areas as digital TV and internet shopping.

We also believe that BBNs have an important role to play in modelling operational risk in diverse organisations, including especially the financial institutions.

During this talk I have mentioned a number of people who have had a major influence on my career. I would also now like to acknowledge:

Professor Smith the Principal and Professor MacCullum for making my decision to come to Queen Mary such a simple one.
Susan Hemp for organising the formalities for the inaugural
Joan Hunter, Sue White and the rest of the Computer Science Office staff for ensuring it has been a success
Ed Tranham, Bob Malcolm, Roger Harris, and Tom Anderson for their support over the years.
But most of all I would like to thank my family. Naomi for her constant love and support and for agreeing to smoke only in the garage, and my Mum and Dad Mildred and Ben Fenton who brought me and my brother and sister up in such difficult circumstances.

I would finally like to dedicate this lecture to the memory of Larry Lewis, Naomi's uncle, a wonderful man who tragically died on Sunday and who should have been with us here tonight.