Sunday 18 September 2011

Bayes' theorem - examples

The Cancer Test Problem

Remember Bayes' theorem:


P(A|B) = P(B|A) * P(A)/P(B)


The following problem is a very famous case of how to use the theorem.

We have a cancer-detecting test which gives which gives a positive result for 90% of people who do have the cancer, but also gives a positive result for 10% of people who don’t actually have the cancer. A patient comes in and gets a positive result. How worried should they be? (In other words, what is the chance that they do actually have cancer?)

Well in fact, Bayes' theorem tells us that we don’t actually have all the information necessary to answer this question.

Indeed, let’s set:
Event A = person has cancer
Event B = test is positive

We are looking for P(A|B). We know P(B|A) (it’s 90%) and P(B|nonA) (it’s 10%). Knowing P(B|nonA) is actually as much information as knowing P(B), because we can calculate P(B) from this and P(A) using the total probability law – we’ll see this later. So we still need to know P(A).

Suppose the question now is: 2% of people have this cancer. We have a cancer-detecting test which gives which gives a positive result for 90% of people who do have the cancer, but also gives a positive result for 10% of people who don’t actually have the cancer. A patient comes in and gets a positive result. How worried should they be? (In other words, what is the chance that they do actually have cancer?)

A lot of doctors were asked this question. Only 15% of them got it right (this article cites a few studies for this result – it’s also a very interesting and entertaining read). They generally estimated the chance that the person did indeed have the cancer to be very high, close to 90%.

But what is the correct number?

A more intuitive way of thinking about this problem is the following:

Take a pool of 1,000 people.
  
20 of them have cancer
    18 of these will have a positive reading on the test
980 of them do not have cancer.
    98 of these will have a positive reading on the test

So in total, 116 people will have a positive reading on the test, and 18 of these will actually have cancer. So the probability that a person with a positive reading does actually have cancer is 18/116 = 15.5% which is still relatively low. So it would be a far better approach to understand the maths involved in this, and not freak your patient out without good reason.

Let’s now calculate the probability using Bayes theorem – hopefully, we will get the same result.

Our first step is to calculate P(B). We do this using the law of total probability:

P(B) = P(B|A)P(A) + P(B|nonA)P(nonA)
        = 0.9*0.02 + 0.1*0.98
        = 0.116

Now Bayes’theorem: P(A|B) = P(B|A)P(A)/P(B) = 0.9*0.02/0.116 = 15.5% (yes!!)

This example question is detailed in this video, which also presents some other very interesting counterintive statistical issues. The part we’re interested in is at 11 minutes, but the entire thing is worth a watch.


The Prosecutor's Fallacy

Bayes’ theorem often applies when considering the probability that a person is guilty of a crime. Indeed, misunderstanding it leads to what is called the prosecutor’s fallacy – when you interpret the small probability of someone fitting the evidence as a small probability that an accused who does fit the evidence is in fact innocent.

Let’s consider the following case:

In a murder case you have found a sample of the murderer’s DNA, and there is a 0.1% chance of a random someone’s DNA matching this sample. You have found a man whose DNA does match.

Then the correct interpretation is NOT there is 0.1% chance that this man is not the murderer, ie there is 99.9% chance that he is the murderer.

Bayes’ theorem tells us that in order to calculate this last probability – the probability that the man is guilty, given that he matches the DNA, one also needs to take into account the probability of a random person being a murderer, which is extremely low, say it is 0.01%.

Let’s use the following notation:

Event A = The man is guilty
Event B = The man’s DNA matches the one found

Then we have
P(B) = 0.1%
P(A) = 0.01%

The probability we are interested in is P(A|B): the probability that the man is guilty, given that his DNA matches the killer’s.

Bayes’ theorem gives us that P(A|B) = P(B|A)*P(A)/P(B)

Now P(B|A) is the probability that the man’s DNA would match the killer’s, if he is indeed the killer. Which should be pretty close to 1, if your DNA testing is any good! P(A) and P(B) are given above, so in the end:

P(A|B) = 10%

Most definitely not a cause for putting someone in prison!

Obviously, in actual cases, this is not the only thing to take into account. If the man’s DNA matches the killer, and he also matches a description of the killer, and has no alibi, and some shoes were found in his house covered in blood, the odds would change somewhat.

No comments:

Post a Comment