I use this blog as a soap box to preach (ahem... to talk :-) about subjects that interest me.

Monday, November 12, 2012

Misunderstood Science: A question of probability

Probability and statistics are very confusing. Most people think they are self evident and consider them easy to handle, at least in everyday’s life. But they are wrong. For starters, how many of you could state the difference between probability and statistics?

No?

OK. Here it is: Probabilities are decided in advance and have to do with predicting outcomes; statistics are concerned with inferring probabilities based on observed outcomes.

For example, if you have a six-faced die and state that each face will come up on average once every six throws, you are talking about probabilities: you estimate probabilities in advance with a mathematical formula and use them to predict what you will get in practice.

If, on the other hand, throw a six-faced die 600 times and count how many times each face comes up in an attempt to determine how probable they are, you are doing statistics. Obviously, statistics cannot ever be an exact science.

For one thing, no die can be perfectly balanced. Even if you started with a perfectly balanced die (and you tell me how you would determine that!), you couldn’t keep it that way, because with each throw imperceptible abrasions would remove tiny particles (perhaps just atoms) from one or more faces. All in all, if you throw any die enough times, you will discover that some faces come up, on average, more often than others.

But even with an ideal, perfectly balanced die (which, I repeat, is a physical impossibility), you cannot expect to get all faces exactly the same number of times. It is theoretically possible but, the higher the number of throws, the less likely it is. If you throw a die, say, 600 times, I bet you a thousand dollars against ten that you will spend the rest of your life trying to get 100 1s, 100 2s, etc. (I’ll settle the matter with your heirs)

How do you calculate a probability? Conceptually, it is simple: The probability of an outcome is given by the number of ways in which you can obtain that outcome divided by the total number of ways in which you can obtain all possible outcomes. That’s why it is easy to estimate that the probability of, say, a 5 when throwing a die is 1/6 (~16.7%), or the probability of head when throwing a coin is 1/2 (50.0%).

FYI, statisticians call the set of all possible outcomes the sample space. This is a bit twisted, because sample is a statistical term, while sample space refers to the calculation of probabilities,
but who says that scientists are always consistent?

Anyhow, the concept of sample space and the above definition of probability lets you answer questions like: what is the probability of getting a 10 if I throw two dice?

The size of the sample space is 36, because you can get 6 possible values with each dice, and they are independent from each other. The possible ways in which you can obtain a 10 are: (4,6), (5,5), and (6,4). As a result, the probability of obtaining a 10 is 3/36 (~ 8.3%). As a comparison, you can obtain a 7 with (1,6), (2,5), (3,4), (4,3), (5,2) and (6,1), which results in a probability of 6/36 (~16.7%).

Everything clear? Let’s check it out with a fun problem.

I place a ten-dollar bill in one of three identical boxes. Then, while you keep your eyes closed, I move them around so that I still know where the money is but you lose track of it. You have to choose one of the boxes; if it is the box with the money, the ten dollars are your. Clearly, you have a probability of 1/3 (or ~33.3%) to win. You make your choice by placing your hand on one of the boxes. But, before you can open your box, I open one of the other two boxes and show you that it is empty. I then ask you whether you want to stick to your original choice or switch to the other box that is still unopened. What do you do and why?

Obviously, you want to maximise the probability of winning. The questions you need to answer are: does it matter whether you keep the box you initially chose or you switch to the other box that is still unopened? And if it does matter, are you more likely to win if you keep the original box or if you swap it for the other one?

The answer seems obvious: there are two boxes and only one contains a reward. As there are no reasons for preferring either box, it is irrelevant which one you choose. They both have a 50/50 chance of being the winning one.

Or not?

Well, ... no. You are better off switching boxes, because the other unopened box is more likely to contain the ten-dollar bill than the one you initially chose.

Surprised? :-) Let’s see...

What is the probability that you chose the winning box? As I already said: 1/3. If you keep the box, you also keep the 33.3% chance of winning.

And what is the probability that the money is not in the box you chose? Obviously, 2/3. But if it isn’t, as I have already opened one of the two other boxes and showed to you that it was empty, you must conclude that the money is in the remaining box. No doubt about that.

In conclusion, if you stick with your original choice, you have 1/3 probability of winning, but if you switch boxes, you have a 2/3 probability of winning. Twice as high!

Where is the trick?

There is no trick. The whole story appears illogical only because of a widespread fallacy incurred by many people when thinking about probabilities. For probabilities to be equally spread among different outcomes, the possible outcomes must be independent from each other. In our game, they initially were independent, but ceased to be so when I opened one of the boxes. This is because I knew that the box was empty. This made the content of the third box no longer independent. If I had opened one of the boxes without knowing whether it was empty or not, the probability of finding the money in either your box or in the third box would have been equally spread at 50/50, as you probably thought.

If you are not convinced, think that if I had opened one of the two boxes without knowing that it was empty, I would have had 1/3 of probability of opening the winning box, exactly the same probability you had when choosing your box. But if that had not happened, and I had opened an empty box without knowing in advance that it was empty, I would have not introduced any dependency, because opening that box would have not said anything about the third box.

Amazing, isn’t it? Martin Gardner once said: in no other branch of mathematics is it so easy for experts to blunder as in probability theory. Imagine for non-experts...

No comments:

Post a Comment