Hi, and welcome to module 2. In this module we're going to continue to extend and expand our foundations in probability. We're going to first begin with the definition and concept of conditional probability. We're going to then use it to define base theorem. In future videos, we'll look at concepts of independence and mutually exclusive events, and finally, we'll see the relationship between conditional and independent events in a statistical experiment. So let's begin with conditional probability. Suppose we have two events A and B from the same sample space S. We want to calculate the probability of event A, knowing that event B has occurred, so B is called the conditioning event. And we'll have a special notation, we'll have the probability of A given B. So given knowledge that B has occurred, what's the probability of A? And let's start with an example to sort of illustrate this. So let's roll our six sided dice twice. We've looked at this sample space before, we know it has 36 elements, and each of those 36 elements is equally likely. So let's let A be the event that at least one of the two dice shows a 3. So we have (3,1) (3,2) and so on. We have 11 events from S, so the probability of A is 11 over 36. Now let B be the event that the sum of the two dice is 9. So B is these four events, and the probability is 4 over 36. So here's the question, suppose you know that B has occurred, you know you got a 9. What's the chance that event A happened? In other words, we know probability of A all by itself with no other information, has probability 11 over 36. But now I'm giving you additional information, I'm telling you that B the event that the sum of the two dice is 9, has occurred. How does that change the probability of event A? Well let's illustrate, before we get into the calculations, let's see if we can figure it out just intuitively from a Venn diagram. So event B is that the sum of the two dice is 9. So this would be the two events (3,6) (6,3), those are the two events that are in B and also in A. And then the two events (4,5) and (5,4) are in B but not in A. And then there's all the other events (3,1), (1,3) that are just in A but not in B etc. All right, so now I tell you that B has occurred, you know B has occurred. So then what's the chance that A will occur? Well, the only possibility for A to occur, is if A intersect B is not empty. And so we know the probability of A intersect B, I just wrote it down, it's going to be (3,6) and (6,3), and that's going to be 2 over 36. So just the probability of A intersect B is 2 over 36. Now, with the additional information, because I know B has occurred, it's the new relevant sample space. So A will occur, if A intersects B occurs, and then I'm going to normalize by the probability of B. And so this will give us 2 over 36 divided by 4 over 36, so we end up with a half. And that should make sense, given the Venn diagram that we have here. We know B has occurred, there's four events, they're all equally likely and two of those events are in set A. So we get the 2 over 36 dividing by the 4 over 36 and one half is our probability. That should make sense for changing our probability for event A given the new knowledge of event B occurring. Now we can use this idea in our formal definition. So you'll see the probability of A given B, is the probability of A intersect B divided by probability of B. And I put this condition probability of B greater than 0, because this doesn't make sense if probability of B equal 0. Now this leads to what's called the multiplication rule. We can just multiply the probability of B up, and so we get the probability of B times the probability of A given B, is the probability of A intersect B. And this is completely symmetric. So we could have written down the probability of B given A as the probability of A intersect B, over the probability of A. And so I could have written this as the probability of A times the probability of B given A. And then I'm going to combine these two statements and I have the probability of A given B here, that's going to equal, just take this probability of B and divide it over on the other side. And that's called Bayes' theorem. And so sometimes it's easier to calculate the probability of B given A. And we can use this information to calculate the probability of A given B. There's a few more ideas that I want to cover, and the first one is called the law of total probability. And these are all related. All these ideas are related with the underlying theme of conditional probability. So suppose I have two events A and B from the same sample space, and I've just drawn them in here, and B can be written as the part of B that intersects with A. And so that is this region right here, and I can union that with the part of B that intersects A compliment. Because A compliment is everything outside of A. And because these two regions are disjoint from each other, I can write the probability of B is the probability of B intersecting A. Plus the probability of B intersecting a compliment. So those two probabilities added together gives you the entire probability of B. And using the definition of conditioning, I can just write the probability of being intersect A. Is the probability of B given A times the probability of A. And then the similarly for the probability of B intersecting a compliment. And this is called the law of total probability. And it looks like a very confusing way to calculate the probability of B. However, there are times and examples that we'll see where it's easier to calculate the conditional probability of B given A. Or B given a compliment and figure this out in order to get the probability of B. And we'll see that in a moment, I can extend this law of total probability to N sets. And I want those N sets, I want more than this where A1 through a 2 up to AN. Are mutually exclusive. And what that means is that AI intersect AJ. Is the empty set for all I. And J. I not equal to J. So all of the A's are mutually exclusive. Now we saw that in the previous slide where A. And a compliment were mutually exclusive and now I'm extending this to A1 through AN. So I need more than just this. I actually need mutually exclusive. The other thing we need is that the union of the A's are all equal to S. All right. So then what we can do, let me just illustrate this with four. So let me just say maybe that's A1 and maybe this is a 2 and maybe this is A3 and A4. So in this case just illustrating it we could write the probability of B. Is the probability of B intersect A1 plus the probability of B intersect A2 Plus the probability of B intersect A3 plus the probability of B intersect A4. And of course this is equal to zero because there is no B. I forgot there's B. There is no B intersecting with A4. So that has probability zero. But all the others have non zero intersections with the A. Sub I and the set B. And so we can rewrite this. Then as the probability of B given A1 times the probability of A1 plus the probability of B given A2 times the probability of A2 Plus the probability of B given A3 times the probability of A3. And this is exactly what we saw on the previous slide. Just with A and a compliment. And here I've written it with N elements. All right. That is called the law of total probability. Now let's illustrate these ideas with an example. All right. Suppose your company has developed a new test for a disease. So let event A, B. The event that a randomly selected individual has the disease. No other symptoms. You just pick them out and you test them. So if you know that one in 1000 people has the disease, then the probability of your randomly selected person having that disease is .001. Now let B be the event that a positive test result is received for that randomly selected individual. Your company collects data on their new test and finds all this information. So the probability of a positive test results. So we should we should write this out. So B given A. The probability of B given A. Is the probability of a positive. So B as a positive test result, given that person has the disease and that is .99 So if the person has a disease, you will get a positive test result with probability .99 This one the words mean. So this is going to be the probability B complement is a negative test given the person has the disease. And then this one is the probability of a positive test result given person doesn't have disease. And so your company calculates all this information and now what you want to do is you want to calculate this one. So the probability that the person has the disease given a positive test result and that's different from this one of course, this is you know, the person has the disease, this is the conditioning part. You know the person has the disease. And you want to calculate the probability that you'll actually get a positive test result back. This one is, I got a positive test result back now, what's the probability that the person really does have the disease. And so you can sort of see how conditioning and knowing one thing or another can change your probabilities. So let us calculate this information. So the probability of A given B. Is the probability of A intersect B divided by the probability of B. And we're going to use our definition of conditioning. This is going to be the probability of B given A times the probability of A. Over the probability of B. This is base theorem. And now we're going to use the law of total probability in the denominator and I'll just write that out. And so we get the probability of B given A times the probability of A. Plus the probability of B. Given a compliment times the probability of a compliment and this is the law of total probability and now we know most of these probabilities. So we know the probability of B given A is .99 and we know the probability of A without any other information is .001. And in the denominator, we've got .99 times .001 plus 0.2 times .999, so that's the probability of a compliment, you do all that and you get .0472. Now, I want to just talk about this a little bit, so the probability of a is .001, this is called the prior probability of A, so the prior probability is our probability without any information at all. The probability of A given B, we calculated his .0472, this is called the posterior probability of A, it's the probability of a knowing after we have learned that event B has occurred. So the prior probability and the post interior probability and you'll notice if you just think about this. The probability that a randomly selected person just pull somebody off the street has the disease is .001. Now, you take that randomly selected person, you administer the test and the test comes back positive, so the probability of having the disease given the positive test result is .0472. So it has increased a tremendous amount, but it's still less than 5%, now in general, if someone goes to the doctors, they'll have other symptoms which will raise this probability even higher. So if I had additional symptoms and I put that in here as part of the conditioning, then the probability of having the disease would go up. All right, I want to illustrate this one more time, and I want to illustrate it with what's called a tree diagram, I think this can also help us understand the information. So here's our prior, the probability of A, so I started this note, we pick our random person, they have the disease with probability .01 and they don't have the disease with probability .999. And so, there's only two events now, what happens next, so I'm on this tree, so the probability of B given A is .99 and the probability of B compliment given A is .01. So we're on this branch and so conditioned on being on this branch, the probability of b given a is .99 and B compliment given A is .01. And notice, there's only two possibilities on this lower branch, this is the probability of B given a compliment and this is .02 and the probability of B given A compliment is .98. Okay, a few things to notice when you're working with a tree diagram on any branch, these two branches are going to add to one same thing here. And it's also possible with more complicated experiments that you'll have multiple branches, so you could have 234 branches. Another thing to notice is if you multiply along a branch, so the probability of a times the probability of B given A, that's going to be the probability of a intersect B. And we already calculated that in the previous slide using conditioning, also this branch multiplying here A times the probability of B compliment given A. That will be the probability of A intersect, B compliment, multiply here, you'll get the probability of a compliment intersecting B. And here you get the probability of a compliment intersecting B compliment, these are the only four branches that are possible in this particular experiment. So just to remind ourselves, this turned out to be .0099 like that, this turned out to be point 00001, this one Turns out to be .01998 and this one is .97902. And if you add all of this, it sums to one, all right, another way we can think about this, so I'm just trying to give you lots of different ways to think about it. So you see how everything is connected together, what is our sample space in this example, so our sample space S, we have two things that can happen. D is the disease and plus is equal to a positive test result and minus is equal to a negative test result and there's only four possibilities, the person has the disease. And they get a positive test result or they do have the disease and they get a negative test result, this is a negative on the disease, so N equals no disease. And they get a positive test result and N and a minus, those are the four possible events, and if we match them up to what we already just calculated, so A intersect B is D and a plus. So this is the event from the sample space of a D and a plus and that's an intersection, not a conditioning, this one is a D and a minus. So they have the disease and they come back with a negative test result, this one is no disease. And they come back with a positive test result and this event is no disease and they come back with a negative test result and again, keep in mind. This is for a randomly selected person just pulled off the street when you add in other knowledge, additional knowledge, perhaps other symptoms than these probabilities. All change, all right, hopefully this gives you some comfort with calculating with conditional probabilities, matching the events up to sample spaces. And also some idea of how to use the law of total probability, in the next video, will work on independent events, will see you then.