Hi, and welcome back. In this video, we're going to continue studying the relationship between two random variables, X, and Y. In particular, we're going to look at the covariance between X and Y, and also the correlation. Before we get to that, I want to remind you of the example that we studied in the last video, and this was the insurance agency that had an automobile policy and homeowner's policy and various deductibles. You'll recall, we had the joint probability table given for homeowner's policy deductibles 0, 100, and 200, and automobile policy deductibles, 100 and 250. The six probabilities inside that table was the joint probability mass function. It was given by p of x,y, and that's the probability that X equals x and Y equals y. In this case, you'll recall x and y are both discrete. Now, how do we define the covariance, because it's really nice to know about the probabilities, but sometimes we want just a single number to tell us about the relationship between X and Y. When X and Y are not independent, we want to assess how strongly they're related to each other. What we're going to do is define what's called the covariance. The covariance of X and Y is the expected value of X minus the expected value of x times Y minus the expected value of y. You'll recall the expected value of x and the expected value of y. These are just numbers. We're taking the expected, how much do we expect x to vary from its mean times, how much we expect y to vary from its mean. There's multiple ways. This is the definition, but there's multiple ways to write this. I could write it as X minus mu sub x, Y minus mu sub y, where mu sub x and mu sub y are the means of X and Y. Now, if X and Y are both discrete, how can we write this? Well, let's think about this. We have an expected value out here on the outside. When we have an expected value, we have to sum over all the possible values of the X and the Y on the inside of that expected value. We can write it as the sum over x, the sum over y, we take all the possible values that X can take on. That would be x minus mu sub X, and all the possible values Y can take on, and that would be Y minus mu sub Y. Then we want the probability that capital X equals x and capital Y equals y. We just happened to have that. That's the joint probability mass function. As I said, this is for when X and Y are discrete. Now, what if X and Y are continuous? Well, instead of summations, we're going to do integrals. We integrate over all possible values for x and y. We have the same thing, x minus mu sub X, y minus mu sub Y. Then we need the joint density function for x and y, and then we have dx, dy. This, you recall is the joint density function. This is for continuous random variables. Let's talk a little bit now about what this really means. I've written down the definition again here. We have the discrete case up here and the continuous case second. What I want you to observe is that, and I think it's particularly clear with the discrete random variables, the covariance depends on both the set of possible pairs. The pairs x,y, and the probability for those pairs. High probabilities, will wait this product more highly. Let's look at how this might work. Suppose we're collecting data, and suppose the mu_x happens to be there, and maybe our mu_y happens to be there. We'll put this as our y-axis, and this as our x-axis. Now, if x is bigger than mu_x, it's over on this side to the right of the mu_x, and if y is also bigger than mu_y, then I'm going to get values up here somewhere. Those values, x is going to be greater than mu , y is going to be greater than mu , and so this product will be positive, and the probability X equals x, Y equals y, that's also positive. Similarly, if I have values down here, where x is less than the mean, and y is less than its mean, then this number here is going to be negative, so x minus mu_x will be negative, y minus mu_y for values down in here will also be negative. We've got a negative times a negative, and that is a positive times another positive, and maybe we have a few values over in here as well. But what we've got here is the covariance of X and Y is positive. Now, what about the other way we could have this? Same idea, let's put mu_x here and mu_y there. Now, what if we have values like this? When our x values are greater than mu , our y values are less than their mean. We're going to get points over in here, and for points up in this quadrant, when x is less than its mean and the y values are greater than its mean, then we have a negative here times a positive, so that's going to be a negative. A system that looks like this is going to have the covariance of X and Y being negative, so there's a negative relationship between x and y, and a few values over here won't matter at all. Then down here, a third option. Here's our mu_x, here's our mu_y. What if the points are just randomly scattered around, so there's no relationship between x and y. In this situation, we say the covariance of X and Y will be about zero. Here this is written out. If both variables tend to deviate in the same direction, either they both go above their means or below the means at the same time, then the covariance is positive. If the opposite is true, the covariance will be negative, and if X and Y are not strongly linearly related, the covariance will be near zero. Here's an aside. It is possible to have a strong relationship between x and y, and still have covariance close to zero. I want to give you just a visual example of how that might look, so for example, here's our axis again. Let's put our mu_x here it doesn't matter, and we can put our mu_y here. What if we had values? Some of the values were here, some down here, some down here, some up there, so there's a very strong relationship between x and y, except that it's a quadratic relationship, not a linear relationship. The covariance is still going to be approximately zero, because you can see values in all four of the quadrants. When you put all that together in the definition for covariance, you're going to get something close to zero, but there is a strong relationship between x and y, it's just not a linear relationship. Let's go back to our example for the homeowners and automobile policies, and I want to actually calculate the covariance. Just to remind ourselves, this is the definition x and y, we're going to sum over all of x, sum over all of y, x minus the mean, y minus its mean, and the joint probability mass function. What do we have to calculate? First we have to calculate the mean of x, and that's, you'll recall the sum of over all possible Xs, probability of X equaling x. In this case, there's only two possibilities, 100 times 0.5 plus 250 times 0.5. We calculated those probabilities before, and that turns out to be 175 for the mean of x. For y, we do the same thing. Probability that Y equals y. Summing over all the possibilities, we have 0 times 0.25 plus a 100 times 0.25 plus 200 times 0.5, and that turns out to be 125. Now, what we really have to do is make a table of values, and we can do x and y, and then we can do another column, X minus Mu sub x. You would of course do this on the computer, y minus Mu sub y, and then X equaling x, Y equaling y. You can fill in those values, so a 100, 0, this would be, so X of a 100 minus its mean. This would give us minus 75, y minus its mean, this would give us minus 125, and you'll notice from the table up here, the probability that x equals 100 and y equals 0 is 0.2. When we multiply this together, you can see we've got a negative number down here times a negative number. That's going to give us a positive contribution to the covariance calculation, and then we're multiplying that by 0.2. I'll just write another one down, 250 and 0, this will be 75, this will be minus 125, and the probability here is 0.05. Again, this one's positive, this one's negative. This will be a negative contribution to the covariance, but the probability is so much less than the 0.2, and you just keep going down. You'd have 6 entries. You try it, and you should get the covariance between X and Y is 1,875, and now here's the question. Is this a strong relationship between X and Y? It seems like a really big number, but it's hard to say because all the numbers for without policy, for the mean, the policies, they're all big numbers. Is 1,875 big, and the answer lies in what's called the correlation coefficient, and we'll get to that calculation in just a moment. But before we get to correlation, there are some other ideas I want to focus on first for covariance. There's a computational formula for the covariance. What I want you to do is recall the computational formula for the variance. The variance for just a single random variable is the expected value of X minus the mean quantity squared. But we calculated a computational formula and that was the expected value of X squared minus the expected value of X quantity squared. This computational formula for covariance has very much the same look and feel as the computational formula for variance, and I just want to see how we would get this. The covariance of X and Y, this is defined to be the expected value of X minus the expected value of X times Y minus the expected value of Y, like that. If we multiply everything out, we get XY minus Y times the expected value of X minus X times the expected value of Y plus the expected value of X, times the expected value of Y. Now, the expected value is a linear function. I can take the expected value of each one of these four terms. We would get the expected value of XY minus the expected value of Y times the expected value of X minus the expected value of X times the expected value of Y, plus the expected value of X and the expected value of Y. Now, here's the thing to remember. The expected value here, here, here, and here. Those are just constants. They can factor out of the expected value itself. If we factor these out, these last three terms, become the expected value of X times the expected value of Y, all three are exactly the same. So this negative will cancel with the positive, and so we are left with the expected value of Y minus the expected value of X times the expected value of Y. That can be an easier computational formula than using the definition all the time. Another question we really should be thinking about is what if X and Y are independent? If X and Y are independent, and I'm going to illustrate this just for the discrete case. The continuous case is similar. But if X and Y are independent, we know that the joint probability mass function is equal to the product of the individual probabilities. So the probability of X equaling x times the probability of Y equaling y, and that's for all possible values x and y. Now, when we go to the covariance calculation, we had the sum of all of x and the sum of all of y, we have x minus Mu sub X and y minus Mu sub Y times the probability of the joint probability mass function. This is if X and Y are independent, this becomes X minus Mu sub x, y minus Mu sub Y times the probability of X equaling x, times the probability of Y equaling y. Now, we can separate this into two separate summations. So we get X minus Mu sub x times the probability of X equaling x, that whole thing times the sum over y, y minus Mu sub Y times the probability of Y equaling y. Now, what do we get here? Well, we can separate this first summation into two, like this, and this one, I won't write, but that's similar. I want you to notice this is the expected value of X, this is one, and so we end up getting the expected value of X here minus the expected value of X here, so we end up getting zero. If X and Y are independent, the covariance of X and Y is zero. As we saw with that quadratic relationship between X and Y, this statement does not go the other way. So if covariance of X and Y is equal 0, we cannot conclude X and Y are independent. We just don't know. We have a few more useful formulas. The first one is that if we have the sum of two random variables, and a and b are constants, so they are scalar multiples of the random variables. The expected value of aX plus bY, that's the same as a times the expected value of X plus b times the expected value of Y. I'm going to let you verify that relationship. The second one, the variance of aX plus bY, we'll do right now. The variance of aX plus bY, that's going to be by definition, it's the expected value of aX plus bY minus the expected value of aX plus bY and it's a variance so we're squaring that. Now, we get to use the first formula, so we get aX plus bY minus a times the expected value of X minus b times the expected value of Y, and this whole thing is squared. Let's rewrite this one more time. Just to see what's going on. I'm going to group the X things, and so we'll get a times X minus the expected value of X plus b times Y minus the expected value of Y, and now that whole thing is squared. I haven't changed anything or canceled anything, I've just grouped terms in a little bit different way. Now what I want to do, is I want to square this whole thing out. We've got two terms, we've got this value of X term multiplied by the a, and the b times the Y minus the expected value of Y. What we'll get is a squared, that's going to come out, we're going to get the expected value of X minus the expected value of X quantity squared plus b squared expected value of Y minus the expected value of Y squared. This first term, X minus the expected value of X squared, that corresponds to this piece being squared, and a is squared and it's a constant, so it comes out in front of the expected value. Similarly with this piece. Then, we have the cross terms, and that's going to be, 2ab be the expected value of X minus the expected value of X, Y minus the expected value of Y. Hopefully you recognize this, a squared variance of X for the first piece, plus b squared variance of Y for the second, plus 2ab, and this is the definition for the covariance of X and Y. We've got a lot of useful relationships involving variance, expected value, and covariance, and it's going to take just a little bit of practice to get comfortable with those. I do have one more definition for you now, and that is the correlation coefficient of X and Y. We usually denote it by Cor, C-O-R- X, Y, or usually it's denoted by Rho with a subscript of X, Y. How is that defined? Rho of X, Y is defined as the covariance of X and Y divided by the standard deviation of X times the standard deviation of Y. This is actually going to represent a scaled covariance, that's what it is, the correlation is a scaled covariance because it is the covariance and we're scaling it by the Sigma X and the Sigma Y. Now, it's not so easy to show, it can be shown; we won't do it here, but it can be shown that the correlation is always a number between minus 1 and 1 always. As you work with the correlation coefficient here and in future courses, you'll see the correlation can be a very, very big number, but then when we scale it by the standard deviation, we're always going to get a number between minus 1 and 1. We have two special cases that we should look at. Let me write down the definition. Rho X, Y is the covariance X and Y divided by the standard deviation of X times the standard deviation of Y. Now, if X and Y are independent, we saw a few slides ago that the covariance of X and Y is 0, so the correlation coefficient is also 0. If two random variables are independent, both the covariance and the correlation are 0. What if Y equals aX plus b? What this is saying, Y is a linear function of the random variable X. They're perfectly linearly related, so what we need to do first, is we need to calculate the covariance of X and Y, so that's going to be the covariance of X, and then Y is aX plus b. We end up with, by the definition, we have X minus the expected value of X, and then aX plus b minus the expected value of aX plus b, so that's our Y. You'll notice when we expand the b here and this minus b will cancel, and the a will factor out in front, so we end up with the expected value of X minus the expected value of X quantity squared, which is a times the variance of X. That's the same as a Sigma X squared. The second thing we need is the variance of Y, because we need the variance of Y in order to calculate the standard deviation of Y, so we can put it into the covariance formula. Let's figure what that is. It's going to be the expected value of Y minus the expected value of Y squared. If we put in our ax plus b minus the expected value of aX plus b quantity squared. Expand all that out, you see that you end up with a squared variance of X. Now, what's the standard deviation? This is a squared Sigma x squared. It's also equal to the variance of Y. What this says is that the standard deviation of Y is, we have to be a little bit careful because I want to take the square root of a squared. It's the square root of a squared Sigma x squared. This is going to be the absolute value of a times Sigma x. Sigma x is always a positive number. With that absolute value, when we go to compute the correlation, Rho X, Y, we put in the covariance of X with Y over the standard deviation. We calculated that the covariance of X and Y was a Sigma x squared. In the denominator, we have the standard deviation of x times the absolute value of a times the standard deviation of x. Now, with a over the absolute value of a, we're going to get 1, if our a is greater than zero and minus 1 if our a is less than zero. In other words, if Y is a linear function of X, if a is a positive number, then Y and X are positively related, the slope is positive and we get a correlation coefficient of 1. They're perfectly linearly related. If our a is negative, then our covariance is negative, and X and Y are negatively related and we get a correlation coefficient of minus 1. For our last look at this example, I want to compute the correlation coefficient for x and y for the automobile policy and homeowner's policy example. Earlier, we computed the covariance of X and Y is 1,875. The question is, is that big or not? You should verify the following, the expected value of X, we already calculated that was 175, the expected value of Y is 125, the variance of X turns out to be 75 squared, the variance of Y turns out to be 6,875. If we put all those numbers together, we get Rho, so the correlation coefficient is going to be 1,875 divided by the square root of 75 squared and the square root of 6,875. This turns out to be about 0.3. The correlation coefficient is 0.3. Recall, even though our covariance is large, 1,875, the correlation coefficient is only about 0.3. There is some linear relationship between X and Y, but it's not very strong. In conclusion for this video, we saw the covariance and then we used the covariance to define the correlation, and the correlation measures the strength of the linear relationship between X and Y. If X and Y are independent, then both the covariance and the correlation are zero. But if you compute a correlation or covariance of zero, you cannot conclude that they are independent. In the next module, we'll transition to multiple random variables. We'll see you then.