In the last video, we took a look at the least squares method, and mainly what we did there was to try to describe conceptually and then to visualize least squares as a method for estimating the parameters of a linear regression model and in the next video will actually derive the least square solution. But in this video, our goal is to really look at a few results from linear algebra that will allow us to efficiently derive the least square solution in the next video. So in order to perform least squares, we need to know a bit of linear algebra, and what we'll do here is we'll take a look at a few different results that will help us, then derive the least square solution in the next lesson. So, first, let's start with our setup, which is that we have a matrix X and X is M by n so that's M rows and n columns. We also have a vector V and a Zen by one. And then we have another vector. Why, which is end by one. And so these different quantities will be used throughout these Lembas the first one only involves the Matrix X. So, if we have an X that is m by n then X transpose x will be symmetric. And what that means is that if we take the transpose of X transpose X, we will get X transpose X itself. So remember, symmetry just means that taking the transpose gives you back the same thing. And just as a reminder, when you take the transpose of a matrix, all that you're doing is swapping rows and columns. So the transpose of the Matrix X the first row will turn out to be the first column, the second row of the second column, etc. And so it would be an end by M Matrix. Now, in this limbo, we're working with the transpose of X transpose X.So in order to prove this, we should start out with the transpose of X transpose X and actually take the transpose. Now, when we take the transpose of the product of two matrices, then we will take the transpose of the matrix on the right of the product and put it on the left. So the first thing we'll do is take X transpose. So that's this matrix transposed and put it first. And then we work from right to left while there are only two here. So the next one is to take the transpose of X transpose so that we should write as we'll put X transpose in parentheses. And we take the transpose of that. Now, to simplify this further will notice that for any matrix, if we take the transpose of the transpose, we get the matrix itself back. >> So that means if we swap rows and columns and then swap them again, we get the original matrix. So we should be left with X transpose X, which is exactly showing symmetry. Namely the transpose of X transpose X is equal to x transpose x. So we've proven this first lemma. Now the next result starts out with an equation. Why is equal to the matrix? X times, the vector V And let's just make sure that the dimensions here work out. So why we said is M by one, X is a matrix. That's m by n and V is n by one. So remember, if we're multiplying two matrices together or matrix and vector together we need the inner dimensions to match, and the result will be of the size of the outer dimensions. So we do have the right dimensions here. So this result says that if we suppose why is equal to X times V then if we took the derivative of why with respect to the vector V, we'll just get the matrix x out. And similarly, if we took the derivative of why with respect to the vector V sorry, if we took the derivative of why transposed with respect to the vector V, we would get out X transpose. Now, I won't do a full rigorous proof here, but I think the idea for the proof is as follows. So the idea here for this proof would be to write out the ice equation, In this system of linear equations, so the length equation would be y I is equal to We can call the first coefficient from the X matrix. We'll call it, X I one. So on the ice equation, that's the first coefficient times the first component of V plus X, I two. >> So that's the second coefficient in the ice equation times the two and then we'd continue all the way through X I And since there are n variables and column stacks times the n. Now, this is just one equation in the system of linear equations. >> Now, think about what you would get out if you took all of the partial derivatives here while if you took the derivative with respect to V one, all of these other terms would be zero. And this first term, you'd just be left with the constant so you'd be left with X I one. And then if you moved over to take the derivative with respect of V two, you would just be left with this constant. All of the others would be zero etcetera. >> So you would just get out, The Constance And then if you iterated through I right where I ranges from one through, then you would get just the matrix X back. So again, this is the idea for the proof. Maybe you could make it a bit more formal, make a more formal argument. But for our purposes, hopefully this idea gives you a sense of why this is true. And remember our overarching goal is to understand these linear algebra results enough to understand how we derive the least square solution. So at some point in deriving the least square solution will have an equation that looks something like this. And we'll need to take the derivative with respect to a vector. >> And, this is how you do it, right? The derivative with respect to this factor will just be the Matrix X and a very similar proof idea. We'll show you how to obtain this result. >> Mm-hm. Now our third lemma is about a quadratic form. And it says that if we have a C equal to the vector v transposed times the matrix X transpose x times, the vector V then there's a way to take the derivative of C with respect to V and get two times your matrix Times V. Now again, let's just do a quick dimension analysis. >> If X is m by n then this year is n by m. This is M by n so we're left with a square matrix of size N by n. And then via is a vector of size N by one. And so that's good these inner dimensions match. We need that in order for this to be a well formed formula, and then V transpose will be a one by an so again inner dimensions match. So this is all well formed and notice what we get out. We get outta one by one, so that means C is one by one. And that's just a scaler, right? Just a constant. So this lemma says that if we take the derivative of C with respect to the right, the vector V, then we should get two times the matrix X transpose x times V. Now this third lemma we won't prove, but I'll leave it to you as a problem that maybe you could think a bit about and try to prove on your own. And now that we have some important linear algebra results, let's move on to the next video where we'll derive the least square solution