This is a video to accompany the matrix factorization assignment. Now I know many of you are excited to get right into matrix factorization techniques, but for some of you that may be a big jump. So, we'll start with an easier step. Let's start with a riddle. So, what do you get when you cross a mosquito with a mountain climber? You got me. You can't cross a vector with a scalar. If you don't like the vector and scalar humor, you may not like matrix factorization, but we are going to make it easy for you. So we have already taken a ratings matrix and factored it and we're going to give you a spread sheet that has everything pre-computed, except the final answers. So that you can play around it and actually experience what goes on in that spread sheet when you look at the individual factors, the weights associated with those factors across the model and the weights that each movie and that each user has for that factor. So, let's dive in. I have a spreadsheet in front of me and it actually has two sheets. The first sheet has two sets of data in it. If you look across the top, you'll see numbers. These are 15 latent factors that come from factoring a movie rating of matrix. If you look down the side, you're going to see movie IDs. Starting in row 4 and going way down to row 103. These are a hundred movies that we've put in here. Mostly popular movies you may have heard up like Star Wars or American Beauty, some you may not have heard of. What matters is there's a wide variety of movies in here with widely different expressions of taste. This second row represents the weights of these individual latent factors. These are the singular values from the singular value decomposition that we computed and I'm going to take you back to the document that you're going to see which points out that our way of scoring is going to be the user wait for the feature, the degree to which the user's taste is matched by a feature times the overall weight times the item weight for the feature. Just to make sure this is really clear. Anytime I want to know how much a user likes a particular item, I'm going to need this weight times how much this feature, let's say is expressed in Forrest Gump. In this case, minus 0.15 and how much this feature is expressed in the user preference? And that can range from very negative to the very positive, so that jumps ahead to the other items in this spreadsheet. The 1,500 numbers in the middle are the coefficient, the weights that describe this movie in terms of the 15 dimensions of taste. So I'm just going to go across the top and show that whatever dimension one is, Star Wars is somewhat negative in that dimension. If I keep going, dimension six, whatever that is not as important overall, but still important. Star Wars is very heavily in this dimension with 0.27. Now it would be a little misleading to compare these across columns, because these weights carry within them the fact that the singular value already has some scaling in it. We're not going to ask you to compare them across columns. Similarly, if I go to my users, again, for all 15 dimensions, I can find a coefficient that describes each and every user. So, here's what we're going to ask you to do and we'll have you submit these in the same quiz forms that we've been using for these assignments in the past. We're going to have you go through certain items and certain or I should say, certain taste dimensions and assess which items are most representative of that taste. Just some basic sorting in the spreadsheet is the easiest way to do that. When you sort, I will remind you to be careful to be sure to sort the whole set of these hundred rows, but not the rows that are here. And if you do that, you can sort them according to a particular column and get back the value that you want in order. The second thing we're going to ask you to do is to computer a set of predictions. For users that are listed here, we're not telling you which users, because we're going to change that over time, but you'll see it in the assignment. And so let's just say at random, I said, user 4768. We're going to ask you to predict how much this user likes each movie of the hundred and then come back, and tell us what are the top five viewed movies or the top three movies for this user. Again, when you're doing that, you're multiplying this value times the weight times the corresponding value for the movie and then adding up all 15 of those. And so, you'll probably find the function sum product to be quite useful. One last note and you don't need this to do the assignment, but it's something you may want to think about if you want to play around in the spreadsheet. In this representation, we're pulling out the singular value or the weight and making that explicit. There is another way that this is commonly be done and that involves taking this number say, 203. Computing the square root and multiplying all of the weights in both the movies, and the users by the square root. You can see that if I do that, the products are going to be the same, but then I don't need that third set of numbers hanging around and it sometimes makes the numbers just a little bit more comparable across dimensions. If you want to play around with that, this matrix is here for you. It's based on real data, you're more than welcome to explore it. Good luck with the assignment. If you have questions, you know you'll find help in the forums.