This is a introduction to your User-User Collaborative Filtering assignment. In this video, I'm going to introduce to your assignment, give you a few tips, and answer a few questions that we have gotten from people who have done the assignment already. So, you will see that there is a document that outlines the assignment in detail, and it will walk you through the steps. At the high level, you're going to compute User-User Collaborative Filtering in a spreadsheet. First, without normalization, and then adding normalization and to see how things are different. It will take you through the step of finding our rating spreadsheet. This is a spreadsheet that has in the movie row form, movies as rows and user ID is as columns. And you'll notice it's an incomplete matrix, where some users have rated lots of movies, some users have rated fewer. We also have this in user row form where the users are rows. This would be the transpose. And we've setup for you a place to start computing correlations. The point here is not for you to learn all of the spreadsheet trickery. So, if you look at the formula for the correlations, you'll actually be able to deal with that and take them forward. But what I want you to understand here are the steps you're going to be going through and why. So, you're going to take this matrix, open it up on your favorite spreadsheet program. We've tested it in Google sheet and then Excel, but it should work in any standard spreadsheet. Finish the user correlation matrix. And you can do this by observing the formula we've given you for a sample correlation. Identify the top five neighbors for a couple of target users. For this assignment, we've asked you to user3687 and user89. There's nothing magic about those users, other than the fact that we've pre-programmed the answers. And as you go through this, you're looking for the top five neighbors by positive correlation. We're not counting negative correlations here. One key notion to be careful about, a user will correlate with him or herself perfectly, that doesn't count, so user 3867's best neighbor is not 3867. He doesn't have anything to offer himself, so you're looking for the other people that are close. And we've given you some samples for other data, so you can check your calculations. Then create a new page in your spreadsheet and use it to compute the predictions for each movie for each of these two users. You're going to need a sheet for each user. The way you're going to do that is using a weighted combination of the top five neighbors using the formula we've given you in this course, where we take a weighted sum of the rating times the weight divided by the sum of the five weights. Some of these maybe zeroes if the person didn't design a rating. And we'll assign that as the prediction for each of these. And then you're going to submit the top three movies and predictions for each of the users, very much as you would have done if you've taken the first course in this specialization. You'll be asked to type in the movie ID number. And then the prediction score to three decimal places. You can go further, we'll compute for you. The second part of the assignment is very similar. Except you're going to add in a level of normalization. A couple things to watch out for, and then responses to a couple of common questions. If you're not familiar with spreadsheets, there are a few, nice, handy formulas that you'll find useful. There is a built in function sum product, that adds the product of two, vectors. Those vectors being rows or columns in your spreadsheet. Which can compute, therefore, a dot product. Which is an essential part of what you're computing for this example formula. Second, there's a function some, if that takes a predicate and a range. And says add up all of the ones where something is true in this range. If you now think of the fact that you wanted a denominator, for instance, that is the sum of all the weights where the rating is present. You want a numerator that is the dot product of the ratings and the weights. You have the components to do this computation. The last thing that I want to make sure to emphasize is a trick with spreadsheets that's really important, is to understand the different types of pasting in a spreadsheet. You can paste copying formulas, and you can paste copying only the values that are the results of those formulas. And when you want to take something and have it not change, even if you can go back and recompute on another page, it's important to be able to paste values. This is often done in cases where you've already selected a set of data, and now you're going to go use it. And paste values and operation, you can get in both of the spreadsheets we've talked about by right clicking, or selecting a special paste operation. Okay, one last item. There have been a couple of questions people have raised saying why are you having us compute? Collaborative filtering, a fairy sophisticated algorithm in a tool as primitive as a spreadsheet. And I want to make it clear why we're doing this. And there's really three reasons. Reason number one is that those of you who are comfortable with programming, we're encouraging you to do the honors track. Where you're also going to do this in Java programming using the lens kit tool kit. Reason number two is that computing this in a spreadsheet is about as close as you need to get to working it out by hand, to really appreciate what the calculations are doing. Unlike going in to a statistical programming tool, like R or a mathematical tool like MATLAB or Maple. By doing this in a spreadsheet, you forced yourself to see intermediate representations. So, you understand where the results are coming from. And a big part of what we want you to develop is the intuition on how this algorithm work. The third part is your way out. If you've done this, you've got the feel for how the algorithms work, and you say yeah, that's really nice, but I really want to try this in R, or I want to try this in MATLAB or Mathematica, or some other tool. You're welcome to do that. Make sure that your results on the sample data match the results that we have, and your results will almost certainly match for the test data as well. And when you submit it, we're not asking you for the spreadsheets of it. If you want to compute it your own way, you can submit the answers your own way. But we do encourage you to try at least one of them with the spreadsheet to make sure you understand step by step where the correlations. Where the normalization, where the selection of neighborhoods comes together to form this type of prediction, and later, when we talk about recommendation where the recommendations come from. So with that, we'll send you off. You have instructions. The answer sheet is formatted in the form of a quiz where you can paste all of your answers as you're done. Good luck, and we look forward to hearing from you.