So in today's lecture, we're going to introduce the NumPy library, which is a very useful library in Python for manipulating array data. And we're going to see how we can use this library to perform simple matrix operations on some of our datasets. So the first thing we'll do is load our data. In this case, we'll look at the Yelp data. And what we'll do and this code is just load the first 50,000 lines. You might have noticed this data set is a fairly large one. If you try loading the whole thing, it's going to be fairly slow. So to play with it today, we'll just look at the first 50,000 lines around the json.loads function, on each of the first 50,000 lines and appending them all to a dataset. I just look at the first entry in the data center to remind you what kind of fields it contains. It contains things like the number of cool, funny useful votes this review has received, as well as the star rating in the review itself. Okay, so we've seen previously how we can extract some basic statistics from that dataset. So let's first extract a few numerical features. In this case, we'll take the list of ratings all reviews have received, the list of cool votes they've all received, the list of funny votes, and the list of useful votes. So all we're doing here is building four lists. Each is going to be of length 50,000 containing the star ratings, the cool votes, the funny votes, and the useful votes. So for the moment, these are just lists, and so far we might be able to compute simple statistics on those lists. But it might be easier if we first convert them to NumPy arrays and use some of the routines that NumPy provides. So to convert them to arrays, all we use is the numpy.array constructor on these four lists. And that's going to give us four NumPy arrays containing the same data. Okay, so for the most part, these NumPy arrays are fairly similar to regular Python arrays, but they're going to support a variety of additional operations like statistical operations. So for example, to compute the mean, that's now very straightforward. We can use the built-in mean operator in NumPy on this ratings array. It will give us the mean. Var will give us the variance. So we also have more complex operators on arrays like composition operators. If we wanted to build an ND-array we could use this numpy.stack operator. It would take a list of arrays as input. In this case, we'll give it the list of the cool, funny, and useful arrays. And it will build, then, a single 2D array which is going to have three rows here and 50,000 columns. Which is now an array containing all of the cool, funny, and useful ratings. Okay, so once we have this array, we'll be able to then use it to perform other matrix operations, like computing the transpose, etc. So imagine we wanted to build a feature matrix for later use, we would do something like the following. We would run the stack operation. And then, we can go and compute its transpose by doing the dot T operator on the array. That will take us from having a 3 by 50,000 array to having a 50,000 by 3 array. Okay, so let's next look at the matrix type. Note so far that with the array type, most of the operations will be overloaded to elementwise operations. But for many linear algebra routines, it will be more convenient to use the matrix type where they'll be overload the regular matrix operations. So if we can vote it out numpy.array of features to a numpy.matrix of features, we could then run operations like standard matrix multiplication. So we could get features.T, the transpose of features multiplied by features would be a matrix matrix operation. So that would take a 3 by 50,000 dimensional matrix and multiply it by a 50,000 by 3 dimensional matrix to give us a 3 by 3 output. We can also do things like the matrix inverse. So we'd use numpy.linalg.inv to convert the inverse of the above 3 by 3 matrix, Finally, NumPy's going to overload primitive operations on matrices, allowing them to be used within complex mathematical expressions so it can perform simple transformations of our data, like the following. Here, we see in line 19, we can run the numpy.sin operation of our features. That's going to element-wise apply the sin function to every entry in our features, we can add three to those that will elementwise, add 3 to everything. We can multiply by 2, that will elementwise, multiply everything by 2. We can also perform comparisons, so we can see which of our elements is greater than 4. And that's going to give us a matrix of true or false values saying which of the elements in that comparison is greater than 4. Okay, so that's just a very brief taste of the kind of operations the NumPy library offers. There are a huge number of others, for example, .shape to get the shape of an array. Reshape, change the dimensions of an array or matrix. Arange to create arrays of ranges of numbers, much like the regular range operation on lists. Numpy.random, which generates arrays of random numbers. Various reduction operations like sum, min, max, which will get the minimum of all the entries in an array, or the sum of all the entries in an array. Eye, which reduces identity matrices. Linear algebra operations, like getting the trace of a matrix, or the eigenvalues and eigenvectors of a matrix. And a whole lot more. So you should look at the NumPy documentation provided in this URL to get an idea of the kind of functions that NumPy provides. Okay, so to summarize today's lecture, we've just briefly introduced the NumPy library, and demonstrated a few of the basic operations for data manipulation. Okay, so on your own, I would suggest trying to read some of the numerical features from the Amazon data into a NumPy array. And then, try computing basic statistics about them, such as the max, min, or average values in those arrays.