[MUSIC] Hi everyone, in the following several video clips, we'll study Pandas library. Pandas library is a key library in Python. It is important because with Pandas library, you can pre-process the data set for AI analysis and statistical analysis. Pandas stands for panel data analysis. Then you may ask, what is panel data? Panel data is a data set collected from the same subject over time. One example of panel data set is a case in which there are 100 countries and those countries GDP is collected over time. Then, in the column you may assign 100 countries' names and in each row represent GDP of specific year of those countries. So this is a typical example of panel data set. Pandas is good for that kind of data handling. Then what is the benefit of using Pandas? I think is written here, Pandas can handle mixed data types. All kinds of data can be handled in Pandas library and also sometimes there could be missing data. Even though there is missing data, Pandas library handles those data sets. Another flexibility provided by Pandas is that, it allows label based indexing. So far, index number is calculated or counted based on integer numbers, but in Pandas dataframe, you can use label like a string word as a label, you can use those description as an index. There are roughly speaking three types of data frame. The first one is one dimensional data fame which is also called series, two dimensional dataframe, that is actually dataframe. We are using dataframe terminology in order to represent two dimensional arrays. There could be multi-dimensional arrays which is handled by multiindex row. Two dimensional data frame can handle all kinds of row data set. So, this whole course is focused on two dimensional dataframes. Now let's study the one dimensional data frame called series. Before introducing series, let me import a few libraries. We need to use that numpy library and Pandas library should be imported because without important pandas library, we cannot use because it is not built in library instead on Python. So in this case, import pandas as pd. We are signing nickname pd for Pandas library after importing it. And also, for data frame or data handling example, I'm going to use iris data set but already we installed sklearn. So, we are importing circle built in data set, read in sklearn and and how can we get iris data? Simply, we use load command, so load on the bar iris. We are importing this one then we can easily load iris data set without downloading somewhere because it is already included in sklearn library, so we can use the data set. So let's execute this one. The first cell is executed but it takes time a little bit because three libraries are imported. So here's one dimensional dataframe series. I used a range function here, so starting number 80 ending number 100, the step is 2, so from 80 even integers will be printed and we can also use shape method and dimension, which is used in numpy. So, we can check the shape and dimension of the grades object and then print it's content. So let me ask you the second cell, here's. Series is one dimensional or a shape reading topple. There's only one number, the number of element is 10. So this means that the grades object is one dimensional already the set. And so, ndim returns number 1 because it also returns one dimension. And A from 80 to 98, ten numbers, even numbers are created after grades is created. As you can see here automatically, row index is assigned from 0 to 9. So if you do not specify rule index automatically, integer numbers are assigned to each row. So, how can we get other statistics regarding that object? We can check how many elements are contained in that series. We use learn function then ten is returned and we apply describe function two grades data set, then describe returns descriptive statistics, for example, how many counts, how many elements contained in grade 10? Mean value is 89, standard deviation, minimum value and quartile values are here and maximum value, data type is floating64. So if you look at here, grade is a data set. It is supplied to describe function, then the outcome is returned. This is chain method code of applying function to a specific object. In this case as you see here, there are six decimal points, it is too long. So how can you control the decimal points? Here, you can use pd set on the bar option precision,3. It means that there is more control, there is more points at 3, so the outcome changes. So now decimal points, only three decimal points appear. Rather than Pregent, the whole descriptive statistics, you can selectively choose a few descriptive statistics. For example, if you apply mean function as a method, the mean value is returned, standard deviation minimum value you can obtain by applying those functions. So as you can see here same numbers are presented, 89 the mean value, standard deviation here the same, and minimum grade is 80. So far, we didn't assign row index, but obviously you cannot sign row index and also in this case we create a series using a list. This list contains three numbers, it measures height of three persons. So at this time, label index is assigned to row index. If you use a data frame index, it means that it is row index. Then how you call column index columns, that is a terminology for column indexes. So index equal in a list or series person's last name is contained. So these three last names will be used as a row index. So let's print, how does it look like? So executing this one returns Kim 175, Kwon 184, Lee 170. So, there are several ways of creating Pandas dataframe. You can create from list, sometimes you can create dataframe from numpy arrays, also you can use dictionary and also if their some index, you can call element values using index. So here's three ways basically. One is, you apply the key low index value to an object as you did in chain method calls. So, kwon index is used as an access to a data. And also, this is a typical way of getting data bracket. Square bracket is used as indicate indexes. So in this case low index is kwon, then the matching values will be returned. Another way of getting element value is in tj index in this case, because the first person is Kim, his index is 0. Second person index is Kwon, Kwon's element value will be returned. So all the same three numbers are returned. Also you can create a series one dimensional data frame using a dictionary. Here, pd.series, reading the series function here's a dictionary key values Korea, Japan China 82, 81 cn integer and or string is used as a matching values to key, and if we print what you see is this way, Korea 82, Japan 81 China cn. So as you can see here, dataframe can handle different data types integer, floating number, sometimes strings. You can use any kind of data type. Before closing this video clip, let me give you a review question, true or false? The dimension of a series is 1, true or false? Yes, it is true because only one variable information is contained in a series, so it is 1 dimensional array.