Hi, everyone. Welcome to join us in discourse, we're going to talk about data integration on Alibaba Cloud Big Data platform. It has a subtitle is from O S S R D s to Max Impute is actually means we have some data in O S s and RDS and we're going to do the did the integration A collecting all the data to max impute to do further analyzation. and the knowledge off O S S and RDS is beyond the scope off this course, but I can make it simple toe Give your general explanation You can consider the O. OSS as distributed file system, you can store all your data on OSS and the RDS is it is distributed database. You can store some structure data in the RDS and in this course, the main purpose is to collect the data in O S S and RDS to the Alibaba Cloud data processing engine that is Max Max Compute. We will introduce how to use the Max compute and based some basic concepts and the architecture's okay, let's get started. These are the objectives of this courses and the first one is dead integration in this part, we're going to talk about the different types, like the structure, data and and structure or semi structured data. And we have summarized those different scenarios into some general process of data integration. The second part is the Max Compute, where we introduce some max computer, basic concept and architectures. The third part is about the data works. You can consider it the dataWorks as the ideas for Max compute ideas then for the integrated on development environment. So basically, we're using the data integration component of dead works to do today's date immigration task and the last one will be the demonstration. We actually first we're going to log on the OSS product and RDS to see the data. We storey in those, different products and then we're going to do the data integration with our data works and Max compute to finish our our task for discourse. Okay, Lets to see. Let's see, the first part is the data integration is is mainly have two parts as we just introduced just now, it has did source and type. So we generally we can separate all the digital data into three parts is a structure, data and structure and semi structured data and the structure data is basically stored in the traditional database, like R G B M s, like Oracle, my sickle, sickle surround post great circle. And the instructor data is stand for some text document pictures, video and audio. The same structure it has has is some meta data in it to describe some format. It is something like locks and ex smile and Jason files. So we're going to collect all those kinds of data using different methods and this part is called the data integration. Okay, let's take a look at the dead side we're going to use in this course, the left hand side is table schema, which is in the GS. It is distributed database and we can see the table name is I P address to location and score db five. It means that when we give IP address, it can retained through the a country which country this IP address come from even in the regional city, and even more process location like latitude and longitude information in the right hand side is through the instructor data. We're going to use in this course and it is, Web click stream data. So it is totally about 60 1000 lines in our tested this set. So about each item off one line off the log, we have another course named Analyze the log data on Alibaba Big platform you can take for reference. You can search on the website with the keywords like NGX Webisode or http Webisode. They have detailed explanation off. What does it mean? Like the first item will be the A P address and then the time stamp information and the request the type it get opposed on the protocols that used as like like the first line Next item It is a number 300 the one it is stand for the H T V States data code. So it stand for something else. You can search on the website, so we're going not to into the detail off the web blocks bag Weblog data. So we're going to load the structure data on the RDS and those log information are starting the S s. So later on, when we have finished the data integration, we will use this to decide to do some data analyzes the next part will be the general process of the integration according to the different business scenario became divided them into three work flows. The first one will be how to deal with the offline data, the second bomb will be the streaming data, and the third one will be how to do the real time data analyzed. So we are only focusing on the first first one how to deal with the offline data. Our data Remember, we have the structure data stored in the TS. It is the database and the other and structure data is we have We have them stored on the OSS. That is the object storage service provided way provided by Alibaba. So we are only focusing on the source accusation but after that, we have some some other courses to describe how to do the distance grubbing and data. EDA s, that is the statistics and modeling and will have the results back into database and using some by twos to gave the data visualization to get better inside and discriminated. We have another day, friend, Architect, We have we used different, data processing tools like we're no longer using the max compute because the max computers, mainly to deal with the offline data analyzing. And we're using the street computers another streaming process engine provided on Alibaba big data platform. So that is, that is basically how the those three types of data were have different architectures to deal with it. So you can, take the further information we will have some corresponding courses to explain each each 11 of those. This chart shows how we deployed offline Data analysis on Alibaba Big Data Cloud platform. We have some applications installed on the ECS you can simply consider them as Linux systems or some database server web server. Something like that from those servers we extract data which have divided into and structure and the structure data. You can see there is two arrows pointing down one is, going to the OSS and the other is go to the RGS. And you make assume that that's where we are now. We have already got the data stored on USS that is the Web log data and the other is the IP? That is the IP dress data base and what what we are going to do is to abstract those data to the max impute as the blue arrows that indicated.