Hello everyone. Welcome to Alibaba Cloud ACT Certification Exam Preparation online course; application monitoring. In this chapter, we will focus on the following products, the CloudMonitor, Log Service, and Auto Scaling. At the very beginning, we will introduce the relationship between the metrics tracing and logging. Then we introduce these products one by one, starting from CloudMonitor, to the log service, to the tracing analysis, and then we talk about while you are using the auto scaling, how can you handle the shared data between the different applications. Why are you running application to serve your business? You may want to collect the application data very thoroughly, which include the three major directions you want to collect from. One is the matrix. Matrix is like when you are monitoring your ECS, you want to know the CPU usage right now, you want to know one of this IO. Those are called the matrix you want to collect from. Another thing you want to look into is the tracing capability, which means if you have a complex application or a macro service framework based application, while the service is calling another service, you really want to look into the calling chain to find out which one is really causing some problem. All we want to look into some performance issue. Tracing tools or services can help you to solve that. Another angle you want to look into is the logging part. Logging means you want to look into the history about the behaviors and whatever the application has done in the past time. So that you know for some root cause purpose, who or where, and what really caused some failures. Matrix, tracing and logging are the three major aspects, if you really want to collect application metrics thoroughly. Alibaba CloudMonitor, is majorly used to collect the metrics, and the tracing analysis is used to choose application runtime trace back, and logging service, which we call the single log service, is used to collect the historical logs. Let's look into the first product CloudMonitor. As we mentioned before CloudMonitor is a very classic service in Alibaba Cloud. It can be used to monitor a lot of stuff, not only for the host, but also for the other Cloud services. They also set up a special menu for application. You can monitor the application liveliness and some other matrix by putting them into different groups. Also CloudMonitor provide the event monitoring, customer monitoring, those services. Definitely for whatever you think is available or you think is critical, you can generate some alarm and send a notification to the different channel, so that the operator can receive the message as soon as possible. Now let's look at one example of the CloudMonitor. Which of the following options correctly describe CloudMonitor, custom monitoring, and custom events? The keyword here is custom monitoring and the custom events. We can tell from the options here, two options is midway talking about a custom events another two is talking about a custom monitoring. We may have to pick up one of the correct answer from them. Custom event is used for collection of continuous event type data , query and alarms. Custom event is for periodic and the continuous collection of time series monitoring data, query and alarms. Custom event actually you define some event and continuous collect based on the event type, and you can generate some alarms. Comparing to the second option, this one is the correct one. For C and D, custom monitoring is used for periodic and continuous collection of time series monitoring, data, query, and alarms. Custom monitoring is used for collection of non-continuous events, the keyword is here. For the custom monitoring, since we are talking about monitoring, we might want to monitor in a continuous way. I would like to pick up the option C. I think the answer is A and C. Now, let's look into the application logging by using log service. Here I've just listed some of the big scenarios, log service can serve. Actually under one-stop, service for the log data. Log service find the four requirement of log management together. Log collection, log streaming, log search, and log shape. You are looking at the structure of the log service. It is majorly composed of three components. LogHub, LogSearch and LogShipper. We got to firstly collect Log Data by using the LogHub from ECS container mobile terminal, open source software, a lot of different resources. Then, you can also use the LogHub to consume the Log Data, which was provided by some real-time interface. Then, we can analyze the [inaudible] in the Log Data by using the LogSearch. It provides terabytes scale log analysis capability. We can index query, and then analyze Log Data in real-time. In addition, apps that analyze, you can also use the LogShipper, which is a stable and reliable log shipping function to ship the Log Data to some more commercial storage service like OSS, or even transfer the data to the MaxCompute or our ADB to do further analysis. The sample question we can give for the Log Service is like following. A developer access logs in the Log Service via the API. The error code returned by the server is 404, which could be the possible reason. The first answer is the log project does not exit. The second one is the requested digital signature does not match, server internal error, server is busy. Actually if you don't hate this problem before, but based on the error code, 404, you are to mean something did exists from the server side. The correct answer is, a project does not exist. Now, let's look into application at runtime tracking, which is majorly a term by a product called tracing analysis. Tracing analysis provide a set of tools for you to develop distributed application. These tools include trace mapping, and request status, and also trace topology, also, the obligation dependency analysis. You can use these tools to analyze and diagnose the performance bottlenecks in a distributed application architects and user making or microservice development and there's no stake more efficient. The picture shows, I just listed some majors scenarios that TA can serve. The first one is query and diagnostic of distributed traces. TA can collect all user request of microservices in a distributed architect, and summarize these requests to a distributed trace for query and a diagnostic. Another scenario is a real-time collection of application performance data. TA can collect all user requests for application and analyze in real time. Also take and do the dynamics discovery of distributed topologies. It can collect distributed call information from all your distributor mixer service and the relevant platforms. Also, TA support the different languages, including all the address listed here. What you need to do is just to install the ''Client'', and then you begin to collect information in the real time. Another scenario here I want to list is various downstream integration scenarios, it's quite like Log Shipper can do. The TA provides traces that are immediately useful to [inaudible] Cloud Log servers. Tracing analyses can't stand the traces to downstream analysis platform including Log Service, and also the MaxCompute. In this chart, I just list to you the major idea about how to use a tracing analysis. You install the different Client, then they will begin to do the following major scenarios collection like the back tracing, metrics collection, topology description. I also have some demo console screenshot to show what it looks like. The question we want to show and assemble question for the trace analysis is, there is a key concept defined in the TA product is called spans, S-P-A-N-S. If a developer have written a web application using a microservice architecture, in such an architect, the client first initiates a request. The request first reaches a load balancer, then goes through an authentication service, billing service, then request a resource. Finally, a result is returned. So it's asking how many spans does such a call chain consists of? To answers this question, you just simply count the application requests, and itself and also all the service it went through, including 1, 2, 3,4, so the answer should be five. Now, let's look into the last section of this chapter, handle the shared data between applications. Talking about the shared data, we have to mention the product called Auto Scaling. Auto Scaling is a management service that allows users to automatically adjust elastic computing results according to the business needs and policies. Together with the color monitor based on the different matrix we defined, we can dynamically add more ECS or remove ECS from the existing scaling group. But the problem is for applications already running in different ECS, how can they share the data smoothly, especially when something was joined or something was removed from the scaling group? How these newly joined nodes be fully aware what is going on right now. Like the session information, like the shared user information. Here I just list some rule of thumb for you to consider if you want to handle the shared data app between applications. These are the key items you need to consider about. Definitely, try not to store too much share data locally, locally means individually in a different ECS. You should consider to put them in a central place, just like unified RDS, then SLB, RDS those are the entry point and the database. Make sure your Auto Scaling can be aware of them so that when a new ECS is added, Auto Scaling will have a huge add them to the SLB and RDS. Another thing you need to pay attention to is because the session information, especially for the web servers, the session information is stored in the server side. We better not to store it locally again on the ECS, we should be using some centralized fast speed, like the radius database to share the session information. The last Rule of Thumb is, if the ECS is moving out the whole scaling group, you may have some important information you want to collect from the ECS. For example, if you want to collect some of the log information, and to see what is going on from the ECS in the previous which we call time, you may want to enable something called the lifecycle hook in the Auto Scaling servers. The hook will notify some other servers to tell them now this ECS is going to be moved out or tear down. Would you like do something there, they'll be notify the servers we'll go back to the ECS and do the necessary work in the timeout period. Considering all this items while you do the application data scaling, it can be a very good guideline for how you apply your whole architect together with the Auto Scaling. Further product usage itself, let's say a sample question for the Autoscaling. In order to deal with sudden spikes in traffic, company A uses Alibaba Cloud Auto Scaling to set up an alarm trigger task. The task is grow the scaling group when the average memory utilization exceed 80 percentage. During the test, it was found out that the alarm task was not executed successfully, what could be the possible reason? First one, the ECS instance in the scaling group have not yet installed the CloudMonitor monitoring agent. Second one, before triggering the alarm, the number of instances in the group has reached the maximum number of instance. Options 3, the instance types chosen in the scaling group are out of stock in the region. Number 4, the number of instances in the current group exceeds the expected number of instances for the scaling group. If you have been using this product, this one is not a real option because we don't have actually something called expect number of instance definition in scaling group. The other options are cracked because don't have the agent, you cannot detect the situation of your ECS. Also, if the existing ones already reached the maximum number of definition, they cannot grow anymore. This other option is, it could be possible that if you want to create some instance, but in that region, that is specific incidence is accidentally out of stock. It might also impact your execution of the expansion task. The correct answer is A, B, C. In this chapter regarding obligation monitoring the bargain and optimization, where introduced cloud monitor for the data matrix collection, which includes tracing analysis for real-time data tracing, and root cause analysis. Log services is a one-stop log collection management surveys, through the logs shaper log search, log hub you can collect the different sources into the central place. The last layer we talk about is Auto Scaling where you have the application with the ECS is getting in and is getting out. You should consider some of the shared data, including the user information, the session information, you should have found them a proper place to stop instead of store them locally. This ends the introduction of this chapter and look forward to seeing you in our final chapter. Thank you.