[MUSIC] Now, I'm going to introduce you another interesting K partitioning clustering method called the K-Medoids Clustering Method. Why do we need to study K-Medoids Clustering Method? Just because the K-Means algorithm is sensitive to outliers. Because a mean is sensitive to the outliers. Just give you a simple example, if you look at a company's salary, if you adding another very high salary, the average salary of the whole company shifts quite a lot. So, let's look at the K-Medoids, what is K-Medoids? That means instead of taking the mean value of object in a cluster, as our centroid, we actually can use the most centrally located object in the cluster or we call medoids. That means the K-Medoids clustering algorithm can go in a similar way, as we first select the K points as initial representative objects, that means initial K-Medoids. The difference between K-Means is K-Means can select the K virtual centroid. But this one should be the K representative of real objects. Then we put this one into repeat loop. We can assign, similarly, we assign each point to the cluster with the closest medoid. Then, we can randomly select a non-representative object, suppose it's o sub i, or see whether we use o sub i to replace one medoid, m. Whether it will improve the quality of the class ring, that means the total cost of swapping is negative. Simply says, it was sloppy, we can reduce some of the square arrows. Then, we are going to swap m with object oi to form the new set of medoids. Then we need to redo the assignment. And here, this process goes, and here, the convergence criteria is satisfied. Now, we'll see a small example how a typical K-Medoids Algorithm is exacted. We look at the PAM, as an example. Suppose we are given ten small number of points in this small graph. In this 2D space, we want to find the two clusters. At the very beginning, we arbitrarily choose k objects here. We choose two objects as initial medoids. Then we will find the clusters of these medoids, as follows. Okay. Then, we will see whether we can randomly choose another object like O random. Say these non-medoic object. We want to see whether it could become a medoid. If it would reduce the total cost. Always say we get a better SSE. And in this case, suppose we choose one here, but we found it does not really reduce any, the total SSE. Then we actually can get another one. Like we get this orange one. Then we look at the cluster we can form. We know this one will reduce the total SSE. That simply said the quality of the cluster is improved. Then we will do the swapping. So this essentially is listed here as we initially, we select initial K medoids randomly. Then, we will do object reassignment. Then we try to swap medoid, m, with the random non-medoid object, o sub i, if it improves the clustering quality. Then we'll do it again and again until the convergence criterion is satisfied. So this is just a simple execution to illustrate the ideas of this K-Medoids, how it is executing. Now we see these K-Medoids clustering essentially is try to find the k representative objects, so medoids in the clusters. And, the typical arrow is in PAM, called Partitioning Around the Medoids, was developed in 1987 by Kaufmann & Rousseeuw, starting from initial sets of medoids. Then we iteratively replace one of the medoids by one of the non-medoids, if such a swapping improve the total sum of the squared errors. That is the total quality of the cluster. This method works effectively for small data sets. Because we can keep trying different swapping, but it cannot scale well, because the computational complexity is quite high. If we look at it into detail, actually, this computational complexity for every swapping, actually, it's to the square of the number of points. That's quite expensive. So, how to improve its efficiency. There's one proposal by same authors in 1990 called CLARA. Essentially, it's PAM on samples. That means, instead of using the whole points, we choose a sample, s. S is the sample size. Then, the computational complexity, this square, actually, comes down to the size of the sample. However, if the sample, initial sample selection, is no good, the final classroom quality could be poor. Then in 1994, there's another algorithm called CLARANS proposed, as every iteration we do randomized re-sampling. That means, we do not keep exact the same sample. We do randomized re-sampling, that, or ensure the efficiency and the quality of clustering. [MUSIC]