[MUSIC] Okay, in the previous lesson we saw what optimization problems are. Problems where we want to minimize the value of a solution, in case of a minimization problem or maximize the value of the solution in case of a maximization problem. These are often NP-hard and so we need approximation algorithms. And we defined the concept of approximation ratio, this raw value, to quantify how good an approximation algorithm actually is. In this lesson we're going to look at an example of such an approximation algorithm. It will be a very simple algorithm for the load balancing problem that we also saw in the first lesson. So let's see what the load balancing problem was. So we have a set of machines, servers, and if we get a set of jobs, computational tasks, that we have to schedule on those machines. And now the goal is to do this in such a way that the total time it takes to process all the jobs is minimized, okay? So let's try to formalize this. But first let's look at an example. So here you see, well, a collection of tasks or jobs. J1 up to J7 and the height of these bars indicate how much time they would take. And here, I have three machines that I can use to execute those jobs, okay? And the goal is to assign each of the jobs to one of the machines in such a way that, well, what I said before was in such a way that the total time it takes to process all the jobs is minimized. So the total time until all the jobs are finish to minimized, and the technical term for this time it takes is the makespan, okay? So, we want to minimized the makespan. So, let's look at one particular feasible solution. So for instance, I could assign job number one to the first machine, job number two to the second machine, job number three, maybe to the first machine again, and so on, okay? Now I've assigned all the jobs to one of the machines, so this is a feasible solution, a valid solution, all the jobs are assigned. And now the makespan is, well, the time at which all the jobs are finished. So in this example the third machine determines the makespan because that's the machine that has finished last. And you see that here the makespan is, well, 18. Okay, and what you can also see, if you look at this a little bit, is that this is not going to be the optimal solution because for instance, if I switch those two jobs, then I get a solution with a smaller makespan, okay? And well, If I wanted to define it in more abstract terms then what happens is my input, so the input to the algorithm is first for each of the jobs I have to know the processing time. How long does it take to process that job on one of my machines? And I have to know how many machines I have. So my input is the numbers T1 up to Tn, right, all the processing times, plus a number m which is the total number of machines. And like we said before, the goal is to find a feasible solution to assign all the jobs, such that the makespan is minimized. Again, this problem is NP-hard but it looks fairly easy, but it is actually pretty hard to find the optimal solution even if you would only have two machines. So for every job, you just have to decide do I put it on the machine one or on machine two. Even doing that in such a way that a solution is as balance as possible is already a very hard problem. Okay, so let's have a look at a very simple algorithm for this problem which hopefully will give us a good approximation. So, the algorithm is going to be a greedy algorithm. So what this is going to do, it's going to go over all the jobs, look at the jobs one by one, and assign each job to the machine that at that point in time has the smallest total load. So for which the processing time of all the jobs that are at that point already assigned to that machine, it's as small as possible. Okay, so let's look at a simple example. Again, the example that we did before and now let's apply the strategy, this greedy algorithm. So we're going to look at job number one, and assign it to the machine that so far has the smallest load. Well, initially everybody has load zero, so it doesn't really matter, so let's put it on machine number one. Then we look at job number two, assign it to the machine with smallest load, well here there are two such machines, two and three that currently have the smallest load. So, let's just assign it to machine number two. Look at the third job, well the machine with smallest load is now machine number three. So, we assign it there and this way we continue. Job number four goes here, job number five goes here, job number six goes here, and job number seven goes here. Okay, and in this case this would be the makespan. So let's try to write that a little bit more formally using pseudocode. So I have my algorithm, which I call greedy scheduling. As input it has, well, what we said before, the processing times of all the jobs T1 up to Tn plus the number m which is the number of machines that we have, and the algorithm simply works as follows. First we initialize, well, jobs of Mi so the set of jobs assigned to machine Mi. Initially nothing is assigned, so this is the empty set. And the total load of the machine so far, well, that's going to be zero. Okay, now after this initialization, we're going to go over all the jobs one by one and always assign each job to the machine that currently has the smallest load. Okay, so we go over the jobs from one to n and now we want to assign job number j to the machine of the smallest load. So we look at all the machines, we find the machine with the smallest load, let's say machine number k. If there are multiple choices we just take an obvious one with the smallest load and then we assign job number j to this machine, and we have to increase the load of this machine, now with tj, the processing time of job number j. Okay, so this is a very simple algorithm. So traditionally, maybe in your previous algorithms courses says you would focus mainly on analyzing the running time of this algorithm. Well, here it depends a little bit on how you implement the algorithm exactly. And in particular, how quickly can you find this machine with the smallest load. Well, if you use a priority queue for that, you can do that in logarithmic time, which would mean that the total running time of this algorithm would be n log m. For us in this course, we're mainly interested in, well, also in the running time, it should be polynomial. But our main focus is on the quality of the solution. Okay, so how close to the optimal value to the minimum possible makespan is the makespan of the solution that we compute? Okay, so again, let's do a little quiz. Look at this particular input instance. And the question is, if we run our greedy-scheduling on this particular input instance, what happens? A, the makespan is 8, and this is optimal. B, the makespan is 7, and this is optimal. C, the makespan is not optimal. Okay, so try to think a little about what the algorithm would do. Okay, this is not very hard to see. What will happen is that the makespan is 8, and this is not optimal. Okay, so what would the algorithm do? Well, first assign the first job to the machine with the smallest load, say M1. Then the second one goes here, third one goes here, because machine one at this point has the smallest load. And then, the last one also goes there or to the other machine, it doesn't really matter in this case, right? So this gives a makespan of eight, and it's not hard to see that actually by switching one of the jobs to machine number two, we can actually get a smaller makespan, okay? And in this case, the approximation ratio of this algorithm for this particular instance is 8 divided by 7, right? 8 is the value of the solution that our algorithm computes, 7 is the optimal, the minimum possible makespan that we could have for this particular example. So now the question is, what can we prove about this algorithm for any possible input instance? Right, because you remember that the approximation ratio of an algorithm, so something is a row approximation, if for any possible input instance the algorithm or the value of the solution that it computes is at most rho times the optimal value. Well, here we've seen that for one particular example, it's at most eight over seven times the optimal. But what we have to show if we want to prove that this is a rhow approximation algorithm for a certain rho, what we have to show it that for any possible input instance, the value that we compute is at most, rho times the optimal value. So in the next lesson we're going to see how to do that. [MUSIC]