Welcome back. Let's maybe start with a quick summary of what we've seen so far. We started in the very first lesson by talking about the basic concepts behind approximation algorithms, and especially we looked at the important concept of approximation ratio. After that, we spend several lessons talking about the load balancing problem. We gave a simple Greedy algorithm, we analyzed that using a lower bound. So, instead of comparing directly to OPT, we compared the solution to a lower bound. We proved it was a 2-approximation, and then we also actually gave a better approximation by first sorting the jobs. But now we're going to look at a different problem, which is the vertex cover problem. So, what's the vertex cover problem? Well, one way to think about it is that you have a computer network with lots of computers, and what you want to do is you want to monitor the links in this network maybe to see if one of these links, or to see if all of the links, are functioning correctly. So, let's look at the link. Now, to monitor this link, one of the two computers in the networks that are using this link, that are connected by this link, one of them has to monitor whether or not this link functions correctly. So, in order to do that, that particular computer maybe we need to install a little piece of software on it to do the monitoring. So, let's say we do that on this particular computer. Well, that actually means that this computer, that now has this little piece of software installed on it, can monitor all these links. The problem that we want to solve now is, suppose we have a network and we want to be able to monitor all the links, what is the smallest set of nodes in this network, the smallest set of computers that we can use to install this little piece of software, so that we can monitor all the links? This is called the vertex cover problem, and let's try to define it a little bit more in mathematical terms. So, the network is now a graph. Here you see a picture of a graph that's the input. What we want to compute is a subset of the vertices, a subset C like you see here, such that for any edge, any link in this graph, at least one of the two adjacent nodes is in the subset. The subset C is called the cover. So, for each edge we want at least one of the two endpoints to be in the cover. The cover would be the set of computers where we install this little piece of software. So for instance, if you look at this edge in the network, indeed you see that one of the two endpoints is one of the nodes in the cover. Also for this one, well here actually, both endpoints are in the cover. What we want to do is we want to find such a cover of minimal size. Let's see what we can do about this problem. Well, the first thing, again, is that this problem is NP-hard, so we cannot expect that we have a fast algorithm that computes an optimal solution, so we want to come up with a good approximation algorithm. So, let's see what will be a simple algorithm for this problem. Well, a very simple algorithm would simply go over all the edges one-by-one, and find an edge which currently is not covered, as we say, so which does not have one of its endpoints in our cover C, and then simply we add one of them to the cover and then continue until all the edges are covered. So, let's do an example. First, we look at this edge, we see it's not covered, so we put one of the endpoints in the cover, let's say this one. Now, using this node, we actually cover three edges. We take another edge which is not in the cover, we put one of it's endpoints in the cover, and this one actually covers a lot of new edges. Again, look at one edge, put one vertex in the cover, and this covers this one. We simply continue until we're done. Look at an edge, put something to cover and so on. So, once we're done by construction or by the way the algorithm works, we are sure that all the edges are covered. So, now the big question is, what is the quality of the solution? But before we get to that, let me write down the algorithm a little bit formally using pseudo-code. So the input is a graph, and we start by initializing the cover to the empty set. Initially nothing is in the cover, so this is the cover, and this E_uncov is the set of uncovered edges. So initially, everything is uncovered, so this is simply the whole set of edges. Now what we do is we go over the edges or more precisely, we check whether the set of uncovered edges is still not empty. Then we take out an edge which it's not covered, say the edge UV and then we put one of the endpoints, say U, into the cover. This new endpoint that we just put into the cover is going to cover, we can use it to monitor a number of edges, in particular the one that we just looked at, U,V, but it hopefully covers more edges. All of them we take out of this E uncovered, and then we check is there another edge that is still uncovered, put an endpoint in the cover, and so on. So, this is our algorithm. If you implement it in the right way, you can get a running time which is linear in the size of the graph. But for us, what is important is what is the quality of the solution? This algorithm computes a cover with a certain number of vertices, is it going to give a minimum number of vertices? If not, how close to the minimum possible will it be? Let's first do a very simple quiz, which hopefully is very easy for you. So, if you look at this algorithm, then what is the quality of the solution? A, is it always going to be optimal? B, is it a 2-approximation algorithm? C, are there maybe input graphs such that the size of the cover we compute, so the number of vertices we put into the cover, is more than 100 times the optimal, the minimum possible number of vertices in the cover? So, hopefully you immediately realize that A cannot be true. This was an NP-hard problem, so we cannot expect to get an algorithm that is always optimal. Actually what turns out to be the case is that this is really not a good approximation algorithm. The size of the cover that we can compute on certain graphs could be more than 100 times the optimal. Let's look at an example or an explanation of why this is the case. So, look at a graph like this, which is called the star graph. So, it is one node in the center and then all other nodes are connected to this one central node. What's our algorithm going to do? Well, it's going to look at a particular edge and then put one of the two endpoints into the cover. Let's say that the algorithm is a little bit unlucky, and it puts not the central vertex but this other vertex into the cover. It covered only this edge. So now, we're going to look at the next edge. Again, maybe the algorithm is unlucky, or stupid you could say, and it's going to put this vertex into the cover and not the central one. This way it continues, and at the end what you see is that what would've been the optimal solution, namely just to put the single central vertex into the cover, what the algorithm might end up doing is putting all the other ones. Now, if I have a star graph with more than 100 vertices, then actually the size of the solution will be more than 100 times worse than the optimal solution. So unfortunately, it's a very simple algorithm which is nice, but unfortunately it's not a good approximation algorithm. The quality of the approximation, the approximation ratio, could be as much as V minus 1, where V is the number of vertices in this graph. So, it could be arbitrarily bad if the size of this star graph gets larger and larger. So unfortunately, this algorithm doesn't work, but we will see in the next lecture is a very simple adaptation of this simple algorithm that does give a good approximation, namely a 2-approximation.