In this module, we're going to look at some hard problems that come up in the context of modular arithmetic. These problems are going to be the basis of the cryptosystems that we build next week. So first I'd like to mention that there are many easy problems in modular arithmetic. For example, if you give me an integer N, and you give me an element x in ZN, finding the inverse of x is actually very easy using the Euclidean algorithm. Similarly, if you give me a prime p, and you give me some polynomial f, then finding an element in Zp that's a root of this polynomial is also relatively easy, and in fact there's an efficient algorithm that can do it in linear time in the degree of the polynomial. So at least for low-degree polynomials, finding roots of these polynomials modulo primes is actually quite easy. However many problems in modular arithmetic are actually quite difficult, and as I said, these difficult, these difficult problems form the basis of many public-key cryptosystems. So let's look at some classic hard problems modulo primes. So fix some large prime p, so think of p as some 600-digit prime, and let's fix some element g in (Zp)<i>.</i> And let's assume that the order of this element g happens to be a number q. Now, consider the exponentiation function that simply maps a number x to the element g^x. We showed in the last segment that this function is easy to compute using the repeated squaring algorithm, so in fact computing g^x can be done quite efficiently, but now let's look at the inverse question. So the inverse problem is basically, given g^x, now I want you to find its logarithm, namely the value x. As I said, over the reals, over the real numbers, given g to the x, find x is exactly the definition of the logarithm function, but here I ask you to find the logarithm modulo a prime p. So this problem is called the discrete logarithm problem, Dlog, and I'll say that the discrete logarithm of g to the x base g is basically the exponent x. So the discrete logarithm of g to the x is x, so our goal is to output some exponent x in 0 to q-2 that happens to be the logarithm of g to the x. So let's look at an example. Suppose we look at the integers modulo 11, and here I wrote down the discrete log function in Z₁₁, base 2. So let's look at how this function behaves. So first of all, the discrete logarithm of 1 is 0, because 2 to the 0 is equal to 1. Similarly the discrete logarithm of 2 is 1, because 2 to the 1 is equal to 2. The discrete logarithm of 4 is 2, because 2 squared is equal to 4. The discrete logarithm of 8 is 3, because 2 cubed is equal to 8. And so on and so forth. So here I wrote down the discrete logarithm values for you, and let me ask you a puzzle: what's the discrete logarithm of 5 modulo 11? See if you can calculate it yourself. And so the answer is 4, because 2 to the 4 is equal to 16, and 16 is equal to 5 modulo 11. OK, so this says that the discrete logarithm base 2 of 5 is this number 4. Now I can tell you that the discrete logarithm function in general is actually quite difficult to compute. Of course for small primes, it's quite easy. You can just make a table and you can read off the discrete log values. But if the prime p happens to be a large number, say a 2000-bit number, then in fact computing the discrete log is quite difficult and we don't have good algorithms to do it. So let's define the discrete log problem more generically. Instead of just focusing on the group (Zp)<i>, let's abstract and look at a generic group G.</i> So we have a finite cyclic group with the generator g. All that means is that this group just consists of all the powers of g. So we take all the powers up to the order, in this case the order of G happens to be q, so we take all the powers of g, and those powers actually make up the group capital-G. Now we're going to say that the discrete log problem is hard in the group G if in fact no efficient algorithm can compute the discrete log function. So what do we mean by that? What we mean is if you choose a random element g in the group capital-G, and you choose a random exponent x, if I give the algorithm g and g to the x—of course I also have to give it a description of the group, so I gave it the description of the group G, and the order of the group, but the primary elements are g and g to the x— the probability that the algorithm will actually compute the discrete log is negligible. Ok, so if this is true for all efficient algorithms, then we say that the discrete log problem is hard in this group capital-G. And again, the reason we say that is because no efficient algorithm is able to compute discrete log with non-negligible probability. So as I mentioned, we have a couple of candidate examples for groups where discrete log is hard. The canonical example is (Zp)<i>, this is actually the example that Diffie and Hellman</i> came up with in 1974, but it turns out there are other candidate groups where the discrete log problem happens to be hard, and I think I already mentioned that one candidate is what's called the elliptic curve group, or rather, the set of points on an elliptic curve. I'm not going to define that here, but as I said, if you'd like me to talk about elliptic curve cryptography, I can do that actually in the very last week of the class. So these are two groups where the discrete log problem is in fact believed to be hard, but it so happens that the discrete log problem actually is harder, as far as we know, on elliptic curves than it is on (Zp)<i>. In other words, if you give me equal-sized problems</i> one set in the group (Zp)<i>, and one set in an elliptic curve group,</i> the problem set in the elliptic curve group is going to be much harder than the problem in (Zp)<i>, again assuming these two problems are of the same size.</i> And because the elliptic curve problem, the discrete log elliptic curve problem is harder than in (Zp)<i>,</i> this means that we can use smaller parameters when using elliptic curves than we can for (Zp)<i>, and as a result, the resulting systems with elliptic curves</i> are going to be more efficient, because the parameters are smaller, and yet the attacker's job is as hard as for a much larger prime in (Zp)<i>.</i> So to make that concrete, I'll tell you that in (Zp)<i>, there's what's called</i> a sub-exponential algorithm for discrete log. So I think I already mentioned this, if you have an n-bit prime, this algorithm will run in roughly cube root of n time. In fact there are many other terms in the exact running time of this algorithm, but the dominant term is cube root of the number of bits in the prime, so cube root of n. And because of this algorithm, you see that if we want the discrete log problem to be hard, as hard as it is to break a corresponding symmetric key, we have to use relatively large moduli p. Now in contrast, if you look at an elliptic curve group, the numbers look much better, and in fact on an elliptic curve group, the best algorithm for discrete log that we have runs in time e to the n over 2. So this is what we would call a proper exponential-time algorithm, because for a problem of size n, the running time is roughly e to the n. It's an exponential in n, not an exponential in cube root of n. And because the problem is so much harder, again the best algorithm we have actually takes exponential time, you notice that in elliptic curves we can use much smaller parameters and still remain secure. By the way, the reason the elliptic curve size is exactly twice the symmetric key size is exactly because of this factor of 2 in the exponent here. So we have to double the size of the elliptic curve for the problem to actually take e to the n time. And actually I made a small typo here in that this is actually supposed to be 2 to the n/2. But the exact base of the logarithm doesn't really matter. So I think I mentioned before that because the parameters are smaller with elliptic curves than they are with (Zp)<i>, there's a slow transition from doing crypto modulo p</i> to doing crypto over elliptic curves. And again I'll mention that if you want me to describe elliptic curves in more detail, I can do that at the last week of the class. So now that we understand what the discrete log problem is, let me give you a direct application of the hardness of discrete log. An in particular, let's build a collision-resistant hash function from the discrete log problem. So let's choose some group capital-G where the discrete log problem is hard. So if you like, you can think of capital G as the group (Zp)<i>, and let's assume</i> that the group capital G has prime order q. So q is some prime number that happens to be the order of G, the number of elements in the group capital G. Now to define our hash function, what we'll do is we'll choose two elements in the group capital G, and let's call them g and h, and then we'll define the hash function as follows: The hash function on input x and y will output an element in G defined as g to the x times h to the y. That's it. Ok. A very very simple hash function, and if you recall, we even talked about this hash function when we talked about compression functions before. I want to show you that this function capital-H is in fact collision-resistant in the sense that finding a collision for capital H is as hard as computing discrete log in the group capital G. Ok. So in particular, if you can find a collision for capital H, you can compute the discrete log of h base g. And since we said the discrete log in the group capital G is hard, computing the discrete log should be difficult, and therefore finding collisions for capital H is going to be difficult. Ok. So let's see how we do this. It's actually a very cute proof. What we'll do is, We'll do the following. Suppose we are given a collision on the function capital H. So we're given two distinct pairs, (x0, y0) and (x1, y1), that happen to collide on the function capital H. What does it mean that they collide on the function capital H? What it means is that if I evaluate the function capital H at (x0, y0) and (x1, y1), I'll get a collision. Namely I'll get an equality. Well, so now I can just do a little bit of manipulation, and you see that this means I just move all the g's to one side and all the h's to the other side, so this means that g to the (x0 - x1) is equal to h to the (y1 - y0), this is just simple manipulation. Now I can raise both sides to the power of 1/(y1 - y0). In other words I am taking the (y1 - y0)th root of both sides. So one side will become simply h, and the other side will become g to the power of this fraction, (x0 - x1) divided by (y1 - y0). But now look at what we just got. Basically we expressed h as g to some known power. Basically all we did is just division, and boom, we're done. We've just expressed h as g to some known power. So we've computed the discrete log of h, base g. So you might be wondering, what does this division in the exponent mean? What does it mean to divide by (y1 - y0) in the exponent? Well what it means is basically we compute the inverse of y1 - y0 modulo q, and then we multiply the result by x0 - x1. And that gives us the exponent in the clear, and so we've just learned the discrete log of h base g. So this also shows you why we wanted q to be prime: because we need to make sure that y1 - y0 is always invertible. So in fact we know that when we work modulo prime, everything is invertible except for zero. So that actually raises a subtle point, what happens if y1 - y0 actually happens to be equal to zero? If that's the case, then we are not going to be able to get the discrete log, because we won't be able to divide by zero. But if you think about this for a minute, you realize, well, let's see here. If y1 - y0 equals 0, that means that y1 is equal to y0. But if y1 is equal to y0, just look at this equation here, that means that well, that necessarily means that x0 is also equal to x1. This takes you a minute to convince yourself, if y0 is equal to y1, basically these two terms simply cancel out, and then we get that x0 is equal to x1. But then if x0 is equal to x1, and y0 is equal to y1, what you gave me is not a collision. You simply gave me the same pair twice. So that's cheating, so that's not considered a collision, and therefore, you know, I don't need to find a discrete log. But if you give me a collision, necessarily y0 is not going to be equal to y1, and then as a result I'm going to get the discrete log of g base h. And as we said, since the discrete log is believed to be hard in the group capital G, this means that this very very simple function capital H must be collision resistant. So this is a very elegant example of a reduction from finding collisions to computing discrete log. I should tell you by the way that this function is not really used. Even though this function has a nice proof of collision resistance, it's not really used because it's relatively slow. Essentially, on every hash you have to compute two exponentiations, and that's much much much slower than, say, functions like SHA-256 and other kind of ad hoc collision-resistant hash functions. OK. So that's what I wanted to say about intractable problems modulo primes. Now let's look at some intractable problems modulo composites. So here we're gonna say ahh, again let's look at, say, 1024-bit numbers, and let's define the set Z-sub-2 n. This is going to be the set of all integers that happen to be a product of two primes where those two primes are n-bit primes. Ok. So the 2 here corresponds to the fact that the numbers in this set basically have 2 prime factors, and those two prime factors are roughly the same size. They're both n-bit primes. So there are two classic intractable problems in this set. The first problem is if I choose a random integer in the set Z(2)(n), the problem is, factor it. And already this is quite a difficult problem for 1024 bits. And although it hasn't been done yet, it's very likely that numbers of this magnitude will be factored soon, and so the recommended value these days is actually to use 2048-bit numbers. That's still beyond our reach, and those are numbers that we still cannot factor. Another example of an intractable problem modulo composites is if I give you some polynomial that's nonlinear, if the degree is bigger than 1, and I give you some random composite in the set Z(2)(n), your goal is to find a root of this polynomial, find an x that happens to be a root of this polynomial. And again, we don't know how to do that; of course if the degree is equal to 1, that just amounts to solving linear equations, and we already know that, that's easy. But the minute the degree becomes nonlinear, we don't know how to solve this modulo N without actually first factoring the modulus, and then computing roots. So there are some systems, for example RSA, that depend on the hardness of this particular problem for specific polynomials, which we're going to see next week. And just as an example, I should mention that for example taking square roots or cube roots modulo a random composite in Z(2)(n) is believed to be difficult. So there's actually quite a bit known about the factoring problem. It's actually a very old problem. Already the Greeks were interested in factoring, but Gauss actually has a wonderful, wonderful quote that talks about the problem of factoring and the problem of primality testing. So in his famous dissertation from 1805, he writes: "The problem of distinguishing prime numbers from composite numbers" (this is what's called primality testing) "and the problem of resolving the latter" (namely composites) "into their prime factors is known to be one of the most important and useful in arithmetic." So he had the foresight to realize that these are extremely interesting problems. These are computer science problems essentially. How do we test if a number is prime? How do we factor a number if it's not a prime, if it's a composite? And already Gauss realized that these are extremely extremely important and interesting problems, and now, these days, these problems are actually used on the Web all over the place. So let's see. So, in fact, testing if a number is prime has been completely solved; we now know completely how to do it, using, efficiently using a randomized algorithm, and we even know how to do it using a deterministic algorithm. Factoring numbers, factoring composites into their prime factors, is still believed to be a difficult problem. I would encourage you actually to think about it. It's a wonderful problem to think about. If any of you can solve it, if any of you can come up with an algorithm to factor composites into prime factors, again, as I said, it's instant fame in the crypto world, and it would have tremendous impact on security of the Web in general. So it's a fun problem to think about. Let me tell you what's known about the problem of factoring. So the best algorithm that we have is called the number field sieve. Again, its running time is one of these exponentials, but a cube root of an exponential, which is why the composite has to be quite large for the problem to be difficult. Although the current world record is really just factoring a 768-bit number. This is called the RSA-768 number, it's a challenge number that was recently factored. The number is about 200 digits, and already factoring this number took an enormous amount of work. It took about two years on hundreds of machines, and finally they were able to factor this number. And the estimate is that actually factoring a 1024-bit number is about 1000 times harder than factoring RSA-768, so instead of 2 years, it would take two thousand years but of course computers are getting faster, we have more cores at our disposal, we have more computers, and so this factor of 1000, assuming Moore's Law and so on, really just means a decade—you know, computers get faster by about a factor of 1000 every decade, so it's very likely that within the next decade, we'll see a factorization of a 1024-bit number, which would be the end of 1024 bits being used for public-key cryptography. So that's the state of the art in the factoring world, and this concludes this module. I'll mention that if you want to read more about any of the topics that we discussed, there is a good book on the Internet, it's a free book that you can download, written by Victor Shoup, and in particular, the topics that we discussed are covered in chapters 1 to 4, 11, and 12. So I would encourage you to take a look at that, and hopefully that will help with understanding the material. Next week, we'll start building cryptosystems using the topics we just learned about.