Well, we've already covered a great deal of techniques in programming, but now today we want to cover something that's very important when you're going to be using computers to solve large problems, and that's performance. So, let's begin with the basic challenge. In this challenge, it's been with us since the very early days of computing. In fact, it goes back to Babbage who built this machine called the Difference Engine. That's a mechanical machine that was going to perform arithmetic operations like multiplications. This is actually a copy that was built in the '90s at the British Science Museum, London Science Museum. Here's what Babbage said, "As soon as an Analytical Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arrive by what course of calculation can these results be arrived at by the machine in the shortest time?" In this machine, if you look at it, it actually has a crank. So really Babbage's question was how many times do you have to turn the crank to get your answer? Really, that's what we're talking about today is how can we be sure that we're getting the results in not maybe not the shortest time, but at least a reasonable amount of time. It's a very important thing to have in mind when we're crafting programs to solve scientific and commercial problems. So, the more modern version of this, this is our review where we were in program development, where we test our program, we get it compile, run, and test, and then after a while, we start using it to solve a practical problem. But really if it won't solve the practical problem, it might be just because it's too slow, it's too inefficient. So we want to talk today about how might you understand its performance characteristics, so that you can approve it, and actually address the practical problem. The key insight behind this was developed by Knuth and the 1970s, who pointed out that it really is feasible and actually productive to use the scientific method to understand performance. That's what we're going to talk about today. So there's a lot of reasons to study program performance. Here's three in particular. So one is we want to just predict the behavior of our program. We want to know if it will finish at all. Is it in a loop, or is it just taking a long time? Actually, usually, we want to know when will my program finish. But we also might want to compare algorithms and implementations. We want to know, "Well, I've got a program that works, but if I change it in this way, will it make my program faster?" or "How can I make my program faster?" That's another reason that we might want to study performance. So making a distinction between the word algorithm and the word program, an algorithm is the general method for solving a problem that's suitable for implementation as a program. We're going to study a lot of algorithms in this course, and if you take more CS courses, you'll see plenty of them. This is our book on algorithms, and we have two courses on algorithms later on online. So, really, what we want to do is with studying performance of algorithms and programs is to develop a basis for understanding the problem and for designing new algorithms or maybe setting parameters that make the algorithms work best. This is really where we get at new technology problems that we could solve with a clever algorithm that couldn't otherwise be solved or addressed at all or research problems that we can feasibly address with a good algorithm that we cannot feasibly address without it, and I'll give some examples of this later on. So we had some fairly complicated programs. This is our gambler simulation that we did early on is a example of nested loops. If you run this for large values of the parameters, it might take quite a while to run, and the question is when is it going to finish? How long is it going to take to finish? That's just one example. For many, many of the programs and algorithms we consider, we want to have some idea of when they're going to finish. So just to motivate this further, I'll give some specific examples. So first one is N body simulation problem. We looked at a program for doing that on a display, but, more generally, the value of N could be really large, like astrophysicists are interested in studying these interactions for huge, huge values of N. The algorithm that we looked at to solve this problem took time proportional to N squared. So that we call that quadratic time, and we'll talk about that. As the problem size increases, the time increases as a function of N squared. Well, in the '70's, this became a problem because that value way, way exceeded the limit on the available amount of time that was available in a computer. You can run your fastest computer for a year, and you still can't solve the problem for values of N that you'd like to solve it. But the success story is an algorithm known as the Barnes-Hut algorithm that solves the problem in NlogN steps, and therefore enabled scientists ever since then, and they still use this algorithm, to push their understanding of this problem to huge, huge values of N, really enabling new research on understanding properties of the universe. This algorithm was actually invented in research started by Andrew Appel who's a professor at Princeton but did this for his senior thesis as an undergraduate. So that's a perfect example of the importance of finding an efficient algorithm to solve a problem. Here's another one, the discrete Fourier transform. This is a very fundamental calculation. In signal processing, I'm going to break down a waveform into its periodic components. Again, the straightforward algorithm for solving this problem is quadratic in the problem size. Again, that algorithm quickly hit the limit of available time on computers even as early as the '50s and '60s. So you couldn't really do this to address the commercial problems that people cared about. But turns out there's an algorithm known as the FFT, the fast Fourier transform. It actually has origins in classical mathematics dating back to Gauss but was popularized by John Tukey, Cooley and Tukey in the 1950s. That algorithm uses NlogN steps, and so therefore ever since that time, people have been able to push the size of problems they can solve FFTs on to a very huge limit way below the limit on available time. That's the basis for a lot of the technology that we have surrounding us from MRI machines to music players, and game players in cell phones is based on our ability to solve this problem quickly, another example of the importance of having a fast algorithm. So, really, the design of fast algorithms is a subject of a later course, but for now, what we want to do is at least provide a basis for how you can understand the performance characteristics of your programs. Now I need one technical thing, we use binary logarithms in a lot of our analysis, and I just want to get this notation out of the way. So, the binary logarithm of a number is the number x such that two to the x equals N, and it grows very slowly. Any computer scientist knows this. This is the number of bits that is needed to express N in binary. So, any computer scientist knows that two to the 10th is about 1,000 of the binary logarithm of two to the 10th is about 10 is exactly 10. Two to the 20th is about a million and a binary log is 20 and a billion is 30. So that's just a really quick computer science calculation, when we say log, think a billion, think 30. This is a much smaller number than a billion is the main point. So, like when we had this recursive convert program that converts a number from a decimal to binary or prints out the binary representation, that's actually the largest integer equal to log N, some number of bits there, and we write that as a floor of log N as largest integer less than or equal to log N. We can prove that by induction. We do actually talk about that later on. So, the number of bits in the binary representation of N is about binary log of N, and it rises often in the study of recursive problems, and including like the FFT and the Barnes Hut. So, this divided half idea is what brings out a logarithm and we'll see examples of that later on in the course. So, what we're going to use as an example to study performance is called the three-sum problem. It's a very simple problem. Given N integers we want to know how many of the triples those integers sum to zero, they could be positive or negative. Now, these other processes that we might want to perform on those triples, but for now, for simplicity, we're just going to count them. Let's say our problem is to find out how many there are. So, this problem actually turns out to be a very fundamental problem in computational geometry. That's a computer algorithms to process a geometric algorithms. If you have a bunch of points and if you want to know if some of them are collinear, you have to be able to solve this problem. Or even does a polygon fit inside another? That's you could imagine important in robot planning and other types of applications. There's no way to fit that rectangle inside that polygon or that star, but that star actually turns out and now fits in there. In order to be able to solve that problem, you need to be able to solve this three-sum problem. We want to talk about the details of that, that's just motivating the idea that a simple problem like this can be important in lots of practical application. It's a very long list of applications where people want to be able to solve this problem efficiently. So, this is just the set-up for it. So, we want to have a method count that takes an array as argument and returns the number of triples of elements in that array that sum to zero. Our test client will read the array and then call the count function. So for example, with those six integers, 30 minus 30 minus 20 minus 10, 40, and zero, there's three triples that sum to zero. 30 minus 30 is zero, 30 minus 20 minus 10 minus 30 minus 10 and 40, out of all the triples those are the ones that sum to zero. So, that's the behavior that we want from our implementation. So, now we'll look at implementation. So, the question is can we actually solve this problem if we have a million points? Plenty of these applications you might have a million points. Now, we're not going to study that problem, that question is actually open, but we'll look at a simple brute-force algorithm for solving this, just as an example for understanding performance of programs. So, what this algorithm is going to do is process all possible triples and then just keep a counter and when the sum is zero, it'll increment the counter. So, we get our number of elements initialize our counter to zero, and then we have triple for loops, for i from zero to N, from j, from i plus one to N, for k from j plus one to N, so, we keep i less than j less than k to avoid processing the triples multiple times, that's already a simple performance improvement, and then we test if those three ai, aj, ak sum to zero, and if they do, we increment our count and then that's what we return. So that's a brute-force algorithm that implements, it solve the three-sum problem. So, here's a simulation of the trace of this thing running. First value that we tries is i equals zero, j equals one, k equals two. Doesn't sum to zero then we increment k. Then in black are the ones that do sum to zero. So, i0, j1, k5 sums to zero, then we're done with k, and we increment j and so forth. So, this trace shows that we enumerate all triples, and we find the ones that actually sum to zero. So, our goal now is going to be to try to understand something about the performance of this program. Could we run this program for N equals a million and solve this problem for a million numbers? That's what we're going to want to look at next. How much time it's going to take for a million? Now, one thing to notice right away is that there are N choose three triples with i less than j less than k, and so, that's going to definitely come in to the analysis.