Hi. Welcome to today's video. Today, we're going to be talking about how to parallelize your code. This isn't a programming class. We haven't picked up on that already. But we've discussed a lot about why you should turn a serial program into a parallel program. What we haven't talked about though is how to actually do that. So let's discuss that a little bit today. It's going to be really hard for me to cover in a short video the exact details of how any particular code is going to be parallelized, but I can give you a general idea and some concepts that you might want to look at. You can parallelize code across cores on one node using shared memory or across nodes using distributed memory. Generally speaking, it's easier to parallelize using shared memory rather than distributed. So if you're looking at starting with some basic parallelism, a great place to start is by learning how to parallelize across cores rather than by across nodes. A lot of parallel programming is either supported directly by the compiler or by libraries that are going to help you. So it might not be as complicated to figure out parallelism as you may think. So why should you parallelize? A lot of people are going to say, "I don't have time to learn how to parallelize my code. It is going to take me forever to even figure out the concept and it's just not worth it. I don't have the time." But because a key part of this process is to have code run simultaneously on multiple processors, it will allow you to run your code faster or to solve a larger problem. If you take the time to learn how to parallelize your code in the beginning, you may have a time sync right away, but it's going to save you time in the end. Not only do you have the potential to have your code run faster, but you will also have picked up knowledge that in the future you can write your programs and in parallel manner from beginning rather than a serial manner. Remember, your program is just a series of executed instructions and it's only going to run just as fast as the slowest piece of code. Any of you have ever run a relay race for example, your group is only as fast as the slowest person in the group, and the same thing is true with your code. So if you have some set of instructions that's bottle-necking and running very slowly, your entire code is not going to be able to run very fast even if the rest of the code is fast. If you take the time to learn how to make that slow bottle-neck code run faster, your entire problem may be solved even faster. One thing to keep in mind, however, that we will see as we move through this course, is that sometimes there are situations that keep your parallelized code from running as quickly as we may expect. We're going to explore this in future videos. Keep in mind too that your goal may be simply to solve a larger problem that cannot be solved using only one core. Learning parallelization will be key for these types of problems. So what can be parallelized? Well, parallelizable codes are ones that need to be executed independently of instructions. So I've put two bits of pseudocode here as an example, and these are both examples of code that cannot be parallelized. The reason that they cannot be parallelized is because they have dependencies. One way to think of whether a loop can be parallelized or not is if you can take the iterations of the loop and execute them in random order rather than sequential. So let's take a look at the box on the left. I've written a loop here that iterates between one and 10. Within the loop is an equation that solves for the current value of a by adding the current value of b in a previous value of a. The dependency in this loop is from the a, i minus one element This value has to come from a previous iteration, and therefore, the loop cannot be executed randomly and give the same answer. For example, if i is two, then a of two is equal to b of two plus a of one, and a of one needs to have been calculated already, which it might not have if you're executing the iterations of the loop in random order. On the right-hand side, it's the same concept but written in a different way. In our loop, we're saying that b plus c is equal to a, and then e times a is equal to d. The reason that this is not parallelizable is because the answer to d depends on the answer that you're going to get from a. One thing that you might want to look at when deciding if you can parallelize your code is if you can use different algorithms or functions that are more suitable for parallelization, and sometimes in your programming language, that might be the case. Here I wanted to provide just a couple of examples of code parallelization that may be helpful for you to use to visualize this process a little bit better. The first one here is pipelining. In this particular example, we have a series of images that first needs to be generated, then colored, and then resized. You can parallelize this in a couple of ways. It is important to note that this, much like many sets of problems you are trying to solve, will have some instructions in the entire code that can be parallelized and some that will need to still be serial. Here, you cannot color the images until they are generated, and you also cannot resize the images until after they have been colored. But you could still parallelize parts of this code. You can either have multiple processors generate the images simultaneously, or have a subsection, of course, start coloring and resizing images after they have been generated. Either way, this entire problem should be solved faster. I also provide a second but similar example. Here we have an atmospheric model that needs modifications done to the radiation code. Once the code is changed, we need to run the code on all the resulting data we have. We can parallelize how the changes to the model are run across the data. These concepts are very similar to what we talked about in the data versus task parallelism video, if you'd like to review that. Lastly, how? Again, this is a little hard for me to talk about if I'm not looking at a specific type of language, but I can give you some general tips. In compiled languages, like C++, Fortran, etc., you can use either openMP or MPI. OpenMP provides shared memory parallelization. Again, this is typically easier to handle that MPI is. OpenMP parallelizes across cores on one node, again, using shared memory. MPI, or the Message Passing Interface, parallelizes across nodes, so that provides message passing and communication. It is a little more complicated to learn, but it is very useful if you want to and need to use as large-scale parallelization. When you're compiling in compiler languages, you need to make sure that you specify certain flags during that process or else even if you use openMP or MPI, it won't be parallelized. Scripting languages like Python and R a little bit different. A lot of times they have packages that you can use that might help you parallelize your code a little bit easier. As we move through this course, we'll learn many more details about how to parallelize in both compiled and scripted languages.