In this lecture, we study the parallel fold operation more closely, to understand its advantages and some limitations. We will also take a look at some alternative forms of this operation. We already saw one of the use cases for fold. Computing the sum of the integers in an array in parallel. Method max is another example. Given an array of integers, find the integer with the largest value. To implement this method, we need to select the correct pair of a neutral element and the folding function. Take a few moments and try to implement the max method yourself. Let's see the solution. If you used the minimum integer as the neutral element and the math.max function for folding, your solution is correct. Alternatively, you could have used the following function. This function does the same thing as math.max. Let's consider these two fold calls and what is common to these neutral elements and functions. First, applying the function to the neutral element and some other element, x, always returns the same element x. For example, 0 + 117 = 117. Second, both of these functions are associative. For example, 1 + (2 + 7) is the same as 1 + 2 + 7. Let's consider another example. In a children's game of rock, paper, scissors each participant simultaneously chooses one of the three choices. Rock always beats scissors, scissors always beat paper, but the paper beats rocks. We are given an array of strings, each being either paper, rock, or scissors, and we are supposed to determine which one of them won. We could be tempted to use the parallel fold operator to determine the winner. We could choose the empty string as a neutral element, and the function play as a folding operator. We implement play as follows. We sort the two input strings and then pattern match the different cases. Between paper and scissors, scissors wins. Between paper and rock, paper wins. And between rock and scissors, rock wins. If the two choices are equal, then return either of the two choices. The neutral element can mean no choice whatsoever, then always the other choice is the winner. Let's execute this fold invocation and see what we get. Since fold runs in parallel, the calls to the play function could execute such that one processor runs play on paper and rocks, in which case paper wins. And the other processor calls play on paper and scissors. In which case, the scissors wins. The overall winner is then scissors. But the data parallel scheduler is allowed to organize this reduction tree differently. It could first call play on paper and scissors, and then on rock and scissors, and then on paper and rock. In this case, first scissor wins. Then between rock and scissors, rock wins. And finally, between these two, paper wins. And the overall winner is paper. It looks like the result of this fold invocation depends on the execution schedule. Why does this happen? The problem is that the play operator is not associative. Calling play on rock and scissors, and then paper, is not the same as invoking it on rock with the result of play of scissors and paper To ensure the default invocation works correctly, its operator f must be associative. And when applied to the neutral element, z, it must act as an identity function. We say that the operator f and the neutral element z must form an algebraic structure called a monoid. Commutativity is not important for fold when invoked on sequences. The operator may or may not be commutative. Let's consider another problem. We want to use fold to count the number of vowels in an array of characters. We could do it like this. Since we are supposed to return a number of vowels, we set the neutral element to 0. We then set the folding operator to increase the count by one if the character is a vowel and not increase it otherwise. So here's a quiz for you. What does this snippet do? The first option is that the program runs correctly and returns the correct vowel count. Then, the program could be non-deterministic in the sense that it returns different values on different runs. It's possible that the program also returns an incorrect vowel count. And finally, this program may not even compile. So take a couple of seconds and think about it. The correct answer is, this program doesn't even compile. How do we know this? The signature of the fold operation says that the accumulation type must be the same as the type of the elements in the collection. That means that this neutral element has to have the same type as the elements in the collection. Elements in the collection have type character. Whereas the neutral element that we chose has the type int. Obviously, the program will not compile. So if you chose this option, you were correct. Unfortunately, this does not work. The fold operation can only produce values of the same type as the collection that it is called on. This means that the type of the accumulation can only be a character, and 0 is not a character. This was not the case with the foldLeft operation, which allowed arbitrary accumulation types. Although foldLeft cannot run in parallel, it is more expressive than fold. Notice that it is easy to implement fold in terms of foldLeft. This trade-off between expressiveness and parallelism is unfortunate for us, so we will need a more powerful data parallel operation. One such operation is called aggregate. Its accumulation type can be of any type B, just as it was the case with foldLeft. It takes a neutral element of type B, a sequential folding operator f, and a parallel folding operator g. How aggregate works is best illustrated by the following figure. Given a collection of elements of type A, aggregate divides them into subsets on different processors. Processors then concurrently use the folding operator f to fold their subsets as if they were executing a foldLeft. Then, each processor turns an accumulation of type B. These values are then combined in a parallel reduction tree using the second folding operator g. Let's use aggregate to count the number of vowels in the character array. We can now provide the neutral element 0, the operator that we have seen earlier, and the parallel reduction operator, which is in this case just addition. Again, the parallel reduction operator and the neutral element should form a monoid. The aggregate operation is very general. Many other parallel collection operations can be expressed in terms of aggregate. So far, we saw only accessor operations. Examples were sum, max, fold, count, and aggregate. However, from sequential programming in Scala, we also know another class of collection of operations called transformers. Methods such as map, filter, flatMap and groupBy, can also execute in parallel. They have similar semantics, so we won't study them further in this lecture.