In this lecture, we take a closer look at the different parallel collection types. We will see the parallel collection hierarchy, and learn about the specifics of different parallel collections. We start by recalling the main hierarchy in sequential collections. Traversable[T] is the root type for all collections and defines many collection operations, each implemented in terms of for each loop. Each collection that implements this type must also implement the method for each. Iterable[T] is a subtype of traversable[T]. Iterable collections must implement the iterator method. The iterable[T] type defines a richer set of collection methods than traversable. Seq[T] is a sub type of iterable[T] which defines sequences of elements of type t. A sequence is an ordered collection in which every element is assigned to an index. Set[T] is an iterable subtype that describes set collections, which cannot contain duplicate elements. Finally, Map[K, V] describes collections that map keys of type K to values of type V. They also cannot have duplicate keys. These collection traits are then extended by concrete collections. Let's next take a look at the parallel collection hierarchy. Traits ParIterable, ParSeq, ParSet, and ParMap are the parallel counterparts of the different sequential collection traits. Collection operations on objects that implement these traits generally execute in parallel. For code that is agnostic about parallelism, there exists a separate hierarchy of generic collection traits. GenIterable[T], GenSeq[T], GenSet[T], and GenMap[K, V]. Methods in these collection types may or may not execute in parallel. The following figure shows this collection hierarchy. On one axis, different collection types are subtypes of each other. For example, the Seq type is a sub type of the Iterable type, and the ParSeq is the subtype of the ParIterable type. On another axis, sequential and parallel collection types are subtypes of the corresponding generic collection types. For example, both Seq and ParSeq are the subtypes of the GenSeq type. Generic collection traits allow expressing code that may or may not execute in parallel. As an example, consider the method largestPalindrome, which, given a generic sequence of integers, finds the largest palindrome number. One example of the palindrome is the number 15,251. Read backwards and forwards, it is the same number. Within the largestPalindrome method, we use the aggregate operation. Since we are looking for the largest palindrome we choose the smallest integer as our mutual element and the math.max function is the parallel operator. The sequential operator checks if the number is a palindrome and if it is bigger then the previously observed palindrome. We can invoke this method by passing a sequential collection, or in this case, an array. We can also invoke this method using a parallel collection, in this case we invoke the .par operator on the array. Next, let's take a closer look at the .par operator. As we already know, a sequential collection can be converted to a parallel one by calling .par. Let's measure the running time required two different sequential collections to parallel ones. Let's take a look at the source file for benchmark. At the beginning of the conversion program, we instantiate the standard scalameter configuration. We set the minimum and maximum number of warmup runs, and the total number of iterations. In the main program, we then measure the running time required to convert a list to a parallel collection and the vector to a parallel collection. We then print the difference. Let's run the conversion program and see what we get. We can notice that converting a list takes considerable time. The average conversion time for the 1 million element list was 128 milliseconds. The difference is several orders of magnitude. Arrays and array buffers can easily be parallelized and their corresponding parallel collection is called ParArray. Ranges can be converted to parallel ranges and are used in parallel for loops. As we saw, a parallel vector is the counterpart of the immutable vector collection. Immutable parallel HashMaps and HashSets can be converted to parallel HashMaps and HashSets. Similarly, mutable HashMaps and HashSets also have their parallel counterparts. Finally, the concurrent TrieMap collection can also be efficiently converted to parallel TrieMap. For other collections, par creates the closest parallel collection. As we saw, the list, as an immutable sequence, is converted to an immutable parallel vector. The conclusion is, unless the conversion to a parallel collection takes a negligible amount of time compared to subsequent parallel operations, then pick your data structures carefully and make sure that they are parallelizable. Now, consider the following snippet. The defined intersection method, which takes two generic sets a and b, and returns another set as a result. The method starts by creating a mutable set and then traverses the set a. For each element x in set a, the method checks if b also contains that element x. In which case, it adds x to the resulting set. The program then calls intersection for two sequential integer sets. Each obtained by constructing a range collection and converting it into a set. Finally, the program calls the intersection method again using two identical parallel sets. The reason why this works is because a both sequential set collection and a parallel set collection are a subtype of the generic set collection. This means that we can use them as arguments to the intersection method. So here's a quiz for you, is this program correct? So let's see. The program seems to work correctly for sequential sets. However, if the set a is a parallel set, then this for loop executes in parallel and concurrently accesses the result set. Since this column mutable.Set is not a thread-safe class, concurrent modifications can corrupt its internal state. The program is therefore not correct. Let's run this program and see what happens. We start by taking a look at the source code. The main method exactly corresponds to what we saw in the previous snippet. We first define the intersection method and then inload intersection, twice once with sequential sets and once with parallel sets. At the end, we take the results of the intersection and print out their sizes. After running the program once, we get the same result in both cases, 250. This is consistent with what we expect. Since the first set was constructed from a range of elements from 0 to 1,000, and the second set was constructed from the same range but taking only the every fourth element, the result that we expect is 250. We might conclude that the parallel version works correctly. But this was only by accident. It turns out that if we run the program again, we get an incorrect result for the parallel case, 246. Running the program two more times, shows that is entirely non-deterministic. This is because arbitrary modifications to the mutable set can arbitrary corrupt its state. In fact, we are lucky that it didn't crash the program. This snippet demonstrates an important thing to remember when using parallel collections. Avoid modifying the same memory locations without proper synchronization. The mutable set in this example represents the same memory locations. Since we do not know anything about the set's internal representation we have to assume that two invocations of the += method can modify the same memory location or otherwise leave the set in some inconsistent state. A solution to this problem is to use a concurrent collection, which can be mutated by multiple threads simultaneously. In the following snippet, instead of creating this column mutable.Set, we create the java ConcurrentSkipList collection. This program now runs correctly. Another solution for the intersection problem is to avoid side effects altogether. We can often achieve this by using the right combinators. In this case we can use the filter combinator. We could just filter the elements from a which are also contained in b. Since [INAUDIBLE] sets in this case also have the type from int to Boolean, we can use the set b as an argument to the filter operation. If we want to be additionally smart about this, we can check which of the two sets is smaller, and then traverse the smaller set. Another rule is to never modify a parallel collection if a data-parallel operation is in progress. In the following example, we create a cyclograph in which every node has exactly one successor. The graph is, in this example, represented with a mutable map. Where the keys are the vertices and the values are their successors. The graph is then contracted in parallel. Each node's successor is replaced by the successor's successor. This is done in the following parallel for each loop in which for every pair of a node and its successor, the successor is replaced with the successor's successor. Finally, to check if the program is correct, we try to find all of the nodes for which the successor node is not two steps away. As you can see in the figure, after the contraction, every node should have the successor, which is exactly by two larger than itself. The only exception are the nodes at the very end. For those, we have to do the division, modulo graph.size. If the program is correct, the printed violation should be none. We take a short look at the source code, which exactly corresponds to what we saw before. Now let's run the program. The program has found a violation. Instead of pointing to the node 41,625, this node points to the node 41,627. If you run the problem again, you'll notice similar violations. The program has found some violations of our constraint, meaning that it does not run correctly. In fact, there are two errors in this program. First, we are modifying the same collection that we are traversing in parallel. In some cases, this can even cause the program to crash, since the data structure can be observed in a corrupted state. Second, we are reading from a collection that is concurrently being modified by some other iteration of the loop. Again, since the mutable map is not a thread-safe collection, we risk crashing the program or getting incorrect results. The constraint violations that we observed are a result of these errors. So the lessons learned are the following. Never write to a collection that is concurrently traversed. And never read from a collection that is concurrently modified. In either case, the program can non-deterministically print different results or crash. The TrieMap collection is an exception to the previous rule. This concurrent collection automatically creates atomic snapshots whenever a parallel operation starts. So concurrent updates are not observed during parallel traversals. TrieMap also provides the snapshot method, which efficiently creates a clone of the map. Using the following snippet, we create a concurrent TrieMap instead of a regular mutable map. This ensures that when the for each loop starts, the graph state is atomically captured. Modifications to the graph are not observed during the traversal, and for each traverses only those node successor pairs that were in the graph at the beginning. Second, in this example, we also use the snapshot method separately to grab the previous state of the graph. This previous state is then used to find the successor's successor. In this example, the red and the purple snapshots shown on the left and the right are never modified. The only collection that does get modified is graph. Taking the snapshot is very efficient. It happens in constant time. So let's see the demo. This time we run the ParallelTrieMapGraphContraction program. Running the program this time shows that there are no violations. Let's try it a few more times just to be sure.