Scott Selikoff wrote:I think this sentence summarizes the discussion nicely. This seems to be a semantic argument to me. Generally speaking, concurrent and parallel are often used interchangeably. Yes, you can have a parallel stream with only one thread, but then it's not really behaving like a parallel stream; its behaving like a serial stream. Likewise, you can have a stream that you declared as parallel and expect to be performed concurrently, but some stream operations can force the stream to be processed in a single-threaded manner.
Thanks for the reply, Scott. I think, you misunderstood my point though. While I agree, that a lot of developers will use the words "parallel" and "concurrent" interchangeably, they do actually mean something different. And in the case of collecting Java streams, these words have a subtle, but definitely distinct difference in meaning:
Parallel and NOT concurrent collecting: Is (or at least can be) processed by many threads in parallel, DOES preserve order, DOES NOT modify the collections used for collecting in several threads concurrently.
Parallel and concurrent collecting: Is (or at least can be) processed by many threads in parallel, DOES NOT preserve order, DOES modify the collections used for collecting in several threads concurrently.
Both uses multiple threads! But while concurrent collecting is always parallel, parallel collecting is not always concurrent.
The JavaDocs are actually very precise about this - try and search
http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html and
http://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html for the
word "concurrent" - this word is only used in very few places, and only in relation to concurrent Collectors. I doubt that is a coincidence.
For a practical example, try and run these classes:
These classes are identical except for one using the three argument collect method and one using the one argument method with a Collector with the CONCURRENT characteristic. My results from running these are:
Parallel non-concurrent collecting test
Common ForkJoinPool size pre-collect: 0
Common ForkJoinPool size post-collect: 3
Resulting set size: 100000
Parallel and concurrent collecting test
Common ForkJoinPool size pre-collect: 0
Common ForkJoinPool size post-collect: 3
Resulting set size: 94157
As the size of the common ForkJoinPool (that is used by parallel streams) clearly show, several threads are used in both tests. But it can also be seen on the size of the resulting set, that there are only concurrency issues because of using non-concurrent HashSets, when using a Collector with the CONCURRENT characteristic. That is because the three parameter collect method does not modify the HashSets used for collecting concurrently, even though the collecting is processed by several threads in parallel.
As stated in my original post, your book fail to explain this subtle difference between the concepts, which is a shame (but not critical, I suppose, since the OCP exam probably doesn't go into this level of detail). Also the statement "You should use a concurrent collection to combine the results, ensuring that the results of concurrent threads do not cause a ConcurrentModificationException" in the paragraph "Combing results with collect()" regarding collecting parallel streams with the three argument collect method, is flat out incorrect, because the three argument collect method is NOT concurrent, even when the stream is parallel and processed by multiple threads.
The JavaDoc also clearly states this under the three argument collect method: "Like reduce(Object, BinaryOperator), collect operations can be parallelized without requiring additional synchronization."