Campbell Ritchie wrote:Remember that you can parallelise a Stream object and that will sort out all its own multithreading for you.
Sure you can. Streams are an awesome feature that work very well for some uses. They might however have some problems. Java 8 streams mix three different concepts: laziness, monadic structures and automatic parallelization.
Automatic parallelization is related to streams in the sense that it consists in breaking a collection in chunks that can be processed in parallel. This could be applied to Java lists. The stream implementation of parallelization has some problems. It is simple if you use it with the default configuration. It will break a collection of tasks in smaller tasks and feed them to worker threads, using the fork/join framework to resynchronize the result. It uses work stealing in order to balance the load among the workers. How many workers? As many as you computer as virtual cores less one (for the main
thread). And there is a unique pool for the application. Does this make sense? Most often not. If you are writing a single threaded application, this solution is very powerful. But in a multithreaded application, you may have trouble. If one tasks is very long lasting, it could block not only the other tasks that where part of the parallelization, but all the tasks belonging to any other automatically paralellized process. To avoid this, you would have to provide your own pool of threads for each parallelization, which is much less simple because it has not been designed for this.
Stream is one of the most important monadic structure in Java 8. Java lists are not monadic, so if you want to use a
List as a monad, you simply call the
stream() method on it t get a
Stream. Generally, you don't care about the fact that it could be parallelized, since you won't bet any performance improvement for small collections. But some very useful methods are missing for this kind of use, for example
takeWhile. From the answers that I could obtain about the reason for omitting this method (and many others) it seems that it would have made automatic parallelization much more problematic. It would certainly have, but this is a real problem since we can't add any method to
Stream.
In my opinion, the main interest of streams is laziness and not paralellization. Everybody knows that laziness allows handling infinite lists, but this might not be the main interest. Streams allow chaining functions such as:
where f, p, g, h are respectively functions from A to B, B to Boolean, B to Stream<C> and C to D, and c is a Consumer<D>
If this would be applied to a (monadic) list, it would involve traversing the list five times. Due to laziness, the stream will be traversed only once. We could compare this to iterating with a loop in imperative programming. Given the following functions:
this would be the imperative equivalent for a monadic list:
And here is the equivalent for a stream:
This is of course much more efficient. The functional version using stream would imply using a slightly different g function:
This is of course a huge improvement, but I feel that having put the three concepts (monadic structure, laziness and automatic parallelization) in the same class is not a good choice because it limits the power of stream as a lazy monadic collection. This is why I present these concepts separately in my book.