I have a question regarding the time between serial streams and parallel streams. In the following example, the serial stream processes the results faster than my parallel stream. Can anyone explain this behaviour please? Or point to me what I am doing wrong?
The IO cost of printing the value takes far longer than the CPU cost. The parallel version has the additional cost of parallelizing it. But since CPU isn't the primary resource, it doesn't provide a benefit.
Note that this code doesn't do what you think it does. You are creating a stream of one element; the number 13999999, And you are timing the looping through of creating those tiny streams. This isn't right. You really want to create one larger stream and start measuring time after it is created. Also, testing with a lambda that doesn't print will show you a different scenario.
Jeanne Boyarsky wrote:Helene,
You really want to create one larger stream and start measuring time after it is created. Also, testing with a lambda that doesn't print will show you a different scenario.
I think this might be more along the lines of the proof you're looking for, Helene. Also note that I removed the IO operation as Jeanne suggested; even with a single System.out.print("") statement in each of the lambdas, the run times increased dramatically (see my comments for the actual numbers).
I think it's fun to look at the performance on these different streams given their operations (with and without IO) and their sizes. For one million records without IO, processing the serial stream was much faster. However, one hundred million records without IO were processed much faster in parallel than they were in a serial manner. Your mileage will vary of course, but I found this to be an interesting baseline for serial/parallel streaming. I hope this helps!