• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Modern Java Recipes: Performance in Stream

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ken Kousen. First of all, congratulation for the book, seems to be very interesting.

Nowadays people are working more often with streams, but what about the performance implications, sometimes I heard that use a for-loop has better performance (memory and speed) than use streams, there are some useful guidelines to know when use stream and when avoid them that you could share?
 
gunslinger & author
Posts: 169
16
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have an example in the book that deals with that, taken from a similar example in "Java 8 in Action" (which is being revised to be Java 8 and 9 in Action). In my example, I use the JMH profiling tool to evaluate the performance of methods that sum the first 10 million long values.

In one method, I simply add the longs in a loop. In another, I use a LongStream with a range and a sum method. Then I make that (sequential by default) stream parallel. Then I do the sum in the most inefficient way possible, by using a Stream<Long> with an iterate method and a reduce, and finally I do the same thing in parallel.

The results are that the sequential LongStream sum is even faster than the simple loop, though the difference is speed is probably not significant. Making it parallel doesn't help, but mostly because summing primitives is, as the kids used to say, wicked fast already. By contrast, summing the Stream<Long> is much, much slower, on the order of about 10 to 15 times. Making that one parallel actually makes the performance worse, mostly because using iterate with a limit is not an easy structure for the system to partition. The bottom line is that as long as you don't do something silly, like using a stream of wrapped values where a primitive stream is available, the performance is about the same as a regular loop. So go ahead and use streams to write your code, and then you can experiment with parallelization afterwards.

When is parallelization worth it? The general rules are: you need a stateless, associative operation (like addition), you need either a lot of data or a process that takes a lot of time on each element, and you need a source of data that is easy to partition. If those conditions apply, you're likely to see a benefit that exceeds the cost of splitting the work and joining all the individual results together again.

Trisha Gee (and if you don't know that name, look her up -- she's awesome) has published some studies that show similar results. As Brian Goetz likes to say, parallelization is an optimization. Get your code working sequentially first, and then see what you can do in parallel. The streams in Java 8 have been optimized enough to make it worthwhile to write your code that way and then try to optimize.
 
Kenneth A. Kousen
gunslinger & author
Posts: 169
16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In case you want to see the tests I'm referring to in the previous message, they're in the GitHub repo for the book: https://github.com/kousen/java_8_recipes . Specifically, I'm talking about https://github.com/kousen/java_8_recipes/blob/master/src/jmh/java/manning/ParallelStreamBenchmark.java .

I used three separate GitHub repos for the book: the "java_8_recipes" one just mentioned, a similar one called "java_9_recipes" for the Java 9 stuff, and one called "cfboxscores" for the larger CompletableFuture example I did. That last one involves downloading MLB boxscore data for all the games played on a range of dates concurrently and then post-processing them in various ways.

Anyone is of course welcome to anything in the repositories, whether you actually buy the book or not.
 
Saloon Keeper
Posts: 15485
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Great answer Kenneth. I'm really interested in getting your book.
 
reply
    Bookmark Topic Watch Topic
  • New Topic