Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Stream, Collector and execution speed

 
Edoardo Pasca
Greenhorn
Posts: 8
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hallo

I was fiddling around with streams and I thought I'd make an histogram out of a DoubleStream.
I programmed 3 different ways to collect the data (however equivalent). I found some unexpected performance and I'd like to ask your opinion.

The code is very simple: in the main method of the class, a list of numbers following the gaussian distribution is created. From this a stream is created, filtered and collected as a histogram, with 3 methods:
  • 1. I pass supplier, accumulator and combiner
  • 2. I create a Collector and pass it to collect()
  • 3. The collector is created by a static method in the Histogram class (exactly as method 2)


  • Now the problem is that it takes 1.665 s for method 1, 1.857 s for method 2 and 0.298 s for method 3. I found also that swapping method 2 and 3 do change the execution times! method 2 gets faster 0.3 s and the other one slower 1.8 s. Therefore there is something I'm missing here.

    Moreover, If I create a parallelStream out of the list of doubles the whole procedure takes about 15 s!!! Possibly the combine method is not so efficient... but still

    I write here the whole code, so that you can try it out yourselves. I tried to comment it as much as possible.

    Thanks

    Edoardo

     
    Pierre-Yves Saumont
    Author
    Ranch Hand
    Posts: 64
    15
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    This is due to optimization made by the compiler. The compiled code get optimized after having been run once or more. This is why swapping the order change the result. The first call is slower. You should allow the compiler to "warm up", for example:



    Here are the results I get:



    You can see that there is still a small difference when calling histo3 before histo2. This shows that all warming up should be done before any test, such as in:



    With this configuration, the result are the same whether you call histo2 or histo3 first.

    Also note that in theory, this code should not warm up the compiler, since it should detect that the calls to methods histo1, histo2 and histo3 has no effect beside returning the results, and these result are not used. So the method calls inside the for loops should simply compile to nothing. This does not happen here for some reason, but it sometimes happen, in which case you have to do something with the results of the warming up calls.
     
    Edoardo Pasca
    Greenhorn
    Posts: 8
    1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks to Pierre, at the end I've come to this version. I have written a Collector that extends DoubleSummaryStatistics because before creating the histogram one needs to know the min and max of the distribution. The HistogramCollector stores the data in a LinkedList so that it can access it after the summary statistics are known.

     
    • Post Reply
    • Bookmark Topic Watch Topic
    • New Topic