This week's book giveaways are in the Jython/Python and Object-Oriented programming forums.
We're giving away four copies each of Machine Learning for Business: Using Amazon SageMaker and Jupyter and Object Design Style Guide and have the authors on-line!
See this thread and this one for details.
Win a copy of Machine Learning for Business: Using Amazon SageMaker and JupyterE this week in the Jython/Python forum
or Object Design Style Guide in the Object-Oriented programming forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Bear Bibeault
  • Paul Clapham
  • Jeanne Boyarsky
  • Knute Snortum
  • Liutauras Vilda
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Ron McLeod
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Joe Ess
  • salvin francis
  • fred rosenberger

Stream, Collector and execution speed

Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I was fiddling around with streams and I thought I'd make an histogram out of a DoubleStream.
I programmed 3 different ways to collect the data (however equivalent). I found some unexpected performance and I'd like to ask your opinion.

The code is very simple: in the main method of the class, a list of numbers following the gaussian distribution is created. From this a stream is created, filtered and collected as a histogram, with 3 methods:
  • 1. I pass supplier, accumulator and combiner
  • 2. I create a Collector and pass it to collect()
  • 3. The collector is created by a static method in the Histogram class (exactly as method 2)

  • Now the problem is that it takes 1.665 s for method 1, 1.857 s for method 2 and 0.298 s for method 3. I found also that swapping method 2 and 3 do change the execution times! method 2 gets faster 0.3 s and the other one slower 1.8 s. Therefore there is something I'm missing here.

    Moreover, If I create a parallelStream out of the list of doubles the whole procedure takes about 15 s!!! Possibly the combine method is not so efficient... but still

    I write here the whole code, so that you can try it out yourselves. I tried to comment it as much as possible.



    Posts: 160
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    This is due to optimization made by the compiler. The compiled code get optimized after having been run once or more. This is why swapping the order change the result. The first call is slower. You should allow the compiler to "warm up", for example:

    Here are the results I get:

    You can see that there is still a small difference when calling histo3 before histo2. This shows that all warming up should be done before any test, such as in:

    With this configuration, the result are the same whether you call histo2 or histo3 first.

    Also note that in theory, this code should not warm up the compiler, since it should detect that the calls to methods histo1, histo2 and histo3 has no effect beside returning the results, and these result are not used. So the method calls inside the for loops should simply compile to nothing. This does not happen here for some reason, but it sometimes happen, in which case you have to do something with the results of the warming up calls.
    Edoardo Pasca
    Posts: 11
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thanks to Pierre, at the end I've come to this version. I have written a Collector that extends DoubleSummaryStatistics because before creating the histogram one needs to know the min and max of the distribution. The HistogramCollector stores the data in a LinkedList so that it can access it after the summary statistics are known.

    Did you miss me? Did you miss this tiny ad?
    Java file APIs (DOC, XLS, PDF, and many more)
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!