Win a copy of Serverless Applications with Node.js this week in the NodeJS forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Bear Bibeault
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Junilu Lacar
  • Paul Clapham
  • Knute Snortum
Saloon Keepers:
  • Stephan van Hulst
  • Ron McLeod
  • Tim Moores
  • salvin francis
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Frits Walraven
  • Vijitha Kumara

Reduce Question  RSS feed

 
Ranch Hand
Posts: 99
Java Linux Monad
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


I'm just wonder which Java containers work best with the above reduce in a parallel stream? Reading over the docs I would conclude that this reduce could be very inefficient with the wrong containers.
 
Marshal
Posts: 63822
209
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What do you mean by, “work best,” or, “inefficient”? Time complexity or space complexity? What do you mean by, “containers”? Do you mean collecions?

Don't know. Try in books like Modern Java in Action (Urma Fusco and Mycrot, Manning) or Mastering Lambdas (Maurice Naftalin, Oracle Press) or Core Java 11/e vol II (Horstmann, ??Pearson??). If you are reeducing to something small, e.g. total, you needn't worry. If you want collections, it is likely that an arrayed list will perform faster than a linked list; you can fill elements 1024‑2047 before elements 0‑1023 very quickly with something simlar to System#arraycopy.
 
Saloon Keeper
Posts: 9996
208
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know what led you to believe that. Nothing in the documentation states anything like that, and when reasoning about how spliterators work it also wouldn't make sense that this function works less efficiently than other similar stream operations, depending on the backing collection type.

The only thing that the documentation says about efficiency regarding this method, is that using this overload may be more efficient than calling
separately.

This is the case when you have two different accumulators available:

You'd then prefer to do this:
over:
 
Gerard Gauthier
Ranch Hand
Posts: 99
Java Linux Monad
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was concerned about collections that sort their input and the divide and conquer ideas behind parallel streams. How does collections work with a parallel stream when the input is sorted? My concern is the possibility of sorting data over and over again.
 
Campbell Ritchie
Marshal
Posts: 63822
209
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Gerard Gauthier wrote:. . . . How does collections work with a parallel stream when the input is sorted? . . . .

That sounds like implementation details. You may find something in the book by Urma Fusco and Mycroft I quoted earlier.
 
Stephan van Hulst
Saloon Keeper
Posts: 9996
208
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Collections provide streams through spliterators that are aware of the implementation details of the collection they're streaming. A TreeSet for instance would be very easy to process in parallel because the spliterator can just use subtrees as split points. A LinkedList's spliterator would iterate over half of the list nodes it's responsible for to create a sub-spliterator.

I don't know why sorting in particular worries you. Why do you think that data would be sorted over and over again?
 
Campbell Ritchie
Marshal
Posts: 63822
209
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If it is necessary to sort separate streams in parallel, they would surely be merged the way you merge subarrays in a merge sort?
 
Stephan van Hulst
Saloon Keeper
Posts: 9996
208
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you mean separate accumulations, rather than separate streams?

When a stream is split by a spliterator, it yields a second spliterator, not a second stream. When you perform a reduction or collection operation on a stream, each spliterator will yield an accumulation, and all separate accumulations must be combined with each other. If you have a sorted stream and you're performing a parallel reduction on it, it will be guaranteed that all reduced elements from a prior accumulation will be less than all reduced elements from a succeeding accumulation.
 
Gerard Gauthier
Ranch Hand
Posts: 99
Java Linux Monad
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Do you mean separate accumulations, rather than separate streams?

When a stream is split by a spliterator, it yields a second spliterator, not a second stream. When you perform a reduction or collection operation on a stream, each spliterator will yield an accumulation, and all separate accumulations must be combined with each other. If you have a sorted stream and you're performing a parallel reduction on it, it will be guaranteed that all reduced elements from a prior accumulation will be less than all reduced elements from a succeeding accumulation.



I have to admit that a spliterator is a new term for me.

Does the data structure have to implement a spliterator to use the reduce functionality I posted?

 
Stephan van Hulst
Saloon Keeper
Posts: 9996
208
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes. But that's a given since you can't get a stream from a data structure in the first place if it can't create spliterators. And you can't call reduce() unless you have a stream to call it on.

All iterables implement a spliterator() method.
 
Gerard Gauthier
Ranch Hand
Posts: 99
Java Linux Monad
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Yes. But that's a given since you can't get a stream from a data structure in the first place if it can't create spliterators. And you can't call reduce() unless you have a stream to call it on.

All iterables implement a spliterator() method.



I retreated back to the basics of streams. I think I went too far too fast with streams and I think I should concentrate on Java collections(containers) which fully support streams and stream features and appreciate them(streams) from a higher level.

Spilterator... Sounds like a comic book villain. I am the Spliterator! I will divide you and conquer your smaller bits.

 
Campbell Ritchie
Marshal
Posts: 63822
209
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Gerard Gauthier wrote:. . . . I am the Spliterator! I will divide you and conquer your smaller bits.

Well, that is how a Spliterator works. To reduce it to its very simplest terms, it can split a source into smaller bits and iterate each of those bits. The parallel() method uses a Spliterator to divide the source into bits, maybe 1024 elements each , but probably a different size, and successive bits are sent to the different threads in parallel. As we said earlier, it is very easy for a Spliterator to divide an array, an arrayed list or a balanced tree into smaller parts. It is very awkward to divide a linked list into such smaller parts, because you have to iterate the list 1023× to find the 1024th element.
 
Stephan van Hulst
Saloon Keeper
Posts: 9996
208
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
While it's definitely more awkward to find a split-point in a linked list than it is in a balanced binary tree, keep in mind that with each sub-division only half of the current division has to be iterated, because the spliterator can keep a reference to the list's internal node that it is iterating from.

The end result is that splitting the list in an arbitrary number of sub-divisions is not any slower than simply iterating the entire list once, which has to happen anyway when you want to reduce all elements. This means that reducing the list in parallel would not incur any extra costs, other than the small overhead of creating extra Spliterator instances. Differences between reducing a linked list in parallel and reducing a balanced binary tree in parallel are the same as the differences between reducing a linked list sequentially and reducing a balanced binary tree sequentially. In the end, a simple reduction is always an O(n) operation.
 
Campbell Ritchie
Marshal
Posts: 63822
209
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But if you have to iterate the linked list in linear time (I forgot you can iterate it both ways), the time to split the list may outweigh any performance benefit you get from parallelisation.
 
Stephan van Hulst
Saloon Keeper
Posts: 9996
208
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hardly. The overhead is completely negligible compared to the overhead of thread scheduling.

It basically comes down to: Can your reduction be performed in parallel, and is the problem set large enough, or the problems hard enough, that the benefit of solving two problems at the same time outweighs the cost of scheduling the threads to do it?

How much time it takes to iterate the backing data source once is really of no consequence. Maybe if you're reducing a special linked list that acts as a proxy to a remote server, and you need a round-trip to access the next node. But whoever designed or uses such a class deserves everything bad that happens to them.
 
If I'd had more time, I would have written a shorter letter. -T.S. Eliot such a short, tiny ad:
global solutions you can do in your home or backyard
https://coderanch.com/t/708587/global-solutions-home-backyard
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!