• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

What is this combiner supposed to do?

 
Ran Cohen
Greenhorn
Posts: 2
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't understand the last example here. The previous example,

Basically says - transform the String List to an Integer List based on the String length, and sum it.
The last example,

Look like - sum the list based on the String length, and then sum it again?
What is more strange is the example return 24. These examples,

All print 24 as well! Changing the identity or the accumulator changes the return value, but changing the combiner does nothing. How?
 
Pierre-Yves Saumont
Author
Ranch Hand
Posts: 64
15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The combiner is used to combine the partial results obtained when the reduction is parallelized. As long as you don't make the reduction parallel (by calling the paralle() method on the stream), the combiner will have no effect.  So no, the combiner is not a function for adding an additional element into a result as mentioned on the site were you found this example. It's always better to go to the source for such information. The javadoc for Stream.reduce says:

combiner - an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function
 
Campbell Ritchie
Sheriff
Posts: 51341
86
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

Well done telling us where that code comes from; don't take the blame for it yourself .
I think you shou‍ld probably not declare Streams; they are not usually used twice (indeed sometimes cannot be used twice), so it is usually a waste of time to declare them. Assuming that wordStream is created from a List<String>, your code would go something like this:-Line 1: You start off with the List.stream() method (or more precisely, Collection#stream() because that method is defined in Collection). Since you have a List<String> the stream method returns a Stream<String>, so it will go through all the names of the languages in turn. Or more precisely, it will go through the languages when the last method call runs because Streams use lazy execution exclusively. If you add line 1½
           .peek(System.out::println)
and line 2½ which will look exactly the same, and if I haven't misled you with incorrect code, you will probably see that; you get java 4 php 3, not java php 4 3.
Line 2 creates an IntStream. It is as you know an intermediate operation because it takes one sort of Stream and creates a different sort of Stream. Line 3 returns an int; since that is different from a Stream, that is called a terminal operation. Anyway, back to line 2. The mapToInt method creates an IntStream. You doubtless already know there are four kinds of Stream: Int Long Double and <T>. The mapToInt method needs a ToIntFunction of type String→int. Now, as you can see ToIntFunction is a functional interface, so you can use a λ expression instead of an anonymous class, and all you need is something which goes from the object being handled by the Stream at present which I called s, to an int; they are using the String's length() method. So you write s on the left, s.length() on the right and the arrow token in the middle:-
s -> s.length()
Since you are using a simple method call on the object in question, you can write the reference and method name and hardly anything else:-
s::length
instead. Note the :: and that the () are omitted. Now that returns an int, 4 from java, 3 from php, etc, which by the way seems to add up to 24.
Your reduce call in line 3 is a terminal function, taking an int called identity as its first parameter. Look on that as a starting value; you are passing 0, but if you passed −24, you would have got 0 as your final result. The second parameter is an IntBinaryOperator which takes two arguments; one of them will be the value of identity and the other will be the value of whichever int comes next in your Stream. It calls them x and y only I changed that to i and j, and adds them. Go through the links I provided; I think that to reduce is particularly helpful. It says that the operation must be associative, which means you don't have to provide the arguments in any particular order, so + is permissible but ÷ and − won't work; remember that i − j ≠ j − i.
So that method returns the total, and you assign it to some int value and print it out. If you count the letters in those language names, they add up to 24.

I think there is a much simpler way to write that expression:-I presume the awkward‑looking last line is intended to show you how to use the IntBinaryOperator.
 
Campbell Ritchie
Sheriff
Posts: 51341
86
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think part of he intent of the link is to show that the combiner doesn't change the value of the result; it might change its type between Integer and Optional<Integer> in the example on that link. In the case of an IntStream, the reduce method is similarly overloaded; the other overloading returns an OptionalInt.
 
Ran Cohen
Greenhorn
Posts: 2
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you all!
So now I understand that because its not parallel, it does nothing. But if it is parallel, what does the x - y do?
This code makes the elements be factored - every element is '*' with the second one (i.e. x*y*z*w*u*...).


But this code:

return 6 which is not applying '-' on each one (i.e. x-y-z-w-u-...).
So what is it supposed to do?
 
Pierre-Yves Saumont
Author
Ranch Hand
Posts: 64
15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is supposed to reassemble partial results. When the stream is parallelized, it is broken into several pieces and each piece is reduced, giving as many partial results that must be combined to give the final result. The key is  in "combiner - an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function". In this example, using anything else than addition makes absolutely no sense since the result will be no be predictable. The fact that  for * it results into 2304=4*3*6*4*1*4*2 is irrelevant. it could as well be (4 + 3 + 6) * (4 + 1 + 4 + 2) if the parallelization had resulted into a split in two substreams. And if Java 8 automatic parallelization was efficient, it should have result in no split at all since the stream is too short to make parallelization efficient.

which must be compatible with the accumulator function means that the type of the stream element with the combine operation must form a monoid. This implies that the operation must be right associative, which is not the case of subtraction. So It makes no sense to use it for the combine operation, even if the accumulator was the subtraction. You can see this by using subtraction for the accumulator and the combiner in parallel and non parallel streams, which will give different results (-6 and -24).
 
Pierre-Yves Saumont
Author
Ranch Hand
Posts: 64
15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:I think part of he intent of the link is to show that the combiner doesn't change the value of the result; it might change its type between Integer and Optional<Integer> in the example on that link. In the case of an IntStream, the reduce method is similarly overloaded; the other overloading returns an OptionalInt.


The signature of the method being:



it can't change the type of the result. The result is U and the combiner is a function from U to U. So there is no way for the combiner to change the type between Integer and Optional<Integer>. Changing the type of the result is the purpose of the finisher when using the collect method to reduce a stream using a Collector. The reduce method, on the other hand, does not use a finisher.

So the combiner can't change the type, but it can definitely change the value, which is a bug. When using parallel streams, one should test that parallelizing does not change the value.

 
Campbell Ritchie
Sheriff
Posts: 51341
86
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A few cattle for an interesting question and helpful replies
 
Pierre-Yves Saumont
Author
Ranch Hand
Posts: 64
15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
By the way, there are a few additional things that are worth noting:


and


Are two versions of a particular case where the collection (the stream) is reduced to a value of the same type as its elements. None of these methods use a combiner because the combiner to use for parallelization is identical to the accumulator. For example, if a stream of integers is reduced to an integer through addition, it is obvious that if we split the stream in two and reduce the two parts by addition, the two partial results will have to be combined through addition.

So these methods are not versions without a combiner, but versions with an implicit combiner.

The difference between the two methods is that in the first case, we have no value to provide as the result if the stream is empty, whereas in the second, we would simply return the identity parameter .  The consequence is that



is equivalent to



If you think about Optional as a kind of Stream with 0 or 1 element, we could change the name of the method:



where the orElse method would be called reduce and would not need an accumulator. The accumulator would not be needed because it would never be used: either the stream would contain 0 element, and we would return the identity (ignoring the accumulator) or it would contain 1 element, and accumulator(identity, element) would be equivalent to element (this is the definition of an identity element for a given operation), so we would once again be able to ignore the accumulator.

Conclusion: orElse is the reduce method of Optional. Optional and Stream have something in common, which is that they are “reduce-able” through the same process (a more adequate term would be “foldable”). By the way, they share much more than this. They both are Monads.

The fact that they are two different classes make some things obviously simpler (from the implementation point of view) but it makes other things more complicated. In particular, combining Stream and Optional is more difficult.

Imagine a Stream of (unevaluated) elements that, when evaluated, may evaluate to Optional<T>. This would result in a Stream<Optional<T>>. Depending upon the use case, we could want to keep only the values that are present. For this, we would just have to filter the stream:



Alternatively we could want that all values were present, resulting in a Optional<Stream<T>> which would be empty if at least one of the original element was empty.

If optional values were represented by a Stream of 0 or 1 element, transforming a Stream<Stream<T>> into a Stream<T> would be as simple as:



Transforming a Stream<Optional<T>> into an Optional<Stream<T>> is more complicated. Interestingly, this operation is a reduction, since it will transform a Stream into a single Optional value. So there should only be one function to create.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic