• Post Reply Bookmark Topic Watch Topic
  • New Topic

Should Streams be Cloneable?

 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:That won't work Winston. max() is a terminal operation. After calling max(), filter() will throw an exception. Either your input needs to be a collection, or you need to collect the stream before you process it again.

Ah. I learn something new every time with this.

So each Stream has state, and is usable as a "stream" only once - which suggests to me that they ought to be Cloneable.

However, that simply changes the input of my method from a Stream to a Collection (unnecessarily, IMO).

Or is this where I could use spliterator()?

But thanks for the information. Have a cow.

Winston
 
Stephan van Hulst
Bartender
Posts: 6583
84
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Streams can't be Cloneable, because their data source might not be used more than once, e.g. a Stream based on an Iterator.

Spliterator is used to implement parallelism in streams. You can split a stream of elements in two and process the partitions with two spliterators, but each part can still only be traversed once.

You can still keep your method argument a Stream, but that just means you'll have to collect it before you perform the max() operation:

Seeing as that is a bit of a roundabout way of doing things, you might as well just accept a Collection.
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Streams can't be Cloneable, because their data source might not be used more than once, e.g. a Stream based on an Iterator.

I don't quite understand that. In fact, it suggests to me that it's eminently Cloneable - ie, I can go halfway through a Stream and then clone it based on its current position.

The only reason why I might not be able to do that would have to do with implementation - which I thought was something we OO bods don't have to worry about.

Furthermore, cloning a Stream before it's been used makes perfect sense, because then I don't need to worry about its source.

I worry a little that "functionality" - for all its upside - is forcing all sorts of new paradigms on me that I didn't ask for ... and also run contrary to what I understand as "Object Orientation".

Winston
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:You can still keep your method argument a Stream, but that just means you'll have to collect it before you perform the max() operation:
...
Seeing as that is a bit of a roundabout way of doing things, you might as well just accept a Collection.

Yeah, I think so.

And I still say it's unnecessary.

You know what I'd like to see? A complete description of a Stream in Java terms, not just an API.
I feel that Streams are a case where the Javadoc system as it stands simply breaks down, and until you understand WHAT a Stream is - ALL of what it is (including what it is NOT) - you will never learn to use them properly.

I have no problem with Unix pipelines, and I had no problem understanding "series" constructs in Smalltalk, which can be made up of hidden "term" methods or constructors, akin to IntStream.range().

Indeed, one of the first things I want to write is a space-compact Fibonacci stream (although it'll probably already have been published by the time I have).
And mine WILL be Cloneable.

Winston
 
Stephan van Hulst
Bartender
Posts: 6583
84
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:I don't quite understand that. In fact, it suggests to me that it's eminently Cloneable - ie, I can go halfway through a Stream and then clone it based on its current position.

What's the position of the following Iterator?

How do you propose to clone the following Stream?

The only reason why I might not be able to do that would have to do with implementation - which I thought was something we OO bods don't have to worry about.

It's a conceptual problem. A Stream represents a 'future form' of your data, if you will. If you could clone a Stream, that means you could create two different futures and live them both. That doesn't make conceptual sense.

Furthermore, cloning a Stream before it's been used makes perfect sense, because then I don't need to worry about its source.

But you do. Streams are intrinsically tied to their source. Case in point: If I make a Stream from an Iterator that reads words from a piece of paper and burns the piece of paper as it goes, the piece of paper will be gone after a terminal operation, and a clone of the stream will have nothing to work on.
 
Stephan van Hulst
Bartender
Posts: 6583
84
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:You know what I'd like to see? A complete description of a Stream in Java terms, not just an API.

I'm not sure what you mean, but have you already read the package summary? It's chock full of information: https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:What's the position of the following Iterator?

Don't know because, if you want a "position", then you need to know the seed.

And for a true RNG. it's "clone" is either itself, or - from your example - a new InfiniteRandoms object with a new RNG, created with whatever seed (or none) that was supplied to the original object.

Let me put the question back to you: I understand that one of the bases of Streams is "lazy" instantiation; but that aside, why shouldn't ANY Stream be able to supply a clone() method, with the caveat that, once used, it may take longer to clone?

Streams are dynamic Lists with Iterator-like APIs, that the "functionally inclined" designers of Java have given new powers; not some esoteric new form of life.

Winston
 
Stephan van Hulst
Bartender
Posts: 6583
84
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:And for a true RNG. it's "clone" is either itself, or - from your example - a new InfiniteRandoms object with a new RNG, created with whatever seed (or none) that was supplied to the original object.

Right, but how would the Stream know how to do that, if you called clone() on it?

Let me put the question back to you: I understand that one of the bases of Streams is "lazy" instantiation; but that aside, why shouldn't ANY Stream be able to supply a clone() method, with the caveat that, once used, it may take longer to clone?

What are you going to clone? A Stream can clone the operations that will occur before it, but it can't clone the source of data, because the source of data isn't cloneable in general.

Streams are dynamic Lists with Iterator-like APIs, that the "functionally inclined" designers of Java have given new powers; not some esoteric new form of life.

No. Streams are not collections of data. They are nothing like Lists.

Given the following code snippet:

Which is the procedural equivalent of:

Asking me to call clone() on the result of map(i -> i+2) is like me asking you to somehow 'clone' the following piece of code from the first snippet:

It doesn't make sense.
 
Stephan van Hulst
Bartender
Posts: 6583
84
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think here's where the crux of the problem is. You're trying to think of streams as objects. The fact that a Stream is an Object is a side-effect of that it's the only way to make lazy evaluation work in Java.

Calling collect() is a little bit like performing a 'strict evaluation' in a functional language, in that it takes a pipeline of operations back to the 'tangible world' in the form of a value. It's not good to think of the pipeline of operations itself as a value or an object.

I think your criticisms can essentially be reduced to the designers bringing functional programming to an OO language. If you don't want to use functional programming, you mustn't use streams, because they were specifically meant to enable functional-style programming in Java. Trying to impose OO on the Stream class is counter-productive.
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:
Winston Gutkowski wrote:You know what I'd like to see? A complete description of a Stream in Java terms, not just an API.

I'm not sure what you mean, but have you already read the package summary? It's chock full of information: https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html

I've already looked at it, and I have found it useful on occasions. But what I want to know is WHAT a Stream IS (in detail) and WHY I need it; not a diatribe on HOW it works. If I wanted that, I'd go to the API (or I should be able to).

This is a fundamental change to Java, and I hate to say (you Stephen, are excepted) Oracle have done an pitiful job of explaining it to me.

Remember: Streams are an extra-lingular extension to Java; NOT part of it, so in addition to understanding all these new "dynamic" concepts, we also need to understand:
  • What can go wrong with them.
  • How to debug them when they go wrong (and they WILL).
  • Paradigms for testing them.


  • You know me. I like version 8 - or rather the "goodies" - but I still don't feel qualified to use them ... particularly if I get it wrong (as above).

    Winston
     
    Piet Souris
    Rancher
    Posts: 1625
    35
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    @Winston

    my advice: just practise these streams as much as you can, then see if you think your objections are still relevant.

    If you want two exercises that caused me gray hares and tons of times, were:

    * given a Map<K, V>, create a reversed Map<V, Set<K>>
    * given a List<T>, make a Map<Integer, List<T>> where the integer is the frequency of all T's that have that frequency

    (java 8 profs are excluded)
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Stephan van Hulst wrote:Right, but how would the Stream know how to do that, if you called clone() on it?

    It wouldn't, which is my whole point.

    What are you going to clone? A Stream can clone the operations that will occur before it, but it can't clone the source of data, because the source of data isn't cloneable in general.

    Then make it clear that a Stream is only Cloneable in its initial state. I have no particular problem with that.

    Asking me to call clone() on the result of map(i -> i+2) is like me asking you to somehow 'clone' the following piece of code from the first snippet:

    No it isn't. it's telling an object to clone itself, with whatever member objects it has - hidden or otherwise.

    I didn't ask for these things, but they've been given to me; and if Java wants to continue to be able to copy objects then it should know how to copy a lambda or a method handle - even if it doesn't tell me how.

    Winston
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Piet Souris wrote:my advice: just practise these streams as much as you can, then see if you think your objections are still relevant.

    You're absolutely right, but I do feel that my (incorrect) example above highlights a weakness of Streams, because I'm unable to disconnect one from its source even when I haven't done anything with it.

    Wouldn't you like to be able to write a method that takes a Stream and returns one, "massaged" in whatever way you like, for some other method to use? Single use (and, it would seem, only single use) puts a major roadblock on that idea.

    Winston
     
    Stephan van Hulst
    Bartender
    Posts: 6583
    84
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I think I'm completely misunderstanding you. What does it mean to you to clone a stream?

    Say I have the following:

    How could you possibly clone words if you didn't know how to clone readAndBurnIterator?
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Stephan van Hulst wrote:I think I'm completely misunderstanding you. What does it mean to you to clone a stream?
    How could you possibly clone words if you didn't know how to clone readAndBurnIterator?

    OK, well first: I'm assuming that clone() is going to be part of an assignment, since it would be stupid (although legal) as part of a pipeline.
    And it's "intent" (IMO) would be to either:
    1. Preserve the Stream in it's "current" state.
    2. Preserve the attributes of a newly created Stream.

    Having thought about it (and read your posts), and not knowing what happens to Streams when methods are applied, it seems to me that #2 is infinitely easier - and in the above case, all I want; and I'd have no problem with a clone() (or copy()) method called at an inappropriate time throwing an Exception.

    What I DO want is a way of disconnecting a Stream from its source - ie, I want to be able to write methods that take and return Streams and/or use them more than once without having to worry about any creation overhead.

    What I wrote - incorrect though it was - seemed perfectly reasonable to me. Why not make it perfectly reasonable?

    Winston
     
    Paul Clapham
    Sheriff
    Posts: 21867
    36
    Eclipse IDE Firefox Browser MySQL Database
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Winston, it seems to me that everything you have said so far about the utility of cloning a Stream applies equally well to cloning an InputStream. To me those two use cases look pretty similar. And yet I believe you would reject the idea of cloning an InputStream and be able to provide cogent reasons why it doesn't make sense.

    Am I right?
     
    Piet Souris
    Rancher
    Posts: 1625
    35
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Well, you can clone a Stream (ehh, sort of...), but it is actually too ugly to look at. For instance:



    edit: but it has a fatal side effect of closing the input stream. Sighhhhh...
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Paul Clapham wrote:Winston, it seems to me that everything you have said so far about the utility of cloning a Stream applies equally well to cloning an InputStream. To me those two use cases look pretty similar. Am I right?

    You are indeed. Damn fine point.

    It still seems to me that one ought to be able to clone a Stream (or indeed an IS) in its initial state, but I'll have to slink off into a corner to think about it some more...

    Winston
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Piet Souris wrote:Well, you can clone a Stream (ehh, sort of...), but it is actually too ugly to look at:

    Yeah, I thought of something like that; except mine was even worse:'Orrible innit?

    Winston
     
    Stephan van Hulst
    Bartender
    Posts: 6583
    84
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Winston and Piet, both your examples ignore infinite streams.

    Both examples also appear to show that you believe a Stream is equal to its contents. That's not correct. A Stream doesn't have contents. A Stream represents a "future form" of data from a data source. In this regard Stream<Foo> is actually very similar to Future<Foo>, except that Stream can provide more than one, or indeed, a possibly infinite amount of Foo results, and the transformation is pull-based, not push-based. Now I ask you, what does it mean to clone a Future?
     
    Piet Souris
    Rancher
    Posts: 1625
    35
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Well, I had given up on the idea already, but okay, let's go on a little.

    A Stream may be abstract and/or a thing of the future, but terminating it makes it something real and now. But it does feel like
    using something for which it is not meant.

    No, I am not disregarding a possible infinite stream. Indeed, the method would give some problems I guess (never tried it)
    but that is not the fault of this method. It is the fault of the user trying to clone something infinite.
    Nothing prevents one from issuing: IntStream.iterate(0, e -> e + 1).toArray();

    But the thought of cloning a Stream never came up to me, so fior me the whole topic is a bit academic , and I will leave it to the experts..
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Stephan van Hulst wrote:Winston and Piet, both your examples ignore infinite streams.

    Well, in that case, so does toArray() - or indeed any 'final' method. I realise that Streams can be almost anything and come from almost any source - including a function like a prime-number generator. All I'm saying is that they presumably all have an initial state, and I'd simply like to be able to copy it while it's in that state. I'd liken it a bit to the tee command in Unix, except that I don't even need it to be that flexible (tee can be applied at any point in a pipeline).

    Winston
     
    Stephan van Hulst
    Bartender
    Posts: 6583
    84
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I'll concede the point about infinite streams.

    You talked about the state of a stream for a while now, but what is the 'initial state' of a stream?
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Stephan van Hulst wrote:IYou talked about the state of a stream for a while now, but what is the 'initial state' of a stream?

    Dunno. It'll depend on where it came from, but it has to come from somewhere - a List or array or function of some kind - and at that point, it's in its 'initial state'.

    I understand that it might be difficult to clone a Stream once functions have been applied, but it seems to me that I ought to be able to copy it "out of the box", ie:
      Stream<Integer> s1 = Stream.of(1, 2, 3);
      Stream<Integer> s2 = s1.clone();
    Perhaps "clone" is the wrong term, but it seems to me it would be nice to be able to replicate an "initial" Stream. Streams can tell when they're exhausted; why shouldn't they be able to tell when they're "unused"?

    Winston
     
    Mike Simmons
    Ranch Hand
    Posts: 3090
    14
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I think the idea of initial state of a stream is very dependent on what a given stream represents, and how it is generated. Only the underlying implementation would know how to clone that state - if it's even possible. Some examples:

    Cloning a stream of seeded pseudorandom numbers: easy enough, if the implementation provides access to the seed.

    Cloning a stream of truly random numbers: well, you can create a second stream with identical properties, but with different values. Not really a clone at all, is it?

    Cloning a stream of results read from a database: I suppose you could have the underlying implementation re-run the underlying query. Assuming the data hasn't changed. Or you could save a copy of all the date read from the stream, in case it needs to be re-played later. That's a potentially huge overhead for a rarely-needed operation.

    Cloning a stream of data representing a user's mouse position in (close to) real time: I think this would require saving the full stream to file in case you need replay. Again, big overhead if you didn't actually need that.

    In general, one of the big advantages of streams, for me, is that you don't need to have the whole thing in memory at once, like a collection or Map. I'd be very reluctant to introduce anything that changes that.

    With the current implementation, I can define an infinite stream by defining just one method, a Supplier<Foo>, and then use Stream.generate() on that supplier. With JDK 9, I can furthermore make the stream finite by defining a predicate and using takeWhile(). However, in order to make streams cloneable, I would also need to define a method for recreating the initial state of the stream. OK, this may well be easy or at least possible for some implementations. But not for all implementations, certainly - as in the above examples. And even if it's possible, is it worthwhile to force this requirement on all implementations? I don't think so.

    If we need to re-run a stream from its initial state, we should usually look to re-running the methods that created the stream in the first place. Often, that's easy. If it isn't easy, that may be an indicator that there are hidden issues that haven't been considered.
     
    Mike Simmons
    Ranch Hand
    Posts: 3090
    14
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Winston Gutkowski wrote:Streams can tell when they're exhausted; why shouldn't they be able to tell when they're "unused"?

    Telling when they're unused would be easy enough to implement. But figuring out how to recreate an initial state in a way that can be replayed identically on all clones is still highly problematic. Unless you've got infinite memory or disk space, and no concerns about the additional time to access them.
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Mike Simmons wrote:I think the idea of initial state of a stream is very dependent on what a given stream represents, and how it is generated. Only the underlying implementation would know how to clone that state

    It does need to be created though. And my thought was that, at that point, whatever created it has the information needed to re-create it, so why not just provide that information to the Stream?

    However, I concede that perhaps it doesn't have wide enough use to be practical; and given my definition of "initial", I guess you won't be too far removed from the source anyway, so just create two Streams (or as many as you need) directly from it.

    Thanks a lot guys - really appreciate it.

    Winston
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Winston Gutkowski wrote:However, I concede that perhaps it doesn't have wide enough use to be practical...

    As a final parting shot, it occurred to me that the original requirement can be done without "cloning", viz:but it seems a bit "clunky".

    It would also be inappropriate for an infinite Stream, so is there any way of "knowing" if a Stream is infinite? Nothing leaps out at me from the v8 API...

    Winston
     
    Campbell Ritchie
    Marshal
    Posts: 52516
    118
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Winston Gutkowski wrote:. . . the original requirement can be done without "cloning", . .
    Afraid that would not work. As Stephan has said (and this is a very interesting discussion) a Stream does not “have” any contents. It passes the contents on and then finishes with them. Once you have traversed the Stream, it has no means to reset itself at the beginning, so I think you will suffer some sort of Exception in line 5.
     
    Rob Spoor
    Sheriff
    Posts: 20817
    68
    Chrome Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    That lambda in your peek method won't be called until a terminal operation is called. In other words, longest will be 0, no matter what contents the stream have.

    No exception will be thrown, since no terminal operations are called yet, but the result will yield not the longest strings, but empty strings instead.
     
    Campbell Ritchie
    Marshal
    Posts: 52516
    118
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    We were both wrong. It won't compile because you are usin a non‑final local variable in a λ. Moving the length variable to a field, you get this:-Whether I put short Strings or long in and print the List; I get “[]”. As you said, Rob, and no exceptions.

    If I include an empty String "" in args, I get a 1‑element List containing the empty String, otherwise I get a 0‑element List. Yes, the peek() call is never used because there is nothing to pull data through the Stream. You can try combining the two calls into one and you get:-Putting an empty String and the names of our family as args, I get this:-
    The List [, Campbell] has length 2
    which tells me that you get the Strings n ascending order of maximum length to date, and also that I have the longest name in my family.
     
    Rob Spoor
    Sheriff
    Posts: 20817
    68
    Chrome Eclipse IDE Java Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Campbell Ritchie wrote:Moving the length variable to a field

    That should be a purely hypothetical situation, because otherwise each time the method gets called and a stream is consumed, the length changes. You can use an AtomicInteger or even an int[1] instead to overcome the non-final value. But of course, the fact that you'd have to use such workaround should be enough of a warning to tell you that this is a bad idea

    which tells me that you get the Strings n ascending order of maximum length to date

    That makes sense. That stream code is effectively the same as the following:
    So each String that is has the maximum length up until that String will be added.

    In short: the source of the stream (collection) must be iterated over before any filtering can be applied.

    With grouping and Optional you can achieve this in one statement, but you lose a bit of readability:
    Optional is required here in case there are no input Strings.
    (And yes, I know that the lambda in the map call should be a method reference, but I got a compiler error at first and then gave up )
     
    Mike Simmons
    Ranch Hand
    Posts: 3090
    14
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    As I suggested the other day in the original thread, there's an easy way to solve the original problem with reduce():
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Rob Spoor wrote:No exception will be thrown, since no terminal operations are called yet, but the result will yield not the longest strings, but empty strings instead.

    Hmm. Interesting. So you mean that if I call my method as follows:
      String[] longest = longest(listOfStrings.stream()).toArray();
    it will return an array of empty Strings - or simply an empty array?

    That seems counter-intuitive. The reason I wrote the method the way I did is that I don't want to have to know where the input comes from, not where the output is going (or indeed, when it gets produced); simply that if provided with a source of Strings it will output a Stream containing the longest of those Strings.

    @Mike: It seems that what I want is indeed a form of "reduction" operation; but I think that your suggestion will only return one "longest" String (correct me if I'm wrong), where the original requirement was to return all Strings that are the same length as the longest.

    Winston
     
    Winston Gutkowski
    Bartender
    Posts: 10571
    64
    Eclipse IDE Hibernate Ubuntu
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Campbell Ritchie wrote:Yes, the peek() call is never used because there is nothing to pull data through the Stream...

    Ah, right. And I guess the stream = Stream.peek(...) assignment might break the "production→consumption" chain. However, with Rob's suggestion, it could be a single pipeline, viz:and then my call above should work, no?

    I agree it's ugly, but to me it suggests a weakness in Stream's "reduction" paradigm, not to anything I'm trying to do.
    I want a reduction operation that acts as a final operation on the Stream being processed, but spits out another Stream for further processing. It doesn't seem so outlandish to me.

    Unfortunately, not being on v8 yet, I can't test my theory (my version of Mint still doesn't include v8 in its standard repository, and I haven't got around to installing it the "hard" way yet).

    Winston
     
    Stephan van Hulst
    Bartender
    Posts: 6583
    84
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    This won't work, because it will return a stream of strings that were longest at the time. That means that if I performed your operation on ["a", "ab", "abc", "xyz", "yz", "z"], it would return ["a", "ab", "abc", "xyz"] which clearly isn't what you want. Besides that, it's not a functional approach at all, because functions should not use mutable state. There's a reason that variables that are closed over in a lambda expression need to be final. Using a type like AtomicInteger to circumvent this is a hack that will lead to extremely brittle programs. Java provides a mechanism to work around this in a safe way: the collect() method, which appears clunky but really is the most elegant way to do stuff like this.

    The fact of the matter is that you first need to consider *all* strings in the stream to know what the length of the longest is. That means you need to evaluate the stream, which is a terminal operation, which means there is no way to continue processing without closing the initial stream. This is not a result of how streams were designed in Java. This is an information theoretic problem, true for *all* programming languages that support lazy evaluation.

    Consider this piece of Haskell, which is a purely functional language:

    When we evaluate result, will we get ["abc", "abc", "abc"]?

    No! The maximum function needs to consider *all* strings in the infinite list of strings to determine what the maximum really is. This program will run indefinitely.
     
    Stephan van Hulst
    Bartender
    Posts: 6583
    84
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    To summarize, any non-short-circuiting function that is a form of reduction, (this includes getting a maximum) by definition of reduction needs to evaluate the entire input sequence until it's empty.

    In Java, non-shorting methods that are a form of reduction include:
  • collect()
  • count()
  • forEach()
  • forEachOrdered()
  • iterator()
  • max()
  • min()
  • reduce()
  • spliterator()
  • toArray()

  • All of these are terminal operations by virtue of being reductions. The only way to continue processing after any of these is to have access to the original data source, or to create a new data source from the input stream (using collect()).
     
    Stephan van Hulst
    Bartender
    Posts: 6583
    84
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Here's a way to solve it using a Collector.

    I really think this is the best way to solve the problem, even though it appears to be a bit verbose, as it only requires one pass over the stream. It returns a Collection and not a Stream, because getting the longest strings from an input sequence is necessarily a reduction operation, and therefore has to strictly evaluate the complete input sequence.
     
    Rob Spoor
    Sheriff
    Posts: 20817
    68
    Chrome Eclipse IDE Java Windows
    • Likes 2
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Perhaps the best solution is to use a custom Collector. We would need an extra utility class to hold not just the list of longest strings, but also the length of these:
     
    Rob Spoor
    Sheriff
    Posts: 20817
    68
    Chrome Eclipse IDE Java Windows
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Guess Stephan and I had the same idea, although I think Stephan's implementation has a few small bugs in both the accumulate and combine methods:
  • accumulate also adds shorted strings
  • the combiner should return a combined object

  •  
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!