This week's book giveaways are in the Cloud and AI/ML forums.
We're giving away four copies each of Cloud Native Patterns and Natural Language Processing and have the authors on-line!
See this thread and this one for details.
Win a copy of Cloud Native PatternsE this week in the Cloud forum
or Natural Language Processing in the AI/ML forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Paul Clapham
  • Knute Snortum
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Ron McLeod
  • Piet Souris
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Frits Walraven
  • Ganesh Patekar

What is inside the stream produced by an intermediate operation?

 
Ranch Hand
Posts: 241
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

The above pipeline does not have a terminal operation. The map() method has produced a stream named resultOfMapOperation.  Pleases, bear with me, I would like to know the elements that are in the resultOfMapOperation at this stage of the pipeline.
 
Marshal
Posts: 64997
246
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's an easy one to answer: what is in that Stream? Nothing.

Streams do not “have” elements; they process them. Also, Streams implement lazy execution; an element is only taken by the first Stream when the terminal operation requires it. You can find that out by putting a peek() call somewhere in your statement, and get it to print something. You will see nothing printed until you have a terminal operation.
Try Arrays.stream() or LongStream.of() instead of Arrays.asList().
 
Biniman Idugboe
Ranch Hand
Posts: 241
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Streams do not “have” elements; they process them.


Arrays.stream() produces a stream. What process is the stream  performing?

Also, Streams implement lazy execution; an element is only taken by the first Stream when the terminal operation requires it.


In the following, how come the sourceStream has already been operated upon even when there is no terminal operation?
 
Campbell Ritchie
Marshal
Posts: 64997
246
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Biniman Idugboe wrote:. . . . What process is the stream  performing? . . .

In that case, creating a second Stream in line 5.

In the following, how come the sourceStream has already been operated upon even when there is no terminal operation? . . .

It has produced a second Stream; in line 11 you are asking it to create a third Stream. Even though the first Stream<Long> has not processed any Longs, it is still marked as having been operated on, and your second request causes it to throw an exception.
 
Ranch Foreman
Posts: 3297
22
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Think of the stream as a thing that saves up all the instructions you give it, waiting for a terminal command that says "now go do it".  Each time you add a non-terminal command, it adds something internally to its list of things to do.  But it doesn't even try to look at any elements in the stream, until it receives a terminal command.  Because a stream is designed so that each element can only be viewed once, so it needs to know everything that will be done with each element, before it actually tries to process any element.
 
Biniman Idugboe
Ranch Hand
Posts: 241
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is the term stream a mask of something far more complicated going on behind the scene?  For example, where did ReferencePipeline.filter (ReferencePipeline.java:164) come from?
 
Mike Simmons
Ranch Foreman
Posts: 3297
22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Biniman Idugboe wrote:Is the term stream a mask of something far more complicated going on behind the scene?


Generally, yes.  We don't necessarily have to study and understand all those details, but yes, there are many things going on behind the scene in a stream.

Biniman Idugboe wrote:For example, where did ReferencePipeline.filter (ReferencePipeline.java:164) come from?


Well, a ReferencePipeline is kind of the standard implementation of a Stream, as provided in the standard libraries.  You can look at the source if you like by studying it in and IDE or by expanding the src.zip file in your Java installation.  However, note that it's a package-private class ("default" access), meaning no one is expected to care about it in order to use streams.  Stuff like that can be fairly complicated - it may be fun or even useful to know about, but it takes time and study.  And it isn't actually necessary.  
 
Campbell Ritchie
Marshal
Posts: 64997
246
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Biniman Idugboe wrote:Is the term stream a mask of something far more complicated going on behind the scene? . . .

Yes. There are several package‑private classes in the java.util.stream package which implement the various interfaces, and that is one of them. I don't know ay more detaills; I don't think that I need to know anything.
 
Saloon Keeper
Posts: 10396
221
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Remember in your other threads about this topic that I gave an example implementation called StreamImpl? ReferencePipeline is the name that Oracle gave to the Stream implementation.

It handles a lot of complex situations, but as others have pointed out, you don't really need to know about it.
 
Biniman Idugboe
Ranch Hand
Posts: 241
3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, it appears the understanding of the concept of stream is for people with special capability. I certainly do not belong in that group.  Nevertheless, I still have questions to ask about stream.
1.  I create a data source:

2.  I create a process line that hooks up to the data source and would allow data to enter the process line sequentially when the process starts.  In Java speak, I think I create a stream from the data source:

3.  I add a data processing station to the line.  That is, I add an intermediate operation to the stream:

What is different between IntPipeline$Head@2a2d45ba and  IntPipeline$9@3c0f93f1? Is the pipeline not one continuous pipeline?
4. So far, I just have a process line; nothing happening yet. I decide to add a viewing station where I can view the data as it passes by.

All I still have is just a process line.  There is nothing to view because the process has not started yet.
5.  I terminate the process line on a processing station that starts the process (that tells IntPipeline$Head@2a2d45ba to start allowing data into the stream).

While the above interpretation may very well be a punishable offence in Java land, please, forgive me. With it, I am relieved of a lot of things complicated.
I reckon I may have millions of data to process.  I want to have parallel streams (with my process line analogy, I am little bit more comfortable talking about stream).
6.  I create a parallel stream from the original data source:

My computer has eight processors and the number of data to process is five, that is three processors more than the number of data to process. I am assuming that five sequential streams have been created.  Is that correct?
7.  For the sake of it, I decided to add a sorted() operation to the stream:

I see sortedStream ==> java.util.stream.SortedOps$OfInt@3632be31.  Is this also a pipeline?
8. I terminate the stream with a forEach() operation:

Again, I am assuming that I ended up with five parallel pipelines as follows:
Processor1:  parallelStream.sorted().forEach(m -> System.out.println(m));
Processor2:  parallelStream.sorted().forEach(m -> System.out.println(m));
Processor3:  parallelStream.sorted().forEach(m -> System.out.println(m));
Processor4:  parallelStream.sorted().forEach(m -> System.out.println(m));
Processor5:  parallelStream.sorted().forEach(m -> System.out.println(m));
Each pipeline has a single data to process and the result of sorting a single data is the single data itself. Thus, the final data printed to the console may not be sorted because the order will depend on the sequence in which the processors finished processing their pipelines.
Is that correct?
 
Campbell Ritchie
Marshal
Posts: 64997
246
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Biniman Idugboe wrote:. . . understanding of the concept of stream is for people with special capability.

No, it isn't, but you should start with the simplest cases you can find and work through them until you understand them.

. . .

So far, so good.

2.  I create a process line that hooks up to the data source and would allow data to enter the process line sequentially when the process starts.  In Java speak, I think I create a stream from the data source:

Don't say things like, “Java speak”. What you are doing is creating an object that will pass the elements of the array to something else, doing some manipulation of those elements as it does so. The w‍ord “process” has a specific meaning and you are not creating a process with that specific meaning.

. . , I add an intermediate operation to the stream: . . . What is different between IntPipeline$Head@2a2d45ba and  IntPipeline$9@3c0f93f1? Is the pipeline not one continuous pipeline?

They are different objects, the one created to manipulate a subset of the data taken by the first Stream. They both operate on different parts of the same data pipeline, yes.

. . . nothing happening yet. I decide to add a viewing station where I can view the data as it passes by. . . . There is nothing to view because the process has not started yet.

And as you said, nothing is going to happen until a terminal operation starts to execute.

. . . I terminate the process line . . .

It might be better to say you are completing the pipeline. Just as whenever I am trying to lead water around the house in copper pipelines, I am not going to turn the water on at the source until I am confident I have connected the other end. When you build a Sahara Pipeline to carry oil across the desert, you don't turn the oil supply on until you are sure the other end is connected correctly.

(that tells IntPipeline$Head@2a2d45ba to start allowing data into the stream).

That sort of thing is an implementation detail. Maybe something else sends the elements into the pipeline. You have no way of knowing whether it is accurate or inaccurate, except possibly if you read the source code, which somebody else has told you about already.

. . . While the above interpretation may very well be a punishable offence in Java land

Hahahahahahahahaha!

. . . I am relieved of a lot of things complicated.

No, you are adding lots of things you don't really need to know; JShell doesn't help by calling toString() on your Stream objects. I don't think you ever need to call toString() on a Stream. Everything you don't need to know is making things too complicated for you and causing you confusion.

I reckon I may have millions of data . . . parallel streams . . . a parallel stream from the original data source:

Don't try creating a parallel Stream from something with five elements, not unless the operations you are doing take a very long time (e.g. factorisation). If you want a parallel Stream. create one with millions fo elements to handle:-

. . . My computer has eight processors and the number of data to process is five . . . I am assuming that five sequential streams have been created.  Is that correct?

Don't know, but I think probably not. It is much more likely that 5 data will be sent to one Stream and a second Stream will be created handling nothing. Or two to one Stream and three ot the other. There is no way to find out, but it is only worth creating parallel Streams when there is a lot of work to be done. And those Streams are not called sequential.

. . . java.util.stream.SortedOps$OfInt@3632be31.  Is this also a pipeline?

No, it is a bit of confusing information JShell has given you. You can find out more about that class by exploring the source, or reflection, but both will overload you with useless information.

8. I terminate the stream with a forEach() operation: . . . Again, I am assuming that I ended up with five parallel pipelines as follows: . . . the final data printed to the console may not be sorted because the order will depend on the sequence in which the processors finished processing their pipelines. . . .

It is much more complicated than that. Not only do you have an unknown number of parallel processes (and what is running on each core is called a process), but each is entering in to a race condition to print its result to System.out. Remember the println() method probably isn't thread‑safe; you might get the ints displayed out of order. As you said, you won't see the sorting order for parallel Streams until you collect their results all into a final data structure.

I suggest,
  • 1: As I said before, there are things you don't need to know. Forget about them. Otherwise you are like somebody driving to London who is worrying about how spark plugs work, rather than remembering whether you have to come off the A1(M) onto the M18 or the A1.
  • 2: Start small and simple.
  •  
    Biniman Idugboe
    Ranch Hand
    Posts: 241
    3
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Apology! When I say process line, I am referring to a factory process line, not a processor process.

    It might be better to say you are completing the pipeline. Just as whenever I am trying to lead water around the house in copper pipelines, I am not going to turn the water on at the source until I am confident I have connected the other end. When you build a Sahara Pipeline to carry oil across the desert, you don't turn the oil supply on until you are sure the other end is connected correctly.


    That makes it a whole lot easier to understand why a terminal operation is required before the pipeline can start to work.

    JShell doesn't help by calling toString() on your Stream objects.


    Not sure I understand this.  What is wrong with sortedStream.forEach(m -> System.out.println(m));?

    ...but it is only worth creating parallel Streams when there is a lot of work to be done. And those Streams are not called sequential.


    I suppose parallelizing the stream logically partitions the stream into individual portions. Are you saying the individual portions are not ultimately processed sequentially?

    ...it is a bit of confusing information JShell has given you.


    I have been going around with the notion that using Notepad or something similar is the simplest way to start learning to write Java codes. But, I need jshell to run code snippets. To hear now that jshell could give confusing information, I am lost.

    ...Start small and simple.


    Surely, I would love to, but I have not found a stream that is simple. All the streams I have encountered so far are derivatives of the same complicated Stream.
    I appreciate your comments. They are very helpful to me.
     
    Campbell Ritchie
    Marshal
    Posts: 64997
    246
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Biniman Idugboe wrote:. . . Not sure I understand this.  What is wrong with sortedStream.forEach(m -> System.out.println(m));? . . . . To hear now that jshell could give confusing information, I am lost.

    You have seen that your sorted parallel Stream doesn't complete the sorting until all the elements are collected together. You are calling the forEach() method at a stage too early.

    Whenever you declare and create an object in JShell, it prints that object. In the case of a Stream, which doesn't override toString(), you get details which don't help you. In this case, the printout is actually confusing you. Remember it is very unusual to declare a Stream at all. You usually only declare a Stream if you are reading a file and need to initialise it or rty‑with‑resouces. I would write your code with alll the Streams as throwaway objects:-...Start small and simple.

    Surely, I would love to, but I have not found a stream that is simple. All the streams I have encountered so far are derivatives of the same complicated Stream.

    Find the thread I split off about why people think Streams are so difficult. It was in the JavaRanch journal recently, maybe April.
    Find out how to create the following:-
  • 1: An int[] containing a range of numbers increasing. Have a look at two methods of IntStream starting r and t.
  • 2: A List<Integer> containing the same numbers as No& 1.
  • 3: An int[] containing 1000000 “random” numbers. You will finnd a hint in this very th‍read.
  • I appreciate your comments. They are very helpful to me.

    That's a pleasure
     
    Biniman Idugboe
    Ranch Hand
    Posts: 241
    3
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Thank you Sir!
    Javaranch-Journal.png
    [Thumbnail for Javaranch-Journal.png]
     
    Campbell Ritchie
    Marshal
    Posts: 64997
    246
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Try here. Try the January edition.
     
    Consider Paul's rocket mass heater.
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!