Win a copy of hapi.js in Action this week in the HTML, CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Is Java becoming data-directed?

 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
While I applaud much of the new stuff in version 8 - even though I'm still coming to terms with much of it - I worry a bit that a lot of this new "stream stuff" urges us to treat data like SQL, a language I absolutely detest - even though, as a DBA for nearly twenty years, I can see some of its upsides.

Just one issue I see (as with SQL) is the absence of a first(int) or top(int) method (or do we call them functions now?) in the Stream class. And yes, I realise that it can probably be done in other ways that make far less sense to me.

In SQL you need a "rosetta stone" to understand how to do it, because each vendor has its own idea of how to implement it (I don't even know if it's part of the latest SQL standard yet; but wouldn't be at all surprised if it isn't); but the requirement for things like "top n" lists is so pervasive in computer applications that I wonder why the writers of data-directed languages have resisted it for so long.

To me it's an absolute requirement; and if Java doesn't get its act together, the same thing will happen with its shiny new Streams. Either that, or we'll simply work with logic and structures we're more familiar with, and bang off "the first ten".

Comments please.

Winston
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 35524
402
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:Just one issue I see (as with SQL) is the absence of a first(int) or top(int) method (or do we call them functions now?) in the Stream class. And yes, I realise that it can probably be done in other ways that make far less sense to me.

I think I might be missing the point here. In SQL, getting the first X rows isn't hard. Granted, it varies by database, but it isn't hard. For example, in Postgres, we have:


Is a database that doesn't have some keyword for this? Or is it just that it isn't standard?

In Java, we also have a limit function:


In SQL, it is a pain to do paging though. Trying to get the third through 6th rows is a nested query. In Java, it is easier:


At least Java put limit() in the API!
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeanne Boyarsky wrote:[I think I might be missing the point here. In SQL, getting the first X rows isn't hard. Granted, it varies by database, but it isn't hard.

But the simple fact that it varies by database makes it an implementation, not part of SQL.

However, thanks for pointing me to limit(); I thought I was going mad there. Why couldn't they just have called it first()?

I still have some concerns about v8, but that's not one of them any more. Have a cow.

Winston
 
Rob Spoor
Sheriff
Posts: 20800
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jeanne Boyarsky wrote:
I think I might be missing the point here. In SQL, getting the first X rows isn't hard. Granted, it varies by database, but it isn't hard. For example, in Postgres, we have:


Is a database that doesn't have some keyword for this? Or is it just that it isn't standard?

It's not. Postgres and MySQL have limit, whereas MSSQL has TOP:

A small overview: http://www.w3schools.com/sql/sql_top.asp

In SQL, it is a pain to do paging though. Trying to get the third through 6th rows is a nested query.

Depends on the RDMS. MySQL can have the limit statement extended with another value which is then the offset (e.g. limit 3, 3). Postgres has LIMIT 3 OFFSET 3 (which MySQL has also adopted it seems). Oracle probably has support with the row number and BETWEEN. For MSSQL it's a real pain though.

Some links:
http://www.postgresql.org/docs/8.1/static/queries-limit.html
http://dev.mysql.com/doc/refman/5.7/en/select.html

In Java, it is easier:

I think you forgot a skip in there:

That's one of the reasons I prefer to put each operation on its own line - it's a bit too easy to miss a statement in a long chain. Plus, exceptions can be easily traced this way (the line the exception occurred on has only one statement).
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 35524
402
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:

Oh yuck. I've used different databases but none where you put that up front in the select clause.

Rob Spoor wrote:
I think you forgot a skip in there:

That's one of the reasons I prefer to put each operation on its own line - it's a bit too easy to miss a statement in a long chain. Plus, exceptions can be easily traced this way (the line the exception occurred on has only one statement).

That wouldn't have helped me . I had it right in my IDE and then copy/pasted the wrong line of code in.
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:In SQL you need a "rosetta stone" to understand how to do it, because each vendor has its own idea of how to implement it (I don't even know if it's part of the latest SQL standard yet; but wouldn't be at all surprised if it isn't); but the requirement for things like "top n" lists is so pervasive in computer applications that I wonder why the writers of data-directed languages have resisted it for so long.

So how does this relate to Java?

To me it's an absolute requirement; and if Java doesn't get its act together, the same thing will happen with its shiny new Streams. Either that, or we'll simply work with logic and structures we're more familiar with, and bang off "the first ten".

What act? If you mean that it doesn't have a first() method, then my retort is that it does, by way of the limit() method, but Jeanne already told you that.

Why couldn't they just have called it first()?

Just in case you were serious, first() would not have been correct, because first() implies an order on the stream. And even if the stream is ordered, are they still the first elements after you perform a skip()?

Personally, I would have preferred if they called it take(), because that's more in line with older functional languages. I'm also *really* peeved that they don't have a takeWhile(Predicate<T>) method, so much so that I wrote my own Stream extension class.
 
Rob Spoor
Sheriff
Posts: 20800
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:I'm also *really* peeved that they don't have a takeWhile(Predicate<T>) method, so much so that I wrote my own Stream extension class.

At first I thought that would be the same as filter(Predicate), but then I got your plan - you don't want to filter out all elements that match the filter, only until you've encountered an element that doesn't match the filter. It would be like a combination of filter and skip.

Can you show that piece of code? I'm curious on how you've done it.
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Keep in mind that the main() method I gave is a silly example. In that case filter() and then sorted() is more efficient. You would use takeWhile() when you know your elements are already ordered some way.
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Just in case you were serious, first() would not have been correct, because first() implies an order on the stream.

No it doesn't. It means exactly what it says: "Give me the first n elements". The fact that it may be used with an ordering (or after one has been applied) is irrelevant.

And even if the stream is ordered, are they still the first elements after you perform a skip()?

Sure they are. Like most other functions, it simply applies to whatever elements are passed to it.

I'm also *really* peeved that they don't have a takeWhile(Predicate<T>) method, so much so that I wrote my own Stream extension class.

Me too, although it probably wouldn't have occurred to me for a while. And thanks for the code, very informative. Have a cow.

It does, however, suggest to me a fragility of things like Streams - and possibly of the functional paradigm in general:
There is a much bigger onus on the designer to "get it right".

In OO, we're used to there being "gaps" in the functionality of existing classes - indeed the whole ethos of OO is to build on existing code to "do your own thing", so we're taught very quickly how to extend and wrap and decorate.

This doesn't seem to be anywhere near as easy with Streams though, and without your code, it wouldn't have occurred to me to even try to extend something that to me is basically a "black box".

What did occur to me though, was that if I needed to do something like your logic above, I could write:which suggests to me that
(a) A more generic solution is probably available on the same basis.
(b) I think Oracle were stupid not to have BaseStream extend Iterable, because it's the only reason the cast is required.
It must surely have occurred to someone that it IS an Iterable, so why make us jump through hoops to use it as one?

Winston
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:Me too, although it probably wouldn't have occurred to me for a while. And thanks for the code, very informative. Have a cow.

Thanks!

It does, however, suggest to me a fragility of things like Streams - and possibly of the functional paradigm in general:
There is a much bigger onus on the designer to "get it right".

I would limit this observation to just Java Streams. Their implementation is incredibly complex, because the designers tried to build in parallelization wherever possible. There are a lot of edge cases that the designer needs to think of. This is not the case in a proper functional language, because everything is immutable.

In OO, we're used to there being "gaps" in the functionality of existing classes - indeed the whole ethos of OO is to build on existing code to "do your own thing", so we're taught very quickly how to extend and wrap and decorate.

And in functional languages, you use function composition. There's no difference in responsibility of the designer or the client.

This doesn't seem to be anywhere near as easy with Streams though, and without your code, it wouldn't have occurred to me to even try to extend something that to me is basically a "black box".

Yes, because OO is built into the language, and lazy evaluation isn't. You'd be amazed at how easily you can do certain things in a functional language that natively supports lazy evaluation and function pattern matching.

Check out this incredibly arcane piece of code I wrote a year ago: http://www.coderanch.com/t/634917/java/java/Recursion-Lambda-Java#2911360

That piece of code attempts to define the Y combinator. The Y combinator takes a function, and returns its fixed point. For instance, if we define g(f) as f' (function derivation), then the fixed point of g, Y(g), is e^x, because g(e^x) = e^x.

One of the cool things you can do with the Y combinator is that you can define recursive functions without using the function name in its definition. Its use is questionable, but it's a great example of the type of code I would have to write in a language like Java to be able to do something like this. In Haskell, I can define the Y combinator like this, all because lazy evaluation is built into the language:

Y g = g (Y g)

What did occur to me though, was that if I needed to do something like your logic above, I could write something like:which suggests to me that
(a) A more generic solution is probably available on the same basis.
(b) I think Oracle were stupid not to have BaseStream extend Iterable, because it's the only reason the cast is required.
It must surely have occurred to someone that it IS an Iterable, so why make us jump through hoops to use it as one?

There is a very very good reason they didn't have BaseStream implement Iterable. Streams are not iterable.

If something is iterable, that means you can iterate over it at any given moment. Streams can only be iterated over once.
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I can not overstate the last bit. It's actually caused me some headache with more recent APIs I've tried to design. Iterables and Streams are two very distinct and incompatible concepts. Streams are more like Iterators than Iterables, except that you tell them about what operations to use ahead of time, instead of during the iteration.

That means I'm fighting a lot with myself over whether my methods should return Stream<Whatever> or Collection<Whatever>, because they both have their advantages and disadvantages, and one makes more sense than another in a given setting.
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:I can not overstate the last bit. It's actually caused me some headache with more recent APIs I've tried to design. Iterables and Streams are two very distinct and incompatible concepts. Streams are more like Iterators than Iterables, except that you tell them about what operations to use ahead of time, instead of during the iteration.

OK, but I don't think my code violates that principle. And it could certainly be rewritten to use an Iterator instead, which suggests to me that the pattern could be "generified" to implement a Streams (or perhaps Collectors?) takeWhile() method without having to extend (or wrap) Stream.

But perhaps I'm not seeing as far as you on this. I like Streams and pipelines because they remind me of script programming, where you combine small additive filters and processes on lines of data to do really cool things; but I'm still a beginner when it comes to applying them to objects.

Fun stuff though ... and I'll get there eventualy.

Winston
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:OK, but I don't think my code violates that principle. And it could certainly be rewritten to use an Iterator instead, which suggests to me that the pattern could be "generified" to implement a Streams (or perhaps Collectors?) takeWhile() method without having to extend (or wrap) Stream.

I agree that the enhanced for-loop could have been used for streams (and iterators!), but only as syntactic sugar for Stream.forEach(), which is a terminal operation. The enhanced for-loop is procedural, and its use implies eager evaluation. It should not be used to add intermediary operations to a stream pipeline, because those operations evaluate lazily.

I like Streams and pipelines because they remind me of script programming, where you combine small additive filters and processes on lines of data to do really cool things; but I'm still a beginner when it comes to applying them to objects.

I don't think objects are all that different from just plain old records, except that they bundle their behavior along with the data.

If you have some spare time, I *really* recommend checking out Haskell. It really changed the way I look at programming. The main problem I have with functional programming is that its communities have a background in mathematics, and have a habit of using very short and cryptic identifier names. If you can get past that, Haskell is really a lot of fun to work with. Here's a really nicely written tutorial: http://learnyouahaskell.com/

In most languages I learn, I write a Sudoku solver to test my progress. I had the most fun writing it in Haskell.
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:OK, but I don't think my code violates that principle...

Which begs another question: Is there a way of "cloning" a Stream? Because it seems to me that the only reason that a Stream is NOT an Iterable is because the act of iteration changes its state, rendering it useless for further traversal.

If, on the other hand it was possible to say "give me an Iterator for the whole of this Stream" - as you can with a script that includes the tee command in Unix - you could perform all sorts of processing without affecting the Stream object itself. Or is that what Spliterators are for?

Winston
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:Which begs another question: Is there a way of "cloning" a Stream? Because it seems to me that the only reason that a Stream is NOT an Iterable is because the act of iteration changes its state, rendering it useless for further traversal.

If you really want a clone in order to process the data in two different way, the best solution is to just collect the current stream of elements, and then make two new streams:

I imagine it would be possible, and convenient, to have a Stream be Cloneable. I'm just not sure about the technical difficulties of implementing this.

If, on the other hand it was possible to say "give me an Iterator for the whole of this Stream" - as you can with a script that includes the tee command in Unix - you could perform all sorts of processing without affecting the Stream object itself. Or is that what Spliterators are for?

After looking up the tee command, I think the closest I can find to it is the Stream.peek() method. Let's say you want to continue processing a stream, but also store intermediate elements into a file, you could do it like this:

Spliterators solve a completely different problem. They are the primary means of implementing parallelism in streams. A Spliterator is like an Iterator, except instead of moving on to the next element, it can break off a whole chunk of elements that are then processed using a second Spliterator. That way, you end up with a tree of spliterators iterating over the elements in the stream in parallel.
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another remark, I believe your initial doubts about Streams being "black boxes" to which it's difficult to add new operations is a little bit unfounded, because you could say the same about extending Map, Set or List. There is rarely any cause to add new operations, because we can do almost anything we want with them using the elementary operations already present in those interfaces. If I wanted to use a Map to implement a telephone directory, it would be unreasonable for me to expect or even extend Map with a getPhoneNumber() method.

My annoyance about the lack of a takeWhile() method didn't have to do with the fact that it's hard to add this to the Stream interface, but that it's such an elementary operation in functional programming, that it should have been part of it to begin with. It's like a Map interface without an entrySet() method.

There's actually a collection of other operations instead of takeWhile that triggered me to write the ExtendedStream class:
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:If you have some spare time, I *really* recommend checking out Haskell. It really changed the way I look at programming. The main problem I have with functional programming is that its communities have a background in mathematics, and have a habit of using very short and cryptic identifier names. If you can get past that, Haskell is really a lot of fun to work with. Here's a really nicely written tutorial: http://learnyouahaskell.com/

Thanks Stephan. I really appreciate your patience in the face of my "scepticism", some of which, I'm sure, comes from lack of familiarity.

I do worry a bit though that we're being "guided" to do things functionally, when it may not necessarily be the best, or "simplest" (ie, most readable) - or indeed fastest - way of doing things.
IMO, Java's great strength is that it allows us many ways to "do stuff" - procedural, objective, and (now) "functional" - and the best practitioners are going to continue to be those who can pick and choose the best elements of the language to do what they want.

And your "advocacy" of v8 is exactly what we need - even if I don't always agree.

Winston
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Another remark, I believe your initial doubts about Streams being "black boxes" to which it's difficult to add new operations is a little bit unfounded, because you could say the same about extending Map, Set or List.

Hmmm. Not sure about that, because there's no AbstractStream class to help you to implement your own. I'm also aware that there are many different types of "stream", including open-ended series like Fibonacci, so I'd like to see a few variants of "skeleton implementation" for duffers like me.

Winston
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:I do worry a bit though that we're being "guided" to do things functionally, when it may not necessarily be the best, or "simplest" (ie, most readable) - or indeed fastest - way of doing things.
IMO, Java's great strength is that it allows us many ways to "do stuff" - procedural, objective, and (now) "functional" - and the best practitioners are going to continue to be those who can pick and choose the best elements of the language to do what they want.

I heartily agree. As a matter of fact, I had to learn this the hard way, as I was trying to fit everything I did in Java into a functional paradigm when v8 came out. Most things we do in Java really are clearer when using good old imperative programming.

Functional code is great when transforming data. I tend to build my application using lots of immutable types, and transforming models consisting of these gives me plenty of opportunity to write nice, clear, declarative pieces of functional code. However, just like forcing everything to be immutable can turn your API into a draconian mess that's not very fun to work with, so does using functional operations everywhere.

You will find that if you check out Haskell, there will be situations where it would be really so easier and clearer if you could just reassign a variable. Java gives us the opportunity to use the best of both worlds.

I'm becoming a fan of C#, which does this whole functional programming thing much better than Java does. C#'s equivalent of Iterable<T> is IEnumerable<T>, and it has a library that extends IEnumerable<T> with higher order functions. That would be as if you could call map() on an Iterable, and get a new Iterable. The nice part is that after adding an operation to the pipeline, you can reuse the same IEnumerable with a different string of operations. And it's all (or can be) lazily evaluated.
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Stephan van Hulst wrote:Another remark, I believe your initial doubts about Streams being "black boxes" to which it's difficult to add new operations is a little bit unfounded, because you could say the same about extending Map, Set or List.

Hmmm. Not sure about that, because there's no AbstractStream class to help you to implement your own. I'm also aware that there are many different types of "stream", including open-ended series like Fibonacci, so I'd like to see a few variants of "skeleton implementation" for duffers like me.

Sorry, I think I didn't phrase that correctly. Yes, it's much more difficult to implement a Stream, but I also think there should rarely or ever be a cause to do so.

A useful skeletal implementation would be difficult to realize, because as opposed to Map, List and Set, Stream doesn't have a clear set of responsibilities. Maps look up values through keys. Sets contain distinct elements. Lists impose an order on their elements. Streams... do stuff.

This is a cool challenge though, and I think in the coming days I will see how I would realize such a skeletal API. Anyway, have some cows for an excellent discussion that's distracted me from all my chores today.
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Functional code is great when transforming data...

Yes, that's definitely the impression I get; and it IS a big part of our work. I'm also a big fan of immutable types; though probably more for their "shareability" than for any consideration of streams.

I'm becoming a fan of C#, which does this whole functional programming thing much better than Java does.

Well, glad to hear it's good for something.
I'm afraid I have a built-in suspicion of anything Microsoft, but I have to admit there are things about C# I like too - not least its inclusion of 'struct' from C.
What I don't like is it's "hybrid" notion of final - IMO, the best (and most underrated) keyword in the Java language.

But that's for a different thread...

Winston
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:not least its inclusion of 'struct' from C.

Ewww. I think structs are a step back. They are great for low level languages, but keep them out of my abstractions :P

What I don't like is it's "hybrid" notion of final - IMO, the best (and most underrated) keyword in the Java language.

I don't think I follow. C# has readonly that prevents variables from being reassigned and sealed that prevents classes from being subclassed.

But that's for a different thread...

I agree. If you want to respond, let's split the thread.
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Stephan van Hulst wrote:Functional code is great when transforming data...

Yes, that's definitely the impression I get; and it IS a big part of our work...

I suspect it could also be a huge "bridge" for database code.

Imagine, instead of having to deal with that monstrosity called ResultSet, we can return the results of SQL queries to Streams of POJOs that WE define. I wouldn't be at all surprised to see that in v9 or v10 - especially given our new masters.

Winston
 
Stephan van Hulst
Bartender
Posts: 6530
83
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Aye. C# actually built this into the language by adding select, from, where etc. keywords. These work on the aforementioned IEnumerables.

I don't like to use them on account of me being allergic to anything that looks like SQL, but YMMV.
 
Dave Tolls
Ranch Hand
Posts: 2207
20
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:Imagine, instead of having to deal with that monstrosity called ResultSet, we can return the results of SQL queries to Streams of POJOs that WE define. I wouldn't be at all surprised to see that in v9 or v10 - especially given our new masters.

Winston


How long before we can stop calling them "new"?
;)
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Stephan van Hulst wrote:Ewww. I think structs are a step back. They are great for low level languages, but keep them out of my abstractions :P

What, you mean like 2.print()?

Winston
 
Winston Gutkowski
Bartender
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Dave Tolls wrote:How long before we can stop calling them "new"?

Dunno, but I still think of them as usurpers.

A couple more releases like #8 though, and I may change my mind.

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!