Win a copy of Java Concurrency Live Lessons this week in the Threads forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

A kinder, gentler API for reading lines  RSS feed

 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Michael Matola recently made an interesting post in the Cattle Drive forum, here. I want to follow up on it more, but the part I'm interested in isn't really appropriate for an extended discussion in Cattle Drive. So I'm making a copy of the discussion here (with Cattle-Drive-specific sections deleted). The following was posted by Michael Matola originally:

On the debate of which of the following:
while( ( line = myFile.readLine() ) != null )
boolean done = false ;
while ( ! done ) ... if <something> { done = true } else ....
I tend to fall in the camp of preferring the first. But of course those of us doing the Cattle Drive assignments have to comply with the style guide.
Aside from the Cattle Drive assignments, however, this entire debate suggests to me that the public methods of BufferedReader (or whatever object you're calling readLine() on ) aren't as useful as they could be. As a client of some class, I think having to check for != null on something its method returns is so loseresque. I mean, really, everyone knows we want to cycle through the lines of the file and treat each line as a String. So why wasn't BufferedReader (or suchlike) written like that to begin with? Hey, it's starting to sound like instead of a loser method like readLine, I'd prefer working with an iterator or something iteratoresque.
A sample wrapper class that returns an iterator follows:

Would allow clients to write tidy code like this:

A sample wrapper class that provides iteratoresque methods follows. Note that in this case we get to deal with Strings as Strings with no pesky downcasting.

Clients could write:

Note that in neither case does client code have to do any pesky checking for not null.
Discuss!

Michael Matola followed with another post:

Just realized that an Iterator's next method is supposed to throw NoSuchElementException if there's no next (instead of returning null like mine does). Anyhow, you get the picture.

And I (Jim Yingst) then posted the following:

I do really like Michael's idea about using an iterator of some sort - this would be nicer for everyone I think. Too bad they didn't put this in the language to begin with. But Michael's code is a step in the right direction. In its current form it is utterly evil of course, due to the catch block which ignores an IOException But this can be easily excorcised. I'd probably decouple the new class from reading a file - make it a more generic FilterReader. In fact it could extend BufferedReader. Give it a method lineIterator() which returns a LineIterator - which might as well be an actual Iterator with one additional method, nextLine(), which returns a String rather than Object to avoid a needless cast. Usage would be like

Or, in more Coop-friendly style:
The LineReader class could also include the functionality of the seldom-used LineNumberReader class, just cause it's so easy.
It would be really nice if Java had a nice foreach keyword that would allow something like this:

But I guess I'm dreaming now. I might as well switch to Perl. Or for that matter, C#. :roll:

[ January 26, 2003: Message edited by: Jim Yingst ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A few more thoughts...
I originally thought that it would be good to follow the actual java.util.Iterator interface for greater interoperability with other Java classes. But then I realized that the next() method (or nextLine() or nextString() or whatever) should really be declared as throwing an IOException in case anything goes wrong. Which means we can't really use a java.util.Iterator here. (Unless we bend the API by converting an IOException into a RuntimeException - but that would be Wrong.) Note also that my idea that it should be an Iterator is also why I showed a usage example implying that LineReader.Iterator was a separate (static member) class from LineReader, since I was following the usual collections pattern of using a separate object for the Iterator. But since it can't be a java.util.Iterator, there seems little point in making a separate object or class for it. So I prefer the LineOfFileAsStringReader2 version of Michael's code. Byt with shorter names for everything.
I also retract my suggestion that the new class should actually extend FilterReader. I see no real need to do so - there's no use in wrapping another high-level Reader around the LineReader. Once you're invoking the hasNext() and next() methods, you can't really call other Reader methods like read() or read(char[]) - the LineReader had to do lookahead already, so you can't really be sure what position the next read() is reading from. So LineReader really only makes sense as the highest-level class in any set of chained streams. In this case it doesn't really need to be a Reader itself; it just needs to accept a lower-level Reader as input.
Also, the whole "while ((x = method()) != TERMINAL_VAL)" paradigm is already endemic to all the stream classes. People using streams expect to do that sort of thing. If the whole purpose of the new class is to privide a different API, it might as well be a different class.
[ January 26, 2003: Message edited by: Jim Yingst ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My desire for "greater interoperability with other Java classes" can probably be best addressed with a simple static util method to return a List - which can then be plugged into any number of other useful classes in the collections framework. These methods/classes generally expect a Collection rather than an Iterator anyway, after all.

Similar useful utils could simply read the whole Reader into a single String or char[] array. These all have the down side of putting everything in memory simultaneously, which is of course a Bad Idea for really big files. But for many applications such utils could be pretty convenient.
 
Barry Gaunt
Ranch Hand
Posts: 7729
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A leetle nitpick to Mike's original code, if I may kick the next down the line .
You don't need to test a boolean variable against true/false as in you can just write
Wow, that feels GOOD!
I'll read about the real issues as soon as I can.
-Barry
 
Barry Gaunt
Ranch Hand
Posts: 7729
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My first thoughts on this topic lead me to suggest that an approach to this would be to derive a class from AbstractSequentialList, backing it up with a text file. I would be nice to handle reasonably finite sized files in this way.
Sort of...could be rubbish of course.
-Barry
BTW Python has some nice gadgets for this. file.readlines() and xreadlines.
[ January 27, 2003: Message edited by: Barry Gaunt ]
 
Maulin Vasavada
Ranch Hand
Posts: 1873
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
i am sure that everybody knows this about this code but as it is not a critical point as far the the concept is concerned its okay if we ignore it.
the issue is, here in the code hasNext() is not atomic operation. each time u call hasNext() a new line is read while the original API doesnt work this way. no matter how many times we call hasNext() it doesn't increase the pointer to point to next line.
but this is implementation issue. we can implement whatever way we want (to match with API) as far as we find better approach as being discussed here...
my 2 cents.
regards
maulin.
 
Barry Gaunt
Ranch Hand
Posts: 7729
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's a very good nitpick Maulin. hasNext() should only detect the state not change it, that's next()'s thing.
 
Anonymous
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jim: thank you for sharing your grand tour of thoughts with us and thanks for being a good sports. Your tour started off with quite some 'grandeur' so to speak, somewhere in the middle you realized that these darn IOExceptions ruined your idea; you even tried to come up with a language 'change of idiom' by suggesting a 'foreach' construct, which you've retracted ever so soon after. Finally you came back at the Good Old (tm) idiom:
while ((result= method()) != sentinel) ...
wrapped up in a simple utility class. And that's exactly why the line above is called an 'idiom'; these little rascals can't be removed from everyday 'speak' of a language that easily, otherwise it wouldn't be an idiom and could be replaced by a 'better' (mind the quotes) idiom. Sometimes I feel that every language has all these idioms already in them, just to be discovered as a 'best practice' way of doing things by the speakers of those languages.
Idioms are as important to a programming language as patterns, they're just a bit smaller and a bit more modest (they don't even have a name).
kind regards
 
Michael Matola
whippersnapper
Ranch Hand
Posts: 1826
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Wanted to respond to some of what Jos Horsmeier wrote in the thread that started this conversation
No, seriously though, the ancestors of Java, CPL, BCPL, C, C++ as well as Java itself carry an enormous load of functions (methods) around that return values from a domain including a 'sentinel' value, indicating some sort of failure. In C 'open()' returned -1 on failure, malloc() returned NULL on failure, read() returned -1 on EOF or failure, strtok() returnd NULL when it couldn't find something etc. etc.
readLine() returns null on EOF or failure. All languages mentioned above mapped the Boolean notion of 'truthness' or 'falsehood' to scalar values, whether they be '\0', 0, 0.0, NULL or whatever. Java went the other way and explicitly defined a 'boolean' domain but it still carries around those 'value plus sentinel value' return value, should I say 'idiom'?
All the ancestor languages took care of this 'sentinel' stuff as follows
while ((return_value= some_func()) != sentinel) // do something with return_value
The code snippet above represents a decades old idiom, used by millions of old sods (like me).

I am in no way doubting that its a decades old idiom or that every Java programmer should know it. (My IT experience is still being counted in years, not decades, by the way.) I'm not arguing against any of that. I don't have the experience or knowledge to argue against any of that.
Given the choice of while ( ! done )... versus the sentinel check, my preference tends to be with the sentinel, as I mentioned before. But given the question "how would you like to interact with the contents of a file?" my vote would be something sketchy along the lines of "um, neither of those, really. how about more like I interact with a list/collection/iterator?" which is what I was trying to sketch out in my code.
Just because Java has a certain ancestry doesn't mean it can't go off and do other things.
Jos Horsmeier writes:
Finally you [Jim] came back at the Good Old (tm) idiom:
while ((result= method()) != sentinel) ...
wrapped up in a simple utility class. And that's exactly why the line above is called an 'idiom'; these little rascals can't be removed from everyday 'speak' of a language that easily, otherwise it wouldn't be an idiom and could be replaced by a 'better' (mind the quotes) idiom.

I think you're giving too much importance to the fact that Jim implemented his alternate interface with the idiom under discussion. It's just an implementation detail of the new readLines method. If clients start using his readLines method instead and start interacting with files as lists, collections, or iterators instead, then his code *has* effectively replaced the classic idiom.
I don't think anyone's retracted a proposed foreach construct in Java.
How about a comparison to a language *not* from the tradition you mention (at least I don't think it is) -- Ruby.
The following Ruby code
(Line 1) Creates a new file object given the filename and opens the file for reading.
(Line 2) Takes *each line* of the file as a string and sends it to the following block of code, which prints the string to the console.
{Line 3) Closes the file.
aFile = File.new ( "names.txt" )
aFile.each { | line | puts line }
aFile.close
Even more concisely, the following Ruby code does the exact same thing as above. (Only by using the static method foreach on the IO class rather than the instance method each on File.)
IO.foreach( "names.txt" ) { | line | puts line }
"puts" -- which the method that prints to the console -- can of course be replaced by whatever else you'd want to do with your line from the file.
Not a single sentinel check or while ( ! done ) in this code. Decades aside, isn't this a better way to interact with the contents of a file?
Does the Ruby code deep under the hood have a sentinel check? I have no idea. But it *does* give the kind of interface I'm looking for. Does Ruby have lower-level methods such as BufferedReader.readLines for interacting with a file in the classical way? I don't know -- I don't really know the language. (Might be fun to find out these things.)
 
Michael Matola
whippersnapper
Ranch Hand
Posts: 1826
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maulin and Barry have very validly pointed out what's wrong with the hasNext method in my code.
Interestingly, I think the easiest way to make hasNext atomic -- detect the state but not change it --, is to load the whole file into some data structure (array, list, etc.) (all at once would probably be easier than incrementally). Which pretty much gets us to Jim's revised readLines
And as far as Barry's nitpick about if ( hasNext ) instead of if ( hasNext == true ), all I can say is that Barry knows I knows this because he's seen code of mine in which I *do* get it right.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think you're giving too much importance to the fact that Jim implemented his alternate interface with the idiom under discussion. It's just an implementation detail of the new readLines method.
Also, it's a reflection of the fact that I'm reading from a Reader. I think most of us are in agreement that the Reader and InputStream API's encourage
while ((var = obj.read()) != sentinel)
as the natural idiom for working with these classes. (Well, there are a few whackos who think it should be done with while (!done), but let's ignore them for now.) I have no problem with using this sort of loop when working with the existing Stream classes. Michael and I are just exploring alternative API's that needn't work this way. One of the questions in my mind is "what might the I/O classes have looked like if they had been designed after the collections framework had been developed?" Also, knowing that there's a decent possibility that foreach might be coming to Java in 1.5 (see below), how might a new class be designed to better interact with it? (E.g. by looking more like a Collection.) So while it might not be particularly necessary to replace the trusty sentinel check with a new idiom, it's at least a fun excercise in design. And if 1.5 ends up with some of the features I'm hoping for, the excercise might be more widely useful.
Re: foreach. I think people have been asking for this to be added to Java since the dawn of time. It's common in Perl, which was the hot new language on the block just before Java, so a lot of folks were disappointed Java didn't have it. Sure, it's "syntactic sugar", but that's not always a bad thing. Python, Ruby, C#, and who-knows-how-many other languages have something like it too. Currently JSR 201 talks of adding it to the language in release 1.5, though I don't know how likely this is. As far as I know there's been little officially announced about what 1.5 will contain. But the emphasis seems to be on making APIs easier to use; I think that foreach would fit nicely here. The current proposed syntax is described here. Note that they use "for" rather than "foreach", which make sense - on the off chance someone's already using "foreach" as an identifier in their program, why break it? Note that Perl considers "for" and "foreach" to be synonyms anyway; I'll probably continue to say "foreach" for now, just to distinguish it from a conventional for loop. Anyway, I hope this feature does make it into Java 1.5; it would be cool. Along with generics of course - I'm guessing that if generics finally make it into the language, foreach will come too. Crossing fingers...
[ January 29, 2003: Message edited by: Jim Yingst ]
 
Anonymous
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jim Yingst:
[b]So while it might not be particularly necessary to replace the trusty sentinel check with a new idiom, it's at least a fun excercise in design.

I fully agree with that; I'm on your side and on Michael Matola's side too; I really enjoy this little discussion. Don't misunderstand me, all I was saying that, given the paradigm of files being read from start to end, etc. the (t)rusty 'sentinel-idiom' was/is quite sufficient.
IMHO it must take a major paradigm shift to get rid of this idiom. Maybe memory mapped files (see mmap and its compadres in the low level C libs for an example) using 'write through' when these memory maps could be treated as array lists or similar would be a solution; don't know yet ... OTOH, not all streams have such a 'backup store', e.g. sockets, stdin/stdout streams etc.
I'll get back to this later, because it's fairly late now at this side of the puddle, so I'll follow my tribe and retract back into the woods for the night.
kind regards
 
gautham kasinath
Ranch Hand
Posts: 583
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Michael Matola:
Maulin and Barry have very validly pointed out what's wrong with the hasNext method in my code.
Interestingly, I think the easiest way to make hasNext atomic -- detect the state but not change it --, is to load the whole file into some data structure (array, list, etc.) (all at once would probably be easier than incrementally). Which pretty much gets us to Jim's revised readLines
And as far as Barry's nitpick about if ( hasNext ) instead of if ( hasNext == true ), all I can say is that Barry knows I knows this because he's seen code of mine in which I *do* get it right.
-
Well.. I think that would be a bad strategy to use.. Because if the file that you are reading is too large, which may happen so.. when an unsuspecting API user is going to use your library..
So "Pay as you use" would be, IMHO, the best way to work it out..
Lupo
 
Frank Carver
Sheriff
Posts: 6920
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've come up aagainst this sort of issue (splitting streams into lines, for example) many times, and I'd like to add a few comments.
Personally, I find many of the standard API classes somewhat too heavyweight. Try implementing your own Map, for example when all you need is an class with an "Object get(Object)" method. As a result of this I often find myself creating tiny interfaces with only one or two methods, and extending these methods when I need more complicated behaviour. I also find that IOExceptions, in particular need to be propagated everywhere, even if you are using classes such as StringReader which never throws any!
For this particular problem I have a small collection of interfaces such as "StringSource" which has one method "nextString()" defined to return either a String or null if there are no more. It throws no checked exceptions. Implementing this I have such classes as "WholeFileStringSource" which opens a file, reads it and returns the contents as a String (or null if not found), "StringStringSource" which takes a String and returns it once, and SplitLinesFilter, which gets each String from a StringSource and splits it into lines.
Each one of these tiny classes doesn't do much, but because they are simple, they can be used in a lot of places where heavier classes would be inappropriate. eg.

These classes are so simple that they make unit testing a snap. All I need to do to test my "loadConfigs" method is to pass it a StringStringSource instead of a WholeFileStringSource in my test method and I can make sure it works without ever messing with files.
In the case where the file might be too large to load all in one go, I do also have a FileLinesStringSource which reads lines as needed, but it is more complicated, and needs an extra method call to close the file if you want to stop reading lines before the file has all been read. Where possible, I use the smaller and simpler version.
[ January 30, 2003: Message edited by: Frank Carver ]
 
Michael Matola
whippersnapper
Ranch Hand
Posts: 1826
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So, now, do I save this thread in the "Frank Carver" folder on my hard drive or the "Jim Yingst" folder? :roll:
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was going to put it in the "Michael Matola" folder. But that may not work as well for you.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!