This week's book giveaway is in the Spring forum.
We're giving away four copies of Spring in Action (5th edition) and have Craig Walls on-line!
See this thread for details.
Win a copy of Spring in Action (5th edition) this week in the Spring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Knute Snortum
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Ganesh Patekar
  • Frits Walraven
  • Tim Moores
  • Ron McLeod
  • Carey Brown
Bartenders:
  • Stephan van Hulst
  • salvin francis
  • Tim Holloway

Readability: Read, Process, Write File  RSS feed

 
Ranch Foreman
Posts: 35
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here is what I would consider ideal readable pseudo code to read, process and write a file:



After a few iterations, this is the closest I can come to the pseudo code above:



Is there a better way to think about the design of the functions and program flow to closer approximate the pseudo code?
 
Saloon Keeper
Posts: 5144
54
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jay Rex wrote:

This is actually pretty good. You need to decide on how your readFile() method should return the file data. For a .txt file a common practice is to return a List<String> with one String for each line of the file. The data that you return from processFile() is unclear without further details of the requirements. I presume it will be a Collection of class instances, most likely of classes you'll need to create. Your writeFile() method might possibly use a toString() method or methods from the class(es) you create.
 
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:. . . . For a .txt file a . . .  a List<String> with one String for each line of the file. . . . .

you could also use a Stream<String> but it is less obvious how you are going to do it.
 
Jay Rex
Ranch Foreman
Posts: 35
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My ideal method chaining solution would somehow be:



Is this somehow possible?

 
Jay Rex
Ranch Foreman
Posts: 35
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As a side note, the implementation is entirely up to me. If I can create the method chain as shown, that would be welcome.
 
Sheriff
Posts: 5446
147
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First off, let's call the class ReadProcessWriteFile and we create an instance of this class called rpwf.  To do the chain, you would need to have the methods read() and process() return this.  read() would need to become read(input.txt).  Intermediate variables would become instance variables.  The execution code would look like
But as cool as that sounds, it would be more usual to see something like this:
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A Stream may be unfamiliar to you, but you can do this sort of thing. You would have to wrap the whole thing in a try‑catch because you might get exceptions from line 1 or 2.I can't find a Formatter constructor taking a Path as a parameter, so I had to use the old version Formatter(File) instead.
Line 2 creates a buffered reader, and line 3 creates a Stream reading individual lines in order.
Line 4 removes any empty Lines.
Line 5 passes them to a Foo(String) constructor which creates a Foo object from the contents of that line.
Lines 6 7 8 take each Foo object, call a method on it, and writes the result to an output text file. The %s tag calls toString() on its argument f.
There are all sorts of variations to that sort of code.
 
Jay Rex
Ranch Foreman
Posts: 35
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Knute Snortum wrote:But as cool as that sounds, it would be more usual to see something like this:






This would be ideal, as the method chaining solutions shown above are a bit too far from my current level.

How would I implement this?

read(input.txt) could return a List<String>.
process could accept the List<String> as a parameter.
write would need to accept the processed List<String> and the file name.

The only parameter is the file name, which follows the pseudo code, but how would you implement that?
 
Jay Rex
Ranch Foreman
Posts: 35
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Carey Brown wrote:This is actually pretty good.



Thank you, but there is a barrier that I need to overcome here.

read can take something as a parameter. The filename would be ideal. it can return the List<String>.
process can take the returned List<String> as a parameter, and return a processed List<String>.
write would need to take the processed List<String> as a parameter and return the written file?

Without parameters, this becomes a simple method chain, but how do I specify the input and output file names?

 
Rancher
Posts: 3753
40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Method chaining, though it can be done, is not really applicable in this case.
The main reason being that the methods can't be called in any order.  You need to read the file before processing.
So I would argue that allowing such chaining would not reflect how the code has to work.
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The one way I can think you can chain methods is via a Stream. Even Cay Horstmann's books have Streams as an “advanced” feature, but I think they should be taught early in courses. Another advantage about the Stream is that even a large file with millions of lines doesn't risk overwhelming your memory because only one line is in memory at a time.
 
Carey Brown
Saloon Keeper
Posts: 5144
54
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jay Rex wrote:read can take something as a parameter. The filename would be ideal. it can return the List<String>.
process can take the returned List<String> as a parameter, and return a processed List<String>.
write would need to take the processed List<String> as a parameter and return the written file?

Without parameters, this becomes a simple method chain, but how do I specify the input and output file names?


Input and output file names are the easy part. You could get them from command line args, environment variables, prompt for user input, or just plain hard code them.

The overall structure that you outlined is good, however without knowing more about the details of the requirements it would be impossible to fill in the blanks accurately. I mentioned readFile() as returning a List<String>, this has been used many times and is simple but other approaches have been used even more, such as returning a List<SomeObject> where the SomeObjects have been created by some amount of text file parsing. A good example of this would be text files that are CSV (comma separated variable) formatted and each line represents the fields of SomeObject. It doesn't even have to be a List, it could be any Collection, including one of your own design. There doesn't even have to be a one-to-one correspondence with lines of the file and the resulting collection; e.g. JSON formatting.

Along the same lines, your processData() method most likely will not have a one-to-one correspondence with input structure and output structure. If you are lucky and the detailed requirements point in that direction then life will be simpler for you, but that is usually not the case.

Same for writeFile().

So, without the detailed requirements all you've got is an outline of the big picture steps; a good beginning but a long way off from a true design.
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are all sorts of ways to specify file names. You can hard‑code them. You can pass them as arguments (even via the command line). You can get them from a file chooser. I like file choosers because they don't introduce spelling errors or non‑existent files into your app.
 
Dave Tolls
Rancher
Posts: 3753
40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:The one way I can think you can chain methods is via a Stream. Even Cay Horstmann's books have Streams as an “advanced” feature, but I think they should be taught early in courses. Another advantage about the Stream is that even a large file with millions of lines doesn't risk overwhelming your memory because only one line is in memory at a time.



In the case of streaming, though, that is not a read-file/process-file/write-file, is it?  That would (I assume) be a read-line/process-line/write-line (or something along those lines anyway).  Which is why you can handle massive files.

My comment on chaining was directed mainly at the:

code mentioned earlier.
You can't swap the process and the read around, so chaining in that way makes little sense.

Now, if the read returned a FileData object that had a process method, and the process method returned a ProcessedData object that had a write method, then yes...that would remove the disconnect.

Having said all that, as you say, the normal way to read and process a file would be line by line (or suitable-chunk-of-data by suitable-chunk-of-data).
 
Knute Snortum
Sheriff
Posts: 5446
147
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Jay Rex wrote:How would I implement this?

read(input.txt) could return a List<String>.
process could accept the List<String> as a parameter.
write would need to accept the processed List<String> and the file name.

The only parameter is the file name, which follows the pseudo code, but how would you implement that?


I like the way you've laid out the run() method.  As Carey said, the devil is in the details, but let's look at a few things first.

As Campbell said, you could get your file names several ways.  Hardcoding is the easiest, but a file chooser is better for user.  Why not hide the implementation in a method, as you've done with the other methods in run()?  So...
So now you can start your implementation easily like
If you need both the input and output files, you could build and return a Map, or...

Maybe getFileNames() doesn't have to return anything.  Once you get the names, you could hold them in instance variables.  These are variables that are declared outside of any method and are not static.  These variables are available to any non-static method in the class.

Much more to do, but that can get you started.
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Dave Tolls wrote:. . . That would (I assume) be a read-line/process-line/write-line (or something along those lines anyway).  Which is why you can handle massive files. . . .

Yes.
 
Jay Rex
Ranch Foreman
Posts: 35
7
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Dave Tolls wrote:Method chaining, though it can be done, is not really applicable in this case. The main reason being that the methods can't be called in any order.  You need to read the file before processing. So I would argue that allowing such chaining would not reflect how the code has to work.


Thank you, this is what I needed to hear. I have shelved the method chaining idea for this task, as it doesn't make sense here.

Carey Brown wrote:So, without the detailed requirements all you've got is an outline of the big picture steps; a good beginning but a long way off from a true design.


You are correct. So let me narrow this down to the following: The design I am considering is for small files, where the input and output is one to one. Each line in the input file matches each line in the output file.

Please provide critique and improvements for the following implementation:

 
Dave Tolls
Rancher
Posts: 3753
40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Now that goes to what Campbell has been talking about.
Since the processing of each line is independent of all other lines, then it is a good idea not to read the whole thing in at once (in general) as that doesn't scale.
If you suddenly found yourself having to read in a big file you might have memory issues.

A non-stream method would open input file to get a reader, open output file to get a writer, then process line by line, which would involve reading a line, processing the line, then writing the line.
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Dave Tolls wrote:. . . A non-stream method would open input file to get a reader, open output file to get a writer, then process line by line, which would involve reading a line, processing the line, then writing the line.

Something like this?You are either going to end up with a long method if you have the reading and processing and writing in the same place, or you get awkward code like the above. It would have been even worse before try with resources was introduced.
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JR: I like your use of readAllLines; it shows you are using up to date code. Let's see if we can't shorten that method. If it is a private method, why are you returning the List? Why not make the List a field? We started with this:-So we can lose the local variable and the return statement.What is going to happen if you suffer an IOException? Your code will continue to run, and you will end up with the List not correctly assigned to. So maybe this method isn't the right place to handle that exception..That gets you out of returning null which would simply cause problems elsewhere. Yes, I know you never actually get that exception the compiler complains about, but you do have to consider it.Now, if you run a loop or a Stream processing the List after an exception, you will run 0× or get a 0‑element Stream, and you will end up with a 0‑line output file.  The Unicode escapes 201c/201d are posh quote marks.
 
Dave Tolls
Rancher
Posts: 3753
40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Something like this



Pretty much.
It's the only way (pre-streams) to ensure you don't get hit by a large file.

Most of it is the boiler plate for opening/closing the files in any case.

Reading everything in, if the lines are pretty much independent chunks of data, is rarely a good idea, and a pattern that will eventually come back and bite you.
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Dave Tolls wrote:. . . Pretty much. . . .
Most of it is the boiler plate for opening/closing the files in any case. . . .

Thank you. How much of Stream code would be such boilerplate? Let's go back to my earlier example.Two lines of sort of boilerplate at the beginning and four lines of catch, and lines 3 and 12 contain only braces.. Not too bad, particularly if you think you need lines 1‑2 to create the two objects anyway.
 
Dave Tolls
Rancher
Posts: 3753
40
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pretty much the same.
I'm not too sure what the discussion is here, though.
 
Campbell Ritchie
Marshal
Posts: 61741
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, we have digressed just a bit, haven't we.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!