Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Best way to manipulate files  RSS feed

 
Vicken Karaoghlanian
Ranch Hand
Posts: 522
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
What is the best and the most efficient way to manipulate text files in Java? I have situation in which I want to modify large sized text files (around 250MB), as you can see the file is too large and efficiency will definitely play a role in this scenario.
Almost every java book I read suggests that the best way to manipulate files is to use Buffers, so I did... and wrote the following code.

What do you think of this approach? Is it efficient enough? Does it require further modifications?
Any suggestion and tips are welcomed.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Using buffered IO is definititely good. You have to inspect each line, so reading and writing a line at a time as shown is probably your best bet. Surely the easiest.
Much more work and much less readable and maybe no faster would be to read big chunks into your own byte buffer, parse for lines, do your transform thing, build up a new buffer, write that. I hate it already, don't you?
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does it require further modifications?
I would note that your error messages are potentially misleading. It's possible there could be a FileNotFoundException from the destination file (e.g. if it's supposed to be in a directory that doesn't exist) and it's possible the input file could have an IOException (e.g. if there's a bad sector on the disk). At this level, there's a lot to be said for a simple e.printStackTrace() - it will provide a lot more useful information than "Error opening source file" which might not even be correct.
I also think it may be a good idea to specify an encoding using InputStreamReader and OutputStreamReader - at least, if the files will be shared to any other machines besides the one that generated them, or if they might contain any international characters which are not defined in your platform's default encoding.
Neither of these comments has anything to do with efficiency. That's because I agree with Stan - the simplicity of the code you've shown is probably more important than any possible performance improvements you might make. Unless you've already tried this code and found that you really need to improve performance further, and you've profiled the code to verify that this file processing is really where your problem is, and you've got the time and inclination to make something more complex, and make sure it works correctly.
IF that's the case, I'd investigate the java.nio classes. The simplest approach, if you have enough available RAM, is to use FileChannel's map() method to create a MappedByteBuffer of the whole file. Then either work with the raw bytes (OK if the encoding is known to be a simple 1-byte encoding like US-ASCII or ISO-8859-1) or use even more memory to do a Charset.decode() into a CharBuffer. If you don't have enough memory for that, well, then it gets more complex, as Stan has outlined. Have fun.
 
Vicken Karaoghlanian
Ranch Hand
Posts: 522
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I would note that your error messages are potentially misleading. It's possible there could be a FileNotFoundException from the destination file (e.g. if it's supposed to be in a directory that doesn't exist) and it's possible the input file could have an IOException (e.g. if there's a bad sector on the disk). At this level, there's a lot to be said for a simple e.printStackTrace() - it will provide a lot more useful information than "Error opening source file" which might not even be correct.

Yes, you are correct Jim, i wrote the try/catch block for the sake of compiling and i didn't give too much attention for the message itself, as you suggested, using e.printStackTrace() will be the most appropriate way to do it.

I also think it may be a good idea to specify an encoding using InputStreamReader and OutputStreamReader - at least, if the files will be shared to any other machines besides the one that generated them, or if they might contain any international characters which are not defined in your platform's default encoding.

The files i am using may indeed contain international character, but what difference would it make whether i read them using InputStreamReader or FileReader, isn't supposed that each character is read according to its Unicode equivalent?!!!

Neither of these comments has anything to do with efficiency. That's because I agree with Stan - the simplicity of the code you've shown is probably more important than any possible performance improvements you might make. Unless you've already tried this code and found that you really need to improve performance further, and you've profiled the code to verify that this file processing is really where your problem is, and you've got the time and inclination to make something more complex, and make sure it works correctly.

The code i wrote is simple for a reason, I am trying to find the best and the fastest way java can provide in reading a single file and writing it to another (modification i make to the line i read is irrelevant at this point). As i pointed out in my first post, i am reading 250-300 MB files, and loading those files into the memory is not an option not to mention -- impossible (Smith> Not impossible... it is inevitable. ) Sorry about this i was tempted . That it why i am being too picky about the performance, first i must secure the first stage of my code (in terms of performance) then worry about the rest later.
[ January 23, 2004: Message edited by: Vicken Karaoghlanian ]
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!