• Post Reply Bookmark Topic Watch Topic
  • New Topic

Optimizing time to read files from local system  RSS feed

 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can anyone suggest me an approach to speed up the process of reading files from local file system.

I am sending the filename and directories as an ArrayList
e.g [c:\Documents,D\FILES\1.txt,c:\] which will then be fetched by the file operation method to extract the data.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do you read the files?
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am reading the files using FileInputStream. Well this process is atomic and any approach is viable.

But, to this in a sequential way is slower. So I was looking for optimization.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The first thing you should try is wrapping the FileInputStream in a BufferedInputStream. If that's not enough, show us some code, please...
[ October 25, 2004: Message edited by: Ilja Preuss ]
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What has to happen to a file after it is read? Can you pass the data to a processing Thread that could work while file IO is going on? Are the files treated as text charactes or can the data be processed as bytes? Conversion to Java UNICODE Strings can be very time consuming.
Bill
[ October 25, 2004: Message edited by: William Brogden ]
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Now , with this code can you tell me will using threads to read files will optimize on time?

If yes, then can you give me the code for the same.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Two suggestions:

First, as I wrote earlier, use a BufferedInputStream, that is, replace all

in = new FileInputStream(...)

with

in = new BufferedInputStream(new FileInputStream(...))

That is likely to give you a *big* performance boost.

Second, notice that you read each file *two* times. If you could reduce that to one pass, that would likely improve performance, too.

If you can't do that, try at least reusing the first stream for the second pass, by using the mark and reset methods.

If that's still too slow, you could take at the new NIO package of JDK1.4. Can't provide any experience with that, though...
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Now , with this code can you tell me will using threads to read files will optimize on time?

If yes, then can you give me the code for the same.

0. Like Ilja said - if those WordDocument, etc objects don't provide for buffered input, thats where you should look for improvement first.
1. If you hand off each of those tasks to a separate Thread you would probably realize some improvement in overall performance since the parsing Thread(s) could work while other Thread(s) were waiting for IO. Exactly how much improvement is impossible to say - you should try to create some sort of simple test case before investing a whole lot of effort. For example, you could create one WordDocument reader and one PDFDocument reader each with a separate Thread and compare total time with running them in sequence in a single Thread.
2. "give you the code"? Optimizing your problem in detail would take a lot of time and time == $$.
Bill
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Your suggestions were of great help ...

Well William I mean to say the logic of this rather than code .. and of course time is money
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think we would all be interested in your results in any case.

Incidently, I suspect that the increasing availability of "dual core" CPUs is going to make for some interesting performance improvement possibilities.
Related topic:
Recently I came across the "SEDA" - architecture for highly concurrent server applications
SEDA stands for Staged Event Driven Architecture - see
http://www.eecs.harvard.edu/~mdw/proj/seda/
and
http://sourceforge.net/projects/seda/
Bill
[ October 28, 2004: Message edited by: William Brogden ]
 
Guy Allard
Ranch Hand
Posts: 776
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When you create the BufferedInputStream, specify a buffer size.

If you look at the source for BIS you will see that the default buffer size is fairly small.

If the size of the files exceeds the default, you will see remarkable improvement on read times.

More memory use of course.

Guy
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!