Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Efficient File Reading  RSS feed

 
Pradeep Kalyan
Greenhorn
Posts: 6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello

We have an application that runs on PC. It reads a input file with half million records and loads them into cache (EHCACHE), before the user work on it. Currently its taking 5 minutes to do this operation. I am just wondering if there is more efficient way to read and load into cache. (may be using streams than readers etc).

reader = new BufferedReader(new FileReader(file));
while(( line = reader.readLine() ) != null ){
fds = new FlightScheduleData( line );
lastIndex = ++lastIndex;
primaryKeyList.add( key );
CacheMgr.getCacheMain().put( key, fds);
}

Also can any help let me know, if I use multiple threads, it may improve the performance (even though the pc has only one cpu). I appreciate any input.

Thanks
Pradeep
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is probably better posted on their forum

http://sourceforge.net/forum/forum.php?forum_id=322278

A couple ideas are. 1) can they do a bulk load of the cache and not call add every time, 2) why are you putting the data in a list too? if you need to create its initial size at a large enough size to hold all elements, so it doesn't need to grow
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wonder if you could do "lazy" initialization by just caching the line instead of caching the FlightScheduleData - turn a line into FlightScheduleData only to work on it?
It is not clear from the fragment of code where key comes from so this may not be posssible.
Bill
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you measure where your time goes? You do four things in the loop: Read a line, create an object, update a key list and update a cache. Get a profiler and see if one of those is eating up all the time before you try to solve a problem you might not have.

If it turns out you're spending your time waiting on the read (seems likely) try increasing (or decreasing) the buffer size and see if that makes a difference. After you've proven there is no other solution look into reading on one thread and doing the other bits on another thread or two.

The basics there are to read a line, put it in a Runnable task that does the other steps and put the task into a blocking queue. One or more other threads pull them out and run them. Java5 has all this built into the java.util.concurrent packages.

Let us know what you find!
[ April 05, 2006: Message edited by: Stan James ]
 
Edwin Dalorzo
Ranch Hand
Posts: 961
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am going to suggest something that may sound a little orthodox. You could use RandomAccessFile to read chunks of data using multiple threads.

One trhead may read from 0 upto 1 MB and have as many threads as megabytes the file have.

Using the seek method you can start reading from any position. Assign each thread a position and read from them until every thread reaches its threadshold.

Using multiple threads will guarantee that you will load the file faster.

I have never tested this idea. I am supposing that if you open the file just for reading it is possible to create mutltiple RandomAccessFile object on the same file.

What do you think?

Regards,
Edwin Dalorzo.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Edwin Dalorzo:
What do you think?


If IO is the bottleneck, this might lead the hard disk head to have to hop between different positions instead of just reading the file in one swoop, probably leading to a significant *decrease* in performance.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Stan James:
If it turns out you're spending your time waiting on the read (seems likely) try increasing (or decreasing) the buffer size and see if that makes a difference.


Or use the NIO package.

But I agree - the very first thing to do is running the code through a profiler.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!