• Post Reply Bookmark Topic Watch Topic
  • New Topic

How to speed up in reading big big ascii files  RSS feed

 
Jimmy Chen
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I used memory-mapped file to read a very big ascii file(600m),
I find it runs very slowly. MappedByteBuffer only can read byte by byte!!!(Can it read more?)
Can anybody help me make it faster.
Here is my code, is there any problem? Thanks in advance!!!

 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Yu Liang ]: MappedByteBuffer only can read byte by byte!!!(Can it read more?)

Yes, it can. For example, see the method get(byte[] dst, int offset, int length). However in your code, you're comparing each character to '\n', so you can only read one character at a time. It's not a limitation in MappedByteBuffer - your algorithm allows only one character at a time.

To speed it up, I would first suggest getting a profiler to measure where the bottleneck really is. Here is a recent discussion of profilers.

It's not clear what is done inside the parseString() method - it's extremely possible that the slowest part of your code is here. In which case it probably doesn't matter what you do outside the parseString() method.

Are you using JDK 1.5? It might be worthwhile to try a Scanner here, using the ReadableByteChannel constructor. Otherwise, I'd try to avoid using a char[] array or Strings as much as possible. Use a CharsetDecoder to convert the ByteBuffer to a CharBuffer, then use the methods of the CharBuffer (which implements CharSequence, just like String) to find where the '\n' characters are. If you don't have enough memory to convert the entire ByteBuffer to a CharBuffer, then convert it in several smaller pieces. This is more complicated, so before you try it I will re-emphasize my earlier points: use a profiler to find where the bottleneck really is, and consider using a Scanner to simplify your code (while still taking advantage of many of the optimizations which NIO gives you).

Good luck...
 
Jimmy Chen
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you, Jim!

I checked the method get(byte[] dst, int offset, int length), it says:
In other words, an invocation of this method of the form src.get(dst, off, len) has exactly the same effect as the loop

for (int i = off; i < off + len; i++)
dst[i] = src.get();


so, I think this methed will not make my code faster,isn't it?

You are right, the bottleneck is inside parseString(). I want to get the value from the string. I use String.split() to get the value. like this:



I think it cost too much resources.
I havn't try jdk 1.5, I use 1.42, I will try the Scanner.

Thanks !!!
 
Jimmy Chen
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
otherwise, I want to change these values to float. How can I change them use the Charbuffer instead of the String? Can you give me some suggestion?

here is the values in ascii file.

time = 0, 0.01041667, 0.02083333, 0.03125, 0.04166667, 0.05208333, 0.0625,


thanks!!!
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Yu Liang]: so, I think this methed will not make my code faster,isn't it?

Well if the bottleneck is in parseString() as you noted, then changing this method probably won't matter much. But when the API says "has exactly the same effect" - that does not include how long it takes to execute the method. If you compare

to

the results (what data gets put in dst) in each case are the same - but the time to execute the method will be different.

I think it cost too much resources.

Well, maybe, maybe not. It's going to be hard to successfully optimize the code without some profiling data.

otherwise, I want to change these values to float. How can I change them use the Charbuffer instead of the String? Can you give me some suggestion?

Again, using Scanner would be my first choice.

Otherwise - it looks like you're going to need most of the data on the line, and in order to convert to float in 1.4.2 you need somethign like Float.parseFloat(), which requires a String as input (not CharBuffer). So I guess you might as well convert the whole line to a String rather than getting a CharBuffer. I'd just use a BufferedReader and a FileReader - or maybe a RandomAccessFile will be faster, since its readLine() is a native method, it could be well-optimized for your system. If String's split() method is too slow (an unproven assumption) then I would try something like:

If that's still not fast enough, you can always use String's indexOf() to locate the '=' and ',' symbols, and use substring() and trim() to extract the numbers.
 
Jimmy Chen
Ranch Hand
Posts: 54
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you very much, Jim!
I learned a lot from your reply. As you said:
I'd just use a BufferedReader and a FileReader - or maybe a RandomAccessFile will be faster, since its readLine() is a native method, it could be well-optimized for your system.


but now I have to use memory-mapped file to read this big ascii file, so I can't use these Readers, can I?
[ July 02, 2005: Message edited by: Yu Liang ]
 
Don't get me started about those stupid light bulbs.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!