I have a Java program doing some IO that's about 2x slower than the corresponding C code and would like help optimizing the Java version. The C version goes in an average of 64 seconds and the Java in about 127, not counting the time for the JVM to load and start. The task is to read many (~6000) small text files from a given directory. The salient part of the Java code is:
The text files look like: AREA_NAME_GOES_HERE 40.67281150817,22.93920917511 40.23754310607,22.93920917511 40.23754310607,22.50393657684 40.67281150817,22.50393657684 END In case it matters, I'm using Java version "Classic VM (build JDK-1.2-V, native threads)" and Borland's freebie C++ compiler version 5.5 on Win 2000 with plenty of memory (224). To get consistent runtimes, I'm testing each just after a reboot to insure they're not penalized by other stuff running or benefiting from files cached in memory. Suggestions...?
I would start by making it multi threaded. Maybe ten threads running at once. There is a lot of time sucked up with opening a file. Next, rather than using the StringTokenizer stuff, I would use the much faster String.indexOf(','); Next, string comparison stuff is pretty slow. For Java as well as C. I hate while loops that contain a lot of stuff. And I hate seeing code duplicated outside of the loop for initialization. Closing your buffered reader will close all the other file stuff. try this:
Thanks for the suggestions and code, Sherrif. The leaner string / parsing stuff whittled a little off the time -- down to an average of 124 from 127 seconds. I'll have a go at multi-threading. Should be interesting; haven't done threads in Java yet -- good chance to explore that area.
I'm using Java version "Classic VM (build JDK-1.2-V, native threads)" and Borland's freebie C++ compiler version 5.5 on Win 2000 with plenty of memory (224). The first (crude) whack at a multi-threaded loader has shaved some more off the time. It's down to ~93 seconds from ~124 using 10 threads. Now that I understand the basics, I'm going to refactor my initial implementation to clean up the architecture and hopefully further reduce the time. I'll report back on that in a day or so.
So is your VM the Sun VM? I suspect that the VM could do a bit more optimizing with I/O stuff. I suppose that increasing the buffer size won't make much difference since the file sizes are already so small?
Uh, dunno. How can I tell? java -version just says: java version "1.2" Classic VM (build JDK-1.2-V, native threads) Can you recommend where I can get a better VM? Yeah, bigger buffer definitely seems unlikely to help. Most files are just 6 lines long.
I loaded jdk 1.3 and that made a big difference. Between this change and the others (multiple threads, simpler string stuff) the Java code is down to the same runtime as the (unoptimized) C version. This is probably good enough for our purposes. Thanks for all the helpful suggestions!
I would guess that the C code doesn't use two byte characters, nor String objects. For this kind of task, converting bytes into chars and creating multiple String objects both impose significant overheads. Once again (see the 'speed of Integer' thread), Java provides you with the ability to get maximum speed, but to do so your code ends up looking very similar to the C code. It depends one whether the speed is more important than using good object-oriented coding.