Win a copy of Java Concurrency Live Lessons this week in the Threads forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Looking for an order-of-magnitude speed-up  RSS feed

 
Charles Knell
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am very new to Java, so if you are kind enough to respond, please don't assume that I know very much about this or that class.

I am working on a Pentium IV 3.2GHz computer with 1 GB of RAM, Win2K, JDK 1.5.

I have a task to read a file from disk and make a number of regular expression swaps. At first I read the file from the disk for each substitution, executed the regex swap, wrote the file to disk, and repeated for each subsequent regex.

The first pass reduces the file from 1.8MB to about 700K, then I put it through an XSLT transform that requires some post-processing. The post-processing involves three additional passes through the file.

The processing took an hour. I re-engineered the task by overloading the loadFile() method on the classes that did the regex swapping so that it would also take a string instead of its no-arg method that loaded the file from disk. Now, instead of saving to disk after each swap, I pass the string along by calling the mutator and accessor methods on the classes involved, storing the file contents to a String. After this bit of jiggery-pokery, I got a 50% improvement in processing time.

What I was really after was an order-of-magnitude improvement in processing time. Here is the relevant (so far as I can tell) part of my code. I crafted it in the time-honored fashion of plagarizing some example code I found somewhere.

How can I re-engineer this to cut processing time to under five minutes, maximum? Thanks.


[ May 15, 2006: Message edited by: Jim Yingst ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I recommend you use code tags in the future for readability. I added them to your post above.

The biggest improvement I see, based on the short sample you showed, would be to get rid of the string concatenation (+=) in a loop, and append to a StringBuffer (or StringBuilder) instead:

The problem with the string concatenation the way it was being used there is that it's creating a new string each time, which requires copying the content of the old string into a new one. This is typically fine for short strings, but if you've got a 1.8 MB file, that's a lot of copying to be doing every time you add a line.

Note that readLine() does not include the line separator as part of the return value, so to reconstruct the original file content in a String, you'd need to put a newline back in. (There are additional complications I'm glossing over here.)

Here's a further improved version (I hope):

Using a char[] buffer allows us to eliminate creating a bunch of intermediate String objects - unnecessary if we're going to put everything into one big String in the end. And a BufferedReader is unlikely to offer any further advantage we're doing our own buffering - probably.

This assumes that putting everything in a single huge String is the best way for you to process this data. I don't know if that's the case; it depends what sort of processing you're doing. But this ought to be substantially faster than your first effort at least.
 
Charles Knell
Greenhorn
Posts: 25
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, thank you. That helped a great deal. That moves the bottleneck down the line to the regex swapping. That is taking far longer than I think it should, but I'm not sure how to approach it. I have this incohate notion that a streaming solution with call-backs like a SAX parse would be the correct approach, but that's way beyond my Java skills. Which forum do you suggest I persue this matter in?
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hm, that's a bit of a toss-up, but I'll vote for Performance. Though Java in General - Intermediate would also work. We can move it later if it turns out that a different forum would be more appropriate.
[ May 16, 2006: Message edited by: Jim Yingst ]
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!