Thank you for using code tags, however the indentation in your code jumps around rather randomly; it's hard to read sometimes. I think this may be because you're using a mixture of tabs and spaces, and your tabs are set differently than the web browser displays them. I recommend that you either use
no tabs for indentation, or
only tabs. I prefer no tabs, since by default tabs take 8 spaces, and that's much more than you really need. And not everyone has a really big display; some people use laptops, so really wide lines aren't a good idea.
I see several empty catch blocks in your code. These are almost always a bad idea, as they make it needlessly difficult to find errors when they occur. A simple alternative is to use printStackTrace():
[Megha]: 1) The size of the output file I get is bigger than the input file, I dont know whats the error. Is this true for both your programs, or just the threaded version? Have you tried looking at the contents of the files, to see how they compare? I realize you don't want to look at all of a 1.77 GB file, but try looking at the beginning, at least, and see how similar they look. You may get important clues about what's going on.
It looks to me like your threaded code is reading the first line of the file over and over. I would think that you want to read each line once, instead.
As Steve says,
you should really get the non-threaded version working correctly first, then see if you can improve its speed, before trying the threaded version. The non-threaded version should be a lot simpler.
To improve speed: a profiler would be a very useful tool here. But even if you don't have it, you can run a few simple tests by just commenting out a few lines of code:
1. How long does it take to read every line in the input file, and do nothing with them?
2. How long does it take to read every line in the input file and write it to the output file, with no character replacement?
3. How long does it take to do the same thing
with character replacement?
In this way, you can discover which parts of this process are important to speed up, and which are not. I can imagine some ways to speed up the character replacement, but in all likelihood they are unimportant, because you're spending almost all the time reading and writing.
For the threaded version: if you want to limit the amount of data that can be read in at once before it's written out, try using a LinkedBlockingQueue instead. You can experiment with the size of the queue, to see what size gives you the best performance, or if it matters at all.
If you print "reading from queue" and "writing from queue" every time you read or write a line, that's going to slow things down quite a bit. That's fine for debugging while you're trying to get this to work, but I hope you comment those lines out later when you want it to be fast.
[ June 06, 2007: Message edited by: Jim Yingst ]