• Post Reply Bookmark Topic Watch Topic
  • New Topic

Performance issue on parsing large text file

 
Serkan Demir
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi everybody,
I am going to parse large text files periodically and do some operations them. Algorithm is simple, parsing line by line and comparing lines in another text file (somehow similar to diff operation in Unix). I have completed it by Java but performance is not acceptable.
I have heard that some scripting languages (such as Jython and Perl) might be used for text processing. With your experiences, is there anyone who can guess my performance increase after switching my implementation from Java to such a scripting language?

thanks lot,
Serkan Demir
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd be more inclined to dig into your algorithm for speed opportunities, and then double check the actual code for something like forgetting to buffer the input. Can you share those bits with us?
 
Joe Ess
Bartender
Posts: 9361
11
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just for the record, Jython is a bridge between Python, a scripting language, and Java. It is not a language by itself.
If you are using Readers, you can gain an order of magnitude of performance by using Streams. Readers do a character conversion on all the data they read in. If you are using BufferedReader.readLine() you are also creating a String instance for each line. The bigger the file, the more short-lived objects created, the worse performance. See if comparing bytes or groups of bytes (i.e. a line) makes sense for your requirements.
 
ak pillai
author
Ranch Hand
Posts: 288
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Use the New I/O (NIO) i.e Non-blocking I/O for more scalable and better performance.

Java has long been not suited for developing programs that perform a lot of I/O operations. Furthermore, commonly needed tasks such as file locking, non-blocking and asynchronous I/O operations and ability to map file to memory were not available. Non-blocking I/O operations were achieved through work around such as multithreading or using JNI. The New I/O API (aka NIO) in J2SE 1.4 has changed this situation.

NIO Buffers hold data. NIO Channels can fill and drain Buffers. Buffers replace the need for you to do your own buffer management using byte arrays. There are different types of Buffers like ByteBuffer, CharBuffer, DoubleBuffer, etc.
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24213
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by ak pillai:
Use the New I/O (NIO) ... for ... better performance.


Do be aware that all the built-in Stream implementations have been rewritten on top of NIO. It's quite rare that you can rewrite using NIO and see performance gains unless you somehow in the process significatly change your algorithms as well.
 
Serkan Demir
Ranch Hand
Posts: 61
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have used BufferedReaders and BufferedWriters for I/O operations and kept the buffer size acceptably large. For optimization, i have also piled all writable strings in StringBuffer and written them only once into a file. I have no too many garbage objects in my loops.
I think i have made optimization as well as java 1.4 sdk permits (unfortunately we might not use 1.5).
Since my operation is an offline proces, instead of using java, i am using unix diff method and using its report.
thanks lot for everybody.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!