• Post Reply Bookmark Topic Watch Topic
  • New Topic

String - memory prob on Linux  RSS feed

 
Ben Wood
Ranch Hand
Posts: 342
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have a batch-job style application that is used to convert very large text files containing grids of data into a basic binary version. This program is very simple and consists of reading a line from an ASCII file in a loop, parsing it (just space delimited data), and writing out the numbers to a binary file. This work fine on my Win 2K PC but we have recently been trying to run these jobs on RH Linux (with a view to running these larger jobs on a Linux cluster). Problem is it's hammering the memory. As the loop continues the memory gets used up, which doesn't happen on the win PC. The String variable I read each line of the text file into is declared outside the body of the loop, so I think at the end of each loop the latest line to have been read in and placed on the stack should be made eligible for garbage collection; and this seems to work apart from on Linux.

Has anyone come across memory leak problems with the Java VM on Linux before?
[ August 02, 2004: Message edited by: Ben Wood ]
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
First, make sure you know what version of Java you're running. Although Sun's JDK is most commonly used on Linux, it's not free software, so many Linux systems are set up to supply the "Kaffe" JVM out of the box. Check for this: try "java -version" and see what you get. If it's Kaffe, do yourself a favor: uninstall it, download and install Sun's JVM, and try again. If you're still having the problem, come back and we'll work on this some more.
 
Ben Wood
Ranch Hand
Posts: 342
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We're running...

java version "1.5.0-beta2"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta2-b51)
Java HotSpot(TM) Client VM (build 1.5.0-beta2-b51, mixed mode, sharing)

I assumed we were running 1.4., I wonder if the fact it's 1.5 beta is causing problems? I include some basic code to demonstrate what the loop does below, in case anything jumps out of the page at you guys....

 
Fletcher Estes
Ranch Hand
Posts: 108
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd be interested to see what the solution to this problem is... You're creating 2 objects per line read of your input file, of which both are placed on the garbage collectible heap in the next iteration of the loop. I don't know if st.nextToken() creates a temoporary String object each time or returns a reference to an existing object. How big are your input files (in lines and disk size)?

The JVM should run garbage collection when the amount of memory used by the gc heap reaches a threshold. This threshold is configurable through command line parameters (type 'java -X' for more info). Try having a play around with this, or try using System.gc() at regular intervals to see what the effect is.

Also, are there significant hardware differences between your Windows and Linux boxes?
 
Ben Wood
Ranch Hand
Posts: 342
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, everything should be on the GC heap as far as I can tell at the end of each loop iteration. It seems strange that this is only a Linux problem. We have now tested on JVM 1.4 and the same thing happens. Have tried setting things to null within the loop and also putting System.gc() in there. Memory options have been configured. Still no cigar...

The Linux box is a moderately spec'd PC and should be capable of doing the work, I think 500MB RAM. Similarly spec'd Win PC's run the job fine. The hit on Windows is on the CPU (as would be expected), but memory usage is very small so even low spec boxes should have no trouble plodding through this loop.

The ASCII input files are big. About 2GB, probably something like 100,000 lines.
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24217
38
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What kind of an object is "binaryOut"? Is it possible that there's some out-of-control buffering going on?
 
Ben Wood
Ranch Hand
Posts: 342
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Apologies, I should have clarified that. Here are some more bits of code from outside the loop that should clear it up...

 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!