Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading Large File Heap

 
Markus Schmider
Ranch Hand
Posts: 132
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I have to read large log files


The code works, but when I profile it I see that large numbers of String and char[] accumulate in the heap. They do not disappear after a forced GC.
How can I avoid this?
 
fred rosenberger
lowercase baba
Bartender
Posts: 12266
36
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There is no such thing as a "forced GC". All you can do is suggest to the JVM that "now might be a good time to do this", but it can ignore your suggestion.

Is there an actual problem you are having here? Are you running out of memory? Is it running slower? Unless you have some documented issue and you know memory is the cause, I wouldn't worry about it.
 
Markus Schmider
Ranch Hand
Posts: 132
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well I can force a GC from my monitoring tool. Ok request might be the better word.
And you can call System.gc(), which of course I never do :-)

The heap can get a problem for this use case with large log files (debug mode) and and insufficient RAM on the client.
 
fred rosenberger
lowercase baba
Bartender
Posts: 12266
36
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Markus Schmider wrote:Well I can force a GC from my monitoring tool. Ok request might be the better word.
And you can call System.gc(), which of course I never do :-)

didn't know that about the monitoring tool..interesting. and System.gc() is the suggestion to the JVM you can put in your code, but it does not force it.
 
Tim Cooke
Sheriff
Pie
Posts: 3210
142
Clojure IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's my analogy for System.gc();

Code: "Hey GC, I got some stuff over here that I'd like cleaned up please"
GC: "I hear ya. I'm kinda busy right now but I'll add it to the ToDo list and I might get to it later."

Doing a 'forced GC' from your profiler tool is hardly a sustainable strategy either, so I'd forget about both of those right away.

I assume that there is more to your code than that which you have presented, as what you have presented will not even compile.

From what you have shown, I can see that you are adding a String object to the StringBuffer for each line of the file being read in. If that file has many many lines, then you are creating many many String objects. Can you see from your profiler where your many String objects are residing? I suspect they belong to the StringBuffer.
 
Terry McKee
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since you are using the default StringBuffer constructor, it will hold enough space for 16 characters. As you add lines form the log file to the buffer, it will have to create new backing character arrays that are larger. This causes the JVM to waste a lot of memory since the old backing arrays may not be gc'd right away. One way to optimize your code is to preallocate a larger capacity for StringBuffer. For instance, using new StringBuffer(100000) will create an initial backing character array that can hold 100,000 characters. You may need to increase the max heap size that you start your process with so you don't run in to OutOfMemory errors.

As a side note, you may want to switch to a StringBuilder if your code is thread-safe.
 
Paul Clapham
Sheriff
Posts: 21581
33
Eclipse IDE Firefox Browser MySQL Database
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You could try creating the StringBuilder with the correct size to start with -- you know the size of the file in bytes, so that's a good approximation. Then you could write code which just reads characters from the file in groups and appends them to the StringBuilder. Unless you really need to standardize the line-ending characters to be whatever your NL variable contains, that is. You didn't state that as one of your requirements, so I can't tell if it really is a requirement or just an artifact caused by your choice of using the readLine() method.

Or alternatively you could search for a solution which doesn't require storing the contents of the whole file in memory all at once. You didn't say why you needed to do that, either; perhaps you don't really need to.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic