Win a copy of Practical SVG this week in the HTML/CSS/JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

String Heap is killing me

 
Ricky Gentry
Greenhorn
Posts: 12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I can already guess the answer because of my bad luck, but is there a way to force the vm to purge the string heap?

I'm working with a large rtf file (~300kb; about 23,300,000 strings of 100 characters each). I'm reading it into a StyledDocument through an RTFEditorKit, parsing chunks of it, and passing it back out through the kit into another rtf file. I'm using char[]'s where possible, but I'm thinking that it's a failed attempt. And due to the nature of the program each string exists twice, once on it's own and then appended to another string that comes after bringing the count to ~47 million 100 character strings.

I've allocated 512mb of memory, but the program starts crawling well before that making it useless. I need a way to purge the heap or to get around it by making a new one. Is it possible to launch a new instance of the virtual machine so I can process this in chunks and recombine it later?

Any help at all will be greatly appreciated. And Happy Holidays.
 
Scott Selikoff
author
Bartender
Posts: 4033
18
Eclipse IDE Flex Google Web Toolkit
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Two questions, have you tried intermittently requesting garbage collection? Not that this will be guarenteed to help, but it might. Second, are the strings still accessible in memory? The system won't garbage collect strings that are still accessible (if it will gc them at all).

I'd recommend rethinking the design pattern or splitting up work such that you can distribute between two jvms.
 
Paul Clapham
Sheriff
Posts: 21892
36
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(~300kb; about 23,300,000 strings of 100 characters each)
That looks like somewhat over 4 gigabytes to me. (Remember Java characters are 16-bit things.) So how does that number relate to 300kb?

And why 100-character strings? That makes it sound like you have database records which allow strings up to a maximum of 100 characters and you're caching them all in memory.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you know for sure that GC is the issue here? Try running java with the -verbose:gc option to get some output telling you about GC operations - in particular you can see how often GC is running, and how long it's taking. This can help you confirm or eliminate GC as the source of slowness. The other thing to do, of course, is run your program with a profiler.
 
Bridget Kennedy
Ranch Hand
Posts: 86
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Regarding:

I'm using char[]'s where possible, but I'm thinking that it's a failed attempt.


Keep in mind that char data does not benefit from built in Java String cache memory management. That is to say, Java will only create one copy of new String( "a" ), but it will allocate an infinite number of char 'a' instances.
 
Ken Blair
Ranch Hand
Posts: 1078
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Bridget Kennedy:
Keep in mind that char data does not benefit from built in Java String cache memory management. That is to say, Java will only create one copy of new String( "a" ), but it will allocate an infinite number of char 'a' instances.


Java will allocate a new String everytime you invoke new String("a") too. I believe you are thinking of the constant pool. That doesn't apply to constructing a new String explicitly. One instance will be used everywhere "a" is used as a constant, but if "a" is used as a parameter to a String constructor a new String will still be created. For example, "a" == new String("a") will return false whereas "a" == "a" will not.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But note that your program can force Java to use the String pool with the intern() method.

To return to the original problem, how random are the contents of these 100 character strings? If they are names and addresses drawn from a limited dictionary, you might be able to tokenize the data to a more compact form.
bill
 
u johansson
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There's no special heap for Strings. The heap is for all objects. You should consider how much you have to keep in memory at all times.
 
Rick O'Shay
Ranch Hand
Posts: 531
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do not call the GC directly: that is guaranteed to be of no help whatsoever. The problem is you are running out of memory so you will have to process all of your data in chunks. Read and process in chunks. Memory mapped files help in this regard. This is a fundamental programming problem unrelated to Java per se.
[ December 27, 2005: Message edited by: Rick O'Shay ]
 
uj johansson
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Rick O'Shay:
Do not call the GC directly: that is guaranteed to be of no help whatsoever.


Exactly. It's of special importance that newbies of Java get this. The Java garbage collector is so advanced that you're better off taking no special notice of it at all.

Just let go of objects when you're finished with them. The one thing you should notice is the special termination methods of some classes. Those are classes which allocates operating system resources. You need to call the termination method before you let go of that kind of object. Otherwise you have a true leak.
[ December 31, 2005: Message edited by: uj johansson ]
 
His brain is the size of a cherry pit! About the size of this ad:
the new thread boost feature: great for the advertiser and smooth for the coderanch user
https://coderanch.com/t/674455/Thread-Boost-feature
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!