• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

HashMap value stores complete string

 
Ranch Hand
Posts: 68
Netbeans IDE Firefox Browser Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
During analysis of a heap dump I found that the value in my main HashMap is not a substring as put there, but the complete string and an index pointing at the substring. Is it smart enough to reference the same string or does it clone the string, which would result in excess copies of the same string in memory?

I know from my analysis that the HashMap in question takes up over 15 MB of the heap, but a similar thing happens with the key and it comes from a different string for every different value (approx. 85 different values). By my calculations it should contain less than 5 MB of data in keys and values so where does the remaining 10 MB come from?
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Philip Grove wrote:During analysis of a heap dump I found that the value in my main HashMap is not a substring as put there, but the complete string and an index pointing at the substring. Is it smart enough to reference the same string or does it clone the string, which would result in excess copies of the same string in memory?


Actually, your question has little to do with HashMap and more to do with: Is the result of a substring() a reference to the same String; and the answer is: not quite.

A substring is a separate String object, but (and I'm almost certain of this, but I'm happy to be corrected if anyone knows better) it shares the character array of the original String. Thus, it will take whatever space overhead is associated with an object (≈16 bytes I think), plus internal indexes (2 or 3 ints; I forget), plus the reference of the array itself (4/8 bytes).

HIH

Winston

PS: It's also worth noting that Java characters takes two bytes, not one. Not sure if you took that into account in your calculations.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think Winston got it - now I recall hitting the same problem. Here is the Java 6 substring code:



Note that the "value" here is the existing array



so the new String object keeps a reference to the big array it was derived from!
However, note that the following constructor checks for this situation and makes a new copy of the substring characters:



SO - to get rid of the reference to the big String it looks like

String s = new String( bigstring.substring(......) ) ;

should drop the old big array reference.

Bill

 
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

It's also worth noting that Java characters takes two bytes, not one.



Unless you use -XX:+UseCompressedStrings.
 
Rancher
Posts: 4804
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Unless the characters are from a language that uses 3 or 4 byte code points.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Pat Farrell wrote:Unless the characters are from a language that uses 3 or 4 byte code points.


Which doesn't alter the fact that a Java character is a 16-bit unsigned number. I notice that UseCompressedStrings (which I've never tried) is defined as a 'performance' option, but I wonder if it actually saves anything except space (one article I read suggested that it's 5-10% slower). It's also likely to make space estimation more complex for anything but pure ASCII text.

Winston
 
Chris Hurst
Ranch Hand
Posts: 443
3
Eclipse IDE C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well I've been doing quite a lot of profiling with compressed strings and haven't noticed any difference in terms of latency from the "compression" though I could well believe there is some. My main reason for using it is we are very sensitive to GC and memory usage and that's the key performance issue according to the stats. So I think it is a performance option but you need to know what your applications string usage profile is and the configuration of your garbage collector.

I think with all performance its profile first then optimize.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic