Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Speed of string allocation

 
paul wheaton
Trailboss
Pie
Posts: 21895
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(an interesting tidbit I found on a mailing list)
To:"'advanced-java'" <advanced-java@xcf.berkeley.edu>
cc: (bcc: Paul Wheaton/Jeppesen/TMC)
Subject:More on String allocation

As an interesting little followup on the previous email about intern() and
obtaining String addresses I submit 6 little tests (at end of email). The
first three tests just create the same String over and over using the three
different methods: s = new String ("blah"), s = "blah", and s =
"blah".intern(). The second three create different Strings each time. I get
the following results
Elapsed 1: s = new String(blah) 2904
Elapsed 2: s = blah 100
Elapsed 3: s = blah.intern() 581
Elapsed 4: s = new String(blah) 15652
Elapsed 5: s = blah 12618
java.lang.OutOfMemoryError
The difference between the first three indicates that each one is doing
something a little different. So what is it? In both Elapsed 2 and Elapsed 3
each String created should all point to the same address.
In the second set of examples, all creation methods are about equal, except
that the Elapsed 6 consistently runs my JVM out of memory. So what is
intern() doing that is causing such a problem? It can't have to do with the
number of objects being created, since I'm creating the same numbers in all
of the cases.
Interesting stuff (maybe its just late... :-) Can anyone explain this?
TIA,
Chrisitian

----------------- examples --------------
System.out.println ("");
long t1 = new java.util.Date().getTime();
for (int i=0; i<100000; i++) {String s = new String ("Test1");}
System.out.println ("Elapsed 1: s = new String(blah) "+(new
java.util.Date().getTime()-t1));
t1 = new java.util.Date().getTime();
for (int i=0; i<100000; i++) {String s = "Test2";}
System.out.println ("Elapsed 2: s = blah "+(new
java.util.Date().getTime()-t1));
t1 = new java.util.Date().getTime();
for (int i=0; i<100000; i++) {String s = ("Test3").intern();}
System.out.println ("Elapsed 3: s = blah.intern() "+(new
java.util.Date().getTime()-t1));
System.out.println ("");
t1 = new java.util.Date().getTime();
for (int i=0; i<100000; i++) {String s = new String (""+i);}
System.out.println ("Elapsed 4: s = new String(blah) "+(new
java.util.Date().getTime()-t1));
System.gc();
t1 = new java.util.Date().getTime();
for (int i=0; i<100000; i++) {String s = ""+i;}
System.out.println ("Elapsed 5: s = blah "+(new
java.util.Date().getTime()-t1));
System.gc();
t1 = new java.util.Date().getTime();
for (int i=0; i<100000; i++) {String s = (""+i).intern();}
System.out.println ("Elapsed 6: s = blah.intern() "+(new
java.util.Date().getTime()-t1));
System.gc();
 
paul wheaton
Trailboss
Pie
Posts: 21895
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(another tidbit in response to the first)
Elapsed 1 explictly creates a new String which is a copy of a literal value
(also a String, incidentally, but you're explicitly creating a *copy* by
calling
a constructor). Elapsed 2 just assigns a reference to the literal value
(which
is actually a String, as mentioned previously). Since Strings are
immutable,
there's no problem with sharing references to a single String object.
Elapsed 3
takes a String, searches to see if the value of the String has already been
intern'ed (which it will be, after the first iteration), adds it to the
intern
table (probably a big hashtable, though I haven't checked on this) if not
already present, and returns a reference to the one in the intern table.
Your test on these three cases shows that the object creation case is the
most
expensive, the simple reference assignment the least expensive, and the
intern
lookup is in between.
Elapsed 4 is creating a String representing the current value of an int,
concatenating the String to another String (which is empty, but probably
*not*
optimized out by the compiler - you can check the code on this) which is
actually implemented through a StringBuffer (create StringBuffer, copy first
String to StringBuffer, concatenate second String to StringBuffer, turn
StringBuffer value into yet another String), with the final concatenated
String
passed to the constructor for still another String. I count 3 String
instances
plus a StringBuffer instance per iteration, and your time is actually more
than
5 times the base case - higher than I'd expect, but not dramatically out of
line
(especially since there may be additional object creations hidden in the int
to
String conversion).
Elapsed 5 is the same as Elapsed 4, except that after the concatenation is
done
the created String is assigned directly, rather than passed to a String
constructor; there's one less object creation in this case, and the time is
as
expected.
Elapsed 6 creates all these same String instances but also interns the final
ones, meaning there is an entry added to the intern table for each of the
100000
iterations. In previous cases garbage collection could run in the middle of
your test and reclaim essentially all of the memory used (the references to
the
created Strings were thrown away each pass). In this case, up to 100000
Strings
are kept, along with the associated objects used to organize them in the
intern
table. I'm not sure of the actual minimum size of a String object, but I'd
suspect something in the neighborhood of 50 bytes. The objects used to
organize
them in the table are probably about the same, so you're talking about
roughly
10MB permanently taken out of circulation by the time this loop completes.
I
don't know what memory size you were using, but that's a pretty big chunk by
most standards.
Note that in Java 2 they could make the intern table use weak references so
that
Strings could be freed if there were no references to them outside of the
table. This was not possible in older versions, and may not be desirable
even
now that it can be done - it could change the behavior of intern in ways
that
could make old apps break.
 
paul wheaton
Trailboss
Pie
Posts: 21895
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
(another excellent tidbit)
To:"advanced-java@xcf.berkeley.edu" <advanced-java@xcf.berkeley.edu>
cc: (bcc: Paul Wheaton/Jeppesen/TMC)
Subject:Re: String interns

Scott -
The way I read your message was misleading. Allow me to clarify if I may.
The JVM does maintain a hashtable that contains SOME Strings. Specifically
the hashtable should contain:
1) Any String that exists as a constant in your code.
2) Any String that is returned as the result of a String.intern() call.
The call to intern() is an expensive call, I would expect
string0.inter
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic