Willem Kokke

Greenhorn
+ Follow
since Sep 25, 2009
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Willem Kokke

Henry Wong wrote:

J Westland wrote:You then make the substring of long, by saying String sub = long.Substring(start, end); sub is then made by just referring to this really big String. So, we still just have one big String and one substring pointing to a part of it. This is the way Strings can save memory no need to keep making Strings for words you already have.



This is not true. Unless, the start and end are zero, and the actual end of the very large string respectively, you will get a new string back. There is no reference to the same space within the very large string.
Henry



Actually, while the terminology Jawine used might not be 100% exact, the gist of what she says is true. As a speed optimisation, the java.lang.String.substring implementation shares the underlying char buffer, and simply stores an offset and length into the substring.

Substring returns itself if it is passed 0 for beginIndex and length() for endIndex.
For any other valid values it will return a new String object, which shares the underlying char[] (which is where most of the memory is)

You can verify this yourself by reading the code at http://www.docjar.com/html/api/java/lang/String.java.html
specifically look at substring() on line 1941, which calls the private constructor on line 644

So yes, if you create a substring from a very long string, the substring will be a new String instance, referring to the same immutable char[] as the long string, but with an offset and length specified that is taken into account in all other operations on that substring.
For all intents and purposes the memory is shared between them, and whilst the long String object might be garbage collected, the char[] (which has the bulk of the memory associated with it) will not as long as the substring refers to it!

Henry Wong wrote:

J Westland wrote:However...if you says String sub = new String(long.Substring(start, end)); you will get a totally new string that is independent of the String called long that is in the String pool. So now there will be two objects in the String pool, the really long String long, the String sub and on the heap we also have sub as an Object.



This is not true. As before, unless start and end represent the orignal string back, you will get a new string. This new string is passed to the constructor of the string, and yet another string is created. This second new string is then assigned to the sub reference. Furthermore, since no other reference now points to the response from the substring call, it is eligible for garbage collection. And finally, this operation has no affect on the string pool.
Henry



Once again, looking at the source code for the String constructor taking another string (line 163), we see that it makes a an actual copy of only that part of the original string that is needed, so Jawine was right about that. This then does allow all memory associated with the long string to be garbage collected. (as the substring will only contain a char[] large enough for that substring, an no longer a reference to original long strings char[])

You were right about the fact that it does not get added to the string literal pool.

Siva:

In java every object instance is on the heap, so every String instance is as well. However if you have a string literal, or string constant in your source code, java will during compilation gather all of them and pre-create them (on the heap) and store references to them in the string literal pool.

The advantages of this are

  • Duplicates are removed, because strings are immutable they can safely be shared. This means less overall memory usage.
  • You can safely compare them with == instead of the more expensive equals, since they are guaranteed to be the same instance


  • The disadvantage of this is:

  • References to the strings in the string literal pool are kept for the duration of the program. This means they will never be garbage collected. If you have large string literals, that you only use temporarily, this might actually increase memory usage of your program, but this is hardly ever anything to worry about


  • Now if you create a string with new String("bla") it will not be added to the string literal pool, but it will be created as a new object on the heap. This means you cannot compare it with == any longer, you have to use equals() (if content equality is what you are looking for)

    Short recap:

    string literals and string constants will always be referenced in the string literal pool, and wont be duplicated, and can't be garbage collected
    using new String("bla") will always create a new object on the heap, (so can be duplicated) but then can be garbage collected if no longer used.

    One small addition (no idea if it is in scjp, I have never done that) is the String.intern() function.

    If you do want to add a string to the string literal pool at runtime ( so after creating it with new String() constructor ) you can do that with String.intern

    String.intern will return a reference to a string in the string pool if it already exists. Else it will add the current string to the string pool, and return a reference to itself.


    String s1 = "bla";
    String s2 = new String("bla");
    s2 = s2.intern();

    s1 will be in the string literal pool.
    s2 will initially create a new String instance on the heap. There are now 2 string instances containing the value "bla" in memory
    After the intern call, s1 and s2 will point to exactly the same object instance. (and the instance created by the new String("bla") will be garbage collected eventually)

    Hope that helps!

    Willem