Win a copy of Mastering Corda: Blockchain for Java Developers this week in the Cloud/Virtualization forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Bear Bibeault
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Jj Roberts
  • Carey Brown
Bartenders:
  • salvin francis
  • Frits Walraven
  • Piet Souris

String Constant Pool Behavior -- In Or Out of Scope for 819?

 
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
First I want to just mention defensively that I have debugged hundreds of programs I didn't even have source for by reverse-engineering native code, both at work and for "fun" and occasionally even fixed bugs and re-assembled them (not at work!).

So I think the focus here needs to remain on whether or not this mess is in scope for the 819.

I also want to comment that this is exacerbated by the fact that a lot of people are looking at old preparation materials for earlier SCJP exams, and there have been changes in Java itself in the area.

I believe the standard references on this site don't reflect modern SCP details in this matter, if they do I am legitimately confused.

The code with puzzling behavior which I wish to know if it is in scope or not for the 819 is:


The output is "true", "true".

I saw other certification-seekers commenting on "Why am I getting different behavior???" in an older video that I am sure was true at some point but I think is now out-of-date.
Trying to explain the difference of behavior led me to:
https://stackoverflow.com/questions/65099188/string-intern-string-concatenation-and-string-constant-pool-example-in-java

After reading that, I came to the conclusion "These sad Anoraks should really get a life!!  Nobody needs to know this except the team working on the String code!!"
Then I realized that maybe I need to memorize something about how the Oracle 11 OpenJDK implementation works in this regard in order to consider myself fully prepared for the Spanish Inquisition --- errr, 819 exam.  (I've been confusing the two a lot lately)...

Is why this outputs true/true in scope for the 819?
If so, and, probably, if we say it isn't, why does it do that?  It seems to go against a whole bunch of stuff I'd read about SCP behavior, only some of which I had believed to be outdated.

Cheers,
Jesse
 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I can't resist commenting on my own post.  By 2020 standards this isn't too bad.

I believe that what is going on is that the entire notion that the Internet collectively has echoing due to tons of outdated material being more readily accessible than current stuff, which is to say that SCP and heap entries are disjoint and distinct, is just no longer operative.  Since the time that the SCP moved out of PermGen into the heap (just in time for the PermGen to be wiped out in the next major release)...the notion that you can figure out whether a particular String value came from the SCP or not by just calling intern() on it and seeing if you get a new value back or not doesn't work anymore.

String s3 = s2.intern(); // just adds a reference to the already existing s2 on the heap to some SCP data structure, and returns the same value, so that s3 == s2 ??

For reference, the people that I saw standing around scratching their heads and fretting were thinking that .concat() behavior changed, because they had previously been so conditioned to think that SCP references were inherently disjoint from new String() references.

I saw a really, really old comment from Bert that addressed what was in scope very long ago, which was more authoritative than if President Clinton himself had said so.  But a bit stale.
 
Marshal
Posts: 71682
312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, I got a bit confused too about the constant pool when I saw your first post last night. Have you read anything about interning Strings? This is the method's documentation. Look up the Java® Language Specification (=JLS) link there to find what the JLS says about interning Strings.
I don't know whether you will find anything in the Java® Virtual Machine Specification. I think not. What does the documentation for concat() say? It would appear that your String is in fact interned.
 
Campbell Ritchie
Marshal
Posts: 71682
312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I tried this on JShell:-

jshell
|  Welcome to JShell -- Version 15
|  For an introduction type: /help intro

jshell>  String s1 = new String("excessive");
  ...>   String s2 = s1.concat("detail");
  ...>    
  ...>   String s3 = s2.intern();
  ...>   System.out.println(s2==s3);
  ...>    
  ...>   String s4 = "excessivedetail";
  ...>   System.out.println(s3==s4);
s1 ==> "excessive"
s2 ==> "excessivedetail"
s3 ==> "excessivedetail"
true
s4 ==> "excessivedetail"
true

jshell> s1 == "excessive"
$7 ==> false

jshell> s2 == "excessivedetail"
$8 ==> true

jshell>

I found a very old version of the API; please look whether there has been any change to String#concat() since then.
By the way: The result of calling concat() is not a compile time constant.
 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I feel like I know a lot of true statements that have remained true about the "String Literal Pool" or the "String Constant Pool", as I see it variously called in different posts and articles, even within this forum.  If I understand correctly, the behavior has zero dependence on what you compile with, and solely on which JVM is running the code, what's more, I suspect that this all changed ten years ago and is only confusing people because there is so much stale information about SCP out there in old tutorials and archived discussions and old mock exam questions.

My current belief is that the following statements from the certainly great-at-the-time but possibly no longer quite true tutorial on this site may have become inoperative:

When a .java file is compiled into a .class file, any String literals are noted in a special way, just as all constants are. When a class is loaded (note that loading happens prior to initialization), the JVM goes through the code for the class and looks for String literals. When it finds one, it checks to see if an equivalent String is already referenced from the heap. If not, it creates a String instance on the heap and stores a reference to that object in the constant table. Once a reference is made to that String object, any references to that String literal throughout your program are simply replaced with the reference to the object referenced from the String Literal Pool.

A downstream conclusion from this fundamental explication that I believe is no longer applicable now is:
Strings created at run-time will always be distinct from those created from String Literals.

It is my current belief that the String Constant Pool seems to now be built up during code execution, so that if a String Literal that the JVM is considering adding or looking up in the pool has already been added to the heap and then the SCP by executing user code, it will choose that very one as the reference to add to the String Literal Pool.  In the old days, as this was done before a single line of user code executed, at class loading time, that would never ever happen.

The reason this matters so much (or at all) is that the simple downstream conclusion mentioned both in that article and many other places that exam-preparers are reading is no longer operative.  Instead, it instead now depends on whether the code creating the string that is only created at runtime executes before the line that references the string constant!

I never loved these questions about "how many String objects get created?  Where?" which I see from some sources are indicated to be much more common in preparatory materials and mock exams than on real exams.



Referenced "Was-Great-At-The-Time-But-Confusing-To-New-Exam-Takers-Now" article from this site:
https://javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html
 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That this also yields the same runtime behavior with modern JVM instances may further underscore my point, as the String in question is marked final and therefore eligible to be treated as a Compile Time Constant.  I'm not sure anymore.  But it behaves the same;

 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Wait.  Now I am questioning the following line from the JLS itself!

Strings computed by concatenation at run time are newly created and therefore distinct.
https://docs.oracle.com/javase/specs/jls/se15/html/jls-3.html#jls-3.10.5

"My name is Elmer J. Greenhorn.  I own a mansion and a yacht!"

I guess this is still true, with the edge case exception that if the value of one of these concatenations has intern() called on it before the first line referencing the compile time constant , the SCP will use that intern()'d value as the reference value in the SCP for the rest of the program.  Further calls to .concat() will all produce distinct new values taking up additional heap space for each one.
 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I felt bad that poor concat() was being blamed for this by hapless certification-seekers and befuddled video watchers.
This completely and totally exonerates concat() proving it just got blamed because it is so frequently used on mock exam questions:

 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am going to quote Scott and Jeanne from the section of their book named "The String Pool", Chapter 5, Page 180:

When you write programs, you wouldn't want to create a String of a String or use the intern() method.  For the exam, you need to know that both are allowed and how they behave.

They then provide a neat NOTE:
Remember never to use intern() or == to compare String objects in your code.  The only time you should have to deal with these is on the exam.

I'd be terribly embarrassed that I even started the thread except it is all about intern() semantics and is on the Programmer Certification (OCPJP) thread.
Certification seekers think this issue is in scope for the exam, which it may or may not be, and are very confused about how the code in these examples works due to having absorbed outdated material that abounds all over the internet in every form (if my analysis of what is going on is actually correct).
 
Master Rancher
Posts: 3754
48
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I really can't imagine this issue being in scope for the exam - it would make a horrible question since it's unspecified and JVM-dependent.  And it has remarkably little to do with anything you'd actually need to know as a working Java programmer.  

That is, the basic functionality of intern() may be useful to know about.  The fact that interned strings will point to the same instances as constant strings, even constants loaded from different classes - that's worth knowing, and could be tested on.  But testing on which particular String was created first, and thus which was the one to become the interned one?  That's a very subtle issue, not actually specified as far as I know, JVM dependent, and frankly not important to know anyway.

The thing is, Strings are often useful example objects to talk about, since they are easy to create and easy to print.   But they make horrible examples to talk about garbage collection, because of the intern pool. Similar issues exist for Integer and other wrappers.  Unfortunately, it's easy for people to create programs using Strings and wonder about their behavior, without realizing that they've made something too complicated to be a useful exam question.  It's not worth worrying about this sort of issue on the exam - there many other topics more useful to study.
 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for your reply, Mike!!

I hope that everyone writing mock exam questions, and even more so, real ones, going forward reads your reply.

I would still maintain that some very commonly referenced written and video tutorial sources relate no-longer-accurate details on the behavior of the SCP, especially on when exactly it is built, and I encountered a good number of individuals trying to prepare for the exam scratching their heads over this, none of whom ultimately figured out what was going on.  I've seen some newer information sources that say "Well, it moved from method area to heap but everything else is still the same."

I posted the link to this thread in places where others confused by stale exam prep material voiced their confusion and uncertainty.

I certainly agree there are a million other things more useful to learn for the 819, I'll get back to those now!

Thanks again!
 
Campbell Ritchie
Marshal
Posts: 71682
312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:Wait.  Now I am questioning the following line from the JLS itself!

The JLS is correct. It isn't some sort of magic formula whereby an effect takes place. It is an instruction book; all Java® implementations (or at least all those Oracle find out about) must comply strictly with the JLS and Oracle may test them for compliance.

Strings computed by concatenation at run time are newly created and therefore distinct.
https://docs.oracle.com/javase/specs/jls/se15/html/jls-3.html#jls-3.10.5

Yes. that means created with the concatenation operator +. I don't think it means a method call, not even to concat().

"My name is Elmer J. Greenhorn.  I own a mansion and a yacht!"

My name is Campbell Ritchie. I burnt down Greenhorn's mansion and sank his yacht.

. . . edge case exception that if the value of one of these concatenations has intern() called on it before the first line referencing the compile time constant , . . . .

There is nothing “edge case”‑y about that. It is a standard part of the specification for Strings and intern().
You are changing between String literals and String compile‑time constants. This automatic interning applies to all String compile‑time constants, not only to literals.

Further calls to .concat() will all produce distinct new values taking up additional heap space for each one.

How do you know? I couldn't find anything in the documentation for concat() to suggest that. Nor to deny that. It simply didn't say anything.
Don't speculate too much about when this interning takes place. Did you find anything about its timing in the JLS? If not, you will have to regard its timing as undetermined. If it says, “before XYZ,” that doesn't add anything we didn't know already. I don't know where you will find anything but if you read JLS §12.5 you might find something to your advantage. As MS said, classes can be loaded in different orders in different runs of apparently the same program (particularly if you are multi‑threading), so the order in which String objects are created is unpredictable and undefined. I agree that it is difficult to set questions about things platform‑dependent or undefined. Just be careful: sometimes the correct choice is the answer saying, “undefined.”
Line 13, “unlucky for some”: you sometimes can't tell the difference between something happening at class loading time and at method execution time. MS alludes to that, too. You can make such a discrimination in this case because you have a method call. But surely execution and class loading both occur at runtime.

You are a bit disparaging about online tutorials. I think many online tutorials are poor quality. I ought to rope you in to assess some of those tutorials and record which are any good. I think there will only be a few.
 
Campbell Ritchie
Marshal
Posts: 71682
312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You have all those String objects declared as local variables. That probably alters their semantics.
 
Jesse Silverman
Ranch Foreman
Posts: 175
8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
All of the places, tutorial or mock question that I've seen talking about intern() and asking you to count objects by looking at the result of == comparisons seem to exclusively use local variables.

I didn't try any experiments with instance variables, class variables, static initializers, etc. because that definitely seemed out of certification scope.

I think the point I was making was that tutorials that are not of generally poor quality still give misleading or wrong advice in this area.  Before gravitating to JavaRanch/CodeRanch, I got burned a number of times by looking at poorly vetted/curated content.  Even some of the better stuff has been kept online for ages with no updates.

I think that the consensus so far (of two) is that this is way out of scope for the 819, tho?
I already know the authors consider any use of .intern() in code to be bad style as quoted above.

When first learning about .intern() it sounded great to me, because I have seen so many terabytes of dumps of huge native code images filled with an insane number of repetitions of the same relatively small number of String objects, like 25% of the whole memory or maybe more...but the S&B book (J&S?) has warned me away from that attractive nuisance.  I guess the hazard of winding up with a whole bunch of String objects you thought you were going to need many of for a long time bloating the SCP means it just isn't worth the risk.
 
Marshal
Posts: 26290
80
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
On the other hand: a long time ago I remember reading about a certain Java product, which I think had something to do with XML parsing. One release of the product apparently used the intern() method extensively, specifically to avoid the issue of having many many duplicate String objects in the heap. But this had the possibility of turning into a sort of DOS attack when you parsed a very large XML document and basically all of the bits and pieces of the document became interned String objects.

After searching the web a bit: this SO question from about a decade ago seems to be related.
 
Campbell Ritchie
Marshal
Posts: 71682
312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:. . . asking you to count objects . . .

Counting objects, except in very small programs, is a very error‑prone activity. It is also not totally consistent with the philosophy of OO programming. Objects should take care of themselves rather than having something else counting them.
 
Rancher
Posts: 4784
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But he's not talking about real life...he's talking about the exams, at least the mock ones.
 
Campbell Ritchie
Marshal
Posts: 71682
312
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, you do get exam questions about how many objects exist or how many are eligible for GC by line 14, but those are always very small programs because they have to fit on the exam screen.
To come back to what I wrote last night:-
  • 1: Does JLS §12.5 as mentioned yesterday say anything about the order of loading of String compile‑time constants with only local scope?
  • 2: Is there another part of the JLS specifying that order?
  • 3: Does the JLS define the timing or order of interning those Strings at all?
  • If the situation is so vague, it is impossible to write a question about it and give that question correct “right” and “wrong” answers, as MS said.
    Remember the official name for a compile‑time constant is constant expression and the JLS always uses that term.
    Have we worked out why JS got true in his first post when he used concat()?
     
    Enthuware Software Support
    Posts: 4501
    44
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator


    I believe that what is going on is that the entire notion that the Internet collectively has echoing due to tons of outdated material being more readily accessible than current stuff, which is to say that SCP and heap entries are disjoint and distinct, is just no longer operative.



    +1 to this.

    The issue is not with the concat method. The issue is with the intern method (which is native). The JLS 11 says in Section 3.10.5, "Strings computed by concatenation at run time are newly created and therefore distinct." As demonstrated by the code shown above, this is not happening. I tried the code using older java versions going as far back as Java 7, and it still produced true true.

    IMHO, this topic should be excluded from certification. Questions that hammer down the point that Strings should not be compared with == should be good enough.
     
    Campbell Ritchie
    Marshal
    Posts: 71682
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I think that when that JLS section says, “concatenation,” I think it only means use of the + operator. I think the concat() method doesn't count, as I said yesterday.
    Agree: a question on such code would be unfair, even if one of the options is “undefined”.
     
    Paul Anilprem
    Enthuware Software Support
    Posts: 4501
    44
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:I think that when that JLS section says, “concatenation,” I think it only means use of the + operator. I think the concat() method doesn't count, as I said yesterday.
    Agree: a question on such code would be unfair, even if one of the options is “undefined”.


    Right. Methods don't count anyway because they are not "constant expressions". It should, therefore, produce a non interned new String object. That's why I said that the issue is not with concat. It is the intern method that is not sticking to the spec.
     
    Jesse Silverman
    Ranch Foreman
    Posts: 175
    8
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Campbell:

    I don't disagree about counting.  It is in the class of "things just for the cert exam" along with i = i++; and predicting the behavior of complex nested loops with labelled continue and breaks that violate every rule of clear safe coding.  I would also throw in the complex expressions that rely on having the whole table of precedence for all operators memorized to predict, like we are war-rationing parenthesis.  Probably also many of the almost-pointless uses of multiply nested inner classes, having local, this and superclass data members shadowing and hiding each other, variables named var, using reference variables to access static classes etc. which would hopefully all fail code review and never get checked into code...

    Of course, I am Desperately Seeking Certification, so all those things matter at least until then.

    Maybe the irony is that the core thing they want you to know is "don't use == for values equality of wrapper classes and string" which I have seen in Production Commercial code too many times to count.
     
    Jesse Silverman
    Ranch Foreman
    Posts: 175
    8
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Hi Paul A.:

    You get what I was driving at, and why I provided an example after that was free of both + and .concat(), despite all the original confused certification seekers being sure that .concat() was to blame (which at first had me thinking that too).

    I believe that the +/.concat() free examples I showed demonstrated that unlike in the old days, where the SCP was built up at load/init time in PermGen, 100% of all String objects (at least local ones, tho I suspect the others too) are just in the heap, with whatever data structure holding the references to the constants and other .intern()'d strings on the heap being built up only as the first line of code referencing each of them executes.

    So it is now possible, (just recently since Java 7!!) to .intern() something you input from the external world or build yourself (with or without .concat() or +)  and to have THAT specific value be pulled from the SCP on even the very first line containing it (as the value of a constant expression) is executed during code execution.

    You realize how picky I am about details, because I've spent decades fixing code of other people who decided "my code is just fine, but it isn't working, so go fix it".  In this case, while the bigger problem is that people are still innocently reading user-level tutorial materials causing them to think the SCP is separate from the heap in PermGen/method area, even those who do not labor against this misconception "know" that all compile time constant string expressions are placed into the SCP first, before any of the code in the methods of their class is executing.  I believe that the demo programs conclusively demonstrate that is no longer the case.

    I was calling it an edge case, because "normally" you are very likely going to usually see a "Compile Time Expression" either by itself or as part of the first reference to any String object, e.g. new String("Compile Time Constant");
    There are tutorial examples that demonstrate contrary behavior, and mock exam questions with contradictory "Correct" answers that are still floating around out there and being accessed that demonstrate the Java 6 behavior of what I am still calling this rare edge case, and confused students who are mis-concluding that the behavior of .concat() has changed.

    There are certainly other changes that invalidate old tutorial and old mock test materials, such as effectively final, auto-boxing/unboxing, covariant return types, etc. -- but these all seem to have relatively very high awareness that they are no longer applicable to current Java.

    You operate very close to what I consider to be the second most important cache of material second only to the actual pool of actual real Oracle exam questions, so I am super-happy to see you on this thread.
     
    Paul Anilprem
    Enthuware Software Support
    Posts: 4501
    44
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesse Silverman wrote: In this case, while the bigger problem is that people are still innocently reading user-level tutorial materials causing them to think the SCP is separate from the heap in PermGen/method area, even those who do not labor against this misconception "know" that all compile time constant string expressions are placed into the SCP first, before any of the code in the methods of their class is executing.  I believe that the demo programs conclusively demonstrate that is no longer the case.


    But a plain reading of the spec does seem to suggest that SCP is separate from the regular heap. May be someone who has more knowledge about the JVM implementation can tell why the the implementation deviates from the spec.
     
    Jesse Silverman
    Ranch Foreman
    Posts: 175
    8
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Paul C:

    Thanks for your post, too!  I followed your link and I now see that smart people might think that .intern() can help them, but that it is very unlikely to be the right way to get what they want, even if there is a legitimate reason to not want ten million copies of large strings to be bloating their heaps.  It cures my "but, but...." reaction to the blanket advice to pretty much forget that .intern() exists except when taking or writing real or mock exams.
     
    Campbell Ritchie
    Marshal
    Posts: 71682
    312
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesse Silverman wrote:. . . don't use == for values equality of wrapper classes and string . . . .

    I think it's better to follow Winston Gutkowski's suggestions and have a list of situations where you should use ==:-
  • 1: If either operand is a primitive type.
  • 2: If the operands are elements of an enum.
  • 3: If either operand is null.
  • 4: To test whether you are comparing this to itself when overriding equals().
  • I can't think of any more just at the moment; look on that list as excluding other circumstances.
     
    Campbell Ritchie
    Marshal
    Posts: 71682
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesse Silverman wrote:. . . ten million copies of large strings to be bloating their heaps. . . . .

    If those Strings aren't interned, why aren't they eligible for GC?
    Don't copy Strings. It is usually unnecessary.
    It used to be advisable to copy Strings created with the substring() method for that selfsame reason but the Java7+ version of substring() uses a different method of creating the returned String which dissociates the source and result so that problem no longer applies. Look, no commas
     
    Jesse Silverman
    Ranch Foreman
    Posts: 175
    8
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I came from an environment where I was in a mixed, garbage-collected and native world where the whole mess was my problem.

    There were enormous numbers of strings (mostly native and therefore not immutable) which I guess were copied, or more likely read in verbatim from database or online source and filled a huge part of the heap, and were a large part of the memory pressure on the application.  When we were supporting 32-bit there were business cases that literally were in great danger of crashing due to hitting 32-bit process size limits.  In 64-bit builds, we still had bloated heaps that undermined all the provided memory caching facilities.  I hadn't designed the architecture that kept that amount of data in memory simultaneously, but once present it would have been a nightmare to change.

    I admit I am hopefully converting to a more pure-Java world where everything (except maybe String constants and byte/short/int caches) is garbage-collected, but an application that reads in 100 million large long-lived strings from a database, of which 80 million are just tons of pointless equivalent independent instances holding the same contents seems to still be a plausible reality, even if more thought is given at an architectural level of "do I really need all this in memory at once?"

    I care about this *much* less than learning the rest of the stuff that used to be on the 816, is now on the 819, and is more important by far than how many copies of each identical String value are populating the heap, but it still seems to "be a thing" for large applications with large numbers of similar large strings being read in from database, webapi or wherever string objects are mass-created from.  So , just saying don't do lots of unnecessary new String() calls wouldn't really address that, tho it is obviously good advice.  Probably managing one's own pooling for such specific categories of String objects would make sense there, rather than trying to lean on .intern() maybe?

    That only explains why back when I mistakenly thought the SCP applied to all String objects (and not just literals) I thought it was a good, rather than disastrous thing.  In my mind, since Strings were immutable, deprecating new String() in favor of some efficiently managed, garbage-collected String Pool would have seemed to be a great thing.  It makes no difference to small programs, or ones that do not contain a large number of large, long-lived, duplicate value String objects being materialized from some outside source.  I see that it could easily do more harm than good.  [The misapprehension about it applying to "all String objects" was conveyed by well-intentioned but poorly presented tutorials on SCP, often simply referred to as "String Pooling".]

    On the other hand, I opened this thread because I was seeing a lot of confusing and apparently wrong advice meant to be useful preparatory material for the OCJP 819 exam, which they expected to grill exam takers on the behavior of the SCP and .intern() in terms of reference equality of objects -- one more instance of the OCJP perhaps lending focus to something contrary to the goal of being a good Java developer.  Even now I think it is reasonable to want to know whether the SCP is just a data structure of some sort sitting on top of and referencing a common pool of all String objects in the heap, or whether the SCP and "normal heap" String objects are disjoint and separate, as many believe.  Not super-important compared to so many other things, like the fate of Serializable and what best replaces it, etc. etc. etc., but still a thing that it might be good to know.  It seems that there might be an answer based on what 90%+ of Java 8 thru 15 JVM instances are running in the world, that can not be determined or ascertained from even the most careful reading of the JLS.  I am way more interested in learning the Java 8+ JODA-style date/time API's, and everything that is in scope for the 819 (which somehow dropped THAT but still has us thinking about String Constant Pooling??)

    Going back to THAT now.

    Thanks for your tireless work in representing the "We want to be creating great Java programmers, not reinforcing bad behavior in pursuit of passing certification exam scores" point of view.

    Jesse
     
    Campbell Ritchie
    Marshal
    Posts: 71682
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    But in Java® everything is subject to GC except the pools/caches you mentioned. And there is only native code if you specify native code. That takes away a lot of the things you used to have to worry about.
    This constructor isn't deprecated, but it does say it is hardly ever necessary to use it.

    Thanks for your tireless work in representing the "We want to be creating great Java programmers . . .

    What a nice thing to say Thank you.
     
    Paul Clapham
    Marshal
    Posts: 26290
    80
    Eclipse IDE Firefox Browser MySQL Database
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:If those Strings aren't interned, why aren't they eligible for GC?



    Could be because there's a big data structure which contains them all. For example a giant XML document containing millions of nodes named "Data"; the name of the node will then be in memory millions of times. None of them eligible for GC because the document is stored in a tree structure.
     
    Ranch Hand
    Posts: 91
    Eclipse IDE Debian Java
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesse Silverman wrote:I feel like I know a lot of true statements that have remained true about the "String Literal Pool" or the "String Constant Pool", as I see it variously called in different posts and articles, even within this forum.  If I understand correctly, the behavior has zero dependence on what you compile with, and solely on which JVM is running the code, what's more, I suspect that this all changed ten years ago and is only confusing people because there is so much stale information about SCP out there in old tutorials and archived discussions and old mock exam questions.

    My current belief is that the following statements from the certainly great-at-the-time but possibly no longer quite true tutorial on this site may have become inoperative:

    When a .java file is compiled into a .class file, any String literals are noted in a special way, just as all constants are. When a class is loaded (note that loading happens prior to initialization), the JVM goes through the code for the class and looks for String literals. When it finds one, it checks to see if an equivalent String is already referenced from the heap. If not, it creates a String instance on the heap and stores a reference to that object in the constant table. Once a reference is made to that String object, any references to that String literal throughout your program are simply replaced with the reference to the object referenced from the String Literal Pool.

    A downstream conclusion from this fundamental explication that I believe is no longer applicable now is:
    Strings created at run-time will always be distinct from those created from String Literals.

    It is my current belief that the String Constant Pool seems to now be built up during code execution, so that if a String Literal that the JVM is considering adding or looking up in the pool has already been added to the heap and then the SCP by executing user code, it will choose that very one as the reference to add to the String Literal Pool.  In the old days, as this was done before a single line of user code executed, at class loading time, that would never ever happen.

    The reason this matters so much (or at all) is that the simple downstream conclusion mentioned both in that article and many other places that exam-preparers are reading is no longer operative.  Instead, it instead now depends on whether the code creating the string that is only created at runtime executes before the line that references the string constant!

    I never loved these questions about "how many String objects get created?  Where?" which I see from some sources are indicated to be much more common in preparatory materials and mock exams than on real exams.



    Referenced "Was-Great-At-The-Time-But-Confusing-To-New-Exam-Takers-Now" article from this site:
    https://javaranch.com/journal/200409/ScjpTipLine-StringsLiterally.html



    I second Jesse Silverman. What he said makes much more sense to me. Thank you.
     
    Jesse Silverman
    Ranch Foreman
    Posts: 175
    8
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    In terms of inherent interest, rather than OCJP 819, I find the following much more interesting than the relatively limited topic of String Pooling.
    I made no mention of it before and won't talk about it again because it seems to me to be clearly and firmly outside the scope of the exam, but it is impossible to look at the String code, as I was trying at first to tell what was going on with this posted question without realizing something huge had happened post Java 8 in terms of internal representations of String objects.
    The most important aspect of this to me is that it is abundantly clear that the team steering Java is still constantly striving to improve both performance and functionality in ways that do not disturb existing cod at all:
    https://www.baeldung.com/java-9-compact-string
    Fascinating but off-topic for the exam, and unnecessary for those who are just trying to write legal, correct Java programs that work correctly.  But boy does the code for String look different these days...
     
    Campbell Ritchie
    Marshal
    Posts: 71682
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesse Silverman wrote:. . . off-topic for the exam, and unnecessary for those who are just trying to write . . . programs . . .

    . . . because the String compression feature is like clean glass. It is transparent, and the user doesn't notice any difference. That is how a program enhancement should look 
     
    Jesse Silverman
    Ranch Foreman
    Posts: 175
    8
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    All of this has fallen out of scope for me, as I have accepted a C++/C#/Python job.

    I just wanted to add this tidbit that helps answer the musical question:
    "Why do generally fairly competent presenters leave a misunderstanding about the extent of String Pooling in Java?  They don't seem to be generally ignorant."

    The ones I saw doing this are bilingual in Java and Python, and I think their Python knowledge cross-contaminated their understanding/recall of Java behavior.

    Evidence submitted -- note that completely unlike Java, Python will happily show you the address of objects on the heap.  I am unaware of the full implications of this, probably it is a historical accident that should be avoided, but anyway, see how if you were doing Python all day you might, as people preparing material I had read or watched, imagine String pooling that will never happen in Java without an explicit .intern() call:


    You would never see the equivalent of the last two occur in Java, it is probably down to the difference of interpreted versus compiled-interpreted, but anyway, I'll never make this mistake in Java again, I like to believe, but now I see why I was - people presented String Constant Pooling in Java as doing more than it does, and I believe it was due to them using a lot of Python at the same time.
     
    Campbell Ritchie
    Marshal
    Posts: 71682
    312
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesse Silverman wrote:. . . the difference of interpreted versus compiled-interpreted, . . .

    No, it is the existence of a mechanism for displaying addresses. Please confirm whether id() shows you the memory address.

    This link suggests that is in fact the case. The runtime would seem to intern all Strings automatically.
     
    Mike Simmons
    Master Rancher
    Posts: 3754
    48
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Python isn't interning all strings - but in some situations, it can recognize that the new string is the same as an existing string.  Either because an operation is recognized as not changing the original string, or for other reasons.  Java does this too, sometimes.


    While v1 + "" and "" + v1 are both recognized as not changing the original string, the more complex expression v1[0:3]+ v2[3:] returns a new string - even though its contents are the same as v1 or v2.  Similarly, while 3*"Woof" evaluates to the same string as v1, 3*v does not, even though I've set v to "Woof" so 3*v is also "WoofWoofWoof".  To me this feels like Java's rules for compile-time constant expressions - if an expression is all constants, the result is also a constant, and interned if it's a String.  But if you use variables or other complex expressions, no.
     
    Jesse Silverman
    Ranch Foreman
    Posts: 175
    8
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    It is mildly annoying that Python uses the keyword "is" operator to tell if two references point to the same identical instance, and == tests for content equality.

    Well, by itself it is fine, but it gets annoying when you jump back and forth between Python and Java.

    I think the real point that Sun and Oracle really meant to drive home the whole time was just "Don't use == to compare strings!" but you would never know that from some of the mock exam questions.

    The value returned for id() is way longer on 64-bit Python than on 32-bit.
    I don't know if that means that it is an actual address or that the number of possible objects is just much larger, I understand that there is major benefit in never knowing the exact value so that garbage collector can go to town cleaning up the heap without disturbing the application.

    They changed a lot in Python 3 since they were willing to break compatibility, but id(my_reference) is still there in 3.x

    It is interesting to see the many similarities and handful of differences in how Java and Python treat constant pooling, but less interesting than learning how to use each effectively and safely to create readable, correct programs that run pretty well.

    That lesson carries over 100%

    I am unimpressed that Python happily allows us to mix boolean, float, integer, str and other objects in a list, but it is less surprising in a dynamic language than someone who arrived late to the party to Java and sees a language with Strong Static Typing letting you do the same thing, Lector Emptor (retrieve elements very carefully).

    I still support treating the fact that you can throw all sorts of random junk into a List in either one of them as closer to a bug than a feature.

    Everyone should know you can do it, but that is hardly a reason to celebrate it [I have seen tutorial material celebrating this as some kind of cool neat great thing in both languages -- ugh!]

    Thanks for the lessons that carry over 100% to the other languages I'll be focusing on for a while.
    I'd say the two things that make me look forward to my focus returning to Java at some undetermined point in the future to be secondly, streams and try-with-resources, and firstly, the presence here of you guys.

    When I get to the point in Python that I believe common reference material regarding stuff I am doing is confused about or deficient in their explanations in, I will be sure to land in the appropriate forum here on Java--err--CodeRanch.
     
    Mike Simmons
    Master Rancher
    Posts: 3754
    48
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesse Silverman wrote:It is mildly annoying that Python uses the keyword "is" operator to tell if two references point to the same identical instance, and == tests for content equality.

    Well, by itself it is fine, but it gets annoying when you jump back and forth between Python and Java.



    I would call that a problem with Java, rather than Python.  I mean, ultimately you need to have the ability to check both reference equality (object identity) and data equality, in both languages.  But the need for checking data equality is far more common.  Python made the sensible choice to use a nice simple symbol for that.  And another simple keyword for the other, less-often-needed operation.  Java, on the other hand, used a simple symbol for the one you rarely need, and the notably longer and error-prone equals() method for data equality.  Bad choice.  I think Python did the right thing here.

    For comparison, Scala and Kotlin uses == for data equality, and === for reference equality.  Also a good choice.  Java is the one that got this wrong.

    Note: By "error-prone" I mean the fact that equals() can easily throw a NullPointerException.  It takes more work to guard against this.  The introduction of the Objects.equals() method helps... but it's still notably less readable than the simple == used in Python, Scala, and Kotlin.  (And others, I'm sure.)

    Jesse, congratulations on the Python gig.  Don't be a stranger!
     
    reply
      Bookmark Topic Watch Topic
    • New Topic