• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Rob Spoor
  • Tim Cooke
  • Junilu Lacar
Sheriffs:
  • Henry Wong
  • Liutauras Vilda
  • Jeanne Boyarsky
Saloon Keepers:
  • Jesse Silverman
  • Tim Holloway
  • Stephan van Hulst
  • Tim Moores
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Mikalai Zaikin
  • Piet Souris

String + char concat, what is really going on!?

 
Chicken Farmer ()
Posts: 1932
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Take the following two examples of code:

And slightly different:

Both compile fine, and produce "hello".
Doing a javap -c TestConcat on both of these, I get the following results:
  • First example seems to be just storing a "hello" string and passing it to println()
  • Second example is putting "hell" into a StringBuffer and calling append( char ) on the buffer. Then it calls toString and sends that string to println()

  • Now, that's all fine and dandy, but look at the JLS:

    15.18.1 String Concatenation Operator +
    If only one operand expression is of type String, then string conversion is performed on the other operand to produce a string at run time. The result is a reference to a newly created String object that is the concatenation of the two operand strings. The characters of the left-hand operand precede the characters of the right-hand operand in the newly created string.
    15.18.1.1 String Conversion
    Any type may be converted to type String by string conversion.
    A value x of primitive type T is first converted to a reference value as if by giving it as an argument to an appropriate class instance creation expression:
    If T is boolean, then use new Boolean(x).
    If T is char, then use new Character(x).
    If T is byte, short, or int, then use new Integer(x).
    If T is long, then use new Long(x).
    If T is float, then use new Float(x).
    If T is double, then use new Double(x).
    This reference value is then converted to type String by string conversion.
    Now only reference values need to be considered. If the reference is null, it is converted to the string "null" (four ASCII characters n, u, l, l). Otherwise, the conversion is performed as if by an invocation of the toString method of the referenced object with no arguments; but if the result of invoking the toString method is null, then the string "null" is used instead.


    According to this, the examples should instead be creating a new Character( 'o' ) object and then calling toString() on it, passing that string to be concatenated. However, bytecode shows none of this, especially in the second example where a StringBuffer is used. I would expect at least the first example to follow the JLS rules, and show a Character object being used.
    Anyone care to shed some light here?
     
    Bartender
    Posts: 2205
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    In your first example, both literals are compile time constants, that is , their values are known at compile-time, so the compiler just concatenates them together as it's compiling your code. No futher run-time processing is required.
    Your second example is using a compound assingment operator.
    String s1 = "hell" ;
    s1 += 'o';
    At runtime, the assignement is performed. The code that is executed is equivalent to writing:
    s1 = s1 + 'o';
    Since one of the operands is a String, the plus (+) operator is therefore a concatenation operator, so the rules of String concatenation are applied. A String buffer is created, using the contents of the first string, then the character 'o' is appended. The final result is then used to create a new String object, and the reference of the new object is stored in s1, which now points to an String object containing the character string "hello".
    [ April 18, 2002: Message edited by: Rob Ross ]
     
    jason adam
    Chicken Farmer ()
    Posts: 1932
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    15.8.1.2 Optimization of String Concatenation
    An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.
    For primitive types, an implementation may also optimize away the creation of a wrapper object by converting directly from a primitive type to a string.


    How is it that at compile time, since they are constants, it can just concat them together into one string? From the quote above, some string conversion has to go on, doesn't it? It's storing it as a single string, not as a string with a trailing char. Whether it uses a wrapper or not, it can't just stick a char, int, or whatever into the string and consider the new result a string...?
     
    Sheriff
    Posts: 9103
    12
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    javap -c shows
    21 invokevirtual #6 <Method java.lang.StringBuffer append(char)>
    and
    56 invokevirtual #5 <Method java.lang.StringBuffer append(java.lang.String)>

    I guess the big question is, since the JLS says that the primitive will be converted to an Object which toString() will then be called on, why is this not happening?
    [ April 18, 2002: Message edited by: Marilyn deQueiroz ]
     
    Rob Ross
    Bartender
    Posts: 2205
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    The compiler does the work. It probably uses a StringBuffer to do the concatenation. But the end result is that the bytecodes it creates have no mention of any partial string "hell" or a character 'o', it just creates a new String literal with the character sequence "hello", and assigns a reference to that String literal in s1.
     
    Rob Ross
    Bartender
    Posts: 2205
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Originally posted by Marilyn deQueiroz:

    I guess the big question is, since the JLS says that the primitive will be converted to an Object which toString() will then be called on, why is this not happening?


    For primitive types, an implementation may also optimize away the creation of a wrapper object by converting directly from a primitive type to a string.
    Because your compiler choose to optimize the way it implements character concatenation
    StringBuffer.append() is overloaded to take any primitive or an Object as a parameter, so it's the "kitchen sink" approach to string concatenation.
     
    jason adam
    Chicken Farmer ()
    Posts: 1932
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Oook, so the compiler does whatever it needs to get the two concatted together, leaving the bytecode with just a String (in the first case).
    Since we're dealing with an identifier with the second case ( s += 'o' ), the compiler puts the use of the StringBuffer in the bytecode (since s might change at runtime, it doesn't know for sure) for the most optimized output.
    So doing String s = "hell" + 'o' isn't any worse than String s = "hell" + "o" because it's all handled at compile time (instead of worrying about a runtime creation of a new Character('o') object, toString() being called, and then all the concatting).
    Pretty much the jist?
     
    Rob Ross
    Bartender
    Posts: 2205
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Yup, that's it!
     
    Marilyn de Queiroz
    Sheriff
    Posts: 9103
    12
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Thanks for pointing that out, Rob.

    Actually, Jason, not only is
    String s = "hell" + 'o' ; not any worse than
    String s = "hell" + "o" ;
    but actually,
    String s = "hell" + 'o' ;
    is much faster than
    String s = "hell" + "o" ;
     
    Wanderer
    Posts: 18671
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Why would there be any difference in speed, if it's all resolved at compile time?
     
    Marilyn de Queiroz
    Sheriff
    Posts: 9103
    12
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Because
    String += char
    doesn't create all the intermediate objects that
    String += String
    creates. See discussion here

    Try it.

    I got:
    append char: 8713
    append string: 54748
    Your system will probably give you different numbers.
     
    Jim Yingst
    Wanderer
    Posts: 18671
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    I agree with the discussion cited, and the results of your code - but they're not quite relevant to the example you gave, which uses compile-time constant expressions:

    And here's the output of javap:

    If you replace "o" with 'o' you will see the exact same result. Jason's original question contained one example with only compile-time expressions, and another with variables. Rob's answer explained why the expression with constants would be faster than the one with a variable. You gave an example where both expressions were constant, and in that case I'm saying there's really no difference. I agree that there can be (and is) a difference between
    and
    - though if the compiler were being as efficient as it's allowed to be, this difference too would vanish.
    [ April 20, 2002: Message edited by: Jim Yingst ]
     
    Marilyn de Queiroz
    Sheriff
    Posts: 9103
    12
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Ahhhh ... Yes, Jim.
    That's what I get for reading in a hurry. Instead of reading what's there, I read what I think is there.
    [ April 20, 2002: Message edited by: Marilyn deQueiroz ]
     
    reply
      Bookmark Topic Watch Topic
    • New Topic