Win a copy of Spring Boot in Practice this week in the Spring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Liutauras Vilda
  • Henry Wong
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Al Hobbs
  • Carey Brown
Bartenders:
  • Piet Souris
  • Mikalai Zaikin
  • Himai Minh

String vs StringBuffer vs StringBuilder and String.replaceAll vs String.replace

 
Ranch Hand
Posts: 213
jQuery MySQL Database PHP
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As I know:
String is immutable.
StringBuffer is synchronous (slow).
StringBuilder is asynchronous (fast).
As far as I understand, when performing the string concatenation with the "plus" operator, when the compilation will become StringBuilder. For example:

Will become:

Also according to this question: Difference between StringBuilder and StringBuffer. Performance will be String < StringBuffer < StringBuilder.
But in my test, StringBuffer is faster than StringBuilder. This is the code I use:

Result:

String replace is: 2ms
String is: 3300ms
StringBuffer is: 4ms
StringBuilder is: 6ms


Why is StringBuffer faster than StringBuilder in this case?
_______________________________________________________________________________
The next question, according to these two questions: Replace all occurrences of substring in a string - which is more efficient in Java? and String.replaceAll is considerably slower than doing the job yourself.
As can be seen, String.replaceAll() and String.replace() both use internal regex and String.replace() will always generate a new string every time it is called. And on top of that, all 2 deliver poor performance, although String.replace() is still rated as delivering better performance than String.replaceAll() - not to mention third-party libraries like StringUtils.replace() or StringBuilder.replace() (uses start-end and no string replacement with strings).
So, that seems to be all the differences between these two methods? In simple replacement cases (as shown in the example above), which method will bring better performance?
According to this answer: Apache StringUtils vs Java implementation of replace()
It seems that the performance between String.replace() and StringUtils.replace() is nothing difference?  
 
Rancher
Posts: 4252
57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tan Quang wrote:Why is StringBuffer faster than StringBuilder in this case?


I don't believe it is.  First, your test case is far too short to measure a meaningful value.  You should run the code many more times to get meaningful results.  Second, when I run the code myself, I get faster results for StringBuilder, not StringBuffer.  Maybe it depends on which JDK version you're using.  If you're still using  JDK 8 or whatever, it may be that there was some bug in the  code that was fixed a long time ago - I don't know.

Tan Quang wrote:The next question, according to these two questions: Replace all occurrences of substring in a string - which is more efficient in Java? and String.replaceAll is considerably slower than doing the job yourself.
As can be seen, String.replaceAll() and String.replace() both use internal regex


No.  String.replace() doesn't use any regular expression.  It treats the first argument as a literal.  No regex involved.  You may be confusing this method with other methods like replaceAll() and replaceFirst().  They did a bad job naming these methods, because they're not consistent.  I always have to double-check the documentation to see which ones use regex and which don't.

Tan Quang wrote:and String.replace() will always generate a new string every time it is called.


So will replaceAll().  So will any String method, pretty much.

Tan Quang wrote:And on top of that, all 2 deliver poor performance


Compared to what?  They don't do the same thing. Getting the optimal performance out of code with multiple replacements can be complicated - not every method is optimized for all situations.

Tan Quang wrote:although String.replace() is still rated as delivering better performance than String.replaceAll()


Probably because it doen't use any regex.  If you don't need regex, don't use it, it slows you down for simple cases.  But if you need it (and it can be very useful), then use it.

You also have a test case at the beginning that is labeled as testing String replace(), but the code is using replaceAll().  It's also doing completely different things than your other test cases, so it doesn't meaningfully compare to anything else.
 
Saloon Keeper
Posts: 25843
184
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tan Quang wrote:
StringBuffer is synchronous (slow).
StringBuilder is asynchronous (fast).



No. StringBuffer is synchronized, not synchronous. That is, its services are managed by a lock so that it is thread-safe. StringBuilder is the same as StringBuffer except that its services do not implement the Java synchronized locking. Synchronous and asynchronous are better off used to describe threads than resources.

StringBuilder is faster than StringBuffer because the act of testing, acquiring and releasing a lock is extra logic that StringBuilder does not have. Less logic, faster time for the same overall algorithm, as a rule. As Mike said, your sampling size isn't large enough to show the difference accurately, though.

Where the StringBuffer gets a lot slower is when multiple threads are all trying to use the same StringBuffer at the same time and perforce some threads must sleep while the current thread uses the resource.

As to why StringBuffer and Vector were implemented with blocking abilities and StringBuilder and java List came later, that's a mystery to me. Perhaps they were used in that capacity somewhere in the core JVM.
 
Mike Simmons
Rancher
Posts: 4252
57
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:Where the StringBuffer gets a lot slower is when multiple threads are all trying to use the same StringBuffer at the same time and perforce some threads must sleep while the current thread uses the resource.


That's pretty rare, though - why would anyone need to do that?  Though if they do, then (a) they do need some sort of locking, but (b) StringBuffer probably isn't doing the locking at the right level of granularity.

Tim Holloway wrote:As to why StringBuffer and Vector were implemented with blocking abilities and StringBuilder and java List came later, that's a mystery to me. Perhaps they were used in that capacity somewhere in the core JVM.


I think they were excited about the idea of having "synchronized" as a catch-all solution to threading problems, and thought it was a good idea.  Over time they realized that the implementations in StringBuffer and Vector were badly thought out, as far as thread safety, allowing people to think they were writing "thread-safe" code when in fact they were not.  Incidentally they also made the code slower.  So they promoted new non-thread-safe versions instead.  When people need thread safety, they're better off putting it in themselves, usually at a higher level.
 
Tim Holloway
Saloon Keeper
Posts: 25843
184
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:
That's pretty rare, though - why would anyone need to do that?



Beats the heck out of me. I suppose there would be a benefit if you had one big character buffer that you built all your strings in and didn't want the overhead that making a new buffer or builder for each String creation would require in a multi-threaded environment. Other than that… ???
 
Mike Simmons
Rancher
Posts: 4252
57
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yeah, I have hard time imagining a situation where that would ever work, really.  Aside from the fact that you probably wouldn't see any performance benefit at all even in a single-threaded case, the synchronization of StringBuffer is at the wrong level.  One thread could be calling

while another is calling

and the result could be "Hello Goodbye, Blue SkyWorld!"   Which goes back to why I hate StringBuffer and Vector, they synchronize at too low a level to be useful in most cases.  You can occasionally share a mutable Hashtable in a thread-safe manner without additional synchronization.  Maybe.  But even that is rare.
 
Tan Quang
Ranch Hand
Posts: 213
jQuery MySQL Database PHP
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:I don't believe it is.  First, your test case is far too short to measure a meaningful value.  You should run the code many more times to get meaningful results.  Second, when I run the code myself, I get faster results for StringBuilder, not StringBuffer.  Maybe it depends on which JDK version you're using.  If you're still using  JDK 8 or whatever, it may be that there was some bug in the  code that was fixed a long time ago - I don't know.


You are right, I don't test them on my computer, I tested them with online compiler tools: jdoodle (JDK 17.0.1), tutorialspoint (unknown JDK version), programiz (unknown JDK version), and onlinegdb (unknown JDK version).
Results: StringBuilder faster StringBuffer on jdoodle (JDK 17.0.1) and tutorialspoint (unknown JDK version) and give the opposite results on programiz (unknown JDK version) and onlinegdb (unknown JDK version).

No.  String.replace() doesn't use any regular expression.  It treats the first argument as a literal.  No regex involved.  You may be confusing this method with other methods like replaceAll() and replaceFirst().  They did a bad job naming these methods, because they're not consistent.  I always have to double-check the documentation to see which ones use regex and which don't.

So will replaceAll().  So will any String method, pretty much.

Compared to what?  They don't do the same thing. Getting the optimal performance out of code with multiple replacements can be complicated - not every method is optimized for all situations.

Probably because it doen't use any regex.  If you don't need regex, don't use it, it slows you down for simple cases.  But if you need it (and it can be very useful), then use it.

You also have a test case at the beginning that is labeled as testing String replace(), but the code is using replaceAll().  It's also doing completely different things than your other test cases, so it doesn't meaningfully compare to anything else.


As far as String.java source code, lines 2209 and 2227, String.replaceAll() and String.replace() all use replaceAll() inside, the difference is that String.replace() uses Pattern.LITERAL for the first argument.
So, in the case of replacing all instances where strings appear, in strings that don't contain regex references (\d, \s,...) as in my example code, String.replace() would give better performance than String.replaceAll() right? And if true, comparing String.replace() together with StringUtils.replace() which is better?
In case multiple strings need to be replaced as in the example code, String.replaceAll() and String.replace() will do better than StringUtils.replace(). I haven't thought of a StringUtils.replace() implementation with this case! Perhaps create an array of strings to be replace and an array of strings to replaced, then use StringUtils.replaceEach() (but it is not StringUtils.replace())?
 
Tan Quang
Ranch Hand
Posts: 213
jQuery MySQL Database PHP
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:No. StringBuffer is synchronized, not synchronous. That is, its services are managed by a lock so that it is thread-safe. StringBuilder is the same as StringBuffer except that its services do not implement the Java synchronized locking. Synchronous and asynchronous are better off used to describe threads than resources.

StringBuilder is faster than StringBuffer because the act of testing, acquiring and releasing a lock is extra logic that StringBuilder does not have. Less logic, faster time for the same overall algorithm, as a rule. As Mike said, your sampling size isn't large enough to show the difference accurately, though.

Where the StringBuffer gets a lot slower is when multiple threads are all trying to use the same StringBuffer at the same time and perforce some threads must sleep while the current thread uses the resource.

As to why StringBuffer and Vector were implemented with blocking abilities and StringBuilder and java List came later, that's a mystery to me. Perhaps they were used in that capacity somewhere in the core JVM.


Yes, you're right, I was confused between synchronized and synchronous! And it is used in the core of JVM, I'm not sure...
 
Saloon Keeper
Posts: 14096
319
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As I already mentioned in one of your other threads, you may not draw conclusions from any benchmark written using currentTimeMillis(), nanoTime() or any timer classes.

If you didn't use a microbenchmark framework to get your timings, they are wrong.

Since you seem to be very preoccupied with micro-optimizations, at least use the proper tools to test them. I strongly recommend jmh.
 
Marshal
Posts: 5388
326
IntelliJ IDE Python Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
JMH seconded for microbenchmarking. Do not attempt to roll your own, ever.
 
Marshal
Posts: 76110
362
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But why would anybody want to do micro‑benchmarking on such code in the first place?
 
Tim Holloway
Saloon Keeper
Posts: 25843
184
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:But why would anybody want to do micro‑benchmarking on such code in the first place?

Because they work on the JVM development teams tuning the String classes at Oracle, IBM, OpenJPA, et. al.? But as for the rest of us, probably not.
 
Mike Simmons
Rancher
Posts: 4252
57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Re: replace() using replaceAll(), interesting.  That code seems to be from Java 1.6 or 1.7, since it has methods added in 1.6 but not 1.8.  I was looking at code from JDK 18 and JDK 11, and it does not use replaceAll() internally.  I would guess that they found they could optimize it better that way.  If you're still using Java 8 you should look at the source for Java 8.  But this also points to another issue when measuring performance - it can vary from Java version to Java version, and also from OS to OS and computer to computer.  
 
Tan Quang
Ranch Hand
Posts: 213
jQuery MySQL Database PHP
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:As I already mentioned in one of your other threads, you may not draw conclusions from any benchmark written using currentTimeMillis(), nanoTime() or any timer classes.

If you didn't use a microbenchmark framework to get your timings, they are wrong.

Since you seem to be very preoccupied with micro-optimizations, at least use the proper tools to test them. I strongly recommend jmh.


Tim Cooke wrote:JMH seconded for microbenchmarking. Do not attempt to roll your own, ever.


As far as everyone's advice goes, the metric seems to make me no longer trust performance, it's unacceptable and unreliable anyway.
StringBuffer is faster than StringBuilder in my test code (perhaps or sure) is due to the difference between the Java versions?! Anyway, I'm not sure this, because as I checked the Java version used on onlinegdb with this code:

Although Java is version 11.0.4, but StringBuffer is faster than StringBuilder in the case of the example code I gave.
So, my conclusion on the first question: Concatenating strings using the "plus" operator, although easier to read, but yields the worst performance. Therefore, so it is necessary to avoid the use of string concatenation using the "plus" operator, and it is advisable to use StringBuffer or StringBuilder (on a case-by-case basis) for better performance?!
_____________________________________________________________________________________________
Anyway, using String.replaceAll() is not advisable, too expensive, and offers poor performance with simple string replacement cases like the current one.
I have read this question and answer: Commons Lang StringUtils.replace performance vs String.replace
They have ceased to rely on micro-benchmarking figures to draw conclusions, they have also analyzed the internal source code and revision histories of both methods. Based on that answer, it can be seen that String.replace() has been greatly improved from Java 9, and outperformed StringUtils.replace() from subsequent versions of Java. But the omission of that answer is that the changes are from OpenJDK, not Oracle JDK, Oracle JDK has long been closed source, its changes although partially publicized but still quite "ambiguous", it is impossible to be sure if String.replace() is really improved in Oracle JDK as on Open JDK or not!?
I'm thinking about whether it's necessary to write a separate method to replace the string? Anyway, improving the String.replace() method is only available from Java 9, versions from Java 8 and below still provide poorer performance than using StringUtils.replace(). According to loukili answer and the micro-benchmarking result of that code is based on JMH in the next answer of qxo. It's probably pretty good for simple string replacements using Java 8 and below (but not "friendly" in case it is necessary to replace many places as in my example code (3 positions)).
 
Tan Quang
Ranch Hand
Posts: 213
jQuery MySQL Database PHP
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:Re: replace() using replaceAll(), interesting.  That code seems to be from Java 1.6 or 1.7, since it has methods added in 1.6 but not 1.8.  I was looking at code from JDK 18 and JDK 11, and it does not use replaceAll() internally.  I would guess that they found they could optimize it better that way.  If you're still using Java 8 you should look at the source for Java 8.  But this also points to another issue when measuring performance - it can vary from Java version to Java version, and also from OS to OS and computer to computer.  


Yes, String.replace() has improved quite a bit since Java 9, but that's with OpenJDK, which is pretty "ambiguous" with Oracle JDK!  
 
Mike Simmons
Rancher
Posts: 4252
57
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tan Quang wrote:Although Java is version 11.0.4, but StringBuffer is faster than StringBuilder in the case of the example code I gave.


The example you gave is still far too short to give meaningful results.  I just tested it with JDK 8 and 11 using 100000000 repetitions, and StringBuilder was the clear winner on my machine - though as I repeated it more and more, the results got closer and closer.  I agree with everyone encouraging you to use jmh - your tests are close to meaningless, as it is.

Tan Quang wrote:So, my conclusion on the first question: Concatenating strings using the "plus" operator, although easier to read, but yields the worst performance. Therefore, so it is necessary to avoid the use of string concatenation using the "plus" operator, and it is advisable to use StringBuffer or StringBuilder (on a case-by-case basis) for better performance?!


Well, that's stated far too generally to be true in all cases.  There are many cases where concatenating with + is equally fast as the others.  Maybe sometimes faster.  But it depends how it's used in code.  The one big thing to avoid is using + inside a loop in cases where you're appending to a string that keeps growing in size.  The fundamental problem is that the intermediate result is forced to be a String each time - which means that you're going to be spending a lot of time copying characters into new char[] arrays each time you append.  You get basically the same performance from each of these:

The problem is not concatenation with +, but the fact that you're creating a new immutable object (String) on each iteration of the loop, and (importantly) it's getting bigger each time.  That's what makes this bad performance.

Conversely, if you are not doing this in a loop, and if the result is not getting bigger each time, then it probably doesn't matter whether you use + or StringBuilder.  There's no reason to scare people off from using + in general - but be very careful when it's in a loop.

As for your concerns about OpenJDK and what's in Oracle's code... sigh.  You can check the actual source code by looking at src.zip, or using a decent IDE like IntelliJ which will show you a decompiled version.  You can also worry about many, many other valid Java releases out there, if you like.  There are a lot.  I think you'll find that in many cases, improvements from OpenJDK make it into other versions as well.
 
Tim Holloway
Saloon Keeper
Posts: 25843
184
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:
The problem is not concatenation with +, but the fact that you're creating a new immutable object (String) on each iteration of the loop, and (importantly) ]it's getting bigger each time.  That's what makes this bad performance.



I don't know if there's any particular badness about making larger and larger String instances as such. When you discard a String in favor of a new String, whether larger or smaller, it's not like there's going to be immediate storage reclamation. The garbage collection process has its own schedule.

The concatenation operator ("+") and concatenation method calls are basically the same thing except that one is defined in the language and the other is an explicit method call. If the compiler can do concatenations at compile-time, it will do so in either event. This is an optimization technique known as constant folding.

As I recall, yes, there was some penalty for using "+" in early Java releases, but it was repaired long ago.

You probably won't suffer too much building strings from explicitly concatenating 2 or 3 other strings. But if it gets more complex than that - and especially if type conversion (say Integer-to-String) is involed, use a StringBuilder.

And always remember. Optimization isn't don't what you "know" is efficient, it's doing what you've measured to be efficient. And if it isn't enough difference to measure, don't bother. Your efficiency is more important than the machine's.
 
Mike Simmons
Rancher
Posts: 4252
57
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:I don't know if there's any particular badness about making larger and larger String instances as such. When you discard a String in favor of a new String, whether larger or smaller, it's not like there's going to be immediate storage reclamation. The garbage collection process has its own schedule.


Ignoring garbage collection for the moment, the key issue is that in the three code examples I shows, each String is being built by adding something to the old String.  Each time that happens, that means a new String has a char[] array whose contents are (mostly) copied from the previous String. In the code I showed, that's 5 characters copied the first time... and 10 the second... and 15 the third... and 20 the fourth... and 25 the fifth... which adds up quite a bit.  It may be double that each time, if you are actually creating a new StringBuilder and then a new String with each step.  Regardless, it's O(N^2)  performance overall, for a case where it was clearly possible to get O(N) performance by using a single StringBuilder and not re-copying a new one each time:

Sure, some copying of arrays occurs internally as the StringBuilder gets resized now and then.  But it's done smartly, such that overall performance is still O(N).  It doesn't have to re-copy all the content on every iteration.  Because that would be silly.
 
Campbell Ritchie
Marshal
Posts: 76110
362
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:. . . I don't know if there's any particular badness about making larger and larger String instances as such. When you discard a String in favor of a new String, whether larger or smaller, it's not like there's going to be immediate storage reclamation. The garbage collection process has its own schedule. . . .

But once the String gets to sizes like 10⁸ characters, GC will be necessary every few runs of the loop

Optimization isn't don't what you "know" is efficient, it's doing what you've measured to be efficient. . . . .

And isn't one of the best ways to mess up your performance to try to be too clever about optimisations?
 
Tim Holloway
Saloon Keeper
Posts: 25843
184
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:the key issue is that in the three code examples I shows, each String is being built by adding something to the old String.


Well, yes. I was talking about string concatenation in general, not making a rule and computing the performance of a specific case. Also note that I'm not assuming that the text copying is necessarily the major consumer of resources. Just constructing a String is no minor feat, regardless of the length of its contents.

Nor am I assuming that GC requires a certain memory threshold to kick off. Last time I heard, GC was running as a gradual and ongoinh process, not as a stop-everything-and-recover atomic operation, as it infamously did in the Microsoft Basic multi-tasking demo app for the Commodore Amiga. GC has progressed a long ways since 1985.

Actually the case of constructing a mega-string by repeatedly concatenating the same smaller string over and over isn't exactly the most common case. The main place you'd see it might be something like creating a long separator line for printed reports, except that Java isn't employed all that often for printed reports. Further, consider what the following code might produce:

Now consider this:

The second case would almost certainly be optimised at compile time into a single assignment of a constructed String of 100 stars --- constant folding. But the first example could result in the exact same thing depending on how determined the compiler was, thanks to another operation known as loop unrolling. Loop unrolling takes advantage that for a small fixed number of iterations, the iterate-test-and-branch is overhead that can be eliminated in favor of simply replicating the loop body multiple times with no testing or branching.
 
Campbell Ritchie
Marshal
Posts: 76110
362
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote: . . . The second case would almost certainly be optimised at compile time into a single assignment of a constructed String of 100 stars --- constant folding. . . . .

Agree; it would actually be executed at compile time and a String of 100 stars created.
 
Stephan van Hulst
Saloon Keeper
Posts: 14096
319
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tan Quang wrote:StringBuffer is faster than StringBuilder in my test code (perhaps or sure) is due to the difference between the Java versions?!


No. It's probable that by the time execution reaches the code that tests StringBuffer, the JVM has warmed up enough that it appears this code runs faster and it has nothing to do with using StringBuffer over StringBuilder. Swap the two around and see if it changes anything. It's exactly for this reason I told you to use jmh, because it deals with situations like this.

So, my conclusion on the first question: Concatenating strings using the "plus" operator, although easier to read, but yields the worst performance. Therefore, so it is necessary to avoid the use of string concatenation using the "plus" operator, and it is advisable to use StringBuffer or StringBuilder (on a case-by-case basis) for better performance?!


Again, you're drawing this conclusion prematurely. It's likely that using the + operator will give you worse performance than using a StringBuilder, when used in a loop with many iterations. Does that mean you need to avoid it at all costs? No.

1) If you're not building a string inside a loop, you might as well use +, because the performance benefits of using StringBuilder only really become apparent when you are performing many edits to get the final result.

2) Even if you are building a string in a loop using the + operator, the performance penalty may not be apparent because that portion of your code is not called often. If performance is a worry, use a profiler to identify hot paths in your code before you start optimizing.

Anyway, using String.replaceAll() is not advisable, too expensive, and offers poor performance with simple string replacement cases like the current one.


Who cares? Stop worrying. Start profiling. You're wasting so much time worrying about stuff that might be inconsequential. Also, as you've already seen, improvements may be made to methods that you previous considered "too expensive", and rules you've made for yourself may no longer hold.

Seriously, the next time you write an application and worry about the performance of a small part of your code, run a profiler and see how much time is spent there in total.

________________________________________________________________________________________

Now, for those interested, I wrote a couple of benchmarks in jmh to test the performance characteristics of +, StringBuilder.append() and StringBuffer.append().


establishBaseline               avgt    5   102,926 ±  10,111  ns/op
useConcatenationOnce            avgt    5    98,797 ±   9,018  ns/op
useStringBuilderOnce            avgt    5   100,541 ±   5,908  ns/op
useStringBufferOnce             avgt    5   105,161 ±   3,027  ns/op


As you can see, for a single invocation, the execution time for both string concatenation and calling the append() methods is almost identical to the baseline. That means that most execution time is used for allocation of the builder/buffer that is used for the concatenation.

"But Stephan, isn't it unfair to create a builder/buffer object in all benchmarks, except in useConcatenationOnce()?"

No. String concatenation requires execution time to create the resulting String object. The comparison is fair because I didn't call toString() on the builder/buffer used in the other benchmarks, something that you likely would have to do in an actual application.

Conclusion: For concatenating two strings once, performance-wise IT REALLY DOESN'T MATTER. So just use + for clarity.

Let's take a look at repeated concatenation:


estabishBaseline           avgt    5     0,193 ±   0,069  ns/op
useConcatenationManyTimes  avgt    5  2947,531 ± 111,243  ns/op
useStringBuilderManyTimes  avgt    5    12,964 ±   1,157  ns/op
useStringBufferManyTimes   avgt    5    27,201 ±   5,807  ns/op


Here we can see that using string concatenation inside a loop is significantly less efficient than appending to a builder/buffer. We also see that using StringBuffer is roughly twice as expensive as using StringBuilder, which is expected because access to StringBuffer is synchronized.

HOWEVER, we also see that repeated string concatenation costs roughly 30 milliseconds for 10,000 concatenations. So unless your loop in executed in a hot path of your application, you probably won't even notice it if you use + instead of append().

Conclusion: USE A PROFILER.


Finally, here is the full output from jmh:
 
Stephan van Hulst
Saloon Keeper
Posts: 14096
319
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ah yes, for completeness, here is the Constants class I used:
 
Tan Quang
Ranch Hand
Posts: 213
jQuery MySQL Database PHP
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:Who cares? Stop worrying. Start profiling. You're wasting so much time worrying about stuff that might be inconsequential. Also, as you've already seen, improvements may be made to methods that you previous considered "too expensive", and rules you've made for yourself may no longer hold.


In theory, replacing a simple string that does not contain regex references like the one in my example code (line 3), the use of:
  • StringUtils.replace(): Will probably provide the best performance. But it's not "friendly" in cases where there are multiple locations to replace as in this example code.
  • String.replace(): This is the best option because whether used on Java 8 and below (not yet improved internal source code - still using replaceAll() + Pattern.LITERAL) or newer versions (which have improved internal source code) it is still fast and offers better performance than String.replaceAll().
  • String.replaceAll(): This is the "to avoid" option. Since it is quite similar to String.replace() on Java 8 and below but since it does not use Pattern.LITERAL, the performance will not be good (at least when compared to using String.replace()).

  • Is this true?
     
    Tan Quang
    Ranch Hand
    Posts: 213
    jQuery MySQL Database PHP
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mike Simmons wrote:The example you gave is still far too short to give meaningful results.  I just tested it with JDK 8 and 11 using 100000000 repetitions, and StringBuilder was the clear winner on my machine - though as I repeated it more and more, the results got closer and closer.  I agree with everyone encouraging you to use jmh - your tests are close to meaningless, as it is.


    In the case of this code:

    As far as I remember not mistakenly, the string is invariant, a string is generated and will be stored in memory, when it is necessary to use the same string, it will directly use the one already in memory or create a new one if there is no string like that in memory.
  • But in the above case, the strings are almost the same, differing only in where they are replaced (and the text at the end), should I call content.toString() directly as in line 21 or attach it to a variable and then call it as in lines 19, 20?
  • Should I use StringBuilder (line 8) then use StringBuilder.append() in this case? Or should use String directly then concatenation the string with "plus" operators?
  •  
    Marshal
    Posts: 27288
    87
    Eclipse IDE Firefox Browser MySQL Database
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Part of the performance equation is to not write complicated code when simple code would be equivalent. Doing that makes it hard to see what needs to be optimized and what doesn't.

    The purpose of the code you posted is simply to concatenate two Strings, although it's hard to realize that. So there is no reason to consider anything except simple String concatenation. Here's my simplified version of what you posted:

     
    Mike Simmons
    Rancher
    Posts: 4252
    57
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    One nice thing about using replaceAll() is that it can give you the opportunity to replace everything you want in a single pass:

    (Here I pretended there was some difference between the different TIP_ITEM_NOTICE_BY_[ x ]_FLASH_END constants.  As was probably intended - though Paul correctly observes that they're all the same in the code shown.)

    The point is, there's just one replaceAll(), not three consecutive ones.  Is that faster?  Maybe, maybe not - the overhead of using a Pattern may still be greater.  But the more variables you want to replace, the more it makes sense to do them all in one pass.  Is it easier to read?  That may depend on taste.  Again, I think it scales nicely as you add more variables to replace (if there are any).

    Admittedly, this is the replaceAll() in Matcher, not one in String.  It works similarly, but it's designed to work with a MatchResult on each replacement.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25843
    184
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:Part of the performance equation is to not write complicated code when simple code would be equivalent. Doing that makes it hard to see what needs to be optimized and what doesn't.



    Also, the compiler understands simple common cases. If you code gnarly performance hacks, it may backfire because the compiler's optimizers may not be able to tune them. Recall what I said is possible with the "100 star" example.

    But in the end, Don't optimize unless you have to!!!. Hardware time is cheap these days. Developer time is not. We're no longer having to fit apps into 16K on a mainframe whose processor speed is slower than an Apple Watch. Management is not amused when you're tinkering for no obvious profit. Or, as I once told one clever fellow, "It's not like you can collect all those nanoseconds you saved and put them in a jar for a rainy day".

    If you DO get yelled at for poorly performing software, THEN profile it in as close to the errant environment as you can. DON'T rely on the wisdom and benchmarks we're reporting here, because a different environment may be involved and the previously-listed optimizations could actually make things worse. Optimize ONLY the parts that are actually hurting. And again, don't get too clever, or you'll end up in a fight with the compiler.

    And above all, keep in mind that optimization is best applied from the top. As someone who has spent years in high-performance, high-reliability environments I can say from experience that a wise algorithm selection can blow away clever statement coding tricks by orders of magnitude.

     
    Tan Quang
    Ranch Hand
    Posts: 213
    jQuery MySQL Database PHP
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:Part of the performance equation is to not write complicated code when simple code would be equivalent. Doing that makes it hard to see what needs to be optimized and what doesn't.

    The purpose of the code you posted is simply to concatenate two Strings, although it's hard to realize that. So there is no reason to consider anything except simple String concatenation. Here's my simplified version of what you posted:


    Oh that's right, I hadn't thought of the string array before, because it was a bit difficult to read and a bit difficult to determine the index (in the case of my old code).  
     
    Tan Quang
    Ranch Hand
    Posts: 213
    jQuery MySQL Database PHP
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mike Simmons wrote:One nice thing about using replaceAll() is that it can give you the opportunity to replace everything you want in a single pass:

    (Here I pretended there was some difference between the different TIP_ITEM_NOTICE_BY_[ x ]_FLASH_END constants.  As was probably intended - though Paul correctly observes that they're all the same in the code shown.)

    The point is, there's just one replaceAll(), not three consecutive ones.  Is that faster?  Maybe, maybe not - the overhead of using a Pattern may still be greater.  But the more variables you want to replace, the more it makes sense to do them all in one pass.  Is it easier to read?  That may depend on taste.  Again, I think it scales nicely as you add more variables to replace (if there are any).

    Admittedly, this is the replaceAll() in Matcher, not one in String.  It works similarly, but it's designed to work with a MatchResult on each replacement.


    I feel Paul's way looks simpler and brief. But anyway, thank you for your proposal.
    P/s: It's not that the strings are exactly the same, it just changes slightly in the last sentence in the string.
     
    Tim Holloway
    Saloon Keeper
    Posts: 25843
    184
    Android Eclipse IDE Tomcat Server Redhat Java Linux
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Pattern matching is a fascinating thing.

    Typically, you have a source pattern that is compiled before use. You can do a one-shot compile-and-go and that's less coding, but since compiling is overhead, if you intend to do repeated matches, a one-time compile is better.

    The actual match operation is done by running the match pattern as instructions to the matcher which is a finite-state machine. In other words, a specialized bytecode interpreter dedicated to running the matches. It's quite efficient.

    I did a quick peek at some of the class sources. In OpenJDK7, the String replaceAll() method actually sets up a Matcher and returns its replaceAll() results. In OpenJDK8, the Matcher's replaceAll() allocates a StringBuffer and replaces into it. Other JRE's may be doing things differently, so Your Mileage May Vary.

    Yes, that's what I said. A StringBuffer, not a StringBuilder. Presumably to ensure that the match is atomic.
     
    Paul Clapham
    Marshal
    Posts: 27288
    87
    Eclipse IDE Firefox Browser MySQL Database
    • Likes 1
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tan Quang wrote:Oh that's right, I hadn't thought of the string array before, because it was a bit difficult to read and a bit difficult to determine the index (in the case of my old code).  



    If you're going to have large amounts of text like that then it may be better to use a Properties file. This is essentially a Map<String, String> which is backed by a text file, When you need to change the text or add new text entries then you can just modify the text file, a much simpler process than adding more code to your program.
     
    Mike Simmons
    Rancher
    Posts: 4252
    57
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:I did a quick peek at some of the class sources. In OpenJDK7, the String replaceAll() method actually sets up a Matcher and returns its replaceAll() results. In OpenJDK8, the Matcher's replaceAll() allocates a StringBuffer and replaces into it. Other JRE's may be doing things differently, so Your Mileage May Vary.

    Yes, that's what I said. A StringBuffer, not a StringBuilder. Presumably to ensure that the match is atomic.


    I don't think so - like most StringBuffers and StringBuilders, it's used as a local variable only, and so there's no way to access it from any other thread.  Synchronization is wasted there.  Only a minor waste, since the lock will never be contended, but still a waste.  I suspect it's just a case where the person who wrote the method was used to using StringBuffer and just did what they were used to.  It looks like by OpenJDK 11 someone corrected it to use StringBuilder.
     
    Tan Quang
    Ranch Hand
    Posts: 213
    jQuery MySQL Database PHP
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Paul Clapham wrote:If you're going to have large amounts of text like that then it may be better to use a Properties file. This is essentially a Map<String, String> which is backed by a text file, When you need to change the text or add new text entries then you can just modify the text file, a much simpler process than adding more code to your program.


    Oh ... I will think about switching to properties files when it is too bulky because there are many codes like this.
     
    Tan Quang
    Ranch Hand
    Posts: 213
    jQuery MySQL Database PHP
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Tim Holloway wrote:Pattern matching is a fascinating thing.

    Typically, you have a source pattern that is compiled before use. You can do a one-shot compile-and-go and that's less coding, but since compiling is overhead, if you intend to do repeated matches, a one-time compile is better.

    The actual match operation is done by running the match pattern as instructions to the matcher which is a finite-state machine. In other words, a specialized bytecode interpreter dedicated to running the matches. It's quite efficient.

    I did a quick peek at some of the class sources. In OpenJDK7, the String replaceAll() method actually sets up a Matcher and returns its replaceAll() results. In OpenJDK8, the Matcher's replaceAll() allocates a StringBuffer and replaces into it. Other JRE's may be doing things differently, so Your Mileage May Vary.

    Yes, that's what I said. A StringBuffer, not a StringBuilder. Presumably to ensure that the match is atomic.


    That right, on Java <= 8, both String.replace() and String.replaceAll() use replaceAll() internally. It is only improved from Java 9 and above.
     
    Tan Quang
    Ranch Hand
    Posts: 213
    jQuery MySQL Database PHP
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Mike Simmons wrote:I don't think so - like most StringBuffers and StringBuilders, it's used as a local variable only, and so there's no way to access it from any other thread.  Synchronization is wasted there.  Only a minor waste, since the lock will never be contended, but still a waste.  I suspect it's just a case where the person who wrote the method was used to using StringBuffer and just did what they were used to.  It looks like by OpenJDK 11 someone corrected it to use StringBuilder.


    True, their use of StringBuffer there is unnecessary.
    As linked to the StackOverflow question I posted on above, they submitted a report that needed to change the source code of String.replace() because using both regex and repleaceAll() was "unnecessary" and "expensive". They later revised the source code of String.replace() to be quite similar to the source code of StringUtils.replace() but used StringBuffer instead of StringBuilder.
    Both the String.replace() and String.replaceAll() methods have changed a lot from Java 9 and above, which is sad for Java 8.
     
    reply
      Bookmark Topic Watch Topic
    • New Topic