• Post Reply Bookmark Topic Watch Topic
  • New Topic

String Concatenation & Performance

 
victor kamat
Ranch Hand
Posts: 247
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Say we want to concat all strings in strArray which is String[200]; each string in strArray is at least of length 20. Until recently the way I would would have done this like so:

StringBuilder temp = new StringBuilder();

for ( int i = 0; i < strArray.length; i++)
temp.append(strArray[i]);

I recently learned that a faster way is this:

int capacity = 20*strArray.length;
StringBuilder temp = new StringBuilder( capacity );
.... and now the for() loop ......

Just thought I'd share that with you.
I'd appreciate comments.
 
Peter Johnson
author
Bartender
Posts: 5856
7
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, if you know the final size of your string it is always good to preallocate the buffer. If you do "new StringBuilder()", you start with a 16-char buffer. Once you overflow it, a new buffer, twice the size (or large enough to hold the current data), is allocated and the contents copied from the old to the new. Based on that algorithm, the first code would have recreated and recopied the buffer contents about 8 times. That is definitely much slower than allocating the buffer only once.
 
Pat Farrell
Rancher
Posts: 4678
7
Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This is all true, but I can't believe its important. Is the performance of this code really a bottleneck?

Or is this just about good practices, when you know what the length is, specify it?

Actually, if you know it might grow to 200 characters, simply specifying 100 characters to start will give you nearly all the performance improvement.
 
Peter Johnson
author
Bartender
Posts: 5856
7
Android Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes this is mostly about best practices. With 4000 characters (which is 8000 bytes), there is probably not that big of a performance issue. I would not even worry about it if it happened only once. If it happens hundreds of times per minute (for example, building http responses), then I would be worried.

But there is a more significant issue than the amount of garbage being collected and the amount of memory-to-memory copying being done - CPU cache invalidation. If the string is large enough (say 1Meg characters), once the buffer is reallocated and the data copied, the CPU cache could now be occupied only by the two copies of the string, meaning you now have to take the time to refresh your working data into the cache. This type of cache thrashing can have significant performance impacts on applications.
[ June 28, 2008: Message edited by: Peter Johnson ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!