Win a copy of Practical SVG this week in the HTML/CSS/JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Best way to compare strings

 
Andraz Poje
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi, here is one interesting question...


Which one is better/faster and why?
 
Rob Spoor
Sheriff
Posts: 20822
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matches will probably be slower since it uses a java.util.regex.Pattern and java.util.regex.Matcher in the background. Both equals and compareTo use a simple loop, and should therefore be faster.

That said, don't optimize prematurely, and go for what's natural. For checking if two strings are equal I would always use equals, never compareTo. Even if that one is faster (which may or may not be the case), the difference will be milliseconds at worst. I then prefer readability over micro-optimization.
 
Henry Wong
author
Sheriff
Posts: 22542
109
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Taking a pure guess, I would say that the regex is the slowest, because it has to compile the regex -- which is probably slower, by itself, than the comparison. As for the other two, I recommend writing testing code that measures it.


Wow... Rob beat me by six seconds !!

Henry
 
Andraz Poje
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is anybody interested to write a test code to measure comparison speed?
 
Jan Cumps
Bartender
Posts: 2615
14
C++ Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andraz Poje wrote:Is anybody interested to write a test code to measure comparison speed?
You?
 
Rob Spoor
Sheriff
Posts: 20822
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I second that motion. For timing comparisons you should run each test for at least several thousand iterations. If possible you should get rid of JIT influences; I thinkHenry posted how to do that a few weeks ago.
 
Andraz Poje
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jan Cumps wrote:
Andraz Poje wrote:Is anybody interested to write a test code to measure comparison speed?
You?

Why not ;)

Result is:
1
0
0
How do I get more precise result?
 
Henry Wong
author
Sheriff
Posts: 22542
109
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote: I think Henry posted how to do that a few weeks ago.


I probably did, but I don't mind doing it again...

1. Get rid of everything that is not relevant. Don't do the "if" or the "System.out", as it is the same in all three cases, so why bother measuring it?

2. Be careful with short strings, as it is likely that you are measuring the setup more than the actual comparison. Also, don't make it obviously not equal, as there are probably short circuit code with a few of the options. Maybe measure two strings, where they start off the same, but are different later.

3. Do it in a loop. Maybe do the operation a million times. And measure the million times as a single number.

4. Make sure you have tons of memory, and do a gc() prior to taking the start time, just in case (to avoid taking a gc hit during the measurement).

5. Run it many many times -- meaning do the three tests in a loop over and over. This way, you can ignore the first few runs, in order to discount the JIT. This will also allow you to discount outliers caused by the GC too.

Henry

 
Jan Cumps
Bartender
Posts: 2615
14
C++ Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Measurement Is Everything if you want to dig further.

 
Andraz Poje
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
Rob Prime wrote: I think Henry posted how to do that a few weeks ago.


I probably did, but I don't mind doing it again...

1. Get rid of everything that is not relevant. Don't do the "if" or the "System.out", as it is the same in all three cases, so why bother measuring it?

2. Be careful with short strings, as it is likely that you are measuring the setup more than the actual comparison. Also, don't make it obviously not equal, as there are probably short circuit code with a few of the options. Maybe measure two strings, where they start off the same, but are different later.

3. Do it in a loop. Maybe do the operation a million times. And measure the million times as a single number.

4. Make sure you have tons of memory, and do a gc() prior to taking the start time, just in case (to avoid taking a gc hit during the measurement).

5. Run it many many times -- meaning do the three tests in a loop over and over. This way, you can ignore the first few runs, in order to discount the JIT. This will also allow you to discount outliers caused by the GC too.

Henry



Thanks....



Result:
279 9 13
No doubt, equals is the fastest.....Right?
 
Paul Clapham
Sheriff
Posts: 21892
36
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Right? Nope. Now you should do the tests in a different order, in case the JVM decided to do some optimizations partway through your test.
 
Henry Wong
author
Sheriff
Posts: 22542
109
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Also, I recommend taking the start and end time out of the loop. There are always some inaccuracies when taking a clock sample, and doing it in the loop means that the inaccuracy is compounded a million times.

This, of course, means that you will now need three different loops -- instead of one big one.

Henry
 
Henry Wong
author
Sheriff
Posts: 22542
109
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

And ... you didn't do my recommendations number 4 and 5... which tries to discount the JIT and GC.

Henry
 
Rob Spoor
Sheriff
Posts: 20822
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Timing measurements in Java is usually done like this (in pseudo-code):
Add to that Henry's additional points and you should get some good results.
 
Andraz Poje
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
And ... you didn't do my recommendations number 4 and 5... which tries to discount the JIT and GC.

Henry

What are JIT and GC?
 
Rob Spoor
Sheriff
Posts: 20822
68
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JIT
GC
 
Campbell Ritchie
Marshal
Posts: 52663
121
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
JIT means "just-in-time" and GC means "garbage collector".

At which point I think I ought to move this thread to our performance forum.
 
Andraz Poje
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
And ... you didn't do my recommendations number 4 and 5... which tries to discount the JIT and GC.
Henry


Is this now proper way to test code?
Thanks for help, I learned quite a bit from this (I replaced all .matches with .equals)


Result: 163 7 10
I run this code few times. "Equls" is always the fastest comparison.
 
Mike Thon
Greenhorn
Posts: 6
Linux Mac
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why not use == for comparing strings? I used the following code to test == versus .equals()



for .equals()
$ time java Cas
1726

real 0m2.005s
user 0m1.981s
sys 0m0.053s

for ==

$ time java Cas
3

real 0m0.413s
user 0m0.288s
sys 0m0.060s

I ran each test a couple of times. From this it would seem that == is several fold faster than .equals(). I learned somewhere that strings should always be compared with .equals() in java. I suspect the difference here is in the way that the compiler can optimize the loop, so this might not represent the performance you would find in real world code.
 
Jan Cumps
Bartender
Posts: 2615
14
C++ Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to JavaRanch, Mike.

Comparing String objects with == might not do what you expect here. It checks if the left side of the == and the right side of the == are the same object. Not that they hold the same string value.

Here is an overview of the valid comparision features of the Java language.

Regards, Jan
 
Istvan Kovacs
Ranch Hand
Posts: 100
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mike Thon wrote:Why not use == for comparing strings?


As Jan has pointed out, you're comparing object references, not string values that way.
It may work in some cases, but is not reliable.

Why it may work:
- the compiler/VM reuse the same String object to represent the same value when it's used several times.
- the compiler actually performs String concatenation for expressions where it can figure out the value
- Strings may be 'interned' (there is a 'canonical' representation for each String value that you can use to save memory - or leak memory, if you are not careful, see http://www.javamex.com/tutorials/memory/string_saving_memory.shtml)

Check out the following, and play around until you understand what's going on (or until you get utterly confused, in which case ask ).
For fun, try to guess the output.


 
Istvan Kovacs
Ranch Hand
Posts: 100
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Andraz Poje wrote:

How do I get more precise result?


Besides running many times and making sure you get the JIT compiler to actually compile your code before benchmarking, you may want to check out System.nanoTime().
 
R van Vliet
Ranch Hand
Posts: 144
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Istvan Kovacs wrote:
Andraz Poje wrote:

How do I get more precise result?


Besides running many times and making sure you get the JIT compiler to actually compile your code before benchmarking, you may want to check out System.nanoTime().


Are there any VM implementations where System.nanoTime() isn't System.currentMillis() * 1000? I have yet to run into any. Also, the granularity of System.currentMillis() is so ridiculously unpredictable and high that you need a LOT of iterations of naturally quick operations to get accurate timing information. You can avoid JIT based inaccuracies by running your complete test a couple of times within a single execution cycle of your program so you're basically doing :

equals test
regex test
compare test
equals test
regex test
compare test
equals test
regex test
compare test
etc.

Dismiss the first X as potential runs that were influenced by a lack of JIT compilation and off you go. Now...

All that said, this seems a rather theoretical excersize because I sincerely doubt there are real world programs where switching from compareTo() to equals() provides a significant performance improvement. Add to that that equals() is the most obvious option anyway it should rarely be a problem.

By the way, the performance differences can be explained quite easily :

1) match uses regex which need to be compiled and pattern matched, as explained above
2) equals generally performs better than compare because it has an early-out before it enters its inner loop that's almost always used, namely s1.length() != s2.length(). A compare needs to do the s1.charAt(i) - s2.charAt(i) for the first different character regardless of string length. You will find different results if you always compare equal length strings. Or more specifically, you will see that compareTo and equals performance will be similar.
 
Istvan Kovacs
Ranch Hand
Posts: 100
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
R van Vliet wrote:Are there any VM implementations where System.nanoTime() isn't System.currentMillis() * 1000? I have yet to run into any.



nanos: 28018282127472, millis1000: 1273673404820000

> java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

Running on 32-bit Windows XP.

Also tried on Linux (32-bit, uname -a: 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:56 UTC 2010 i686 GNU/Linux), nanos was not a multiple of some power of 10.

BTW, the multiplier between milli and nano is 1,000,000, not 1,000
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!