• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

hashCode of class String

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello!

I hava a question about the implementation of the hashcode algorithm in class String.

As we know, the implementation is: hashCode = s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Just to play around a little bit, I tried to implement this algorithm by myself:



But: This implementation provides correct values just for short Strings (same results as s.hashCode() ), whereas the following provides always correct results:



Where is the difference between these implementations? I think in theory both must provide same results. I´m sure I overlook something...

Wolfgang
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It's just a guess, but it could be this: Math.pow() works on floating-point numbers of type double. The data type double has limited precision. You might get rounding errors which will give you a different result than when you'd use integer computations.
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You don’t calculate powers of 31 like that at all. You do something like this:-Of course in an immutable class you would probably cache a hash (what a dreadful rhyme) rather than calculating it every time it is used.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Your use of left shift and - may be what the * 31 bit is optimised to.
 
Wolfgang Troescher
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesper de Jong wrote:It's just a guess, but it could be this: Math.pow() works on floating-point numbers of type double. The data type double has limited precision. You might get rounding errors which will give you a different result than when you'd use integer computations.



I tried it with BigDecimal instead of int/floats, but the same results...

For example: with test-String "abcde" my method provides the same results like hashCode()-Method (92599395), but if I add a character more, let´s say an 'f' (abcdef), I get different results (my method: 2147483647, hashCode(): -1424385949).
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What happens if you keep going with Math.pow is that you will eventually exceed the maximum value of an int. You have probably missed out a cast to (int) somewhere. Casting such a double will produce 2147483647, so you will be repeatedly multiplying by that number and adding. That is almost certainly not how the hashCode method is implemented. It is probably implemented as I showed it, and you showed a similar algorithm with a premature optimisation combining << and -.
 
Master Rancher
Posts: 4806
72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Your use of left shift and - may be what the * 31 bit is optimised to.


Yes, definitely. (x << 5) is the same as x * 2^5 which is x * 32. Therefore ((x << 5) - x) is the same as (32 * x - x), which is 31 * x.

Campbell Ritchie wrote:and you showed a similar algorithm with a premature optimisation combining << and -.


Why assume it's premature? Could be it was well tested back when the optimization was made, and made sense for the hardware and software it was run on. At least, Josh Bloch thought so. This is the original algorithm used in String.java. It's not necessary in the modern JDK code, because as Bloch notes in Effective Java, "modern VMs do this sort of optimization automatically." But calling it premature seems uncharitable, considering Wolfgang is probably just reading his copy of Effective Java where this very code is discussed.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I thought Bloch says it is probably optimised to h << 5 - h. But you cannot be sure that is how it is optimised. Anyway, as Brian Goetz reminds us, “write dumb code”. If there is such an optimisation for * 31, it will be used anyway. If, however, somebody ever invents a better optimisation, then using h << 5 - h will miss that optimisation. So trying to micro‑optimise in code like that is unnecessary in my opinion.

Sorry if I was uncharitable or rude or anything.
 
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If I recall correctly, and if the state of the art hasn't changed in this respect since I learned it, the shift and add can be significantly more performant than the equivalent multiplication. I think this is due to pipeline stall introduced by multiplication. A smart compiler (as in hotspot, not javac) would probably recognize multiplications by 2^N +/- 1 and convert to the equivalent shift operation if it's appropriate to do so on the underlying hardware. So you're probably safe with X * 31, but in a case like hashCode(), where it could be called often, can be doing lots of multiplications, and specifically exists to improve performance, I don't think it's necessarily premature to explicitly write out X << 5 - X.
 
Mike Simmons
Master Rancher
Posts: 4806
72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:I thought Bloch says it is probably optimised to h << 5 - h. But you cannot be sure that is how it is optimised. Anyway, as Brian Goetz reminds us, “write dumb code”. If there is such an optimisation for * 31, it will be used anyway. If, however, somebody ever invents a better optimisation, then using h << 5 - h will miss that optimisation. So trying to micro‑optimise in code like that is unnecessary in my opinion.


I don't see a "probably" in Bloch's entry on that topic, but I agree we don't know for sure. I think Bloch was quite confident it would be done using the then-current JDK from Sun, but he couldn't know what changes might be made in the future. Then too, the JIT may not optimize it at all until the code in question has been run a few times - that's the nature of just-in-time compilation. So yes, I agree with you on all points. Except to point out that the poster you're responding to probably wasn't optimizing the code at all - he was analyzing code that someone else had written.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic