• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Why does Java’s hashCode() in String use 31 as a multiplier?

 
Ranch Hand
Posts: 296
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I had gone through the API of String and i found that in hashcode method they are multiplying each string value by 31 please tell me why they are preferring that number
 
Ranch Hand
Posts: 1183
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have no clue why exactly this number. But i found the algorithm here:
http://www.neowin.net/forum/lofiversion/index.php/t352012.html
 
santhosh.R gowda
Ranch Hand
Posts: 296
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The number 31 was chosen because it is a prime number. (They could have chosen any other prime number).

To understand why a prime number was chosen, you have to know how hash-based collections such as HashSet and HashMap work.

Hash-based collections store elements in so-called buckets. A hash-based collection has a fixed number of buckets. When you put an object in the hash-based collection, then it uses the hash code of the object to determine in which bucket to put the object. In fact, I guess it uses the formula hashCode() % numberOfBuckets to compute which bucket to put the object in. (% is the modulo-operator).

When you want to find an object in a hash-based collection, this is what happens:

  • Compute the hash code of the object to find.
  • This tells the collection in which bucket to search.
  • Look at all objects in the bucket and compare each of them with the object we're looking for, by calling equals() on the method in the bucket and the object we're looking for.

  • This algorithm is very efficient for finding an object in a collection - using the hash code, it knows immediately in which bucket to look, and it only needs to compare the object you're looking for with a few other objects, those in that particular bucket. Even if the collection contains thousands of objects, in general only a few comparisons are necessary to find a particular object.

    There is one weak point: it only works well if elements in the collection are well-distributed among the buckets. Suppose you made a class with a hashCode() method that returns a fixed value. That's a correct implementation of hashCode(), but it defeats the efficiency of hash-based collections - all objects would be stored in the same bucket (because their hash codes are all the same), and finding an object in the collection would then mean that it would have to be compared to all objects in that single bucket.

    So, to make hash-based collections work optimally, the objects you put in the collection should have a hashCode() method that causes the objects to be distributed among the buckets as evenly as possible.

    HashSet and HashMap in Java have a number of buckets that is normally a power of 2 (by default, a HashMap has 16 buckets, for example).

    By making the hash code of objects a prime number (or a multiple of a prime number) the objects will be distributed better over the buckets, because a prime number is not a multiple of the number of buckets (which is a power of 2), so hashCode() % numberOfBuckets won't quickly cycle back to the same number (which means the same bucket).
     
    Marshal
    Posts: 80645
    473
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Jesper Young wrote: . . . I guess it uses the formula hashCode() % numberOfBuckets to compute which bucket to put the object in. (% is the modulo-operator).

    . . .

    HashSet and HashMap in Java have a number of buckets that is normally a power of 2 (by default, a HashMap has 16 buckets, for example). . . .

    Sort of, yes. If you have a number of buckets which is an exact power of two (1, 2, 4, 8, 16, 32, 64, 128, etc) there is a cheat's method of calculating the remainder/modulo value.

    buckets[hash & capacity - 1] = . . .; // [- operator has higher precedence than &.]

    It uses the bitwise and operator as a shortcut for using the % operator because the bitwise operations have faster performance than division. That will give the same result as % for positive numbers, but the results run backwards and are always positive for negative numbers (-0x1 & 0xf is 0xf; -0x1 % 0x10 is -1; -0xf & 0xf = 1; -0xf % 0x10 is -0xf). That trick only works when you are using & 1, & 3, & 7, & 0xf, & 0x1f, & 0x3f etc, but it always returns a positive answer in exactly the right range for the array access indices. It depends on the number of buckets being always an exact power of two, which HashMap always does.
     
    Campbell Ritchie
    Marshal
    Posts: 80645
    473
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Sebastian Janisch wrote:I have no clue why exactly this number. But i found the algorithm here:
    http://www.neowin.net/forum/lofiversion/index.php/t352012.html

    A poor post in that link. Search for "Bruce Eckel Thinking in Java" and you will find a free copy of the 3rd edition available to download. Look for the chapter called "containers in depth" or similar or look for "hash" in the index. Finding Joshua Bloch's book will explain why 31 is used; Eckel uses a different prime number.
     
    reply
      Bookmark Topic Watch Topic
    • New Topic