Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

hashing and prime numbers ?  RSS feed

 
Kalpesh Soni
Ranch Hand
Posts: 312
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
why do people use prime numbers while trying to hash ?
e.g. java.util.HashMap is an array of linked list
how would it help if the array size is a prime number ?
 
David Weitzman
Ranch Hand
Posts: 1365
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Supposed you wanted to create a hashtable and start out by filling it with all the even numbers from 0 to 600. (Perhaps an example that seems more immediately realistic would be creating a HashSet containing all numbers from 0-600 with the intention of using the Sieve of Eratosthenes to find primes or something silly like that)
So you somehow create a HashMap that uses an array of size 100 (since you're entering 300 items you expect to average 3 collisions per slot in the hashtable, which doesn't seem too bad).
Then you go about adding the integers...

Note that the JavaDoc for Integer.hashCode() specifies that the value of the hash code will be equal to the value of the integer. This was not a great thing to specify in the public API, but it's unfortunately too late to go back and change it now.
Since each integer N is inserted at location N % 100 and all values of N will be even, it turns out that none of the odd slots in the hashtable will ever be used! Half the space we've allocated is wasted, and the 300 elements are squeezed into 50 slots with 6 elements per slot. That's not cool.
The problem is that there's a pattern in the items we're inserting (they're all even) and that manifests itself as a pattern in the hash codes (they're all even too). It would be possible to write a hashCode algorithm for Integer that did a better job of removing patterns, but the fact of life is that a lot of hash algorithms aren't very good and there isn't much you can do about it.
Choosing an arbitrary prime as the size of the hashtable array is a good way to eliminate most simple patterns that are likely to show up in the hashCode() values. You can still generate collisions easily enough, which makes traditional hashing methods prone to DoS attacks, but collisions are less likely to occur in normal usage.
[ June 11, 2003: Message edited by: David Weitzman ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!