# Java String hashcode() base 31 computation

Ranch Hand
Posts: 74
Hi,

For the Java String's hashcode() implementation:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

(jdk source code):
int h = 0;
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}

I am curious to find out the reason for using base 31.

From a performance perspective, choosing a binary number such as 32 would make the 0(n) multiplications faster (left bit shift).

I am guessing 31 has been proven to give the best hashcode distribution on random string values?

Thanks
Victor

Rancher
Posts: 13459
there's a great link I saw recently but I'm still looking for it.

Essentially the prime value is used to reduce collisions. The Java code uses 31, therefore you should use a prime other than 31 for your own calculations, such as 37.

Wanderer
Posts: 18671
Multiplying by a power of two would have the effect of left-shifting the hash value, losing bits of info to the left and replacing them with zeros to the right. If you multiplied by 32, you'd use 5 bits of info with each multiplication, and for a long string, only the last 7 letters would end up affecting the hash at all. Which could be very bad if the strings have a common ending. By using an odd number, the final hash can still reflect contributions from all parts of a string.

I'm not sure what's special about 31, exactly, other than it being prime. Whatever the rationale, it's locked in now, since they (foolishly?) specified the exact hash formula for String as part of the API. There was no need to do this, and now it would prevent them from using a better formula if someone came up with one. Of course, they could just change the API too. But Sun has always been very reluctant to do this sort of thing, since in theory someone might have written code which depends on String behaving exactly as its API indicates. So they won't change the API without a very compelling reason.

Jim Yingst
Wanderer
Posts: 18671
gives output

But if we replace 31 with 32, we get:

