programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
• Campbell Ritchie
• Tim Cooke
• Devaka Cooray
• Ron McLeod
• Jeanne Boyarsky
Sheriffs:
• Liutauras Vilda
• paul wheaton
• Junilu Lacar
Saloon Keepers:
• Tim Moores
• Stephan van Hulst
• Piet Souris
• Carey Brown
• Tim Holloway
Bartenders:
• Martijn Verburg
• Frits Walraven
• Himai Minh

# Java String hashcode() base 31 computation

Ranch Hand
Posts: 74
• Number of slices to send:
Optional 'thank-you' note:
Hi,

For the Java String's hashcode() implementation:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

(jdk source code):
int h = 0;
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}

I am curious to find out the reason for using base 31.

From a performance perspective, choosing a binary number such as 32 would make the 0(n) multiplications faster (left bit shift).

I am guessing 31 has been proven to give the best hashcode distribution on random string values?

Thanks
Victor

Rancher
Posts: 13459
• Number of slices to send:
Optional 'thank-you' note:
there's a great link I saw recently but I'm still looking for it.

Essentially the prime value is used to reduce collisions. The Java code uses 31, therefore you should use a prime other than 31 for your own calculations, such as 37.

Wanderer
Posts: 18671
• Number of slices to send:
Optional 'thank-you' note:
Multiplying by a power of two would have the effect of left-shifting the hash value, losing bits of info to the left and replacing them with zeros to the right. If you multiplied by 32, you'd use 5 bits of info with each multiplication, and for a long string, only the last 7 letters would end up affecting the hash at all. Which could be very bad if the strings have a common ending. By using an odd number, the final hash can still reflect contributions from all parts of a string.

I'm not sure what's special about 31, exactly, other than it being prime. Whatever the rationale, it's locked in now, since they (foolishly?) specified the exact hash formula for String as part of the API. There was no need to do this, and now it would prevent them from using a better formula if someone came up with one. Of course, they could just change the API too. But Sun has always been very reluctant to do this sort of thing, since in theory someone might have written code which depends on String behaving exactly as its API indicates. So they won't change the API without a very compelling reason.

Jim Yingst
Wanderer
Posts: 18671
• Number of slices to send:
Optional 'thank-you' note:

gives output

But if we replace 31 with 32, we get:

 And then we all jump out and yell "surprise! we got you this tiny ad!" the value of filler advertising in 2021 https://coderanch.com/t/730886/filler-advertising