This week's book giveaway is in the HTML Pages with CSS and JavaScript forum.
We're giving away four copies of Testing JavaScript Applications and have Lucas da Costa on-line!
See this thread for details.
Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
Bartenders:
  • Piet Souris
  • Frits Walraven
  • Carey Brown

why does hashMap method uses bitwise and operator

 
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All

I checked the internal code of hashmap and found that it uses & operator on table length-1 and hashCode and generates one index.

All the values that are common are stored in same index in linked list.

My question to advanced users is why all this is done?

Why not to use simply based on hashCode and having auto incrementing index, for example if hashcode is 100 and current index is 5, then store hashcode 100 to next available index by ordering to hashcode.

this way avoiding bucket storage and parsing.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Go read the rules for the "contract" of the java.lang.Object hashCode() method to see why this would not work.

Bill
 
author
Posts: 23879
142
jQuery Eclipse IDE Firefox Browser VI Editor C++ Chrome Java Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

mahajan amit wrote:
I checked the internal code of hashmap and found that it uses & operator on table length-1 and hashCode and generates one index.

All the values that are common are stored in same index in linked list.

My question to advanced users is why all this is done?



It is using a bit mask to locate the bucket. This works because there is code in the hashmap that guarantees that the size of the bucket table is a size where this trick will work (meaning the mask can be calculated from the length). In other words, the size can only be 2, 4, 8, 16, 32, 64, 128, 256, etc.


mahajan amit wrote:
Why not to use simply based on hashCode and having auto incrementing index, for example if hashcode is 100 and current index is 5, then store hashcode 100 to next available index by ordering to hashcode.

this way avoiding bucket storage and parsing.



Then it wouldn't be hashing, would it? Hashing algorithm have a constant average access time (assuming even distribution) for read and writes. An array list, which is what you are describing doesn't have that property.

Henry
 
Bartender
Posts: 10777
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

mahajan amit wrote:Why not to use simply based on hashCode and having auto incrementing index, for example if hashcode is 100 and current index is 5, then store hashcode 100 to next available index by ordering to hashcode.

this way avoiding bucket storage and parsing.


Well, an alternative would be to use
hash % capacity
but that's actually quite a bit slower than
hash & (capacity - 1)
and when capacity is a power of two, both calculations actually do the same thing.

Remember: the whole point of HashMap is to be fast, so all its optimizations are geared for speed, not space.

That said, the space overhead for an empty bucket array is only 4/8 bytes for each entry, which is actually fairly small when compared to the space of the objects it may hold.

Winston
 
Bartender
Posts: 1166
17
Netbeans IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:
Well, an alternative would be to use
hash % capacity
but that's actually quite a bit slower than
hash & (capacity - 1)



Really ? And even if true then in the time one takes to computed the hash code, indexed the hash table and search for the item in the bin comparing each element using the equals() method what is any difference in the time taken to compute the bin going to make?
 
Marshal
Posts: 69859
278
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Winston Gutkowski wrote: . . . hash % capacity . . .

Not quite. h & c - 1 has another advantage over %. It never returns a negative result. As long as c for capacity is an exact power of two (1, 2, 4, 8, 16 etc), the result of h & c - 1 is exactly what you want for a zero‑based array index, without using an “abs” function. It would not work for a 1‑based index because you would get 0 sometimes and would have to use (h & c - 1) + 1. The straight version with the and operator is (as Winston has already said) the fastest method.
 
Winston Gutkowski
Bartender
Posts: 10777
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Not quite. h & c - 1 has another advantage over %. It never returns a negative result...


<nitpick>
Unless c <= 0.
I think you also meant h & (c-1).
</nitpick>

Winston
 
Campbell Ritchie
Marshal
Posts: 69859
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, a capacity of 0 might return a negative result.
The - operator has a higher precedence that & so the () are redundant.
 
Winston Gutkowski
Bartender
Posts: 10777
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:The - operator has a higher precedence that & so the () are redundant.


Doh-h. Quite right. I've just got so used to doing it whenever I use bitwise ops...

Winston
 
Campbell Ritchie
Marshal
Posts: 69859
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When I said redundant, it means that you can use the () if you want to. In the other formula I posted, however (h & c - 1) + 1, the () are essential.
 
Rancher
Posts: 3571
39
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Richard Tookey wrote:

Winston Gutkowski wrote:
Well, an alternative would be to use
hash % capacity
but that's actually quite a bit slower than
hash & (capacity - 1)



Really ? And even if true then in the time one takes to computed the hash code, indexed the hash table and search for the item in the bin comparing each element using the equals() method what is any difference in the time taken to compute the bin going to make?


Hard to say without careful testing. I would guess that the vast majority of the time this optimization will make no significant difference to people. However, one thing about well-known library code like HashMap is that it's used in a wide array of contexts and use cases; if there is a use case where % vs & makes a difference in performance, there's a much-better-than-usual chance that people will find it - quite possibly in actual production code. So the usual admonishments about premature optimization don't apply to a widely-used library in quite the same way that they would apply to typical production code.

In the case of HashMap, I would guess that a bottleneck around using % may occur if the keys have very simple hash and equals() functions - e.g. Integer's hashCode(), which just returns the wrapped int. And if collisions are relatively infrequent (as is often the case) the performance of equals() won't matter much. If your use case has frequent lookups for keys that [i]aren't/i] there, equals() will matter much less, as well. I don't know how significant the effect of % is in such a use case, but I can imagine that it might be noticeable.

I would also note that the idea of optimizing by using & or >> rather than / or % is fairly pervasive in the HashMap code. I doubt that every single use of these tricks is fully justified as a necessary optimization. But I suspect that at least some of them are justified, and perhaps many of them. And once they got in the habit of using such tricks internally, they might have just stuck with that habit - at least within HashMap and some similar classes.
 
Campbell Ritchie
Marshal
Posts: 69859
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are several other uses of the bitwise operators in HashMap; they do some twiddling to try to move the variations towards the right‑hand end of the int, because differences there are more likely to put values into different buckets.
If two objects return hash codes like 0xabcd1234 and 0xabcd1235 any h % c or h & c - 1 operation will put them into different buckets, assuming the capacity is > 1. Two hash codes like 0xabcd1234 and 0x2bcd1234, even though their actual difference seems so very large, will find the same bucket as long as the difference remains in that leftmost bit.
 
Mike Simmons
Rancher
Posts: 3571
39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Interestingly, looking again, bit twiddling seems to be less pervasive in the current code than I remember it being. Guess there's been a bit of rewriting. (It couldn't be my memory that's at fault.) Anyway I retract the "fairly pervasive" comment.
 
mahajan amit
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
got it thanks
 
Campbell Ritchie
Marshal
Posts: 69859
278
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You’re welcome
 
Slime does not pay. Always keep your tiny ad dry.
Thread Boost feature
https://coderanch.com/t/674455/Thread-Boost-feature
    Bookmark Topic Watch Topic
  • New Topic