science belief, great bioscience!
science belief, great bioscience!
science belief, great bioscience!
drac yang wrote:
what i slightly don't understand is why it emphasized the reason that HashMap uses power-of-two length hash tables? i don't think it's a reason, because even if it was not power-of-two, it would still encounter collisions for hashCodes that do not differ in lower bits.
drac yang wrote:well, how could this function
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
ensure a bounded number of collisions? from discarding the lower bits first?
Henry Wong wrote:
Well... it can't. The goal of the method isn't to ensure that the hashcode won't collide. The goal isn't even to ensure that the number of collision is bounded. If you write a crappy hashcode method, such as returning a constant, it will collide regardless of this rehashing.
The goal here is ... if a user does a good job at providing a hashcode with a good distribution, the collection won't throw it all away, by simply using the lowest bits.
Henry
science belief, great bioscience!
what i slightly don't understand is why it emphasized the reason that HashMap uses power-of-two length hash tables?
To earn money on java go to upwork.com
To earn money on java go to upwork.com
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
what i slightly don't understand is why it emphasized the reason that HashMap uses power-of-two length hash tables?
To earn money on java go to upwork.com
drac yang wrote:but why does it want to discard lower bits?
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Volodymyr Levytskyi wrote:
This topic is about this :
what i slightly don't understand is why it emphasized the reason that HashMap uses power-of-two length hash tables?
My reply is:
power-of-two length is used for method indexFor to work correctly and just indexFor can generate the same index of backing table for different mappings you give via put method. This is the reason we can have multiple Entries at the same bucket.
The implementation of Hashmap simply uses a power of two as possible capacities.
To earn money on java go to upwork.com
Volodymyr Levytskyi wrote:Maybe I am wrong, sorry, but I believe that power-of-two length is deliberately used for method indexFor to work correctly.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Winston Gutkowski wrote:
Volodymyr Levytskyi wrote:Maybe I am wrong, sorry, but I believe that power-of-two length is deliberately used for method indexFor to work correctly.
Actually, I think it's more likely to make it work fast - although there may be some aspects of "retention of randomness" involved.
Assuming that all the mangling ops that happen to the hash before it gets to this point have produced a reasonably random number, then 'hash % length' should work just as well; but it's much slower than a simple masking op. And speed is an essential part of why we use hashed collections.
However, it is also possible that an unlucky choice of length might highlight deficiencies in the "bit mangling" process that a simple mask doesn't, leading to skewed bucket distributions.
Volodymyr Levytskyi wrote:
Please read this small part in wikipedia .
This explains that programmatic & or mathematical % return the same result only if power-of-two length bcking table is used.
Volodymyr Levytskyi wrote:
The implementation of Hashmap simply uses a power of two as possible capacities.
Maybe I am wrong, sorry, but I believe that power-of-two length is deliberately used for method indexFor to work correctly.
What does indexFor do ?
It determines index of backing table for every new mapping. And just this method produces identical indices for different mappings.
In HashMap everything is rrotating around indexFor method. Meaning hash() method and power-of-two length are for indexFor method
to produce different indices for different mappings.
Of course I may be wrong but where...
Volodymyr Levytskyi wrote:
@Henry Wong please tell java ranch that I do not receive email notifications when new answer is posted.
Henry Wong wrote:As a side note, why do I dislike the design? Basically, I got burned by it. The problem with the power of two algorithm, is that it is more of a memory hog as it gets bigger. The number of buckets double in size as it needs more buckets. On the other hand, as you need more buckets, you also have a need to be more efficient with memory (as you are using more). So, this inefficiency happens at a bad time.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Volodymyr Levytskyi wrote:Maybe I am wrong, sorry, but I believe that power-of-two length is deliberately used for method indexFor to work correctly.
What does indexFor do ?
It determines index of backing table for every new mapping. And just this method produces identical indices for different mappings.
every doubling of capacity adds one more bit to the effective length of the key
To earn money on java go to upwork.com
Volodymyr Levytskyi wrote:
What do you mean by this? Can you explain this?every doubling of capacity adds one more bit to the effective length of the key
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
I also guess that using powers of two ensures that the number of collisions are more or less consistent as the size and capacity grows
To earn money on java go to upwork.com
what is that binary value
1000
in decimal?
And what is the binary value
10000
?
To earn money on java go to upwork.com
Volodymyr Levytskyi wrote:But the AND operation(&) used by indexFor method requires power of two length to search for modulo of division as described here
Volodymyr Levytskyi wrote:
every doubling of capacity adds one more bit to the effective length of the key
What do you mean by this? Can you explain this?
what i slightly don't understand is why it emphasized the reason that HashMap uses power-of-two length hash tables?
To earn money on java go to upwork.com
drac yang wrote:what i slightly don't understand is why it emphasized the reason that HashMap uses power-of-two length hash tables? i don't think it's a reason, because even if it was not power-of-two, it would still encounter collisions for hashCodes that do not differ in lower bits.
Don't get me started about those stupid light bulbs. |