• Post Reply Bookmark Topic Watch Topic
  • New Topic

Best case scenario for key distribution in HashMap?  RSS feed

 
manish ghildiyal
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

My understanding is that in HashSet hash value of object is used to assign location to the object in array internally used and a well designed hash method is supposed to distribute the objects uniformly ie neither objects must be clustered at very few places nor they must be uniquely positioned. I hope my understaning is right(if not then this quesion stands void).
But in case of HashMap, should keys also be distributed in similar way for better performance. I mean what is the best scenario for key distribution. Should it be like what we have in HashSet or unique position per key?

Manish
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In both HashSet and HashMap--in fact in any hash-based data structure--the ideal distribution in terms of speed is that each hash value is unique, and each bucket has at most one item. In the vast majority of cases, that is not practical, or even possible, due to both space constraints and the difficulty of providing a unique hash function for anything other than very small, simple objects.

So we trade off some time benefit and accept less than ideal (but still fast) lookups in order to keep the number of buckets manageable and the hash function reasonably easy to generate.

As for "best case," there is no one, correct "best case" answer. The ideal balance of space vs. time depends on your particular requirements, constraints, and use cases for this particular project. In general though, you can assume that the built-in algorithms with their default parameters for load factor and whatnot will be sufficient, and only look to tweak them if you actually measure a significant problem stemming from there using a profiler.

(Also note that a HashSet is just a HashMap that uses dummy values (null, or a simple Object, or the same value as the key maybe), and doesn't give us access to those values, only to the keys.)
 
manish ghildiyal
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Jeff.

You said that ideal scenario is to have a unique position for every object used for hashing. Something like ArrayList offers same.So one may ask why to go for HashSet over ArrayList.
Then only one thing comes to mind that HashSet avoid duplicates? I can't think of another reason as ArrayList offers best positioning scenario.

Manish
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
manish ghildiyal wrote:You said that ideal scenario is to have a unique position for every object used for hashing. Something like ArrayList offers same.


If you want to keep O(1) lookup, then ArrayList only has that if you have enough slots for every possible value. If you want an ArrayList<Integer>, you'd need to have a 4,000,000,000+ element array, even if you were only actually storing 50 Integers. Without that, you don't get O(1) hash-like lookup. It turns into an O(N) linear search, or, best case, an O(log N) binary search, if the list is sorted. If it's an ArrayList<String>, then it becomes something like a 26^45 element array, give or take a few orders of magnitude.

So one may ask why to go for HashSet over ArrayList.


So that we can have O(1) lookup without needing impossible amounts of storage. This exact point is the subject of this other thread.

 
manish ghildiyal
Ranch Hand
Posts: 136
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Got it.
Thanks a ton.

Manish
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!