Help coderanch get a
new server
by contributing to the fundraiser
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Devaka Cooray
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Tim Moores
  • Carey Brown
  • Mikalai Zaikin
Bartenders:
  • Lou Hamers
  • Piet Souris
  • Frits Walraven

Java HashMap and buckets

 
Ranch Hand
Posts: 83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
i am looking for a way, to make next scenarion :
1. to make 2 or more value to be in the same bucket
2. to see if there is a way to see how many item(2 or more) are in the bucket.

is thee a way to do 1+2? until now i tried to put different values in and all the times it seems like only 1 element is in a bucket.
 
Saloon Keeper
Posts: 15705
367
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What bucket an element goes in depends on the hash code of the element, and the current capacity of the map. You can force two elements to go in the same bucket by having them return the same hash code, but you can't force two elements to go in separate buckets by having them return different hash codes.

There is no way to find out what bucket an element is in, except for stepping through the code with a debugger and inspecting the map's internals.

Why do you want to do this in the first place?
 
Peter Benda
Ranch Hand
Posts: 83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
i want to see if i can predicate what size of hash i need to init.
lets say i have 100,000 strings.
i want to see if some of them fall into the same bucket, and what the hashmap size i should init so there wan't be any resize later on (assuming no need data is added)
 
Stephan van Hulst
Saloon Keeper
Posts: 15705
367
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can't do this programmatically.

The only way to be 99% sure is to initialize a map with the highest capacity possible, but even then there is no way to guarantee that elements won't fall into the same bucket. As a matter of fact, if two elements have the same hash code, they are guaranteed to go in the same bucket, regardless of how big the map is.

What you want is not only impossible, it's also pointless. Having only a few collisions in an otherwise well distributed set of elements will lead to absolutely no noticeable performance degradation.
 
Peter Benda
Ranch Hand
Posts: 83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Each bucket is implemented behind as a linked list or bTree.
what was the point of implementing it of the data that is added will be the size of tha HashMap, this loses the power of the buckets.
 
Stephan van Hulst
Saloon Keeper
Posts: 15705
367
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Because even if there are less items in the map than there are buckets (meaning there are empty buckets available), some buckets may still contain more than one item. The reason for this is that HashMap doesn't look for an empty bucket to put items in. Instead, it calculates a bucket index from the hash code of the item and the current capacity of the map, and puts the item in that bucket, regardless of whether there's already an item in the bucket. This is the reason that HashMap is so fast: It doesn't spend time looking for an empty bucket.

It's true that HashMap will slow down if it gets fuller, because there are more and more collisions and performance will degrade from constant lookup times to logarithmic lookup times (because it now has to look for items in a tree, rather than a hash table). This is why the map doesn't wait for itself to get full before it resizes. Instead, it resizes when its size has reached a certain percentage of its capacity. This percentage is known as the load factor. By default, the load factor is 75%. This means that by default, 25% of all buckets in a HashMap will be unused. This may seem wasteful, but it ensures that there are almost no collisions and that lookups will remain super fast.
 
Greenhorn
Posts: 16
1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

what was the point of implementing it of the data that is added will be the size of tha HashMap



I don't understand what your question is. Do you mean the buckets themselves are of a similar size to a HashMap? In the case of small maps, this is a trivial amount of memory. In the case of large maps, trees are used and there are more elements per bucket, so this is also trivial when compared to the memory consumed by the many elements in the map.


 
Peter Benda
Ranch Hand
Posts: 83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

It's true that HashMap will slow down if it gets fuller, because there are more and more collisions and performance will degrade from constant lookup times to logarithmic lookup times (because it now has to look for items in a tree, rather than a hash table). This is why the map doesn't wait for itself to get full before it resizes. Instead, it resizes when its size has reached a certain percentage of its capacity. This percentage is known as the load factor. By default, the load factor is 75%. This means that by default, 25% of all buckets in a HashMap will be unused. This may seem wasteful, but it ensures that there are almost no collisions and that lookups will remain super fast.


totally agree
 
Marshal
Posts: 79642
380
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:. . . the load factor is 75%. . . . 25% of all buckets in a HashMap will be unused. . . . .

Since there are likely to be collisions, at least 25% of buckets are empty. Maybe more than 25%.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic