# HashMap, Load Facor, Rehash Operation

Varuna Seneviratna

Ranch Hand

Posts: 170

posted 7 years ago

The 4th paragraph of the HashMap in Java documentation is as follows

What is meant by "decrease the space overhead but increase the lookup time"

Then

What does the above sentence mean

Initial capacity is the number of empty elements at time of creation, If that is so maximum number of entries is the initial capacity.Is that correct?How is it assured that a rehash will not ever occur?

Regards Varuna

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.--

What is meant by "decrease the space overhead but increase the lookup time"

Then

If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur

What does the above sentence mean

Initial capacity is the number of empty elements at time of creation, If that is so maximum number of entries is the initial capacity.Is that correct?How is it assured that a rehash will not ever occur?

Regards Varuna

Varuna Seneviratna

Campbell Ritchie

Marshal

Posts: 52581

119

posted 7 years ago

- 1

What it means is that you start with an initial capacity of 16, and a load factor of 0.75f, so rehashing occurs when you reach 12. If you only intend to store 10 key-value pairs in this Map, then it will never re-hash. If you store 13 pairs, then it will re-hash once and if you store 25 pairs it will re-hash twice. So to store 25 pairs without re-hashing you would need an initial capacity of 64.

I don't know whether re-hashing occurs after 11, or before 12, or after 12. You could work that out by going through the code: find a file called src.zip in your Java installation folder and unzip it.

If you set your load factor lower, eg 0.5f you will get rehashing at 8, so there will be more empty space. If you set your load factor at 1.0f you won't get rehashing until 16, but there will be more chance of two pairs with the same hashCode % 16, so they will land in the same "bucket" and look-up will be slower.

I don't know whether re-hashing occurs after 11, or before 12, or after 12. You could work that out by going through the code: find a file called src.zip in your Java installation folder and unzip it.

If you set your load factor lower, eg 0.5f you will get rehashing at 8, so there will be more empty space. If you set your load factor at 1.0f you won't get rehashing until 16, but there will be more chance of two pairs with the same hashCode % 16, so they will land in the same "bucket" and look-up will be slower.

Varuna Seneviratna

Ranch Hand

Posts: 170

posted 7 years ago

So what I gather is "a rehash will not ever occur" is not correct is it?What ever the initial capacity is when ever the load factor is reached it will rehash.I am saying this with the assumption that if the number of entries that will have to be made can not be predetermined

I am not able to understand this.What is HashCode?Is it the Key?When entries are made up to 16(16th one also added) with the load factor 1.0f.Another 16 of empty elements will be added and the total number of elements will be 32, is it?

In the HashMap method list I couldn't find a method to get the number of total number entries possible. The size() returns only the number of entries already made.Is there a way to get the total number of possible entries?

By saying that if empty space increases lookup time will be slower, does this mean that when an search has to be carried out using a given Key all the elements even the empty ones will be looked up?

Regards Varuna

but there will be more chance of two pairs with the same hashCode % 16, so they will land in the same "bucket" and look-up will be slower

I am not able to understand this.What is HashCode?Is it the Key?When entries are made up to 16(16th one also added) with the load factor 1.0f.Another 16 of empty elements will be added and the total number of elements will be 32, is it?

In the HashMap method list I couldn't find a method to get the number of total number entries possible. The size() returns only the number of entries already made.Is there a way to get the total number of possible entries?

By saying that if empty space increases lookup time will be slower, does this mean that when an search has to be carried out using a given Key all the elements even the empty ones will be looked up?

Regards Varuna

posted 7 years ago

Hashing theory is actually something that you should have learned (or will learn) in your data structure class (Or algorithms depending on the school). So, you should review the course work (or pickup the book in advanced) from that class.

Trying to learn based on an implementation (even a Java implementation, that is highly used) is not as good as learning the theory directly.

Henry

Trying to learn based on an implementation (even a Java implementation, that is highly used) is not as good as learning the theory directly.

Henry

Campbell Ritchie

Marshal

Posts: 52581

119

santhosh.R gowda

Ranch Hand

Posts: 296

posted 7 years ago

can you expalin this quate more clearly

If you set your load factor at 1.0f you won't get rehashing until 16, but there will be more chance of two pairs with the same hashCode % 16, so they will land in the same "bucket" and look-up will be slower.

can you expalin this quate more clearly

Creativity is nothing but Breaking Rules

Raju Goke

Greenhorn

Posts: 7

posted 7 years ago

HashCode is a random number generated by the JVM,its unique to every object.It's not the address of the object.The JVM returns same HashCode for the following case:

String s1= new String("hello");

String s2= new String("hello");

In this case s1.hashcode() and s2.hashcode() returns same value.

Regards,

Raju

Varuna Seneviratna wrote:So what I gather is "a rehash will not ever occur" is not correct is it?What ever the initial capacity is when ever the load factor is reached it will rehash.I am saying this with the assumption that if the number of entries that will have to be made can not be predetermined

but there will be more chance of two pairs with the same hashCode % 16, so they will land in the same "bucket" and look-up will be slower

I am not able to understand this.What is HashCode?Is it the Key?When entries are made up to 16(16th one also added) with the load factor 1.0f.Another 16 of empty elements will be added and the total number of elements will be 32, is it?

In the HashMap method list I couldn't find a method to get the number of total number entries possible. The size() returns only the number of entries already made.Is there a way to get the total number of possible entries?

By saying that if empty space increases lookup time will be slower, does this mean that when an search has to be carried out using a given Key all the elements even the empty ones will be looked up?

Regards Varuna

HashCode is a random number generated by the JVM,its unique to every object.It's not the address of the object.The JVM returns same HashCode for the following case:

String s1= new String("hello");

String s2= new String("hello");

In this case s1.hashcode() and s2.hashcode() returns same value.

Regards,

Raju

Campbell Ritchie

Marshal

Posts: 52581

119

posted 7 years ago

- 1

Not quite. A hash code is by no means random; it is calculated from features in the object. The Java™ representation requires that two objects returning

You have a Map with 16 "buckets" an a load factor of 75% (0.75f). When you have 12 entries, then the Map will be enlarged (not sure whether that happens when you reach 12 or when you try to reach 13). So you have at all times at least 4 empty buckets. That reduces the risk of putting two objects into the same bucket. When the Map is enlarged, it has capacity 32 and contains 12 pairs, so at least 20 buckets are empty. When you add more pairs, then they have 20 spaces to go into

If you increase the load factor (150%) then you won't enlarge the Map until you have 24 pairs. That guarantees not less than 8 pairs will have to share buckets. When you increase it to 32 buckets, it will have to accommodate up to 48 objects.

Not sure, but I think rehashing occurs whenever a pair is added (put method). Also when the Map is enlarged, every pair is re-hashed. And the Map interface does not have a method to return the capacity

`true`from their equals() method must also return the same value from their hashCode() method. I haven't ot time to go through how to write a hash code method, but you want to get the greatest variability in the "low order" bits. So a hash code of 1234abcd differs only slightly from 1234abce, but that tiny difference will almost guarantee those two objects go into different buckets.You have a Map with 16 "buckets" an a load factor of 75% (0.75f). When you have 12 entries, then the Map will be enlarged (not sure whether that happens when you reach 12 or when you try to reach 13). So you have at all times at least 4 empty buckets. That reduces the risk of putting two objects into the same bucket. When the Map is enlarged, it has capacity 32 and contains 12 pairs, so at least 20 buckets are empty. When you add more pairs, then they have 20 spaces to go into

If you increase the load factor (150%) then you won't enlarge the Map until you have 24 pairs. That guarantees not less than 8 pairs will have to share buckets. When you increase it to 32 buckets, it will have to accommodate up to 48 objects.

Not sure, but I think rehashing occurs whenever a pair is added (put method). Also when the Map is enlarged, every pair is re-hashed. And the Map interface does not have a method to return the capacity

santhosh.R gowda

Ranch Hand

Posts: 296

posted 7 years ago

First of all, you kinda took the original statement out of context (meaning this is just a snippet of the explanation), so it doesn't make much sense by itself. Second, the original statement was an example, and it was pretty clear. What is it that you don't understand?

Basically, from the original example... if the load factor is 150%, it can accommodate 150% the number of elements that it has buckets. So, with 32 buckets, it can accommodate 48 elements before it has to create more buckets and rehash.

Henry

santhosh.R gowda wrote:

That guarantees not less than 8 pairs will have to share buckets. When you increase it to 32 buckets, it will have to accommodate up to 48 objects.

please tell me this statement some more clearly with example

First of all, you kinda took the original statement out of context (meaning this is just a snippet of the explanation), so it doesn't make much sense by itself. Second, the original statement was an example, and it was pretty clear. What is it that you don't understand?

Basically, from the original example... if the load factor is 150%, it can accommodate 150% the number of elements that it has buckets. So, with 32 buckets, it can accommodate 48 elements before it has to create more buckets and rehash.

Henry