1. In case of ArrayList, the load factor is 1? Why it is so?

2. In case of HashMap etc., the load factor is 0.75? What is the reason? Why it is not 1?

3. ArrayList size is increased by 1.5 while that of HashMap it is increased by 2 times? What is the rationale behind this?

- 2

Vaibhav Gargs wrote:1. In case of ArrayList, the load factor is 1? Why it is so?

Load factors are not applicable to lists. Lists simply get filled until they are full and then they are resized. Otherwise it would just be a waste of space.

2. In case of HashMap etc., the load factor is 0.75? What is the reason? Why it is not 1?

Hash tables become very slow when they are almost full. Hash tables more or less "guess" a location to insert an item, and they have to keep looking if that location is already occupied*. The fuller the table is, the more time it takes to find a spot for the item. 0.75 is a trade-off between speed and efficient use of memory.

3. ArrayList size is increased by 1.5 while that of HashMap it is increased by 2 times? What is the rationale behind this?

Where did you get this? Even if this was true, it would be an implementation detail and not part of the official specification.

*Reality is more complex. Hash tables are very sophisticated and use a lot of mathematical tricks to do what they do. I can't explain it in this post.

*The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.*

Stephan van Hulst wrote:

Hash tables become very slow when they are almost full. Hash tables more or less "guess" a location to insert an item, and they have to keep looking if that location is already occupied*. The fuller the table is, the more time it takes to find a spot for the item. 0.75 is a trade-off between speed and efficient use of memory.

Thank you Stephan!

Can you please explain more about this? Won't it be good if loadfactor would have been 1/0.9? When the hashmap gets full i.e. loadfactor of 1, then only it should do the rehashing because rehashing is also a time consuming process. What do we actually achieve having loadfactor as 0.75?

Campbell Ritchie wrote:Have you read the documentation for HashMap? I would have thought that explains the load factor. If that isn't clear to you, please ask again.

Thank you Campbell!

As per the java doc, "Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). "

I am just wondering that if we have a default hashmap of size 16, then having load factor of 0.75 means that on 13th entry (bucket), it will do rehashing & we still had 4 empty buckets. So, if we performed rehashing after filling all 16 buckets, then will it impact performance too much i.e. those 4 buckets will have such a huge impact on performance?

`12`and a capacity of

`16`, the chance of 4 consecutive collisions is

`(12/16)^4`, or about

`32%`. If we first double the capacity of the map, the chance of 4 consecutive collisions becomes

`(12/32)^4`, or about

`2%`.

As you can see, before doubling there was a one-in-three chance of the map completely falling back to linear performance. Doubling the capacity of the map improved this situation drastically, allowing the map to keep performing in constant time.

Now, you can say that the linear performance doesn't matter to you and that it's fast enough and that you don't want to waste space, but then why use a

`HashMap`at all? In that case you need a

`TreeMap`or a

`List`.

*The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.*

Start with the nanoseconds method. Create your 1

The reason for using nanoTime thrice (not twice) is to allow for the duration of the nanoTime call. The reason for populating the Maps twice is to allow any just in time optimisations to kick in. You will have to run the program several times because the timings will be different each time. Try populating the Maps with 10, 11, 12 and 13 pairs, and see whether there is a large difference.

I do not know whether the Maps will be resized and rehashed before or after the 11th or 12th “putting”, but they will double their capacities; if you have overfilled them earlier they will have capacities of 32 and you won't see any effects of resizing.

Tell us what the figures look like.

I suggest you use the same pairs for every Map, so you won't run out of heap space.

`new HashMap<>(32)`and one of

`new HashMap<>(16, 1)`. Fill each map with 12 elements, and then benchmark how much time it takes to add 4 elements to both sets of maps. No resizing or rehashing will be done at all.

*The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.*

Stephan van Hulst wrote:Yes. If the hash codes are spread uniformly and the map has a size of12and a capacity of16, the chance of 4 consecutive collisions is(12/16)^4, or about32%. If we first double the capacity of the map, the chance of 4 consecutive collisions becomes(12/32)^4, or about2%.

As you can see, before doubling there was a one-in-three chance of the map completely falling back to linear performance. Doubling the capacity of the map improved this situation drastically, allowing the map to keep performing in constant time. (...)

Is that correct? I'm certainly no hashmap expert, but for the map to show linear behaviour, the four items should end up in the same bucket. Now, if I'm right the chance is more like 12/16 * (1/16)^3, that is much less than 1/3

Stephan van Hulst wrote:Yes. If the hash codes are spread uniformly and the map has a size of12and a capacity of16, the chance of 4 consecutive collisions is(12/16)^4, or about32%. If we first double the capacity of the map, the chance of 4 consecutive collisions becomes(12/32)^4, or about2%.

Stephan, can you please elaborate on this calculation? I didn't get how 12/16^4 is coming to picture

(12/16)^4, please, or better (12 ÷ 16)⁴ Use ⁴Vaibhav Gargs wrote:. . . 12/16^4 . . .

When you put the first pair, there is no chance of a collision.

The second pair has a 1 in 16 chance of a collision.

The third pair has a 1 in 16 chance of being a second collision (three pairs in the same bucket) or a 1 in 8 chance of being a first collision.

The fourth pair has a 1 in 16² change of being a third collision a 1 in 16 chance of being a second collision and a 1 in 16 chance of being a first collision.

etc. There must be a simpler way to calculate that statistically.

I just thought of a different consideration: If you put 16 elements in a map with a capacity of 16, and a load factor of 1, the chance of exactly filling up each bucket of the map with one element is

`(16! / 16^16)`, or about 0.0001%. So even with a load factor of 1, you will almost certainly have wasted space, because at least one bucket will contain multiple elements, and at least one bucket will be empty. I suppose there's a certain sweet spot where expected value for the number of wasted buckets due to collisions will be equal to the number of unused buckets due to the load factor.

Stephan van Hulst wrote:My point is that it's almost unavoidable even with a load factor of 100%.

That's called a "perfect hash function" if I'm not mistaken. Wikipedia has a page about that -- apparently there must be applications where having a perfect hash function is important. I guess those applications must be using the hash function to such an extent that it's important for it to execute in constant time.