posted 8 years ago

Can someone please explain these two concepts :

initial capacity and load factor

Also please point me to a good example which is using these concetps.

I got this statement from the API doc :

An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.

What does a bucket mean here ??

initial capacity and load factor

Also please point me to a good example which is using these concetps.

I got this statement from the API doc :

An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created.

What does a bucket mean here ??

posted 8 years ago

Ok, let me try ...

A HashTable is made of a 2-dimensional array that has an initial capacity (e.g. 20).

So, when you add an object to the HashTable, the index in the array is calculated (we'll see how later) and the object is stored at this index. If two different objects get assigned the same value, they will end up in the same array index.

How does all that work. Objects have a hashCode method, which returns an int. At it's best, the hashCode is different for all objects that are meaningfully different from each other.

Whe calculating the array index from the hashCode, something like this happens.

So, if the capacity of the HashTable is 20, the objects hashCode will be calculated modulo 20, which always returns a value between 0 and 19 --> The array index. You see that the greater the capacity is, the better distributed the values will get. At it's best every object has it's own index (or bucket). This is important because upon object retrieval the HashTable calculates the index, looks it up and finds 0 to n objects placed in that bucket. Now it has to iterate over all objects found and use it's equals method to determine the object we look for.

That is why it's important to have a good distribution among the indexes, because in the worst case (e.g. index is always the same) all objects will land in the same bucket and whenever you want to retrieve an object the HashTable has to iterate over all objects (which is pretty much what LinkedList does).

The load factor is the amount of load that has to be reached until the capacity is increased to ensure a better distribution.

Let's say the load factor is 0.75 and we have an initial capacity of 20. As soon as the 15th element is added, the capacity is increased (in general doubled) and the elements get redistributed. Now that we have 40 instead of 20 buckets available, the elements get distributed better and the object retrieval is fast as a beam of light ;-) ..

Hope that makes it clear .

A HashTable is made of a 2-dimensional array that has an initial capacity (e.g. 20).

So, when you add an object to the HashTable, the index in the array is calculated (we'll see how later) and the object is stored at this index. If two different objects get assigned the same value, they will end up in the same array index.

How does all that work. Objects have a hashCode method, which returns an int. At it's best, the hashCode is different for all objects that are meaningfully different from each other.

Whe calculating the array index from the hashCode, something like this happens.

So, if the capacity of the HashTable is 20, the objects hashCode will be calculated modulo 20, which always returns a value between 0 and 19 --> The array index. You see that the greater the capacity is, the better distributed the values will get. At it's best every object has it's own index (or bucket). This is important because upon object retrieval the HashTable calculates the index, looks it up and finds 0 to n objects placed in that bucket. Now it has to iterate over all objects found and use it's equals method to determine the object we look for.

That is why it's important to have a good distribution among the indexes, because in the worst case (e.g. index is always the same) all objects will land in the same bucket and whenever you want to retrieve an object the HashTable has to iterate over all objects (which is pretty much what LinkedList does).

The load factor is the amount of load that has to be reached until the capacity is increased to ensure a better distribution.

Let's say the load factor is 0.75 and we have an initial capacity of 20. As soon as the 15th element is added, the capacity is increased (in general doubled) and the elements get redistributed. Now that we have 40 instead of 20 buckets available, the elements get distributed better and the object retrieval is fast as a beam of light ;-) ..

Hope that makes it clear .

JDBCSupport - *An easy to use, light-weight JDBC framework* -

posted 8 years ago

I have related question about hashCode() % capacity concept.

In many scjp mocks and in Kathy Sierra & Bert Bates book there are questions like this:

class Test {

int i;

Test(int i) {this.i = i;}

public boolean equals(Object o) {return ((Test)o).i == this.i}

}

HashSet s = new HashSet();

s.add(new Test(1));

s.add(new Test(1));

System.out.println(s.size()); // what will be the output?

Now the problem: authors say that 2 is correct answer -- "because hashCode() is not overriden, so two object will go to different buckets."

BUT! I think that even when each hashCode is unique, there is posiibility that hash1%capacity == hash2%capacity AND then the output is 1.

What do You think about it?

In many scjp mocks and in Kathy Sierra & Bert Bates book there are questions like this:

class Test {

int i;

Test(int i) {this.i = i;}

public boolean equals(Object o) {return ((Test)o).i == this.i}

}

HashSet s = new HashSet();

s.add(new Test(1));

s.add(new Test(1));

System.out.println(s.size()); // what will be the output?

Now the problem: authors say that 2 is correct answer -- "because hashCode() is not overriden, so two object will go to different buckets."

BUT! I think that even when each hashCode is unique, there is posiibility that hash1%capacity == hash2%capacity AND then the output is 1.

What do You think about it?