• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Liutauras Vilda
  • Jeanne Boyarsky
  • paul wheaton
Sheriffs:
  • Ron McLeod
  • Devaka Cooray
  • Henry Wong
Saloon Keepers:
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Tim Moores
  • Mikalai Zaikin
Bartenders:
  • Frits Walraven

Duplicates in HashSet

 
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I am using jdk1.4.2.
I am using HashSet, and seem, it allows me to add duplicate values.

HashSet implements Set interface,
and in Set interface javadoc, it is clearly written that
"A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2)"
Similiar thing is written for add() javadoc.

My code snippet is as follows:-
--------------------------------------------------------------
import java.util.HashSet;

public class TestHashSet
{
public static void main(String arg[])
{
HashSet set = new HashSet();
set.add(new Inner(1));
set.add(new Inner(1));
set.add(new Inner(1));
System.out.println(set.size());
}
}
class Inner
{
private int i=0;
public Inner(int _i)
{
i=_i;
}
public boolean equals(Object o)
{
Inner inner = (Inner)o;
if(inner.i==this.i)
{
return true;
}
else
{
return false;
}
}

/* public int hashCode()
{
return i*1000;
} */
}
-------------------------------------------------------------

This code returns me size as 3.
If i uncomment hasCode() method,
it correctly prints me size as 1.
But, i believe, it should be using only equals() method to check duplication, and not hashCode()

After looking into implementation of HashSet class, found that it uses HashMap to store values.

I am wondering, where i am going wrong?
Please suggest.
thanks,
Anuj
 
Ranch Hand
Posts: 1683
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are adding three new objects to the HashSet, so the size is three. The fact that the objects are of the same type is what is confusing you.

After looking into implementation of HashSet class, found that it uses HashMap to store values.


No, the HashSet's objects are internally stored as keys in a Hashmap. If memory serves me right, all the Hashmap's values are the same (an instance of an Object).
 
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
When you add an element to HashSet/HashMap,
1) first the hashcode of object are matched against that of all other elements
2) then the object are matched with either == or .equals

And its added only when the hashcode is not same "and" == or.equals return false

So when you uncomment the hashCode method, only 1 element is added to the HashSet.

Moreover, as per definition of 'hashCode' method

If two objects are equal according to the equals(Object)
method, then calling the hashCode method on each of
the two objects must produce the same integer result.

So, for correct operations, its your responsibility to override the 'hashCode' method if you are overriding the 'equals' method.
 
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
yes.. provide your own unique implementation of
hashCode and equals(..) for Inner class.
 
Singhal Anuj
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi ,
thanks for the input.
yes, you are right, here, reason that i have not overridden hashCode() method appropriately.
But, i would like to address another aspect of it:-
From API of hashCode() in Object class,
-------------------------------------------------------------
3) It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
--------------------------------------------------------------
Two unequal objects, say o1 and o2, are allowed to return same hashCode,
but going by the Theory of "added in HashSet only when the hashcode is not same "and" == or.equals return false"
suggests that o1 and o2 will not be added in HashSet inspite being unEqual()

I am trying to convey that hashCode should not be the criteria to detect Duplication.
Let me know, if i am wrong.
thanks,
Anuj
 
Harish Kashyap
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That is actually true....
If 2 different objects return same hashcode, only first object will be added to the HashMap.

If you see the src of HashMap

 
Harish Kashyap
Ranch Hand
Posts: 118
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would like to correct myself here...

An object is added in HashMap if

1. hashCode doesn't match
2. hashCode matches, but .equals() or || operator return false
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Harish Kashyap:
That is actually true....
If 2 different objects return same hashcode, only first object will be added to the HashMap.



No, this is most certainly not true, and I don't see how you think the source code shows this.

What the Javadoc says is that two equal objects must have the same hashCode(), but two unequal objects should have different hashCodes(). There's generally no way to ensure that two objects are guaranteed to have different hashCode values, and that's OK -- nothing will break, although having different values for unequal objects gives the best performance. BUt if a class had a hashCode() method that returned "1" for every instance of a class, HashSet would not malfunction as long as equals() properly compared instances of the class.

Imagine that you have a large stack of index cards with people's names on them. You have to sort the cards into alphabetical order. How do you do it? Here's one way: first, make 26 piles, one for each letter of the alphabet; then sort each pile. By first making smaller piles, you reduce the amount of work you have to do comparing and ordering cards.

This is exactly how HashMap/HashSet/Hashtable work. The hashCode() of the keys is used to sort the objects into piles. Then the objects in each pile are compared for equality using equals(). If two objects have the same hashCode(), then they end up in the same pile, and if two of them are equal(), then one will be discarded. If, on the other hand, two objects have different hashCodes, they will be in different piles and they will never be directly compared.

So you must override hashCode() so the objects end up in the right piles, and equals() so they are compared correctly. OK?
 
when your children are suffering from your punishment, tell your them it will help them write good poetry when they are older. Like this tiny ad:
Gift giving made easy with the permaculture playing cards
https://coderanch.com/t/777758/Gift-giving-easy-permaculture-playing
reply
    Bookmark Topic Watch Topic
  • New Topic