• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

What have I done that causes HashSet to permit the addition of a duplicate to itself?

 
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I created the following static inner class:



When I add a coordinate to a HashSet already containing the same coordinate, the HashSet adds it anyway. I debugged and found that the equals method of Coordinate is never actually used. In other words, equality of two coordinates is being assessed by some other means. Is this assessment fair? I have reason to say "no", because the @Overrides annotation isn't flagged by my IDE (implying that Coordinate's equals method does actually override the one in Object).

More generally, can the community help me identify why HashSet adds an existing coordinate?

EDIT: I read a post, which oferred that


The equals() documentations states (emphasis added):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.



I factored in the comment given here by deleting my overriden equals method and falling back onto the equals() method supplied by Object. By using the original method, I reasoned, I could avoid having to redefine hashCode().

Even with this change, my HashSet still adds a duplicate to itself.
 
Sheriff
Posts: 28382
99
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Show us the code which demonstrates what you mean by "a duplicate", then.

However you'd be better off just taking the advice from that other thread, and implementing a hashCode() method which returns equal values for objects which are equal according to your extremely sensible equals() method.
 
Ranch Hand
Posts: 417
Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you try to add an element a second time, the HashSet.add() method does nothing and returns false instead of true.


public boolean add(E e)

Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.

Specified by:
add in interface Collection<E>
Specified by:
add in interface Set<E>
Overrides:
add in class AbstractCollection<E>
Parameters:
e - element to be added to this set
Returns:
true if this set did not already contain the specified element



As the previous poster as mentioned, you need to define the hashcode method as well.

You may want to have a look at this thread where the matter was discussed:
https://coderanch.com/t/654653/java/java/remove-specific-object-ArrayList

 
Aahan Agrawal
Greenhorn
Posts: 20
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is the code that results in the addition of a duplicate:





Here, again, is my class for reference:



______________

Also, how should I construct a hashCode() method such that every unique coordinate returns a unique integer? Is there some mathematical function that, using both x and y, could return a distinct output for every distinct combination of x and y?

Some function of the form x^n + y^m comes to mind, but I'm not sure how to determine if this would map a unique int to every x and y?
 
Bartender
Posts: 689
17
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Aahan Agrawal wrote:

Also, how should I construct a hashCode() method such that every unique coordinate returns a unique integer? Is there some mathematical function that, using both x and y, could return a distinct output for every distinct combination of x and y?

Some function of the form x^n + y^m comes to mind, but I'm not sure how to determine if this would map a unique int to every x and y?



The goal of a hashcode is not to assign a unique value to every possible object. A perfect hashcode function would certainly do that, but the hashcode value is finite in size so can only have a finite number of values. In the general case an object can often have an infinite number of values. Take for example a String. In concept at least a string could be infinite in length. Implementation details will impose an arbitrary size (the maximum amount of memory available to store the string in, or restrictions on the maximum size of array available if it is backed by a character array), but in theory at least there is no such thing as the longest possible string. So there are inevitably going to be cases where two none-equal objects have to have the same hash code.

And if we look at your example specifically, you need to create a hash code based on two ints. In Java an int is 32-bits long, so it has 2^32 possible values. Therefore your coordinate system has 2^64 possible values. Now a hashcode is also an int, so it can only have 2^32 values. Each coordinate is going to have to share its hashcode with on average 2^32 other coordinates.

And this is backed up by the contract of equals() and hashcode() in object. Two equal objects must have the same hashcode, but there is no stipulation that two non-equal objects must have different hashcodes.

As to how to write a good hashcode function, a search in a search engine would explain it better than I ever could. However from my limited knowledge, something equivalent to x*31 + y should work. The key concept is introducing the prime number, specifically Mersene primes I believe.
 
Bartender
Posts: 2237
63
IntelliJ IDE Firefox Browser Spring Java Linux
  • Likes 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Also note that your equals() method violates the contract of Object.equals().Line 3 throws java.lang.NullPointerException while it should print false.
Line 4 throws java.lang.ClassCastException while it should print false.
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paweł Baczyński wrote:Also note that your equals() method violates the contract of Object.equals()...


Which is very important to read and understand - as is the one for Object.hashCode().

And I should warn you that neither of them are simple. Indeed, this excellent article warns us that:
    "The authors of a 2007 paper concluded that almost all implementations of equals() are faulty."
And that includes ones written by us supposed "experts". :-)

My suggestion: Before you go any further, read those contracts thoroughly, and come back if you have any questions about them.

Also: Look up the instanceof operator.

It's also worth noting that if you override equals(), you should also override hashCode(). It's not an absolute rule, but it's true so often that it's a good habit to get into.

HIH

Winston
 
Marshal
Posts: 80447
451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually the general contract for Object#hashCode says it must return the same result from two objects returning true from equals, so maybe it is an absolute rule after all.
The hash code method is necessary for the correct behaviour of hash tables as used by HashSet, so if there is any incorrectness in equals or hash code methods the hash set, hash map or other type of collection class will not work correctly.

As well as the Odersky Spoon and Venners article which Winston has given you a link to, please read the appropriate part of Effective Java™ by Joshua Bloch (referenecs 1 and 5 of Odersky Spoon and Venners) and Angelika Langer's work: 1 2 (I think I have managed to find her work in English ‍). Although the three authors cover similar territory, there are features unique to each so it is worth reading all three.
 
Campbell Ritchie
Marshal
Posts: 80447
451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote: . . . this excellent article[/url] warns us that:
    "The authors of a 2007 paper concluded that almost all implementations of equals() are faulty."
. . .

Unfortunately they do not give enough information to find that paper, so one cannot verify that assertion at the source. Angelika Langer is a bit more optimistic about equals() methods and lists some correct and incorrect implementations; you may need to follow that link to next section several times to find that piece.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I had at first been storing instances of a custom defined "Point" class in my HashSet, and having the issue that if I stored (1,2) for example, and then tried to use "myHashSet.contains(myPoint)" where myPoint represents (1,2), I was getting false when I expected true.

I was getting ready to fix this by attempting to override the equals and the hashCode methods but all that talk about contracts made me finally realize that the best way was to avoid using contains entirely.  Instead, I defined my own method that I call containsPoint.  Here's the definition:


What is your reaction to this solution compared to one where hashCode and equals are overridden?

 
Saloon Keeper
Posts: 11025
88
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Overriding hashCode() and equals() is not difficult but what you have should work, the main consideration is how big the set is and what your performance expectations are. If your set is quite large the effort into hashCode() and equals() might be worth  it in the long run but if your set  is small I wouldn't worry about it.

Please share your Point source as it may generate further suggestions.
 
Carey Brown
Saloon Keeper
Posts: 11025
88
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
 
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Carey, when overriding equals() you should use instanceof and also make the method final.

This ensures that the method works as expected across all subclasses.

Here is my preferred solution:

Or if you don't mind the downsides of using a record:
 
Stephan van Hulst
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to CodeRanch Eric.

Eric Osman wrote:What is your reaction to this solution compared to one where hashCode and equals are overridden?


I don't like it. Overriding equals() and hashCode() is a really tiny effort, and it makes your class consistent in its use for all clients, even if they don't care about hash sets.
 
Campbell Ritchie
Marshal
Posts: 80447
451
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
EO: (again) Welcome to the Ranch

Stephan van Hulst wrote:Carey . . . make the method final.

Since Carey hasn't suppied any means to alter x and y once the constructor has completed, why not make the class immutable by marking those two fields and the whole class final? That gets you out of having to mark any methods final, since there can't be any subclasses. It also obviates a potential problem with HashSets, HashMaps, etc.; if you enter mutable objects as Es or Ks and those objects change state so as to change their hashcodes, it will become impossible to locate those objects again. I believe that problem will NOT occur if you use mutable objects as Vs in a hash table or map.
Agree about using instanceof in a non‑final class; getClass() would mean that subtypes would fail to show up as equal. In the case of a final class, the behaviour of instanceof and getClass() will not differ, but instanceof has the advantage that it makes a cast redundant, as Stephan showed. I have forgotten what that feature is called, and also when it was introduced; I am 99% sure, however, it wan't available when this thread started.

. . . Or if you don't mind the downsides of using a record . . .

Please explain more; that situation is exactly the sort of thing records are good for. Again, records weren't available eight years ago.
I think you missed out a part of the equals() method: a test for whether you are comparing an object with itself. The general contract for equals() includes “reflexitivity,” which means every object is equal to itself. That is the only thing Object.equals() tests, so an un‑overridden equals() method will fail to find two objects with the same content, which confused the author of the very first post eight years ago. Many people recommend an equality test in equals(); it is one of the few places where it is a good idea the == operator. You can avoid multiple return, though most people don't use “structured” programming nowadays:-Remember, instanceof also makes a test for nulls unnecessary.
 
Campbell Ritchie
Marshal
Posts: 80447
451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:. . . Overriding equals() and hashCode() is a really tiny effort . . .

Somebody's on their way to help already, the cheque's in the post, and overriding equals() is easy. This thread lists several places where you can read how difficult it can be to override equals(). IDEs can help, but the boilerplate code they produce will be no good for people who need to learn how objects work. The example you showed is a correct implementation of equals(), which can be reused mutatandis mutatis for any class, using Objects#equals() to prevent any problems where a field points to null.
I think I may have found the reference Odersky Spoon and Venners used, their No 2, Vaziri, Tip, Fink, and Dolby, who actually say,

We found buggy or fragile equals() methods in nearly every Java application that we examined.

Not quite what Odersky Spoon and Venners said:-

. . . after studying a large body of Java code, the authors of a 2007 paper concluded that almost all implementations of equals methods are faulty.

In one case, Vaziri, Tip, Fink, and Dolby mention a hashcode() method using the wrong alphabet of variables. In order to write hashcode() correctly, the alphabet of variables it uses must be a subset of the alphabet of variables used by equals(). It is obviously easiest to use exactly the same variables in both places, remembering that does count as a subset. Objects#hash() wasn't available in 2007, but you showed how much easier that makes it to write a hashcode() method.
 
Stephan van Hulst
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I always found the obj == this comparison a pointless optimization. Why make your code more verbose for a miniscule performance boost that doesn't matter in 98% of cases?

The reflexivity clause is fulfilled quite neatly by comparing each field to its counterpart in the other reference.

Campbell Ritchie wrote:Please explain more; that situation is exactly the sort of thing records are good for.


Yes, records are perfect for simple data containers or key types. The type discussed in this topic is a good example.

What I really dislike about record in Java is that for some reason the canonical constructor is forced to be at least as visible as the type itself, which means I can't use record in situations where I want to prevent the client from calling the constructor directly (such as when I have provided a factory method). This severely hampers an otherwise very welcome feature.
 
Campbell Ritchie
Marshal
Posts: 80447
451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Good point about optimisations.
Agree about canonical constructors; that does make records hard to use in factory methods and you can't write a constructor with restrictive access. You have to write your own constructor and getter methods if you want to include a mutable type as a field of a record, too.
 
Stephan van Hulst
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Somebody's on their way to help already, the cheque's in the post, and overriding equals() is easy.


Overriding equals() is not without its pitfalls, and its not helpful that the internet is rife with outdated and confusing examples. Even experts like Josh Bloch have given out (good) examples that (nevertheless) focus on inconsequential little warts such as the obj == this optimization, which really shouldn't be a priority when teaching how to write a good implementation.

If you are aware of the pitfalls, overriding equals() and hashCode() really IS a tiny effort, and even if you aren't, it's not a good excuse for writing a method that performs a linear search through a hash table.
 
Stephan van Hulst
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:You have to write your own constructor and getter methods if you want to include a mutable type as a field of a record, too.


It depends on whether immutable wrapper types exist, such as when using a Collection as a record field:

So yes, you'd have to override the canonical constructor, but you don't necessarily need to supply custom getters.

I posit that if you need to return mutable types from your getters (other than collection types), your design is off in the first place, and the point of supplying custom getters is moot anyway.
 
Campbell Ritchie
Marshal
Posts: 80447
451
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Stephan van Hulst wrote:. . . if you need to return mutable types from your getters (other than collection types), your design is off in the first place . . .

 
Carey Brown
Saloon Keeper
Posts: 11025
88
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Great feedback. I think this sums up the comments unless I missed something.
reply
    Bookmark Topic Watch Topic
  • New Topic