• Post Reply Bookmark Topic Watch Topic
  • New Topic

Overridden equals and compareTo and hashcode not working as expected.  RSS feed

 
Joey Dale
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have rather simple program, that takes a text file, via its path and parses it down into words, and inserts each word into a TreeSet. The parsing is going great. Its the creating word objects and inserting them I seem to be having an issue with.

This is my first attempt to override equals, compareTo, and hashcode, as well as my first time using a TreeSet. Running this through a debugger its getting hung up on the if/then on lines 30-35.

This code will compile and run, but some methods will blow up as this is a rewrite in progress, the original version used H2 and JDBC, and made me feel dirty inside

Please point out my major malfunction here.

Parse.java


Word.java


Thank you for your help
-Joey
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Can you describe more about how it isn't working? What does not happen correctly? What does it do, what does it not do, how does it differ from what you expect?
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I see that your compareTo method uses the word count to maintain order. If you read the API for TreeSet, you should understand that this is not a good idea:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

So when you put any Word into the Set the first time, its count will either be zero or one (haven't looked into which). Either way, it will compareTo any other element in the Set and determine that the Objects are the same. A Set does not allow duplicates, so the new Word you add would replace the old Word which had the same count. This is not what you want.

Also, since order is determined when the Word goes into the Set, but the count can change over the course of the experiment, the order will not be maintained as the count changes. It may become very hard to find your Words later and iterating over them would not necessarily iterate in the correct way.

Finally, since you do not re-use the Words that are in the Set, you create a new one, and if it already exists you increment the counter on the new instance, you will not be incrementing the Word that is actually in the Set. If the Word is already in the Set you should get it out of the Set and increment the already-stored value, not increment the counter on the new instance (which is really a different Object).

My suggestions would be to use a HashSet<Word> to store all the words in. Then, after you have all the words and their counts, use Collections.sort(new ArrayList<Word>(words)) to get a view of the Set in sorted order according to count. Or you could use the other version of Collections.sort(...) and may different Comparator<Word> instances to sort by count or alphabetical order.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!