• Post Reply Bookmark Topic Watch Topic
  • New Topic

Removal of duplicates programmatically.  RSS feed

 
Pankaj Shet
Ranch Hand
Posts: 320
Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Ranchers,

I have a resultset which contains 100 records from database, of which 80 are unique and 20 are duplicate.
I want to display 80 records which are unique. Which collection is suitable. ?

Regards,
-Pankaj

 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16060
88
Android IntelliJ IDE Java Scala Spring
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Set

But really, it would be much better to change your SQL statement and let the database find out the exact records that you need.
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A lot more needs to be known about the application and the data to choose the correct collection. So rather than just guess at which one is most appropriate, I will point you to the documentation so you can make an educated decision; knowing the parameters of your application and data.

Check out the API for Collection: http://docs.oracle.com/javase/7/docs/api/java/util/Collection.html. It has a list of known implementors, you can follow the links and read the classes to figure out which one best suites your needs. Alternatively, you could read this: http://docs.oracle.com/javase/tutorial/collections/ to get a high level description of each of the types of collections which may help you find the best one.
 
Pankaj Shet
Ranch Hand
Posts: 320
Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for reply,
How is set goining to identify duplicates from Resultset?
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It depends on the Set implementation. In general, Sets do not allow two equal Objects in their collection, so if you build each row of your ResultSet into an Object and put them in the Set, the Set will keep only one copy of each equal row. But each Set implementation has its own definition of what defines equality, and you will have to build a Class that properly defines equality in that context (for example, if you use a HashSet, you need to properly define hashCode() and equals() to compare using the meaningful data.)
 
Pankaj Shet
Ranch Hand
Posts: 320
Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Steve.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:It depends on the Set implementation.

Actually, it isn't - or it shouldn't be - because Sets are not supposed to allow duplicates at all.

Specifically, the java.util.Set docs clearly defines it as:

A Collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.

@Pankaj: which means that "duplicate" depends solely on how your objects implement equals() (and possibly hashCode() - also detailed in the docs), not on how the Set itself is implemented.

HIH

Winston
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Steve Luke wrote:It depends on the Set implementation.

Actually, it isn't - or it shouldn't be - because Sets are not supposed to allow duplicates at all.

Specifically, the java.util.Set docs clearly defines it as:

A Collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element.


And yet TreeSet uses compareTo() or the Comparator's compare() methods, not its equals method. It is up to the user to write a compareTo() that is consistent with equals().
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:And yet TreeSet uses compareTo() or the Comparator's compare() methods, not its equals method...

Because it is a SortedSet, whose docs clearly state:

Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. [...] This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

It occurs to me that the designers could also have added a Hashed marker interface, in the style of RandomAccess, to indicate how the Java "Hash..." classes differ from their bases in general terms; but some might see that as overkill.

Winston
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Winston Gutkowski wrote:
Steve Luke wrote:And yet TreeSet uses compareTo() or the Comparator's compare() methods, not its equals method...

Because it is a SortedSet, whose docs clearly state:

Correct, but a SortedSet Is A Set (inherits from). So saying a Set depends solely on equals() ignores those Sets which don't.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:Correct, but a SortedSet Is A Set (inherits from). So saying a Set depends solely on equals() ignores those Sets which don't.

Erm, that doesn't sound right to me. A SortedSet is a specialization of a Set (as indeed is a HashSet), so it stands to reason that it might impose additional rules. Indeed, the docs snippet I supplied clearly states that you can create a SortedSet that violates the general contract for a Set.

I suppose the designers could have created a completely different hierarchy for SortedSet's, but perhaps the lesson is that you should read all the documentation involved in an implementation.

Winston
 
Steve Luke
Bartender
Posts: 4181
22
IntelliJ IDE Java Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I guess we are drifting a bit off topic. What I am disagreeing with was this statement:
Winston Gutkowski wrote:@Pankaj: which means that "duplicate" depends solely on how your objects implement equals() (and possibly hashCode() - also detailed in the docs), not on how the Set itself is implemented.

SortedSet does not use the equals() method, as described in the JavaDoc you showed, it uses compareTo(). So it is an instance of a (group of) Set(s) which does not depend solely on how equals() is implemented. It is dependent on how compareTo() is implemented. The contract says the compareTo() should be consistent with equals() but this is a matter of the compareTo() implementation, not the equals() implementation (or perhaps the combination of both).

So the way I see it, what 'duplicate' means is dependent on the type of the Set. For direct implementations of Set, then yes, equals() is the only thing that defines duplicate. On the other hand, the definition of duplicate in SortedSets is defined by the compareTo() method. The compareTo() should be consistent with equals() but if equals() was not implemented correctly then a SortedSet could still find and eliminate duplicates. Therefore SortedSets do not depend on equals() for their definition of duplicates. If equals() was implemented correctly but compareTo() not (and no Comparator provided) then duplicates would not be eliminated. Therefore SortedSets depend on compareTo() (or a Comparator) instead of equals() for their definition of duplicates. In either of these cases you could say that by not making the compareTo() and equals() consistent you are making the SortedSet not a proper implementation of Set, and we can accept that to be true, but even in the case that they are consistent, the SortedSet depends on compareTo() and not equals().

So even if we accept that the SortedSet imposes an additional rule as a specialization (rather than a substitutional rule), it still indicates that the original statement is not complete - some implementations of Set depend on more than solely the equals() method. So I think my original statement "It depends on the Set implementation" is correct.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Steve Luke wrote:So I think my original statement "It depends on the Set implementation" is correct.

OK, well I guess we'll have to agree to disagree on this one, because my assertion is still that it has nothing to do with the Set itself and everything to do with how its elements implement equals() (and/or compareTo(), and/or hashCode()) - although I suppose you could say that a TreeSet that takes a Comparator is an exception even to that. The fact is that the Set (or subtype) defines the requirements; it doesn't implement the action.

Fun discussion though.

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!