Win a copy of Penetration Testing Basics this week in the Security forum!

# Check List for duplicate entries

Qualtar Demix
Greenhorn
Posts: 15
To check if a List has duplicate entries i convert it to HashSet and compare the size for any mismatch. Do you guys have any better approach???!!!

Winston Gutkowski
Bartender
Posts: 10527
64
Qualtar Demix wrote:To check if a List has duplicate entries i convert it to HashSet and compare the size for any mismatch. Do you guys have any better approach???!!!

Well, another one would be to sort it and then go through it to see if there are any duplicates. If there are, they will now be adjacent to each other.

For large lists it will probably take longer, but it won't use any extra space (unless you have to keep the original List).

Winston

Stephan van Hulst
Bartender
Posts: 6469
83
• 2
I actually always use Qualtar's approach, because it has a very good time-complexity, and because I often work with Collections rather than Lists as input.

A solution that is slightly more optimal would be the following:

Winston Gutkowski
Bartender
Posts: 10527
64
Stephan van Hulst wrote:I actually always use Qualtar's approach, because it has a very good time-complexity...

Quite agree. I just thought I'd offer an alternative.

Your way is pretty much the way I'd do it; except if you're really interested in speed, it would probably be better to do:
Set<Object> set = new HashSet<>(c.size());

Winston

Qualtar Demix
Greenhorn
Posts: 15
Hmm. So, looking at all the approaches it would be safe to say that no-matter-what (or how for that sake), one has to convert it to Set one way or the other. Hence, the worst cast time complexity will always be O(n). Guess i have to close in on O(n).

Winston Gutkowski
Bartender
Posts: 10527
64
Qualtar Demix wrote:Hmm. So, looking at all the approaches it would be safe to say that no-matter-what (or how for that sake), one has to convert it to Set one way or the other.

Not if you use the method I suggested above.

Hence, the worst cast time complexity will always be O(n). Guess i have to close in on O(n).

True. Mind you, that only gives you an indication of scalability. The sorting approach probably works out at O(n log(n)), but for small lists (< 20 items?) it may actually be quicker than forcing elements into a Set. However, the only way to work that out would be to benchmark it.

It should also be added that it will only be O(n) if
(a) the Set used is a hashed Set, and
(b) your elements have good hashes.

Winston

Stephan van Hulst
Bartender
Posts: 6469
83
Winston Gutkowski wrote:Quite agree. I just thought I'd offer an alternative.

Your way is pretty much the way I'd do it; except if you're really interested in speed, it would probably be better to do:
Set<Object> set = new HashSet<>(c.size());

Winston

Yeah, I like the sorted list approach if memory is constrained. And I always forget passing the size to the set's constructor . For large collections it should make a large difference.