• Post Reply Bookmark Topic Watch Topic
  • New Topic

Different Possible Ways to Remove Duplicates From ArrayList  RSS feed

 
Skott Elliott
Greenhorn
Posts: 13
1
Chrome IntelliJ IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,

I'm at an internship and I've been here for 3 months.  I started with zero Java skills.  Things are going along and I'm able to find most things via Google.  However, this particular task I was given is starting to get the best of me.  The task I have:

1) You have an ArrayList that contains duplicate string values.  Think of possible ways to remove the duplicates and mark the advantages and disadvantages of each approach.

Obviously, we can use the HashSet class, which I already have written a bit of code to demonstrate.  (As well as the advantages/disadvantages.)  Now, what the heck else could I use to accomplish this task?

I'd be grateful for anybody who could point me to the right direction.  I know it's the JavaRanch way to not just give the answer and I'm OK with that.  But a bit of a point in the right direction would be great, be that a point to websites, chapters in a book, etc. 

Thank you!
Scott
 
S Ry
Greenhorn
Posts: 7
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Apart from using HashSet you can use nested loops to check if there is any duplicate element in the arraylist.

 
fred rosenberger
lowercase baba
Bartender
Posts: 12562
49
Chrome Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
at the moment, stop thinking about concrete java classes.  Just think about the problem in English (or whatever).  The problem at hand requires you do two things:

1)  FIND the elements you want to remove
2)  REMOVE items you have found (which don't necessarily have to be duplicates)

The reason you want to think about it this way is so you can re-use the pieces of code.  If you have a method that removes a collection of elements, you can hand it a list of duplicate elements, a list of elements that are too old, a list of elements that are "blue"...or whatever you want.  The point is, it doesn't matter HOW you find them, and you can change your mind about it later, and not have to touch your "remove items" code.
 
Carey Brown
Saloon Keeper
Posts: 3309
46
Eclipse IDE Firefox Browser Java MySQL Database VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sort the strings. Then any duplicate strings will be in adjacent locations.
 
Skott Elliott
Greenhorn
Posts: 13
1
Chrome IntelliJ IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just wanted to update.  So far, I've got three total ways I've coded.

1) Using a LinkedHashMap;
2) Using a nested loop that was written for the specific ArrayList;
3) Using a method removeDuplicates() that was made more generic so it could accept any ArrayList of any type;

I've been thinking about Fred's post where I could have a method of any Collection type, but haven't figured that out yet.  So I've been able to come up with three ways thus far. 

For anybody who wants to see, the code is in the following:



Predictably, the outcome of the code:

 
Piet Souris
Master Rancher
Posts: 2041
75
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another easy one (if you have Java 8) is to use list.stream().distinct.....

Do you need to modify your original list?

Keep in mind that for a generic solution your objects need to be comparable (or you must supply a suitable comparator).

With regard to Freds remarks: okay, but if you have a list 'toBeRemoved', then you must also think of how to
apply this: is a removeAll suitable? For instance: if you have made a separate list of non unique strings, do you want to
remove all the strings in your original list, or do you want to keep at least one, or ?
 
Ganish Patil
Ranch Hand
Posts: 529
19
Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Netbeans IDE Spring Tomcat Server
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think modifying original list is too time consuming as It is ArrayList which takes much time for delete and insertion operation. It would be better to create new list which will have only unique values in it.
Here I tried adding only unique values from original ArrayList to new created ArrayList.

Output:
Time Taken by removeDuplicatesOne(): 187 ms
After duplicates removed: A.....Z  // prints A to Z

Time Taken by removeDuplicatesTwo(): 81242 ms
After duplicates removed:  A.....Z  // prints A to Z
 
Campbell Ritchie
Marshal
Posts: 56518
172
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
S Ry, welcome to the Ranch

Are you supposed to produce lots of different ways to remove duplicates, or find the most elegant way. If the latter, I would avoid sorting the List because that removes the information about its original order. It also is slower than some other methods. I would agree with Piet Souris about using a Stream.The stream() method occurs in all Lists; it is actually in the Collection interface, and (would you believe) it creates a Stream. Nothing happens to this Stream until you use the final Stream operation to create the List. The distinct() method depends on correct implementation of the equals() method, which Strings have already. It may require lots of memory to retain references to objects already encountered, and the documentation doesn't say whether the algorithm runs in linear complexity or not. The manual methods where one searches through the List run in quadratic complexity because each search is linear complexity. Once you have the second distinct Stream, it is easy enough to create a new List. You use the collect() method, which takes a Collector as its argument. The easiest way to get a Collector is often to go through the Collectors class and see what you can find; there is a method called toList() which returns a Collector that does exactly what you want.

Sorry, I had to edit this post, because I clicked the submit button too soon by mistake.

If you use the technique with a Set, as you have already suggested, both those stages will run in linear time. You can reduce the two lines to one if you don't need the Set again:-
 
Skott Elliott
Greenhorn
Posts: 13
1
Chrome IntelliJ IDE Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell Ritchie wrote:Are you supposed to produce lots of different ways to remove duplicates, or find the most elegant way.


Coming into the office today and showing what I had, now I'm tasked with getting the most effective/efficient way of doing this in an ArrayList; a way that would use the least amount of operations. What seemed kind of open-and-close yesterday has turned into a bit of a never ending black hole.   They mentioned to pay attention to:

  • compareTo()
  • .remove
  • for each
  • for


  • I'm not sure if this is a clue of things to experiment with, or if it's how I should track which is most efficient.  Anyway, off to get my litre of water and cup of tea and have a think.  Thanks for all the help and replies from everybody else, it's been great food for thought.  I've taught myself music, guitar, drums, orchestral scoring and computers, but damn if programming hasn't given me the biggest fit.
     
    Ganish Patil
    Ranch Hand
    Posts: 529
    19
    Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Netbeans IDE Spring Tomcat Server
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Skott Elliott wrote:now I'm tasked with getting the most effective/efficient way of doing this in an ArrayList;
    In this case, I think Campbell Ritchie gave you a good solution.
     
    Campbell Ritchie
    Marshal
    Posts: 56518
    172
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Skott Elliott wrote:. . . a never ending black hole. . . . I'm not sure if this is a clue of things to experiment with, or if it's how I should track which is most efficient.
    It isn't a black hole at all, but an opportunity to rise to the challenge. You must acknowledge all suggestions given here. How you do that depends on how formal they want you to be about the problem. You can now create yourself a List with millions of words in. Download a text‑only version of the Bible, War and Peace, Complete Works of Shakespeare or similar from Project Gutenberg and read every word into your List. Go through all the algorithms and work out which run in linear time, which quadratic and which nlogn complexity. Write down your reasoning for all those. Find out how to time the execution of your algorithms, and also how to trick the JIT compiler into getting to full speed before you start your timings. Find out how many of those techniques can be parallelised to take advantage of multi‑core processors. Find out whether the fastest technique is different for a 1000000‑word collection and a 20‑word collection.

    Now you will be busy for the next four hours doing all that and writing it down. If what you write is any good, post it here for us to admire.
    . . . litre of water and cup of tea and have a think.
    You will need more than tea and water
    Thanks for all the help and replies . . .
    That's a pleasure
    Learning the language is easy, but programming is renowned for being a difficult activity.
     
    Skott Elliott
    Greenhorn
    Posts: 13
    1
    Chrome IntelliJ IDE Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Campbell Ritchie wrote:
    Skott Elliott wrote:. . . a never ending black hole. . . . I'm not sure if this is a clue of things to experiment with, or if it's how I should track which is most efficient.
    It isn't a black hole at all, but an opportunity to rise to the challenge. You must acknowledge all suggestions given here. How you do that depends on how formal they want you to be about the problem. You can now create yourself a List with millions of words in. Download a text‑only version of the Bible, War and Peace, Complete Works of Shakespeare or similar from Project Gutenberg and read every word into your List. Go through all the algorithms and work out which run in linear time, which quadratic and which nlogn complexity. Write down your reasoning for all those. Find out how to time the execution of your algorithms, and also how to trick the JIT compiler into getting to full speed before you start your timings. Find out how many of those techniques can be parallelised to take advantage of multi‑core processors. Find out whether the fastest technique is different for a 1000000‑word collection and a 20‑word collection.

    Now you will be busy for the next four hours doing all that and writing it down. If what you write is any good, post it here for us to admire.
    . . . litre of water and cup of tea and have a think.
    You will need more than tea and water
    Thanks for all the help and replies . . .
    That's a pleasure
    Learning the language is easy, but programming is renowned for being a difficult activity.


    I did mean the mention of 'black hole' in the nicest way possible.  In fact, it's what keeps me continuously interested.  It's similar to audio recording and production.  You can learn the theory of EQ, but every waveform you record is different and requires different treatment.  No wave form is ever the same - kind of like no boiler-plate code will cover all bases for a solution.  Thank you for the idea of reading in some text, because I already was using System.nanoTime() for these extremely small ArrayLists.  I ended up writing another utilising Iterator with .remove().  Consequently, the LinkedHashSet is the slowest, with the nested loop method being the fastest.  I'm going to implement the huge text file and run these tests again.

    Thanks again everybody in the thread that has helped out and has exercised patience with a n00b!
     
    Skott Elliott
    Greenhorn
    Posts: 13
    1
    Chrome IntelliJ IDE Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Just wanted to give a bit of an update.  I ended up using a LinkedHashSet, a method with a nested loop, and an iterator.  All went fine.  I've been asked to do the same procedure but instead of Set, to use Map.  (HashMap)  So I'm thinking these are just thinking exercises to get used to working with Collections.

    The LinkedHashMap:


    The Nested Loop Method:


    I ended up just populating an ArrayList with a for loop, using 1E5 to make 100,000 random numbers 0 to 100.  (Used new Random();)  I used new Date().getTime() to track the performance, and the Iterator beat out the LinkedHashSet by a few milliseconds in my example every time.  The nested loop wasn't even a close 3rd.

    So I'll continue to see how I can accomplish this with a HashMap.  The mentor who gave me this task, it turns out that this was coding example he was asked to complete during the interview he had.  (Aside from a billion questions asked.)  It's certainly a case of 'the more I learn, the more I realise I don't know.'  But it's a good and interesting experience.

    *EDIT*
    So I realized that the the Iterator didn't work, because when it iterated, it actually didn't find any duplicate values based on how I wrote it.  Seems I have to look into recursion.  For now, working with the HashMap, I'm about 50% of the way I think.



    So now I need to see about iterating with the map and ArrayList?  I think I understand a way to do this, just have to hammer it out.  Maybe something like a for each testing for equality of each object's value?  Just thinking out loud.
     
    Knute Snortum
    Sheriff
    Posts: 4270
    127
    Chrome Eclipse IDE Java Postgres Database VI Editor
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Big Hint: what happens if the key value is already present?
     
    Skott Elliott
    Greenhorn
    Posts: 13
    1
    Chrome IntelliJ IDE Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Knute Snortum wrote:Big Hint: what happens if the key value is already present?


    I understand that the keys cannot be duplicated - so in that essence the HashMap removes duplicates.  I'm currently trying to work out how to use the HashMap in a way to remove duplicates from the ArrayList.  I'm almost positive I need to use the Iterator .remove method in a loop.  I just haven't worked it out yet.

    *EDIT*

    Got it!

     
    Knute Snortum
    Sheriff
    Posts: 4270
    127
    Chrome Eclipse IDE Java Postgres Database VI Editor
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Nice job!
     
    Campbell Ritchie
    Marshal
    Posts: 56518
    172
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Why do you need the remove method?
    What you can do is put all the elements of your List as Keys into the Map, and use a dummy object as a Value. You can actually use the same instance for the Value for every pair. If you go through the code for HashSet, you will find that is how it is implemented. Then you can iterate the Key set in the Map and put everything into another List. If you use a LinkedHashMap rather than an ordinary HashMap, you should get the order of insertion maintained in your new List. You may need some condition like:-
    if (!myMap.containsKey(myList.get(123))) ...
    Don't copy and paste that code.
    Go through the documentation for those different kinds of Map and see what it says, and the Java™ Tutorials.
     
    Skott Elliott
    Greenhorn
    Posts: 13
    1
    Chrome IntelliJ IDE Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Campbell Ritchie wrote:Why do you need the remove method?
    What you can do is put all the elements of your List as Keys into the Map, and use a dummy object as a Value. You can actually use the same instance for the Value for every pair. If you go through the code for HashSet, you will find that is how it is implemented. Then you can iterate the Key set in the Map and put everything into another List. If you use a LinkedHashMap rather than an ordinary HashMap, you should get the order of insertion maintained in your new List. You may need some condition like:-
    if (!myMap.containsKey(myList.get(123))) ...
    Don't copy and paste that code.
    Go through the documentation for those different kinds of Map and see what it says, and the Java™ Tutorials.


    Yessir, however what I was asked to do was to specifically use the map as a way to remove elements from the ArrayList.  That's what the mentor asked me to do.  On Friday I had simply created a new map which contained unique values by default of how it works, and that was not acceptable.   (Remember, I'm in an internship.) HOWEVER - I do appreciate all the help and tips. 
     
    Campbell Ritchie
    Marshal
    Posts: 56518
    172
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I would not use the Map to remove elements from the List, but not to record them for the new List. Subtle difference, but not recording might be easier.
     
    Stefan Evans
    Bartender
    Posts: 1837
    10
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Sounds to me like the orders have come from on high that you CAN'T do it by creating a new List.  You must instead cull records from the existing list.
    Yay for arbitrary decrees from on high.  Get used to them :-)

    Either approach is valid.  There are of course advantages/disadvantages.  I presume analysing those is part of the learning experience here.

    Memory usage vs speed vs amount of code required vs  ease of understanding the code - there is always a trade-off

    You will normally have to compromise on some the above somewhere along the way.
    Making something faster, might mean making it harder to understand.
    You can use a helper method to cut down on the amount of code written, at the cost of some extra memory.

    Personally I'm a believer in the KISS principle.  (Keep it simple stupid!) so normally prefer easy to read code over the most efficient code.
    As long as it is "fast enough"

     
    Skott Elliott
    Greenhorn
    Posts: 13
    1
    Chrome IntelliJ IDE Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I hear you guys.  I see you get a lot of inane posts like this from others.  It's clear that for the internship they want me to do these kinds of exercises.  Today everything was good, but still was asked to use the Map with a for loop without using an Iterator.  I kept getting the IndexOutOfBounds exception, turns out that I just had to change my looping arguments.  Here is what I came up with:



    I've been struggling how to explain the why behind the way it works here and how it wouldn't work if we used the "normal" arguments in an for loop.  I'm going to type how I'm able to explain it, maybe somebody else can help clean up my verbose attempt.  Basically, if we do something like:



    We are going to carry on to the original end of the ArrayList, which is impossible since elements are being removed, so the list is getting shorter.  Hence IndexOutOfBounds Exception.  But by doing:


    We set int i to the last index of the Array List on each iteration, and then we decrease the iteration cycle with i--. 

    Do I properly understand this?
     
    Campbell Ritchie
    Marshal
    Posts: 56518
    172
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Skott Elliott wrote:. . .   I see you get a lot of inane posts like this from others.  . . .
    There is nothing inane about your post; I was considering reporting it to the other mods as an interesting thread. What appears to have happened is that you have been given a rather awkward additional requirement, to remove elements from the current List without creating a new List. That is probably done to make the task more challenging and see how you handle it. They are probably only being awkward like that so as to see how good you are

    What is Timer? Is it an app to time your execution time? If so, why do those methods appear to be static?
    You appear to have used caps lock in line 1
    Why have you changed from Strings to Integers?
    How are you populating your List? That would be interesting to see. I have my own ideas about how to populate a List<Integer>.
    What do you know about for and while loops for Lists? What do you know about their performance? There are all sorts of interesting things you can find out there, and you can do that all in about 20 minutes of you look in the right place.
    Why are you iterating your loop backwards? Why have you written > 1 in the loop? Why are you catching the out of bounds Exception? Do you know how to write a for loop so you can never get an out of bounds Exception?
     
    Skott Elliott
    Greenhorn
    Posts: 13
    1
    Chrome IntelliJ IDE Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Campbell Ritchie wrote:
    Skott Elliott wrote:. . .   I see you get a lot of inane posts like this from others.  . . .
    There is nothing inane about your post; I was considering reporting it to the other mods as an interesting thread. What appears to have happened is that you have been given a rather awkward additional requirement, to remove elements from the current List without creating a new List. That is probably done to make the task more challenging and see how you handle it. They are probably only being awkward like that so as to see how good you are

    What is Timer? Is it an app to time your execution time? If so, why do those methods appear to be static?
    You appear to have used caps lock in line 1
    Why have you changed from Strings to Integers?
    How are you populating your List? That would be interesting to see. I have my own ideas about how to populate a List<Integer>.
    What do you know about for and while loops for Lists? What do you know about their performance? There are all sorts of interesting things you can find out there, and you can do that all in about 20 minutes of you look in the right place.
    Why are you iterating your loop backwards? Why have you written > 1 in the loop? Why are you catching the out of bounds Exception? Do you know how to write a for loop so you can never get an out of bounds Exception?


    It's a learning experience.  I walked into this internship with 0 programming or computer science experience.  Obviously I've been on a PC and using Internet since... I think it was 1996 or so.  Learning to program has been a longtime ambition of mine. 

    Timer, this is just new Date(), I put it into a separate method because I was tired of copy-pasting it.  I have the method to start, end, and show the results of the new Date().
    I think you're right about the caps lock.   
    I changed to Integers for no real reason.  The truth is that I forgot that I was working with Strings, had accidental made an iteration with Integer, so I ran with it.

    Population of the List is this method here:



    The other questions you asked are precisely what I'm hunting for now, as I think part of this exercise was to learn these important bits of the Collections classes. 

    Iterating the loop backwards so that there is always an index and won't actually have an IndexOutOfBounds exception.  Catching it because before that I kept running into the exception and it was annoying me, so I put a try-catch phrase to at least have a successful bit of code to look at instead of red text.  It doesn't need to be there now that it works.  i > 1... you got me. Probably should have wrote that as i >= 0.

    For the last question, I don't know how to do that.  I'd be all ears.  Could I find that specific information in the Oracle docs? 
     
    Ganish Patil
    Ranch Hand
    Posts: 529
    19
    Chrome Eclipse IDE Hibernate Java jQuery MySQL Database Netbeans IDE Spring Tomcat Server
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Skott Elliott wrote:
    did this give correct output? I don't think so.
    IMO you better use below loop as ArrayList also uses dynamic resizable array of object internally so array always starts from 0.
    We are going to carry on to the original end of the ArrayList, which is impossible since elements are being removed, so the list is getting shorter. Hence IndexOutOfBounds Exception. But by doing:
    Then just decrement i by 1 i.e. i--; when condition returns true.
     
    Campbell Ritchie
    Marshal
    Posts: 56518
    172
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Skott Elliott wrote:. . . Timer, this is just new Date() . . .
    Avoid Date like the plague. Look in the Java™ Tutorials for what to use instead. For timing execution times you are probably better off with System.nanoTime(). Search my posts for nanoTime and you can probably find examples of it in use.
    I changed to Integers for no real reason.  The truth is that I forgot that I was working with Strings, had accidental made an iteration with Integer, so I ran with it.
    Integers are easier to get out of a Random object, but I would use a Stream to create the random List. Again, search my posts and you should find an example within the last six months. Don't use 1E5 which is not an int but a double. Don't use doubles with < in loops or if statements.
    . . . Iterating the loop backwards so that there is always an index and won't . . . .  i > 1... you got me. Probably should have wrote that as i >= 0.
    If you write a for loop correctly you will never suffer an out of bounds exception. Use myList.size() not a constant, otherwise the size and that constant might be different
    For the last question, I don't know how to do that.  I'd be all ears.  Could I find that specific information in the Oracle docs? 
    I didn't find it in the Java™ Tutorials, but I would expect to find it in any decent book.
    The standard idiom to traverse a List backwards is:-Remember my challenge to find out why you sometimes use a for loop for Lists and sometimes you don't.
    Also, you need some work with pencil and paper. If you traverse the List backwards, and remove all elements which exist as “K”s in the Map:-
  • 1: What changes will occur to the last element in the List, apart from its changing its index, whenever you remove elements.
  • 2: Under which circumstances can you run out of indices and suffer an out of bounds exception?
  • 3: If a duplicate is found, which element is removed? Go through the List interface before you answer that.
  •  
    Skott Elliott
    Greenhorn
    Posts: 13
    1
    Chrome IntelliJ IDE Windows
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Sorry for not getting back in a timely fashion.  It was an action packed week.  I had a review of my internship and was told that I was going to be released due to slow progress.  (Actually, there is a lot of progress, but coming from no programming at all, they need folks who can step up their game immediately.)  Instead, I've been put on a more individual year-long program.  So, ain't throwing in the towel by any means!

    I will definitely get to your post soon Mr. Campbell.  I appreciate a lot the posts you're making for me. 
     
    Campbell Ritchie
    Marshal
    Posts: 56518
    172
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Also work out whether the behaviour of the remove method differs if you run the loop forwards and backwards.
     
    Campbell Ritchie
    Marshal
    Posts: 56518
    172
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Welcome to the Ranch

    I added code tags to your post, which you should use yourself. Doesn't the post look better Nice solution.
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!