• Post Reply Bookmark Topic Watch Topic
  • New Topic

Cost of creating a list then added to a hashset which is added back to a new list  RSS feed

 
Ariana Hobson
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I want to use a list to store and use indexOf() method to retrieve the element at a specific index while at the same time keeping the list unique. So I added the list created to a hashset to remove any duplicates, then add the hashset to a new list. I do this step every time I add a new element to the original list. When I want to retrieve an element, I retrieve it from the new list. I wonder if this is costly?
 
Campbell Ritchie
Marshal
Posts: 56546
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
No. A million‑element array list will occupy something in the order of 8MB on a 64‑bit box (if you specify its capacity), a million‑element hash set about 16MB and your new list anything up to about 12MB, probably less. If you specify the size of your list and set in advance you can probably add each element in a few clock cycles so you will probably only take a few milliseconds for the whole procedure.

But before you do that, consider the structure of your data structures. You have a List and you add things in in order. Now you put it into a hash set which supports neither duplicates nor order. Then you put it back into a List and you will have everything in a different order from what you started with. Try going through the Java™ Tutorials and look for implementations of set; there is one implementation which retains insertion order. I don't think it is a pure set, but I think it is a hybrid data structure. It will probably occupy at least twice as much space as a plain simple set.

If you go through the whole of that tutorials “trail”, you will find about Streams. Try this
Please check the Stream interface because there is a 75% chance that I have spelt one of those method names wrongly! You can only get that to work in Java8

You will probably need space for the old and new Lists and a spare memory space similar to that for the set mentioned earlier, but that will be released when that call completes. You must override equals() and hashCode() correctly for any of these techniques to work at all.
 
Campbell Ritchie
Marshal
Posts: 56546
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Adding the whole set every time you add something is going to be costly, yes. Sorry I missed that bit. I shall see if I can find an old post where I have looked at a similar problem.

You can use the “hybrid” data structure on its own to achieve everything you want.
 
Campbell Ritchie
Marshal
Posts: 56546
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You will find related questions, yes. These might help. 1 2
Be sure to read the whole of the discussion.
 
Ariana Hobson
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not particularly concerned about the order of the insertion. Thanks for the suggestion of using Stream. It's kinda cool to have a bit of functional style in java.

So anyway instead of List --> Set --> List, I adopt your suggestion of using Java 8 Stream:


This just uses two lists without having to use the hashset. But..
  • Is it better to remove duplicate every time a new person is added or..
  • Is it better to wait until everything is added and then remove duplicates?


  • It seems like what I'm doing is the same as what I was doing with adding the whole list to a hashset each time I add an item and according to you this is costly. So I guess the last bullet point is better performance wise?

    And correct me if I'm wrong.. If I use distinct() after I finish adding all the items to the list, this it is better than I check using equals() each time I want to add an item in the list isn't it?
     
    Mike. J. Thompson
    Bartender
    Posts: 689
    17
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    I'm not totally certain why you want indexes here if you don't care about ordering. If all you want to do is find an element and replace it with a different one then that is achieved quite easily with a HashSet by removing the old element and adding the new one. You don't need to maintain indexes to do that unless order is important, but transferring to a set then back to a list was changing your order anyway.
     
    Campbell Ritchie
    Marshal
    Posts: 56546
    172
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    And why have you marked everything static?
     
    Ariana Hobson
    Greenhorn
    Posts: 10
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    You're right. Thanks for pointing that out. Something is wrong with my head... If I hadn't posted here, I would be so such a fool. What was I thinking..
     
    Mike. J. Thompson
    Bartender
    Posts: 689
    17
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    It's very easy to miss the obvious and end up with something more elaborate than it needs to be. I've done it many times. That's one of the reasons why code reviews are important, to get a different perspective.
     
    Campbell Ritchie
    Marshal
    Posts: 56546
    172
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Running a Stream every time you add an element is going to be costly, too. Remember it will be a different Stream object each time; once you have used a Stream, it may be “exhausted” and unavailable for reuse. I still think my “hybrid” data structure will do what you want more simply. Tell us what you found in the Java Tutorials section.
     
    It is sorta covered in the JavaRanch Style Guide.
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!