Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

choosing a data structure  RSS feed

 
joew weakers
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi there. I need to insert roughly about 200 objects into a data structure over the course of several calls to an insert method in my program. Each object is to be inserted only once although before an object is inserted I must check to see whether the object has previously been added to the data structure. My question is this: what is the most suitable data structure to use in my code? I would like to use either a Vector or List data structure as both have the functionality to check whether the data structure already contains an object, i.e. Vector.contains(object) and List.contains(object).
I know Vectors are not really efficient when dealing with large numbers of objects. However, if I am dealing with solely 200 objects is it ok to use either a vector or list? Joe
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I made a Collections Crib Sheet because I couldn't remember all this stuff.

See that the SET interface already assures you can only put something in once which takes care of part of your problem. That saves you a few lines of code and more importantly communicates to the reader your intent to only allow each object once.

It's not clear which of the sets would be best for putting a lot of objects in quickly. You could compare them on your own, but I bet at 200 objects you won't notice any difference. Maybe with 2 million you'll see some performance difference.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Stan James:

It's not clear which of the sets would be best for putting a lot of objects in quickly.


Theoretically it should be HashSet (if its initial size is choosen appropriately, so that it doesn't have to resize). HashSet has O(1) for insertion, TreeSet O(n log n) (if I remember correctly). That is, insertion time for a HashSet shouldn't increase with the number of objects at all.

I agree that with just 200 objects, it probably doesn't matter, anyway.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
[Ilja]: HashSet has O(1) for insertion, TreeSet O(n log n) (if I remember correctly).

That's a bit off. Inserting one element into a HashSet is O(1) (on average), while inserting N of them is O(N). For a TreeSet, inserting one element is O(log N), and inserting N of them is O(N log N).

A TreeSet is generally slower than a HashSet, but the difference frequently is not a big deal In general, use a TreeSet if you need the elements to be sorted; use a HashSet if you don't. Or since the original poster was originally interested in a List, perhaps a LinkedHashSet would be the preferred choice if the insertion order is important. Slightly slower than a HashSet, but the performance is comparable.

And yeah, the performance probably doesn't really matter here. The important criteria should be: does Joe want something sorted (TreeSet), in insertion order (LinkedHashSet), or does it not matter (HashSet)?
[ March 11, 2006: Message edited by: Jim Yingst ]
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jim Yingst:

That's a bit off. Inserting one element into a HashSet is O(1) (on average), while inserting N of them is O(N). For a TreeSet, inserting one element is O(log N), and inserting N of them is O(N log N).


Ah, right - thank you very much for the correction!

Another thing to keep in mind is that the big O notation only becomes relevant for big n - for a small number of objects, it could actually be that an O(log n) implementation is faster than an O(1) implementation. In this case, it would - besides other things - depend on the implementation details of hashcode versus compareTo.

But again, this is rather academic - for just 200 objects you are unlikely to notice the difference, anyway.
 
steve souza
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I beefed up rule 11 of the Java P&T faq page in response to this thread. See the link on my signature below...
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!