programming forums Java Java JSRs Mobile Certification Databases Caching Books Engineering Languages Frameworks Products This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
Sheriffs:
Saloon Keepers:
Bartenders:

# Iterator better for that?

nimo frey
Ranch Hand
Posts: 580
I have two lists listA and listB,
which contains the same kind of objects.

The size of A is shorter than the size of B.

Now, I want to know, if there are elements in A which also exists in B. (If so, at first store these elements in listC. After the loop, delete this elements in listA):

What are the differences between these two methods:

I guess, there is no difference, if I loop from listA or listB at first:

for example:

listA size=6000
listB size=15000

so the contains-Method must loop through 6000*15000 (=15000*6000) loops.

Is this best practice or should I use the Iterator?
[ October 27, 2008: Message edited by: nimo frey ]

Tim Holloway
Bartender
Posts: 18662
71
The actual math is a little more complicated than that. First, the number of passes to check whether the item from one list is in the other list is probably going to be an average of half the size of the list being searched. Secondly, if you delete items from that list, you reduce the size of the search.

So a little statistical, a little bit arithmetical.

If you can get the search target ordered however, you can speed things up immensely, since an ordered list can be searched using a binary algorithm, reducing the average search time exponentially.

When both lists are unordered, I can envision modifying one of the stock sort algorithms so that when you have the case of 2 equal entries, the end-result will delete the existing one and refuse to add the duplicate. However, if you're using a heaping type of sort, that will require some extra tree-balancing, so it's a little tricky.

nimo frey
Ranch Hand
Posts: 580
if you delete items from that list, you reduce the size of the search

So I use the Iterator, to be able to delete while I am in the loop (to reduce the count of the next loop). Am I right?

If you can get the search target ordered however, you can speed things up immensely

I have a ArrayList..to make at first the lister ordered costs not more than leaving it?

Paul Beckett
Ranch Hand
Posts: 96
nimo, careful with your 'remove' methods in the Collections framework. You used
listA.remove(listC);

The remove method removes the listC from your collection. I think you want to remove all of the elements contained in listC from your collection. In this case use

[ October 28, 2008: Message edited by: Paul Beckett ]

Paul Beckett
Ranch Hand
Posts: 96
I think either of the two methods you've described can be replaced with:

This is much easier to understand than trying to write your own version of the method.

nimo frey
Ranch Hand
Posts: 580
oh yes thanks, I have used removeAll()..only typed here false.

Mike Simmons
Ranch Hand
Posts: 3090
14
Since you're posting this in Performance, I expect you can get much better performance if you use a HashSet instead of (or in addition to) the Lists. Fundamentally, the algorithms described so far are all O(Na * Nb), while with a HashSet you can get O(Na + Nb).
[ October 28, 2008: Message edited by: Mike Simmons ]

Pat Farrell
Rancher
Posts: 4686
7
Originally posted by Mike Simmons:
while with a HashSet you can get O(Na + Nb).

Which by definition is the same as O(Na) for suitable constants.

But in general, a HashSet/HashMap should be more of O(1) rather than O(n)

Mike Simmons
Ranch Hand
Posts: 3090
14
[Pat]: Which by definition is the same as O(Na) for suitable constants.

Well, we don't know whether Na or Nb is larger - they may both be significant. It's premature to write off either one as a constant, I think - see below.

[Pat]: But in general, a HashSet/HashMap should be more of O(1) rather than O(n)

Basic operations like add(), remove(), contains() are O(1). But operations like addAll() and removeAll() are obviously O(N). And of course, calling add(), contains() or remove() N times would be O(N).

The point is, if Na and Nb are of comparable size (call them both N), then the solutions given in this thread are O(N^2), while a HashSet will be O(N). Big win.
[ October 28, 2008: Message edited by: Mike Simmons ]

Billy Tsai
Ranch Hand
Posts: 1307
so which method do you guys think has better performance when processing large number of complex objects(e.g. Value Object/Data Transfer Object)?

Mike Simmons
Ranch Hand
Posts: 3090
14
Using a HashSet, definitely.

Pat Farrell
Rancher
Posts: 4686
7
Originally posted by Mike Simmons:
[Pat]: Which by definition is the same as O(Na) for suitable constants.

Well, we don't know whether Na or Nb is larger - they may both be significant. It's premature to write off either one as a constant,

No, O(50) == O(500)

Look at the definition of big Oh.

Mike Simmons
Ranch Hand
Posts: 3090
14
But neither Na nor Nb is necessarily a constant. Naive definitions that assume there's only one variable will not help much here.

For non-HashSet solutions the performance is O(Na * Nb), and it makes a huge difference whether we vary the two quantities independently or not. So it would be a disservice to simplify this to either O(N) or O(N^2) - both are possible, as are some in-between solutions.

For the HashSet solution the performance is O(Na + Nb). Yes, we could call this O(N) I suppose, but then the question arises how does that N compare to the Na and Nb used in the previous statement about non-HashSet solutions. For me, it seemed much clearer to use comparable notation for both: one is O(Na + Nb), and the other is O(Na * Nb).