programming forums Java Java JSRs Mobile Certification Databases Caching Books Engineering OS Languages Paradigms IDEs Build Tools Frameworks Products This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
Sheriffs:
Saloon Keepers:
Bartenders:

Comparing two collections...

Greenhorn
Posts: 16
Hey
I have a small problem that I'm trying to solve.
I have 2 HashSet's (called set1 and set2) containing 5 values each.
SET112345
SET2246810
I'm trying to compare the sets in 3 different ways.
1). Create a new HashSet that contains the elements from both SET1 and SET2
example1: Set contains: 2,4
2). Create a new HashSet that contains the elements from SET1 or SET2 or both.
example2: Set contains: 1,2,3,4,5,6,8,10
3). Create a new HashSet that contains the elements from SET1 and NOT SET2
example3: Set contains: 1,3,5

I think Question 2). is not a problem for me. As HashSets don't contain duplicate values I can add the contents of SET1 and SET2 to a new HashSet.
However, I can't yet think of a solution for solving 1). and 3).

Thanks

Ranch Hand
Posts: 305
Assume the two sets are S1 and S2.
For 1.) and 3.) you can use a brute-force algorithm which basically iterates through the smaller set S1, checking if the current element is in S2, and if so, either add it to the new HashSet, or ignore it. Because it's a HashSet, the brute force method is not that bad in terms of efficiency.

author and iconoclast
Sheriff
Posts: 24217
38
Look at the removeAll() and retainAll() methods of the Set interface.
If you make a copy of one set, and then call retainAll() on the copy passing the other set as an argument, you have the intersection. If you make a copy of one, then call removeAll() on the copy passing the other set as an argument, you have the answer to your number 3.

Andrew Hartman
Greenhorn
Posts: 16
Thanks, you have been most helpful.
Here is my code:
set1 = [URL1,URL2,URL3,URL4,URL5]
set2 = [URL2,URL4,URL6,URL8,URL10]

This is only part of a program I'm making, where I have a 'word' matched to a list or url names:
e.g.
the word 'hello' is in urls: [URL1,URL4]
'goodbye' : [URL1,URL4,URL5]
'another_word' : [URL1]
etc.
My program has many of these (sets of URLs that correspond with a word)
I'm performing queries on the lists/sets such that with the input :
and(hello,goodbye) would return : [URL1,URL4]
I've done this by parsing the input query string 'and(word,word,...,)'
Then tokenizing the words within the brackets. As each word is related to a set of URLs, I can then use their sets to perform union, intersection etc. on them.
I now want my program to be able to handle any number of words in the input string.
e.g. and(word,word,...,N words)
Will I be able to modify my above code to handle this? My intial ideas were, when tokenzing the input string do a token count. From here I'll then know how many 'sets of URLs' i'll be using. Then I can perform my union, intersection, set difference actions on them like in my above code.
Any help with this I will greatly appreciate, thanks

Ranch Hand
Posts: 539
You'll find the Collections package in the Jakarta Commons project makes this trivial. Check out the CollectionUtils class (org.apache.commons.collections.CollectionUtils)
Both Set 1 and Set 2 - use CollectionUtils.intersection()
Either Set 1 or Set 2 - use CollectionUtils.union()
Set 1 and not set 2 - use CollectionUtils.subtract()
One advantage here is that the algorithms used are *probably* efficient ones, given the usual quality of Jakarta libraries. That said, I haven't checked the source to see.
Cheers,

--Tim
[ May 03, 2004: Message edited by: Tim West ]

Ranch Hand
Posts: 1873
hi,
just my 2 cents.
I would first,
1. sort SET1 in some algo with log(n) order
2. sort SET2 in some algo with log(n) order
3. consider bigger set SET1
4. loop thru SET1 indexes and same index in SET2
- if the element is equal --> put in "AND SET" set
- if the elements differ put both elements in "OR SET" set
- if the elements differ put the SET1's or SET2's element (depending upon where i want to put NOT) in "NOT SET" set
This would give me total order,
log(n)+log(m)+l where,
n = num of elements in first set
m = num of elements in second set
l = n or m; whatever is greater
i guess that would be good.
regards
maulin

Tim West
Ranch Hand
Posts: 539

1. sort SET1 in some algo with log(n) order

I assume you mean n log(n) right?
Your approach sounds efficient to me though. Your final step I think would be order l = (n + m), since at each iteration you may only increment your index through set 1 or set 2 respectively. (I'm imagining that you have an index into each "set", and that as you step through you increment the counter that points to the smaller item).
Alternatively, you could also use a priority queue for each set, but given the availability of efficient, well-tested sorting algorithms, your approach is better.
--Tim

 Consider Paul's rocket mass heater.