• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Comparing two collections...

 
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey
I have a small problem that I'm trying to solve.
I have 2 HashSet's (called set1 and set2) containing 5 values each.
SET112345
SET2246810
I'm trying to compare the sets in 3 different ways.
1). Create a new HashSet that contains the elements from both SET1 and SET2
example1: Set contains: 2,4
2). Create a new HashSet that contains the elements from SET1 or SET2 or both.
example2: Set contains: 1,2,3,4,5,6,8,10
3). Create a new HashSet that contains the elements from SET1 and NOT SET2
example3: Set contains: 1,3,5

I think Question 2). is not a problem for me. As HashSets don't contain duplicate values I can add the contents of SET1 and SET2 to a new HashSet.
However, I can't yet think of a solution for solving 1). and 3).

Thanks
 
Ranch Hand
Posts: 305
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Assume the two sets are S1 and S2.
For 1.) and 3.) you can use a brute-force algorithm which basically iterates through the smaller set S1, checking if the current element is in S2, and if so, either add it to the new HashSet, or ignore it. Because it's a HashSet, the brute force method is not that bad in terms of efficiency.
You can find more info about iterating through a Collection at Sun's website.
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Look at the removeAll() and retainAll() methods of the Set interface.
If you make a copy of one set, and then call retainAll() on the copy passing the other set as an argument, you have the intersection. If you make a copy of one, then call removeAll() on the copy passing the other set as an argument, you have the answer to your number 3.
 
Andrew Hartman
Greenhorn
Posts: 16
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks, you have been most helpful.
Here is my code:
set1 = [URL1,URL2,URL3,URL4,URL5]
set2 = [URL2,URL4,URL6,URL8,URL10]

This is only part of a program I'm making, where I have a 'word' matched to a list or url names:
e.g.
the word 'hello' is in urls: [URL1,URL4]
'goodbye' : [URL1,URL4,URL5]
'another_word' : [URL1]
etc.
My program has many of these (sets of URLs that correspond with a word)
I'm performing queries on the lists/sets such that with the input :
and(hello,goodbye) would return : [URL1,URL4]
I've done this by parsing the input query string 'and(word,word,...,)'
Then tokenizing the words within the brackets. As each word is related to a set of URLs, I can then use their sets to perform union, intersection etc. on them.
I now want my program to be able to handle any number of words in the input string.
e.g. and(word,word,...,N words)
Will I be able to modify my above code to handle this? My intial ideas were, when tokenzing the input string do a token count. From here I'll then know how many 'sets of URLs' i'll be using. Then I can perform my union, intersection, set difference actions on them like in my above code.
Any help with this I will greatly appreciate, thanks
 
Ranch Hand
Posts: 539
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You'll find the Collections package in the Jakarta Commons project makes this trivial. Check out the CollectionUtils class (org.apache.commons.collections.CollectionUtils)
Both Set 1 and Set 2 - use CollectionUtils.intersection()
Either Set 1 or Set 2 - use CollectionUtils.union()
Set 1 and not set 2 - use CollectionUtils.subtract()
One advantage here is that the algorithms used are *probably* efficient ones, given the usual quality of Jakarta libraries. That said, I haven't checked the source to see.
Cheers,

--Tim
[ May 03, 2004: Message edited by: Tim West ]
 
Ranch Hand
Posts: 1873
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi,
just my 2 cents.
I would first,
1. sort SET1 in some algo with log(n) order
2. sort SET2 in some algo with log(n) order
3. consider bigger set SET1
4. loop thru SET1 indexes and same index in SET2
- if the element is equal --> put in "AND SET" set
- if the elements differ put both elements in "OR SET" set
- if the elements differ put the SET1's or SET2's element (depending upon where i want to put NOT) in "NOT SET" set
This would give me total order,
log(n)+log(m)+l where,
n = num of elements in first set
m = num of elements in second set
l = n or m; whatever is greater
i guess that would be good.
regards
maulin
 
Tim West
Ranch Hand
Posts: 539
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


1. sort SET1 in some algo with log(n) order


I assume you mean n log(n) right?
Your approach sounds efficient to me though. Your final step I think would be order l = (n + m), since at each iteration you may only increment your index through set 1 or set 2 respectively. (I'm imagining that you have an index into each "set", and that as you step through you increment the counter that points to the smaller item).
Alternatively, you could also use a priority queue for each set, but given the availability of efficient, well-tested sorting algorithms, your approach is better.
--Tim
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic