Win a copy of Head First Android this week in the Android forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

Accessing value from a seperate hash table

 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am implementing the hash join algorithm for a project with a hard coded hash function. I've hashed the first relation and I've hashed the second relation. The problem is when hashing the second relation I only know how to add the tuple from the second relation into a third relation and not also access the first relation tuple at that time

 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Matt Wright wrote:I am implementing the hash join algorithm for a project with a hard coded hash function.


Not quite sure what you mean here. Do you have a hashCode() method?

Also: You talk about "tuples", which is a database concept. In Java, you're likely to be dealing with collections of some kind.

What you're attempting is something that is usually done with Sets (java.util.Set) in Java (indeed, the method retainAll() specifically mimics it), so if I was trying to do this I would probably create a Relation class that encapsulates the "where" clause of a SELECT statement.

Once you have that, and can calculate its hash, populate two HashSet<Relation>s with the relations for each of your datasets (don't worry about duplicates for now) and run retailAll(). That will give you the Relations common to both, and then with another pass through your datasets you can add something that equates to row ids for them (eg, List indexes). It's possible you could do this during your initial build, but the two-pass approach is likely to be a lot simpler to code and doesn't change the O(n) characteristic of the operation.

It seems to me that you may be concentrating too much on the mechanics of the operation rather than looking at the design.

HIH

Winston
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:It's possible you could do this during your initial build...


Actually, thinking about it, you could probably do that quite easily if you initially load into HashMap<Relation, List<Integer>>s (where the Integer is the index of your "tuple"); and you could then do the retainAll() operation on their keySet()s.

I would expect this to have similar speed to a HashSet (but I don't know), and it would save you having to scan one of the datasets a second time (in fact you could just scan one of the Maps).

In order to combine them though, you'll need some way of distinguishing indexes of "table" A from those of table B, and one easy method would be to make the latter negative (perhaps something like: negativeIndex = -(originalIndex + 1), as Arrays.binarySearch() does for its "insertion" indexes). Obviously, this won't work for any joins involving more than two tables though (if you need it).

HIH

Winston
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:if I was trying to do this I would probably create a Relation class that encapsulates the "where" clause of a SELECT statement.


Actually, since this is for joins, it would only have to include those elements that are actually part of the join. What you do about any other "filtering" logic would be up to you. One simple way to combine hashes for multiple join fields would be to XOR them.

Winston
 
Matt Wright
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I took your advice and got an efficient solution. Thank you.
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Matt Wright wrote:I took your advice and got an efficient solution. Thank you.


Great! Just as a matter of interest, what did you come up with? My "advice" was only one of many ways to do it, and there's no "right" or "wrong" about these things...

Winston
 
Matt Wright
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
http://pastebin.com/7Jc92QLB
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic