Matt Wright wrote:I am implementing the hash join algorithm for a project with a hard coded hash function.
Not quite sure what you mean here. Do you have a hashCode()
Also: You talk about "tuples", which is a database concept. In Java
, you're likely to be dealing with collections of some kind.
What you're attempting is something that is usually done with Set
s (java.util.Set) in Java (indeed, the method retainAll()
specifically mimics it), so if I was trying to do this I would probably create a Relation
class that encapsulates the "where" clause of a SELECT statement.
Once you have that, and can calculate its hash, populate two HashSet<Relation>s with the relations for each of your datasets (don't worry about duplicates for now) and run retailAll()
. That will give you the Relation
s common to both, and then with another pass through your datasets you can add something that equates to row ids for them (eg, List indexes). It's possible you could do this during your initial build, but the two-pass approach is likely to be a lot simpler to code and doesn't change the O(n) characteristic of the operation.
It seems to me that you may be concentrating too much on the mechanics of the operation rather than looking at the design.