• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Strategies for calculating hash codes from composite keys?

 
Karsten Wutzke
Ranch Hand
Posts: 106
Hibernate MySQL Database Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Are there any general collision-free best practices to generate hash codes for any (atomic) type composite primary keys?

I thought about it for a few hours and came to the conclusion, that a string concatenated by *all* primary key columns would be the only reliable way to do so. Calling Java's hashCode method on that concatenated string should yield a unique integer. (it would in fact somehow mimic what a database index does *eyerolling*)



However, I don't believe my solution is optimal. There must be algorithms out there that are better. The above probably runs rather slow, but it's still faster than accessing a DB generator. What about standard hashing functions like MD5, SHA1, CRC, and Adler?

Karsten

PS: This is for overriding composite key ID classes...
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Karsten Wutzke wrote:What about standard hashing functions like MD5, SHA1, CRC, and Adler?

Encoding ? You don't use encoding for a hashCode. If you don't want to make the hashCode yourself, there's a nifty class in Apache Commons Lang which helps you do that : org.apache.commons.lang.builder.HashCodeBuilder
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
PS: hashCode is not used to generate unique integers
 
Karsten Wutzke
Ranch Hand
Posts: 106
Hibernate MySQL Database Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Christophe Verré wrote:PS: hashCode is not used to generate unique integers


Umm yes I have to redo my homework on hashCode and equals first. Gathering Effective Java... ;-)

Karsten
 
Karsten Wutzke
Ranch Hand
Posts: 106
Hibernate MySQL Database Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Christophe Verré wrote:
Karsten Wutzke wrote:What about standard hashing functions like MD5, SHA1, CRC, and Adler?

Encoding ? You don't use encoding for a hashCode. If you don't want to make the hashCode yourself, there's a nifty class in Apache Commons Lang which helps you do that : org.apache.commons.lang.builder.HashCodeBuilder


Now that's a nice hint. I really have no interest in creating hash codes for myself. I just know that XORing or even adding the hash codes of the parts is a bad idea. String.hashCode already produces a collision with "Ca" and DB ;-)

Thanks for pointing me into the right direction.

Karsten
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Remember that if you're overriding hashCode() you need to override equals() as well. HashCodeBuilder is accompanied by EqualsBuilder

Personally, I like Pojomatic, which handles hashCode(), equals(), and also toString() with a minimum of fuss.
 
Karsten Wutzke
Ranch Hand
Posts: 106
Hibernate MySQL Database Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mike Simmons wrote:Remember that if you're overriding hashCode() you need to override equals() as well. HashCodeBuilder is accompanied by EqualsBuilder

Personally, I like Pojomatic, which handles hashCode(), equals(), and also toString() with a minimum of fuss.


Well, I RT(F)M and discovered that commons-lang also has an EqualsBuilder... hashCode is for sorting, equals is for object identity... yes.

It seems to be using reflection to inspect the class, right. Commons lang can do this, too. However, I have no problem with using (generated) hardcoded implementations.

Karsten
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
True, it's easy to auto-generate code for these methods using any of the major IDE's. However it's less easy to tell if a previously-generated method is still up-to-date. If you've added a few fields since you first generated the methods, will you remember to add the new fields to the methods? Can you read the method quickly and figure out if it's complete and correct, or not? That's where the reflection-based techniques are beneficial. Little code to write, and little to read.

Comparing HashBuilder's reflection-based mode and Pojomatic, the main advantage of Pojomatic is that you can easily modify the default behavior with annotations. @AutoProperty causes all properties to be used by default, and @Property(policy=Pojomatic.NONE) can then be used on any fields you don't want included. Or you can include a field in hashCode() but eliminate it from toString(). Etc. For me, this is usually the optimal mix between minimal code, most of the time, and customizability, when you need it.
 
Karsten Wutzke
Ranch Hand
Posts: 106
Hibernate MySQL Database Python
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the information. Is performance an issue on using reflection-based builders?

I have no problem re-running the generator when adding columns to the DB. I have a GUI tool that I can sync with the DB and regenerate from that.

Karsten
 
Mike Simmons
Ranch Hand
Posts: 3090
14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Karsten Wutzke wrote:Thanks for the information. Is performance an issue on using reflection-based builders?

It can be, but most of the time, no. Reflection-based techniques will almost certainly be slower than a good hard-coded method - but very very often, the difference isn't big enough to be noticeable, because your performance bottleneck is somewhere else. As is so often the case for performance questions.

Also, I'm pretty sure Pojomatic will be faster than HashBuilder's reflection techniques, because Pojomatic caches a lot of the information it looks up about the fields in the class. Though it's possible this has been added to HashCodeBuilder as well, since last I looked at it - there's no fundamental reason why such optimizations can't be applied in HashCodeBuilder too.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic