• Post Reply Bookmark Topic Watch Topic
  • New Topic

code generation of an effective hashCode  RSS feed

 
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ranchers,

our shop recently had cause to look into a number of solutions for automatically generating hashCode and equals implementations. we went with a home-grown solution, but the effort of evaluation carried some enlightenment with it.

i made a blog entry to capture the experience and share some of what we found (and some of what we'd still like clarification on).

ranchers always have had great feedback on items related to certification and eclipse, and so, i'd really like that same quality feedback for this...

[ July 12, 2005: Message edited by: Erick Reid ]
[ July 12, 2005: Message edited by: Erick Reid ]
 
Ranch Hand
Posts: 2937
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If the intent was to be able to cache the instances of the legacy classes, the most straightforward solution would be to ... do nothing about the equals() and hashCode() implementation in those classes. As long as the reproducable objects are used as the keys (such as Strings or Integers), you can cache your legacy classes in hash-based collections all you want without worrying about their equals() and hashCode() implementations. In the following example, I intentionally broke the hashCode() contract in the legacy class, but caching still works perfectly well:



In no way do I advocate ignoring the careful implementation of equals() and hashCode(), but I am stressing the fact that your caching would work as long as the instances of your legacy classes are used as values in the cache, even with the faulty equals() and hashCode() methods.

Now, if you want to use the instances of your legacy classes as the keys to the hash-based caches, that's a different matter -- both equals() and hashCode() must be overriden correctly (or not overriden at all) in order for caching to work. But is that really the intent?
[ July 12, 2005: Message edited by: John Smith ]
 
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm ashamed to admit I haven't studied this in any detail yet, but see if this Java Specialists Newsletter has any helpful ideas.
 
Anonymous
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
john,

thank you very much for your detailed reply. i'm still mulling your example to make sure my head is completely around it.

In no way do I advocate ignoring the careful implementation of equals() and hashCode()


i admit to uttering "eek!" when starting to read your reply. this latter assertion of yours (and the warning comment in code) calmed me down.

if you want to use the instances of your legacy classes as the keys to the hash-based caches, that's a different matter -- both equals() and hashCode() must be overriden correctly


that turns out to be the case for our needs (though i'll continue to carefully consider your initial example -- there is something to learn there). the following code shows a key part of the cache implementation. the code contains some distracting extras, but the gist is that it does indeed uses instances as keys, relying heavily on hashCode and equals:




All the classes in the blog example all extend DomainObject.



and that is the key interest behind the blog entry are we doing it "right"? or, stated differently, are we doing it right-enough for other to consider it a good solution for their code? our solution works for us, both as it applies to an as-expected caching mechanism, and as a general hashCode/equals implementation... but do you, given your experience, see the final solution (utilize josh bloch's recipe, seed calculation based on class name, include parent values in calculation unless parent is Object) as one you'd use?
[ July 13, 2005: Message edited by: Erick Reid ]
 
Anonymous
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
stan,

thanks for your reply. i did scan the newsletter and dig the ideas presented there. it seems similar to the HashCodeBuilders used by jakarta commons.

it would be cool if an IDE could somehow be configured with a strategy that it would apply to all invocations of the hashCode code generation functionality (like that in IDEA/IntelliJ)
 
John Smith
Ranch Hand
Posts: 2937
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
_domainObjects.put(new CacheKey(doId), createNewElement(domainObject));

Ok, you've got 5 objects here (domainObjects, CacheKey, doId, domainObject, and whatever is created by createNewElement()). Out of those 5, only one matters in respect to the correct implementation of equals() and hashCode() and the expected caching behavior. This is to clarify my earlier comment that you don't need to fix your legacy classes at all.
 
Anonymous
Ranch Hand
Posts: 18944
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
john,

i understand your exception to the sample given, but trust that, given the problem at hand, everything is appropriate (and well-tested). the hashCode/equals implementations as given in the example are necessary and work for us.

giving the cache snippet was a bad idea on my part. in addition to the distracting extras in it, just by itself, it distracts from the key reason for my inquiry: is the example hashCode/equals implementation given one that could be effectively applied by any developer (caching needs or not) to most classes in their codebase?



this implementation hinges on two considerations:
  • the hash code calculation is seeded by using the hash code of the class name
  • calls to hashCode() and equals() of the parent are included in the corresponding methods of the given instance, unless the parent is Object.


  • your feedback is greatly appreciated, so, crazy domain-specific caching routines aside, what are your thoughts on the hashCode/equals implementations themselves?
     
    John Smith
    Ranch Hand
    Posts: 2937
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator


    I see that it follows the Joshua Block recepie for hashCode() implementation, but I don't understand the reason for ornamentation around it. I mean, why not simply this:

     
    Anonymous
    Ranch Hand
    Posts: 18944
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    john,

    thanks for your ongoing feedback.

    why not simply this:



    josh's formula would still use a seed value. the one given in the examples in his book is 17:


    the primary reason for using a varied seed value (departing from josh's specific example, but keeping with his intent) is to ensure that 2 instances of different classes do not return the same value. this was a concern in our case:



    under certain common conditions, calls to these two methods will return 629; whereas this will not happen with the following versions:



    the calls to super.hashCode() are made given our understanding that in a lot of cases, the values of the parent ought to be rolled into the calculation in the descendent:



    this is not to say we went looking for a perfect hash function, but it did seem that, when all was said and done, the solution we used was somewhat better than those we found in some code generation facilities available in common IDEs. it is that point on which i am hoping to draw feedback: sure it works for us, but is it, in your opinion, an effective and generally applicable solution.
     
    Anonymous
    Ranch Hand
    Posts: 18944
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    ranchers (specifically john and stan),

    thanks for your feedback. learned a few things about this small, but important item of logic. foremost is that a specialized solution is just that -- specific to a particular domain. related to that is the realization that, if the code generation functionality in IDEs/plugins does not try to be too crafty, it is with good cause.

    here is comment to self on blog entry:


    this implementation works for our specific need, but as a generally-applicable solution, there are problems. those problems are resolved by not including calls to super in hashCode/equals, and by using: if (object != null && object.getClass() == getClass) versus: if (object instanceof <ThisClass>

    the former change (not using super) was pointed out by rereading josh bloch's advice -- assuming i read it right. the latter comes from reading http://www.geocities.com/technofundo/tech/java/equalhash.html.

    our code generation utility worked for our specific case, and the code generation functionality in IDEs/plugins works for most general cases.


    a final thought is that our use of calls to super likely breaks transitivity, and thus needs to be revisited. silver lining is that we do have a code generation utility that will make revision a snap.
     
    John Smith
    Ranch Hand
    Posts: 2937
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    ER: the primary reason for using a varied seed value <...> is to ensure that 2 instances of different classes do not return the same value.


    If two instances of two different classes return the same hash code, it would not violate the hashCode() contract -- I just wanted to make it clear. It's possible that it would deteriorate the caching performance (if there are too many items in a particular hash bucket), but I would venture to say that the likelyhood of that is pretty low, and if it actually does happen, the performance penalty is likely to be low. This is one of the cases where I believe the "do not optimize" rule should be applied.

    I would also note that the case of the optimal hashing functions is an ongoing research subject, and it would be best to leave it to the academics and focus on solving business problems instead (unless you are an academic, of course).

    What's actually more likely to cause the performance degradation is the calculation of hash code itself (because it may be invoked by the hash-based collection many many times). For that, an easy boost is to cache the hashcode. For example, in the class constructor, calculate the hash code and assign it to a class level variable. In the hashCode() method, just return that value. That would not work if your hash code uses mutable fields, however.



    Can you explain the role of super.hashCode() in the code above? I don't quite understand why the the hash code of a subclass should depend on its parent.
     
    Anonymous
    Ranch Hand
    Posts: 18944
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    john,

    i agree with you on all points. having spent some time testing the cache performance now that our hachCode implementations have been code-generated and injected into all those "query" classes, the increased speed is negligible.

    there were a number of discussions in our shop about including calls to super in both equals and in hashCode. in the latter case, it seemed (and still seems) necessary given the nature of our classes -- but, right to your point, it really is a "so what" if AccountQuery and BuildingQuery can, under certain circumstances, both return 629 (or whatever other identical hash value).

    the real problem is that, in trying to stay in lockstep with what we did in hashCode, calling super in equals proves to be outright wrong. the implementation given in the example, with its calls to super, in fact breaks the symmetry portion of the equals contract.

    so, while seeding the hash calculation with clasaname is not detrimental, and can be argued as an acceptable thing, both for our code and others, the use of calls to super in both hashCode and equals (especially in equals) is wrong-headed.

    our panic to avoid equal values for non-equivalent instances lead us off the better path. fortunately, we have a code generator to redo the implementations for a song.

    thanks again for your explanations.
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!