Win a copy of Kotlin in Action this week in the Kotlin forum!
    Bookmark Topic Watch Topic
  • Likes 1
  • Mark post as helpful
  • send pies
  • Report post to moderator
(Level: intermediate)

After reading this article, and Item 8
of Effective Java - and also spending quite a lot of time writing the FirstClasses page - I've latterly come to the conclusion that the designers of Java may have made a mistake when they wrote the equals() and hashCode() methods of Object.

Why? Because they assume that objects are unequal unless they're the same object.

And why is that wrong? Think about what "equals" means.

From the docs for Object.equals() itself:

The equals method implements an equivalence relation on non-null object references...

.

"Equivalence" note, not identity. Indeed, the implementations for Object are simply pieces of logic that can already be done in other ways (the '==' operator and System.identityHashCode()).

And since we have no "contents" by which to verify 'Object equivalence', why should we naturally assume that they're different?

My suspicion: Collections. Specifically: Sets and Maps.

Having Object default to an identity comparison allows us to happily plough an object into any collection and know that it will be there unless an overridden equals() (or hashCode()) method dictates otherwise.

.

However, this creates another, less obvious problem. Namely: you can't use super.equals() consistently. You can use it on subclasses of classes that override equals(), but you can't use it on a subclass of Object itself because you run into that roadblock of identity.

Wouldn't it be nice to have hierarchies of classes that implement equals() the same way they implement constructors? - ie, in chains. That way, you could implement hierarchies that can allow comparison between subtypes (which we will call "mixed-type comparison" from now on), and also - with a few caveats - refine the meaning of "equals" for subgroups. And this is what Martin Odersky's article is all about.

.

This is my solution and (I hope), like most good ones, it's simple; but it's taken me more than a year to work out, so don't assume that the "why" part is simple.

Also: It's by no means perfect. I don't think any "solution" to the equals() problem is. See the caveats section at the end to find out why.

The page is quite long, but most of what you need to use it is contained in the next three sections; the rest is "background stuff".

.

THE SOLUTION  

Simple - Redefine the default behaviour of equals() and hashCode() to make objects equal (in this case, a "qualified equal"), and make users override them consistently.

Now it's far too late to start mucking about with Object, but we can use that old designer jewel, the "layer of indirection" - viz:











and have every hierarchy for which you want to implement "flexible equality" extend Equal instead of Object.

Don't worry too much about the last two methods for now, we'll explain them later; although you may have noticed that equals() uses the last one to do its "type" check. For now, Equal objects are "equal" if they both match or subclass the same implementation (more on that later).

.

Note also that the class is abstract. We certainly don't want people instantiating Equal objects without some further qualification.

Also: it relies on users (or "subclassers") following explicit rules. They aren't complicated, but they are required.

This is no different to extending Object, which has several rules you need to follow to make equals() and hashCode() work correctly. Indeed, Equal arguably makes things simpler by not only telling you what to do, but how to do it.

However, you do have to stick to them, so if you can't be bothered to comply, you might as well not bother extending Equal to begin with.

For that reason, I suggest that you always put a reminder note in your class docs to say that "this class extends Equal".

.

I also suggest that you document Equal itself copiously, particularly with regards to overriding. My version (included below) is over 400 lines long, which is possibly overkill, but you do need to ensure that people know exactly how to use it. You should be able to get quite a lot of stuff from this page, but it should be in your own words.

.

THE EXPLANATION  

Sound too good to be true? Well, it isn't; but it does involve a few things that you need to know about.

.

First: The solution is aimed at object hierarchies - specifically: hierarchies of value objects where mixed-type comparisons and refining equals() is desirable - so:

Don't go extending Equal for every class you create.

That said, if you have a value class, and you're not sure if you're going to want to subclass it, you might consider having it extend Equal rather than Object.

You can even change an existing final class that extends Object to extend Equal if you want, providing you also change any existing methods to follow the rules outlined below. This might be particularly worth considering if you plan to remove the final qualifier for some reason.

.

Second: That new canEqual() method: What's that all about?

Well, basically, it's a type check. It ensures that the object (obj) that we're trying to compare can be compared. We don't want to be allowing comparisons of Strings with Dates, for example.

You're probably already used to putting it (or something like it) in your equals() methods anyway; putting it in a separate method simply "decouples" it. And that's very important, because it means we can override it (indeed, immediate subclasses MUST override it, since it's abstract), and that's what provides the power of this solution.

It's probably also worth noting that it's protected. You can make it public if you like, but I'd advise against it.

Good OO programmers hide things that other classes don't need to know about.

So, when you override it - keep it protected.

.

Third:  There are now three methods to consider, and I suggest you treat them like "The Three Musketeers" - All for one, and one for all - with equals() as the "senior partner" of the trio. So:


  • If you decide to override equals(), override all three.
  • If you don't override equals(), leave all three out.   And that infers:
  • If you ever remove equals(), remove the other two as well.


  • .

    Fourth:  While Equal does permit mixed-type comparison, and also allows equals() to be refined, it is NOT universal. Specifically, as soon as you override equals() you create a subgroup that is no longer connected to its supertypes for the purposes of equality.

    Thus, if you have a String class that extends Equal, and then add a subclass called ColouredString which overrides equals() to add a colour to the comparison process, ColouredStrings will NOT be comparable to Strings (or vice-versa), much as you might like them to be. The reasons for this are explained later, and they're fairly involved.

    For the moment, more formally: A class in an Equal-based hierarchy is comparable with another class in that hierarchy if, and ONLY if, they both call the same equals() implementation. And this applies not just to simple parent-child relationships, but across sibling branches as well.

    .

    Fifth: Identity comparisons (ie, ones similar to those provided by Object) are generally inconsistent with Equal. You can apply them to a subclass, but they should then be regarded as final. Indeed, the Identity Comparisons section shows you exactly how to do this.

    .

    THE RULES FOR OVERRIDING  

    I can't stress enough how important these are. Equal provides a framework for "flexible comparison", but it won't make a scrap of difference if you don't follow the rules; so:

    Read them carefully, and don't forget them.


    Luckily, they're pretty simple. You already saw the first one above, but I'm going to repeat it again for emphasis:

    .

    Rule 1  There are now three methods: equals(), canEqual(), and hashCode(), and you should treat them like "The Three Musketeers":


  • If you decide to override equals(), override all three.
  • If you don't override equals(), leave all three out.   And that infers:
  • If you ever remove equals(), remove the other two as well.


  • And it's particularly important for those first two. Just think of equals() and canEqual() as "joined at the hip".

    .

    Rule 2  Immediate subclasses of Equal MUST override all three methods.

    If you think about it, it just makes sense. Equal basically makes all Equal objects "equal", so you need to establish some baseline for your hierarchy.

    The fact that canEqual() is abstract hopefully acts as a reminder, because it enforces it for that method, but you should also override the other two.

    .

    Rule 3  Overriding canEqual().

    This one's really simple: For a subclass Sub it MUST look precisely as follows:



    and that instanceof is essential. Don't try anything "clever" like class comparisons or reflective tests like isInstance(), because they won't work.

    Specifically:


  • The name after instanceof MUST be the name of the overriding class.
  • The method should NOT be made public.
  • As with equals(), the type of the 'obj' parameter is Object, NOT Sub. Adding the '@Override' annotation prevents this very common error, since the compiler will complain if you get it wrong.


  • Ie, the method should be coded exactly as you see.

    .

    Rule 4  Overriding equals().

    Basically, all equals() methods should look the same. And that means as follows for a subclass Sub:









    Obviously, that last part will be different, but the stuff before it should be identical.

    The '@SuppressWarnings' annotation is there to prevent a compiler warning, but don't worry: providing you always follow the pattern, the assignment WILL work.

    In some cases it's possible that the only thing that is added is type. One could imagine this happening, for example, with an Employee subclass of a Person. In such a case, there will be no "extra fields" to compare, so the assignment is unnecessary; but the super.equals(obj) test IS still required.

    .

    Rule 5  Overriding hashCode().

    As with the others: All hashCode() methods should look the same. And again - assuming you're on Version 7 or later - that means precisely as follows:



    where



    denotes the fields used for comparison in equals(); and they should always be preceded by super.hashCode(). You are allowed to cache the value if you like, but the calculation itself should look precisely as above.

    As stated above, it's possible in some cases that the only thing that is changed is type. In those cases, you could just have the method return super.hashCode(); but to be honest, I'd stick to the pattern - ie, return Objects.hash(super.hashCode()).

    Whatever you decide, it must be overridden.

    And if you're not on version 7? If you're on version 5 or later, you can use:



    And if you're not on version 5? Write a utility method that does something similar to Objects.hash() and use that.

    But the main question should probably be: Why aren't you?

    .

    Just one further point: As with subclasses of Object, it's perfectly reasonable to say that you don't want equals() overridden any further. In that case, simply add the final qualifier to all 3 methods.

    .

    WHY DO WE NEED IT?  

    Simply put: To stop the spread of that appalling pattern known as "class-based equals() methods".

    When you have a hierarchy, chances are you will want to refine the meaning of equals() for subclasses. Unfortunately, this is not easy to achieve (more on that later). Indeed, Odersky's article cites a 2007 paper which concluded that:

    "Almost all implementations of equals() methods are faulty".

    In reaction (I suspect) to this, many books have advocated "class-based" equals() methods, and if you've spent any time reading code you will almost certainly have seen one. One of the first checks they usually do is:



    Which basically says: "an object can't be equal to ours if it isn't the same class as ours."

    Ooof.

    It's certainly simple. And it certainly allows equals() methods to be overridden. But it does it at the expense of making each class a stand-alone type for the purposes of equality. If I create an anonymous subclass that doesn't even override equals(), that object will still be different to an instance of its parent, even if everything else about them is the same.

    Ugh. Did I sign up for that? NO.

    And furthermore, many texts that advocate it don't even warn you about the consequences. I have also yet to see one that suggests you include the class's hashcode in your hashCode() calculation.

    It's a "solution" that blindly trashes "objectivity" for simplicity; and in my opinion it should never have been allowed to see the light of day.

    Extending Equal, on the other hand, doesn't require that draconian "class test". Indeed, subclasses are equal to their parents by default. Isn't that what you'd expect? And yet you can still override equals(), as long as you follow a few basic rules.

    .

    REFINE vs REDEFINE  

    You may have noticed that I've been careful to use the word "refine" when referring to overriding equals(), rather than "change" or "redefine".

    The reason is that hierarchies, in the main, exist to allow specialization:

    String to ColouredString, Person to Employee, Animal to Dog, etc.

    which means that changes to equals() comparisons are usually "additive". A subclass introduces some new value (sometimes just its type) that is needed to refine the notion of equality.

    Equal itself starts out life treating all Equal objects as "equal"; and the rules you've seen so far require that overridden equals() methods call super.equals(). This pretty much forces them to be "additive".

    So remember:

    Equal is generally inconsistent with hierarchies that need to redefine the meaning of equals() for subclasses - with one major exception: identity comparisons.

    .

    However, since life tends to throw up these little things, Here's how to code equals() for a subclass called 'Disjoint' that truly needs to redefine the notion of "equal":









    Not too terrible, and still pretty consistent with what you've seen before. And maybe now you understand why that compatible() method is there.


    And since it now defines a new baseline for equality, if you need to refine this method, you can go back to the style you saw earlier.

    I would suggest using it sparingly though.

    It's probably also worth mentioning that, since you're redefining equals(), your hashCode() method should calculate a new code from scratch. canEqual(), however, should look exactly as before.

    .

    Incidentally, you may think that making subgroups separate whenever you override equals() is inconsistent with the idea of an "additive" method; and you'd be right:

    In an ideal world, we would be able to compare classes any way we like. Unfortunately, we can't, because its outlawed by something called the "equals() contract", explained in the next section.

    Sometimes life just isn't simple.


    .

    THE 'equals()' CONTRACT  

    Before we can explain how Equal works, we first need to understand why it's needed, and in order to do that you need to understand the equals() contract, which is fully described in the Object.equals() docs. I don't propose to go into it in great detail, save to say that there are two rules that are especially worthy of note:


  • It must be symmetric: ie, x.equals(y) == y.equals(x).
  • It must be transitive: if x.equals(y) and y.equals(z), then x.equals(z).


  • and they are NOT simple to guarantee - indeed, they can't be guaranteed - because they may involve implementations from different classes - at least one of which, you may not have written.

    The best you can do (and Equal does) is to follow the contract from your side.

    Specifically, Equal.equals() requires that the object (obj) being compared with this - which extends Equal by definition - also extends Equal. This is a very good basis for guaranteeing the 'symmetric' rule, because we now know that both objects are subtypes of Equal and can therefore use its methods to check the relationship from both sides.

    And this where the rules you saw earlier are so important. If you follow them, the way Equal is coded actually guarantees both of the above rules at once - as you'll see...

    .

    SO...HOW DOES IT WORK?  

    Actually, you already know how it works: Implement an Equal class, extend it, and follow the rules for overriding.

    What you really want to know is why it works, and that's a lot more complex. So before you continue, put your thinking cap on.


    .

    Our rules require that:


  • All overridden equals() methods call super.equals() before any other comparison (except for the 'this == obj' test, which is simply to eliminate the case where objects are identical - and therefore "equal" - as quickly as possible).
  • The methods work as a "trio", so canEqual() is overridden when, and only when, equals() is.


  • This means that, unless both objects are identical, equals() logic must first bubble up to Equal.equals(), and that calls canEqual() in both directions via its call to compatible(). In essence, it tests:



    Therefore, given two objects - o1 and o2 - and the comparison o1.equals(o2): both o1.canEqual(o2) and o2.canEqual(o1) have to be 'true' in order for it to return 'true'.

    .

    Now that doesn't necessarily mean that o1 and o2 have to be the same type (canEqual() is an instanceof check, remember), but they must both be subclasses of - or the same class as - the class that canEqual() is called on.

    And the only way for that to happen is if they both call the same implementation.

    It takes a bit to get your head around, but trust me, it's true. Basically, you have a hierarchy of methods (our "Three Musketeers") superimposed on top of a hierarchy of classes. Neither o1 nor o2 is obliged to implement canEqual() itself (unless it's a direct subclass of Equal) but, if it doesn't, it will call the implementation of the closest superclass that does.

    And by extension, since equals() and canEqual() work in tandem, that means that they must both be calling the same equals() method.

    And incidentally, that's why the



    assignment is guaranteed to work, providing you follow the rules.

    .

    Phew.

    So where does all this get us? Well, since we now know that both o1 and o2 call the same equals() method, it stands to reason that o1.equals(o2) will return the same result as o2.equals(o1) - unless you're doing something very screwy in your field comparisons.

    And if we add a third object o3, and o2.equals(o3), we know that o3 must be calling that same equals() method as well.

    From there it's easy to extrapolate that if o1.equals(o2) and o2.equals(o3), then o1.equals(o3) must be 'true', since all three objects will be using the same method.

    Therefore: the comparison is both symmetric and transitive.

    .

    IDENTITY COMPARISONS  

    As stated earlier, these are generally incompatible with Equal-based hierarchies, but it's just possible that you might need one for one of your subclasses. If so, this is what you need to do (in this case we will call the subclass Unique):







    Pretty simple, eh?

    The main thing to note are those final qualifiers. Identity comparisons are the most discriminating, so there's really no point in allowing any further refinement. There's also no point in having equals() or hashCode() conform to the patterns you saw earlier because it's ... well ... pointless.

    In theory, you don't actually need to override canEqual(), since it will never be used; but I strongly suggest you put it in for consistency. It also ensures that the class will continue to work if you ever change those other methods later on.

    .

    DRAWBACKS  

    Obviously, I wouldn't have written the class (or this page) if I didn't think Equal had value, especially against its main rival pattern in this arena: the "class-based" equals() method (CBEM).

    However, CBEM does have one great merit: Simplicity. Once you add that class equality test, the method is bulletproof. It will work (unless you're a complete moron) and it will conform to the equals() contract, no matter how many times you override it.

    Equal requires that its subclasses follow rules. They're pretty simple, but if you fail to follow them, even by a little bit, you risk your equals() method not behaving as it should. And if you're just writing a subclass to an Equal-based hierarchy, you rely on every superclass also following the rules.

    The trouble is that there's simply no way to enforce them. You can document all you want, but if someone decides to be "clever", or feels that following rules stifles their "creativity", they can introduce subtle errors that may be difficult to detect.

    I may be naive, but I like to think that programmers are reasonably intelligent, and so don't need a solution that caters to the lowest common denominator. Particularly when - as in the case of CBEM - that solution is patently inferior.

    There is, however, one situation where CBEM may be preferable: a hierarchy where type is the principal determining factor of inequality.

    One can imagine, for example, an Animal hierarchy, where no subspecies is ever equal to any other, regardless of similarity. If you were to base that hierarchy on Equals, you'd have to override equals() for every subclass. For a CBEM-style hierarchy, you might only have to override it when something significant changes.

    However, I can also imagine something like this:









    Animal could now extend ClassBased and, providing you follow the rules (apart from canEqual(), which is now final), its subclasses will work just like any normal CBEM hierarchy.

    Indeed, my Equal implementation (included below) contains just such a class; although I must admit, I haven't used it in anger...YET.


    .

    Another little wrinkle  

    There is an optimization you can add to equals() that may be worth considering:









    It's based on the premise that hashcode calculations are usually faster than equality checks - particularly if their results are cached - and that objects tend to be unequal more often than not. However, it does require that you adhere strictly to the Three Musketeers rule:

    If you override equals() you MUST override hashCode(), AND vice-versa.

    Normally, I wouldn't advise "nano" optimizations like this; but equals() is a bit of a special case, since it tends to be heavily used by Java collections and in common operations like sorts and searches.

    .

    CAVEATS (there's always a 'but')  

    A class that's a direct subclass of Equal - providing it follows the rules - is fine. It's no different to any other class that extends Object. And providing nothing else in its hierarchy overrides equals(), its subclasses can be freely and mutually compared.

    However, changing an equals() method as we go down a hierarchy - even if it's just for the purposes of refining, and no matter how reasonable it may seem - is arguably a violation of something called the Liskov Substitution Principle (LSP).

    .

    What LSP says (essentially) is that if you have a class C, and a subclass S of C, then S must be substitutable for C in all situations. It's basically a formal declaration of the fact that S "is-a" C.

    And that's precisely what Equal breaks. As explained earlier, if we override equals() for some subclass S, we're saying: "we're redefining equals() for S's portion of the hierarchy" - so, for the purposes of equals(), S is no longer a subclass of C; it's now the head of its own isolated subgroup.

    The trouble is: Java doesn't know that, and will happily continue to assume that S is a subclass of C. And moreover, there's no way of telling it otherwise. So, if you have a Set<C> you may be able to mix S and non-S objects in it in ways that you didn't intend.

    .

    Unfortunately, it's basically forced on us by the the equals() contract. There is simply no easy way to allow mixed-type comparison between objects that have different ways of determining equality without breaking the symmetric or transitive rules. Equal does the best it can, and also keeps the process very simple.

    It's also far less LSP-hostile than "class-based" equals() methods which require both objects to be the same class in order to be "equal". They violate LSP by definition.

    Equal objects also conform, in the main, to another well-known axiom: the Principle of Least Surprise. They are "equal" by default unless you override equals(), in which case you create a subgroup whose types are all equal, unless/until you override it again...

    .

    It's also not really Equal's fault that they can produce odd results in collections. If the Java Collections Framework had a Discriminator interface for equality, in the same way that we have Comparators for ordering, we could tell a Set or Map how to determine whether or not an object exists.

    However, it doesn't (at least for the moment), so you're stuck with the world as it is.

    .

    If you document well, you should be able to avoid situations like the one above; and if you find what Equal offers useful, you should use it.

    Just be warned that LSP dangers lurk for the unaware...


    .

    That's about it really: A basis for flexible, "objective" equals() methods. Use it, and enjoy.




    My Equal implementation  

    I include it merely to illustrate just how much documentation you need to write for a "library" class like this. Without it, the class would be 51 lines long; with it, it's over 450.

    You will notice that much of it repeats what you've already read; so if you do decide to use the class, you may well want to re-write some of it in your own words, or even prune places where you think it's excessive.

    You may also notice that the code isn't exactly as you saw earlier, but most of the changes are just internal normalizations to cater for the ByClass variant; the effect of each method is precisely as you've seen.

    And, as with any software, it's provided with no guarantees; so if you decide to copy and paste it, make sure you test it WELL.






    CategoryWinston
     
      Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!