(Level: intermediate)
After reading
this article, and Item 8
of
Effective Java - and also spending quite a lot of time writing the
FirstClasses page - I've latterly come to the conclusion that the designers of
Java may have made a mistake when they wrote the
equals() and
hashCode() methods of
Object.
Why? Because they assume that objects are
unequal unless they're
the same object.
And why is that wrong? Think about what "equals"
means.
From the docs for
Object.equals() itself:
The equals method implements an equivalence
relation on non-null object references...
.
"Equivalence" note, not
identity. Indeed, the implementations for
Object are simply pieces of logic that can already be done in other ways (the
'==' operator and
System.identityHashCode()).
And since we have no "contents" by which to verify 'Object equivalence', why should we naturally assume that they're
different?
My suspicion: Collections. Specifically:
Sets and
Maps.
Having
Object default to an
identity comparison allows us to happily plough an object into
any collection and know that it will be there unless an
overridden equals() (or
hashCode()) method dictates otherwise.
.
However, this creates another, less obvious problem. Namely: you can't use
super.equals() consistently. You can use it on subclasses of classes that
override equals(), but you
can't use it on a subclass of
Object itself because you run into that roadblock of identity.
Wouldn't it be nice to have hierarchies of classes that implement
equals() the same way they implement constructors? - ie, in
chains. That way, you could implement hierarchies that
can allow comparison between subtypes (which we will call "mixed-type comparison" from now on), and also - with a few caveats - refine the
meaning of "equals" for subgroups. And
this is what Martin Odersky's article is all about.
.
This is my solution and (I hope), like most good ones, it's simple; but it's taken me more than a year to work out, so don't assume that the "why" part is simple.
Also: It's by no means perfect. I don't think
any "solution" to the
equals() problem is. See the
caveats section at the end to find out why.
The page is quite long, but most of what you need to
use it is contained in the next three sections; the rest is "background stuff".
.
THE SOLUTION
Simple - Redefine the default behaviour of
equals() and
hashCode() to make objects
equal (in this case, a "qualified equal"), and make users override them
consistently.
Now it's far too late to start mucking about with
Object, but we
can use that old designer jewel, the "layer of indirection" - viz:
and have every hierarchy for which you want to implement "flexible equality" extend
Equal instead of
Object.
Don't worry too much about the last two methods for now, we'll explain them later; although you may have noticed that
equals() uses the last one to do its "type" check. For now,
Equal objects are "equal" if they
both match or subclass the same
implementation (more on that later).
.
Note also that the class is
abstract. We certainly don't want people instantiating
Equal objects without some further qualification.
Also: it relies on users (or "subclassers")
following explicit rules. They aren't complicated, but they
are required.
This is no different to extending
Object, which has several rules you need to follow to make
equals() and
hashCode() work
correctly. Indeed,
Equal arguably makes things simpler by not only telling you
what to do, but
how to do it.
However, you
do have to stick to them, so if you can't be bothered to comply, you might as well not bother extending
Equal to begin with.
For that reason, I suggest that you always put a reminder note in your class docs to say that "this class extends
Equal".
.
I also suggest that you document
Equal itself
copiously, particularly with regards to
overriding. My version (included
below) is over 400 lines long, which is possibly overkill, but you do need to ensure that people know
exactly how to use it.
You should be able to get quite a lot of stuff from this page, but it should be in your own words.
.
THE EXPLANATION
Sound too good to be true? Well, it isn't; but it does involve a few things that you need to know about.
.
First: The solution is aimed at object
hierarchies - specifically: hierarchies of
value objects where mixed-type comparisons and refining
equals() is desirable - so:
Don't go extending Equal for every class you create.
That said, if you have a value class, and you're not sure if you're going to want to subclass it, you might consider having it extend
Equal rather than
Object.
You can even change an
existing final class that extends
Object to extend
Equal if you want,
providing you also change any existing methods to follow the rules outlined below. This might be particularly worth considering if you plan to
remove the
final qualifier for some reason.
.
Second: That new
canEqual() method: What's that all about?
Well, basically, it's a type check. It ensures that the object (
obj) that we're trying to compare
can be compared. We don't want to be allowing comparisons of Strings with Dates, for example.
You're probably already used to putting it (or something like it) in your
equals() methods anyway; putting it in a separate method simply "decouples" it. And that's
very important, because it means we can override it (indeed, immediate subclasses MUST override it, since it's
abstract), and
that's what provides the power of this solution.
It's probably also worth noting that it's
protected. You can make it
public if you like, but I'd advise against it.
Good OO programmers hide things that other classes don't need to know about.
So, when you override it -
keep it protected.
.
Third:
There are now
three methods to consider, and I suggest you treat them like "The Three Musketeers" -
All for one, and one for all - with
equals() as the "senior partner" of the trio. So:
If you decide to override equals(), override all three.If you don't override equals(), leave all three out. And that infers:If you ever remove equals(), remove the other two as well.
.
Fourth:
While
Equal does permit mixed-type comparison, and also allows
equals() to be refined, it is NOT universal. Specifically, as soon as you override
equals() you create a subgroup that is
no longer connected to its supertypes for the purposes of equality.
Thus, if you have a
String class that extends
Equal, and then add a subclass called
ColouredString which overrides
equals() to add a colour to the comparison process,
ColouredStrings will NOT be comparable to
Strings (or vice-versa), much as you might like them to be. The reasons for this are explained later, and they're fairly involved.
For the moment, more formally: A class in an
Equal-based hierarchy is comparable with another class in that hierarchy if, and ONLY if, they both call
the same equals() implementation. And this applies not just to simple parent-child relationships, but across sibling branches as well.
.
Fifth: Identity comparisons (ie, ones similar to those provided by
Object) are generally
inconsistent with
Equal. You
can apply them to a subclass, but they should then be regarded as
final. Indeed, the
Identity Comparisons section shows you exactly how to do this.
.
THE RULES FOR OVERRIDING
I can't stress enough how important these are.
Equal provides a
framework for "flexible comparison", but it won't make a scrap of difference if you don't follow the rules; so:
Read them carefully, and don't forget them.
Luckily, they're pretty simple. You already saw the
first one above, but I'm going to repeat it again for emphasis:
.
Rule 1 There are now
three methods:
equals(),
canEqual(), and
hashCode(), and you should treat them like "The Three Musketeers":
If you decide to override equals(), override all three.If you don't override equals(), leave all three out. And that infers:If you ever remove equals(), remove the other two as well.
And it's
particularly important for those first two. Just think of
equals() and
canEqual() as "joined at the hip".
.
Rule 2 Immediate subclasses of
Equal MUST override all three methods.
If you think about it, it just makes sense.
Equal basically makes all
Equal objects "equal", so you need to establish some baseline for your hierarchy.
The fact that
canEqual() is
abstract hopefully acts as a reminder, because it enforces it for that method, but you should
also override the other two.
.
Rule 3 Overriding
canEqual().
This one's really simple: For a subclass
Sub it MUST look
precisely as follows:
and that
instanceof is
essential. Don't try anything "clever" like class comparisons or reflective tests like
isInstance(), because
they won't work.
Specifically:
The name after instanceof MUST be the name of the overriding class.The method should NOT be made public.As with equals(), the type of the 'obj' parameter is Object, NOT Sub. Adding the '@Override' annotation prevents this very common error, since the compiler will complain if you get it wrong.
Ie, the method should be coded
exactly as you see.
.
Rule 4 Overriding
equals().
Basically, all
equals() methods should look the same. And that means as follows for a subclass
Sub:
Obviously, that last part will be different, but the stuff before it should be
identical.
The '@SuppressWarnings' annotation is there to prevent a compiler warning, but don't worry: providing you always follow the pattern, the assignment WILL work.
In some cases it's possible that the
only thing that is added is type. One could imagine this happening, for example, with an
Employee subclass of a
Person. In such a case, there will be no "extra fields" to compare, so the assignment is unnecessary; but the
super.equals(obj) test IS still required.
.
Rule 5 Overriding
hashCode().
As with the others: All
hashCode() methods should look the same. And again - assuming you're on Version 7 or later - that means
precisely as follows:
where
denotes the fields used for comparison in
equals(); and they should always be
preceded by
super.hashCode(). You are allowed to cache the value if you like, but the calculation itself should look precisely as above.
As stated above, it's possible in some cases that the
only thing that is changed is type. In those cases, you could just have the method return
super.hashCode(); but to be honest, I'd stick to the pattern - ie, return
Objects.hash(super.hashCode()).
Whatever you decide,
it must be overridden.
And if you're not on version 7? If you're on version 5 or later, you can use:
And if you're not on version 5? Write a utility method that does something similar to
Objects.hash() and use that.
But the main question should probably be:
Why aren't you?
.
Just one further point: As with subclasses of
Object, it's perfectly reasonable to say that you
don't want
equals() overridden any further. In that case, simply add the
final qualifier to all 3 methods.
.
WHY DO WE NEED IT?
Simply put: To stop the spread of that
appalling pattern known as "class-based
equals() methods".
When you have a hierarchy, chances are you will want to
refine the meaning of
equals() for subclasses. Unfortunately, this is
not easy to achieve (more on that later). Indeed, Odersky's article cites a 2007 paper which concluded that:
"
Almost all implementations of equals() methods are faulty".
In reaction (I suspect) to this, many books have advocated "class-based"
equals() methods, and if you've spent any time reading code you will almost certainly have seen one. One of the first checks they usually do is:
Which basically says: "an object
can't be equal to ours if it isn't
the same class as ours."
Ooof.
It's certainly simple. And it certainly allows
equals() methods to be overridden. But it does it at the expense of making each class a
stand-alone type for the purposes of equality. If I create an
anonymous subclass that doesn't even
override equals(), that object will still be
different to an instance of its parent, even if everything else about them is the same.
Ugh. Did I sign up for that?
NO.
And furthermore, many texts that advocate it don't even
warn you about the consequences. I have also yet to see one that suggests you include the
class's hashcode in your
hashCode() calculation.
It's a "solution" that blindly trashes "objectivity" for simplicity; and in my opinion it should never have been allowed to see the light of day.
Extending
Equal, on the other hand,
doesn't require that draconian "class test". Indeed, subclasses are equal to their parents
by default. Isn't that what you'd expect? And yet you can still override
equals(), as long as you follow a few basic rules.
.
REFINE vs REDEFINE
You may have noticed that I've been careful to use the word "refine" when referring to overriding
equals(), rather than "change" or "redefine".
The reason is that hierarchies, in the main, exist to allow
specialization:
String to
ColouredString,
Person to
Employee,
Animal to
Dog, etc.
which means that changes to
equals() comparisons are
usually "additive". A subclass introduces some new value (sometimes just its type) that is needed to
refine the notion of equality.
Equal itself starts out life treating all
Equal objects as "equal"; and the
rules you've seen so far require that
overridden equals() methods call
super.equals(). This pretty much
forces them to be "additive".
So remember:
Equal is generally inconsistent with hierarchies that need to redefine the meaning of equals() for subclasses - with one major exception:
identity comparisons.
.
However, since life tends to throw up these little things, Here's how to code
equals() for a subclass called
'Disjoint' that
truly needs to
redefine the notion of "equal":
Not too terrible, and still pretty consistent with what you've seen before. And maybe
now you understand why that
compatible() method is there.
And since it now defines a new
baseline for equality, if you need to refine
this method, you can go back to the style you saw
earlier.
I would suggest using it sparingly though.
It's probably also worth mentioning that, since you're
redefining equals(), your
hashCode() method should calculate a new code from scratch.
canEqual(), however, should look exactly as before.
.
Incidentally, you may think that making subgroups
separate whenever you override
equals() is inconsistent with the idea of an "additive" method; and you'd be right:
In an ideal world, we would be able to compare classes any way we like. Unfortunately, we can't, because its outlawed by something called the "
equals() contract", explained in the
next section.
Sometimes life just isn't simple.
.
THE 'equals()' CONTRACT
Before we can explain how
Equal works, we first need to understand
why it's needed, and in order to do that you need to understand the
equals() contract, which is fully described in the
Object.equals() docs. I don't propose to go into it in great detail, save to say that there are two rules that are especially worthy of note:
It must be symmetric: ie, x.equals(y) == y.equals(x).It must be transitive: if x.equals(y) and y.equals(z), then x.equals(z).
and they are NOT simple to guarantee - indeed, they
can't be guaranteed - because they may involve implementations from
different classes - at least one of which,
you may not have written.
The best you can do (and
Equal does) is to follow the contract from
your side.
Specifically,
Equal.equals() requires that the object (
obj) being compared with
this - which extends
Equal by definition -
also extends
Equal. This is a very good basis for guaranteeing the 'symmetric' rule, because we now know that
both objects are subtypes of
Equal and can therefore use
its methods to check the relationship from
both sides.
And this where the
rules you saw earlier are so important. If you follow them, the way
Equal is coded actually guarantees both of the above rules at once - as you'll see...
.
SO...HOW DOES IT WORK?
Actually, you already know
how it works: Implement an
Equal class, extend it, and follow
the rules for overriding.
What you really want to know is
why it works, and that's a lot more complex. So before you continue,
put your thinking cap on.
.
Our
rules require that:
All overridden equals() methods call super.equals() before any other comparison (except for the 'this == obj' test, which is simply to eliminate the case where objects are identical - and therefore "equal" - as quickly as possible).The methods work as a "trio", so canEqual() is overridden when, and only when, equals() is.
This means that, unless both objects are identical,
equals() logic must
first bubble up to
Equal.equals(), and
that calls
canEqual() in
both directions via its call to
compatible(). In essence, it tests:
Therefore, given two objects - o1 and o2 - and the comparison
o1.equals(o2):
both o1.canEqual(o2) and o2.canEqual(o1) have to be 'true' in order for it to return 'true'.
.
Now that doesn't necessarily mean that o1 and o2 have to be the
same type (
canEqual() is an
instanceof check, remember), but they must
both be subclasses of - or the same class as - the class that
canEqual() is called on.
And the only way for
that to happen is if they both call
the same implementation.
It takes a bit to get your head around, but trust me, it's true. Basically, you have a hierarchy of methods (our "Three Musketeers") superimposed on top of a hierarchy of
classes. Neither o1 nor o2 is obliged to implement
canEqual() itself (unless it's a direct subclass of
Equal) but, if it doesn't, it will call the implementation of the
closest superclass that does.
And by extension, since
equals() and
canEqual() work in tandem, that means that they must both be calling
the same equals() method.
And incidentally,
that's why the
assignment is guaranteed to work,
providing you follow the rules.
.
Phew.
So where does all this get us? Well, since we now know that both o1 and o2 call
the same equals() method, it stands to reason that
o1.equals(o2) will return the same result as
o2.equals(o1) - unless you're doing something
very screwy in your field comparisons.
And if we add a third object o3, and
o2.equals(o3), we know that o3 must be calling that same
equals() method as well.
From there it's easy to extrapolate that if
o1.equals(o2) and
o2.equals(o3), then
o1.equals(o3) must be 'true', since all three objects will be using
the same method.
Therefore: the comparison
is both symmetric
and transitive.
.
IDENTITY COMPARISONS
As stated earlier, these are generally incompatible with
Equal-based hierarchies, but it's just possible that you might need one for one of your subclasses. If so, this is what you need to do (in this case we will call the subclass
Unique):
Pretty simple, eh?
The main thing to note are those
final qualifiers. Identity comparisons are the
most discriminating, so there's really no point in allowing any further refinement. There's also no point in having
equals() or
hashCode() conform to the patterns you saw earlier because it's ... well ... pointless.
In theory, you don't actually need to override
canEqual(), since it will never be used; but I strongly suggest you put it in for consistency. It also ensures that the class will
continue to work if you ever change those other methods later on.
.
DRAWBACKS
Obviously, I wouldn't have written the class (or this page) if I didn't think
Equal had value, especially against its main rival pattern in this arena: the "class-based"
equals() method (CBEM).
However, CBEM does have one great merit: Simplicity. Once you add that class equality test, the method is bulletproof. It will work (unless you're a complete moron) and it
will conform to the
equals() contract, no matter how many times you override it.
Equal requires that its subclasses follow
rules. They're pretty simple, but if you fail to follow them,
even by a little bit, you risk your
equals() method not behaving as it should. And if you're just writing a subclass to an
Equal-based hierarchy,
you rely on every superclass also following the rules.
The trouble is that there's simply no way to enforce them. You can document all you want, but if someone decides to be "clever", or feels that following rules stifles their "creativity", they can introduce subtle errors that may be difficult to detect.
I may be naive, but I like to think that programmers are reasonably intelligent, and so don't need a solution that caters to the lowest common denominator. Particularly when - as in the case of CBEM - that solution is patently
inferior.
There is, however, one situation where CBEM may be preferable: a hierarchy where type is the
principal determining factor of
inequality.
One can imagine, for example, an
Animal hierarchy, where no subspecies is
ever equal to any other, regardless of similarity. If you were to base that hierarchy on
Equals, you'd have to override
equals() for
every subclass. For a CBEM-style hierarchy, you might only have to override it when something significant changes.
However, I can also imagine something like this:
Animal could now extend
ClassBased and, providing you follow
the rules (apart from
canEqual(), which is now
final), its subclasses will work just like any normal CBEM hierarchy.
Indeed, my
Equal implementation (included
below) contains just such a class; although I must admit, I haven't used it in anger...YET.
.
Another little wrinkle
There is an optimization you can add to
equals() that may be worth considering:
It's based on the premise that hashcode calculations are usually faster than equality checks - particularly if their results are cached - and that objects tend to be
unequal more often than not. However, it
does require that you adhere strictly to the
Three Musketeers rule:
If you override equals() you MUST override hashCode(), AND vice-versa.
Normally, I wouldn't advise "nano" optimizations like this; but
equals() is a bit of a special case, since it tends to be heavily used by Java collections and in common operations like sorts and searches.
.
CAVEATS (there's always a 'but')
A class that's a direct subclass of
Equal - providing it follows
the rules - is fine. It's no different to any other class that extends
Object. And providing nothing else in its hierarchy overrides
equals(), its subclasses can be freely and mutually compared.
However,
changing an
equals() method as we go down a hierarchy - even if it's just for the purposes of refining, and no matter how reasonable it may seem - is arguably a violation of something called the
Liskov Substitution Principle (LSP).
.
What LSP says (essentially) is that if you have a class C, and a subclass S of C, then S
must be substitutable for C
in all situations. It's basically a formal declaration of the fact that S "is-a" C.
And that's
precisely what
Equal breaks. As explained earlier, if we override
equals() for some subclass S, we're saying: "we're redefining
equals() for S's portion of the hierarchy" - so, for the purposes of
equals(), S is
no longer a subclass of C; it's now the head of its own isolated subgroup.
The trouble is:
Java doesn't know that, and will happily continue to assume that S
is a subclass of C. And moreover, there's no way of telling it otherwise. So, if you have a
Set<C> you may be able to mix S and non-S objects in it in ways that you didn't intend.
.
Unfortunately, it's basically forced on us by the
the equals() contract. There is simply no easy way to allow mixed-type comparison between objects that have
different ways of determining equality without breaking the symmetric or transitive rules.
Equal does the best it can, and also keeps the process very simple.
It's also far
less LSP-hostile than "class-based"
equals() methods which require both objects to be
the same class in order to be "equal". They violate LSP
by definition.
Equal objects also conform, in the main, to another well-known axiom: the
Principle of Least Surprise. They are "equal" by default
unless you override
equals(), in which case you create a subgroup whose types are all equal, unless/until you override it
again...
.
It's also not really
Equal's fault that they can produce odd results in collections. If the Java Collections Framework had a
Discriminator interface for equality, in the same way that we have
Comparators for ordering, we could
tell a
Set or
Map how to determine whether or not an object exists.
However, it doesn't (at least for the moment), so you're stuck with the world as it is.
.
If you document well, you should be able to avoid situations like the one above; and if you find what
Equal offers useful, you should use it.
Just be warned that LSP dangers lurk for the unaware...
.
That's about it really: A basis for flexible, "objective"
equals() methods. Use it, and enjoy.
My Equal implementation
I include it merely to illustrate just how much documentation you need to write for a "library" class like this. Without it, the class would be 51 lines long; with it, it's over 450.
You will notice that much of it repeats what you've already read; so if you do decide to use the class, you may well want to re-write some of it in your own words, or even prune places where you think it's excessive.
You may also notice that the code isn't
exactly as you saw earlier, but most of the changes are just internal normalizations to cater for the
ByClass variant; the effect of each method is precisely as you've seen.
And, as with any software, it's provided with
no guarantees; so if you decide to copy and paste it, make sure you test it
WELL.
CategoryWinston