# Very large object equals...

pascal gehl
Greenhorn
Posts: 14
Hi,

On my project we have metal-model based objects (basically HashMap with AttributeName - AttributeValue).
Some of our object can be very very large (a big object tree).
I wonder if some of you guys know a OS framework providing a very performant EqualsBuilder for this case.

Reid M. Pinchback
Ranch Hand
Posts: 775
I'd suggest creating your own subclass of HashMap. Assuming your keys distribute evenly, every "put" xors the key hashcode into a group hashcode. Equals tests the group value first; if different, the groups aren't equal. If the same, do the normal comparison. For moderately large hashmaps, it will be statistically rare for you to do the normal comparison.

The algorithm above assumes that you have many keys, but don't tend to use them all. The obvious alteration for situations when you always use all the keys is to do the xors with the hashcodes from the values instead of the keys. Potentially a bit slower - depends on how the value objects implement their hashcode method.
[ January 20, 2006: Message edited by: Reid M. Pinchback ]

Ilja Preuss
author
Sheriff
Posts: 14112
Have you tried how fast a naive equals implementation is in your worst- and average case scenario?

Jim Yingst
Wanderer
Sheriff
Posts: 18671
[Note - this was written without seeing the other replies, so I'm responding directly to the first post.]

Why not just use the equals() method of the HashMap? Should do just what you need, no?

If your objects are immutable (something I generally recommend for any object that doesn't need to be mutable), then there are a few other tricks you might employ to get a really fast equals(). You can (lazily?) cache the hash code and use that as a quick check of equality. If two objects have different hash codes, they must not be equal, and you can return false immediately. (Assuming you've defined hashCode() correctly.) If the hash codes are equal, you still need to check that all the fields are really equal by calling the HashMap.equals().

Or, if the objects are immutable and you can afford to keep all instances of your class in memory at the same time, then you could use a Set or perhaps a WeakHashMap or ConcurrentHashMap to store all the objects in existence. Make the constructor of your class private and create a factory method which does a lookup in the map to see if there's already an equal object in existence. Return the existing object if it does exist, otherwise create a new one and put it in the map. Much like String.intern(), but set up so this is the only way to get an instance. Then your equals() method can just call ==, since you've guaranteed that no two separate instnaces are equal. (There are some potential complications with serialization and multiple classloaders, but skipping those for now...) Using WeakHashMap would allow GC to reclaim memory when a given instance is no longer needed, which may be useful.
[ January 20, 2006: Message edited by: Jim Yingst ]