Problem with hashcode etc

May 31, 2009 23:33:28

I have read the Java tutorial and as much as I can google on the above subject but I am struggling to understand the basic concept. Unfortunately my brain will not absorb any more information until the block is removed. The problem is this:

"you will calculate the same hashCode, and will be able to look up numerically with it. Of course there is always the possibility two different Strings will have the same digest hashCode. However, even then, all is not lost; it greatly narrows down the search, hence speeding it up. "

As I understand it collisions are put into an array list or linked list (not come to that yet, maybe that's my problem). But I can't see that in example code.

Under what circumstances does ' near enough is good enough ' apply. Reminds me of time a couple if years ago I was held at immigration because I had the same surname as someone they were seeking whose other names were totally different from mine. The reason given was, it might be me!
Is there a Ladybird book in simple words and large print available on this subject anyone can recommend?

Jun 01, 2009 00:08:21

hash tables are an integral part of chess programming. If you go to the following site and do some searching for hash, you will find a lot of information.

http://chessprogramming.wikispaces.com/

The following article in particular is quite good.

http://www.cis.uab.edu/hyatt/collisions.html

as long as you know the basics of chess, you should get some mileage out of these sites, and a different insight into hash collisions. These aren't java articles, but general principles still apply.

Jun 01, 2009 00:53:49

Howdy James. Think of hashcode this way. Suppose you want to "hash" something say username and password. Now say the username is "james" and password is "abc". Another user username may be "tom" and his password is "def". To produce the hash may look like this:

Is it possible to produce the same hashcode using this way? If you answer yes... you are correct. At least I think so. Why? Ignoring database concepts of primary keys and such, what happens if another user also called james and have his username name "james" but his password something different? Eventually that new James may have an identical password as "abc" which is same as yours. And when that happens it will be impossible to identify that account as you or the new James.

IP addresses may be a better example. If you use the computer IP address produce the hashcode, you are bound to have trouble because a user can log in on different computers. Hope this makes some sense.

Jun 01, 2009 01:08:55

K. Tsang wrote:Howdy James. Think of hashcode this way. Suppose you want to "hash" something say username and password. Now say the username is "james" and password is "abc". Another user username may be "tom" and his password is "def". To produce the hash may look like this:
int hash = username.hashcode() + password.hashcode();
Is it possible to produce the same hashcode using this way? If you answer yes... you are correct. At least I think so. Why? Ignoring database concepts of primary keys and such, what happens if another user also called james and have his username name "james" but his password something different? Eventually that new James may have an identical password as "abc" which is same as yours. And when that happens it will be impossible to identify that account as you or the new James.

IP addresses may be a better example. If you use the computer IP address produce the hashcode, you are bound to have trouble because a user can log in on different computers. Hope this makes some sense.

which is why it can be so hard to get a user id that you like on sites like yahoo, etc. They don't allow duplicates, for the very reasons you describe. And all the good id's are already taken. Also a reason why so many sites use email address as user id.

Jun 01, 2009 08:10:46

Hi James,

I'm not sure any of the above does much to clarify the situation. Let's look at a concrete example. Imagine you are an instructor who teaches a class of 30 students, and they have taken an exam. You've marked the exam, and now you'd like to return their graded papers to them. Each paper, of course, has a mark on it.

Unfortunately, you must leave town, so you can't hand the papers out in class; you'd like to leave the papers in your office for the students to quickly pick up. You want to organize the papers so that the students can find their own paper quickly. You decide to use a sort of hashing algorithm to do this. The name on each paper is the "key" that we're going to hash, while the paper itself with the mark on it is the data that will come along for the ride.

You choose an extraordinarily simple hashing algorithm: the hashcode for a student's name is the first letter of their last name. There are therefore 26 possible hash values.

You make a row of piles: the first pile contains the papers for the students whose last names hash to 1 -- i.e., the names that start with "A". The second pile is for hash 2, and so on, up to the pile for hash 26.

Some of the piles are empty, and some of them have more than one paper in it. But all the piles are small, and the average pile size is 30/26 == 1.15 papers.

To find their paper, a student instantly identifies the appropriate pile by computing the hash value for their name. Then they do a very short linear search through that pile for their own paper by comparing the name on each paper to their own name. (Note that this is why the Javadocs make such a big deal about overloading both hashCode() and equals() together, so they're consistent. If you use hashCode() you almost certainly will need to use equals() in this way!)

The row of piles is a hash table -- that's all there is to it! This is exactly how java.util.HashMap (and its older, uglier cousin java.util.Hashtable) works.

Note that if the hashCode() function is more complex, so that there are an inconvenient number of possible values (in Java, hashCode() can return any integer, so this is always true at least in theory) then it's common to use the absolute value of the hash code modulo some arbitrary number of piles. In the above example, we could use 10 piles and use "hashCode() % 10" to sort the papers into ten piles.

Jun 01, 2009 08:51:18

Ernest Friedman-Hill wrote:Hi James,

...

To find their paper, a student instantly identifies the appropriate pile by computing the hash value for their name. Then they do a very short linear search through that pile for their own paper by comparing the name on each paper to their own name. (Note that this is why the Javadocs make such a big deal about overloading both hashCode() and equals() together, so they're consistent. If you use hashCode() you almost certainly will need to use equals() in this way!)

Ernest, I think your paragraph cuts to the essense of the original question quicker than my examples, which are rather complex.

In chess it is very simple. There is a hash code that represents a position, and a table of some kind of table that represents all possible positions in analysis to a certain move depth. It is very likely that the same position is possible through different move sequences, and the table is search to see if a position has occurred previously, so we don't have to evaluate the position more than once. The collision problem occurs when the same hash code can happen for two different positions, and considerable time and effort is devoted to dealing with this possibility. Another factor is whether or not it is practical to represent all possible factors which make a position unique. The speed of the algorithm is everything here.

In any event the issue hash collisions must be resolved. Very often it is a tradeoff. Which is more efficient? devoting extra effort to ensuring that hash codes are created uniquely? Or, as in your case I believe, taking extra steps after the fact to deal with the uniqueness thing.

Jun 01, 2009 09:58:13

+1 for a nice explanation.

Jun 01, 2009 20:13:07

Fred, do I have to write some code to do the final sort if there is more than one item returned for a hash code? I can't see it in the examples I have seen. Thanks.

Jun 01, 2009 20:42:23

james dunster wrote:Fred, do I have to write some code to do the final sort if there is more than one item returned for a hash code? I can't see it in the examples I have seen. Thanks.

You know, I'm pretty good with the theory, but not so good with the implementation. I know the reasons for hash tables, and what sorts of things cause problems, but haven't yet mastered the actual details of how to do it. I was just trying to give a high level overview.

That being said, if there is a specific example from the links I provided, then please specify which example, and I will try to be of assistance.

Otherwise, someone else here could probably help you better.

regards.

Jun 01, 2009 20:48:33

james dunster wrote:Fred, do I have to write some code to do the final sort if there is more than one item returned for a hash code? I can't see it in the examples I have seen. Thanks.

After the collection uses the hashcode to locate the "bucket", it uses the equals() method to find the right object in that bucket. Hence, it will never return two objects.

Henry

Jun 04, 2009 02:39:15

After the collection uses the hashcode to locate the "bucket", it uses the equals() method to find the right object in that bucket. Hence, it will never return two objects.

so I have to override the equals function to that it compares everything used to generate the hashcode?

Jun 04, 2009 03:09:15

james dunster wrote:

After the collection uses the hashcode to locate the "bucket", it uses the equals() method to find the right object in that bucket. Hence, it will never return two objects.

so I have to override the equals function to that it compares everything used to generate the hashcode?

In general you should use same set of attributes for both equals & hashcode. Check this for more details.

Jun 04, 2009 03:25:12

Yes, if you want collections framework classes to work correctly, you must override both equals() and hashCode().

Find the chapter in Bloch's book about equals(), or Google for "Angelika Langer Java equals" for details, because the equals method appears easy to implement, but it isn't at all easy.

I hired a bunch of ninjas. The fridge is empty, but I can't find them to tell them the mission.

a bit of art, as a gift, that will fit in a stocking

https://gardener-gift.com