Forums Register Login

Problem with hashcode etc

+Pie Number of slices to send: Send
I have read the Java tutorial and as much as I can google on the above subject but I am struggling to understand the basic concept. Unfortunately my brain will not absorb any more information until the block is removed. The problem is this:

"you will calculate the same hashCode, and will be able to look up numerically with it. Of course there is always the possibility two different Strings will have the same digest hashCode. However, even then, all is not lost; it greatly narrows down the search, hence speeding it up. "

As I understand it collisions are put into an array list or linked list (not come to that yet, maybe that's my problem). But I can't see that in example code.

Under what circumstances does ' near enough is good enough ' apply. Reminds me of time a couple if years ago I was held at immigration because I had the same surname as someone they were seeking whose other names were totally different from mine. The reason given was, it might be me!
Is there a Ladybird book in simple words and large print available on this subject anyone can recommend?
+Pie Number of slices to send: Send
hash tables are an integral part of chess programming. If you go to the following site and do some searching for hash, you will find a lot of information.

http://chessprogramming.wikispaces.com/

The following article in particular is quite good.

http://www.cis.uab.edu/hyatt/collisions.html

as long as you know the basics of chess, you should get some mileage out of these sites, and a different insight into hash collisions. These aren't java articles, but general principles still apply.
+Pie Number of slices to send: Send
Howdy James. Think of hashcode this way. Suppose you want to "hash" something say username and password. Now say the username is "james" and password is "abc". Another user username may be "tom" and his password is "def". To produce the hash may look like this:

Is it possible to produce the same hashcode using this way? If you answer yes... you are correct. At least I think so. Why? Ignoring database concepts of primary keys and such, what happens if another user also called james and have his username name "james" but his password something different? Eventually that new James may have an identical password as "abc" which is same as yours. And when that happens it will be impossible to identify that account as you or the new James.

IP addresses may be a better example. If you use the computer IP address produce the hashcode, you are bound to have trouble because a user can log in on different computers. Hope this makes some sense.
+Pie Number of slices to send: Send
 

K. Tsang wrote:Howdy James. Think of hashcode this way. Suppose you want to "hash" something say username and password. Now say the username is "james" and password is "abc". Another user username may be "tom" and his password is "def". To produce the hash may look like this:

Is it possible to produce the same hashcode using this way? If you answer yes... you are correct. At least I think so. Why? Ignoring database concepts of primary keys and such, what happens if another user also called james and have his username name "james" but his password something different? Eventually that new James may have an identical password as "abc" which is same as yours. And when that happens it will be impossible to identify that account as you or the new James.

IP addresses may be a better example. If you use the computer IP address produce the hashcode, you are bound to have trouble because a user can log in on different computers. Hope this makes some sense.



which is why it can be so hard to get a user id that you like on sites like yahoo, etc. They don't allow duplicates, for the very reasons you describe. And all the good id's are already taken. Also a reason why so many sites use email address as user id.
+Pie Number of slices to send: Send
Hi James,

I'm not sure any of the above does much to clarify the situation. Let's look at a concrete example. Imagine you are an instructor who teaches a class of 30 students, and they have taken an exam. You've marked the exam, and now you'd like to return their graded papers to them. Each paper, of course, has a mark on it.

Unfortunately, you must leave town, so you can't hand the papers out in class; you'd like to leave the papers in your office for the students to quickly pick up. You want to organize the papers so that the students can find their own paper quickly. You decide to use a sort of hashing algorithm to do this. The name on each paper is the "key" that we're going to hash, while the paper itself with the mark on it is the data that will come along for the ride.

You choose an extraordinarily simple hashing algorithm: the hashcode for a student's name is the first letter of their last name. There are therefore 26 possible hash values.

You make a row of piles: the first pile contains the papers for the students whose last names hash to 1 -- i.e., the names that start with "A". The second pile is for hash 2, and so on, up to the pile for hash 26.

Some of the piles are empty, and some of them have more than one paper in it. But all the piles are small, and the average pile size is 30/26 == 1.15 papers.

To find their paper, a student instantly identifies the appropriate pile by computing the hash value for their name. Then they do a very short linear search through that pile for their own paper by comparing the name on each paper to their own name. (Note that this is why the Javadocs make such a big deal about overloading both hashCode() and equals() together, so they're consistent. If you use hashCode() you almost certainly will need to use equals() in this way!)

The row of piles is a hash table -- that's all there is to it! This is exactly how java.util.HashMap (and its older, uglier cousin java.util.Hashtable) works.

Note that if the hashCode() function is more complex, so that there are an inconvenient number of possible values (in Java, hashCode() can return any integer, so this is always true at least in theory) then it's common to use the absolute value of the hash code modulo some arbitrary number of piles. In the above example, we could use 10 piles and use "hashCode() % 10" to sort the papers into ten piles.
+Pie Number of slices to send: Send
 

Ernest Friedman-Hill wrote:Hi James,

...

To find their paper, a student instantly identifies the appropriate pile by computing the hash value for their name. Then they do a very short linear search through that pile for their own paper by comparing the name on each paper to their own name. (Note that this is why the Javadocs make such a big deal about overloading both hashCode() and equals() together, so they're consistent. If you use hashCode() you almost certainly will need to use equals() in this way!)



Ernest, I think your paragraph cuts to the essense of the original question quicker than my examples, which are rather complex.

In chess it is very simple. There is a hash code that represents a position, and a table of some kind of table that represents all possible positions in analysis to a certain move depth. It is very likely that the same position is possible through different move sequences, and the table is search to see if a position has occurred previously, so we don't have to evaluate the position more than once. The collision problem occurs when the same hash code can happen for two different positions, and considerable time and effort is devoted to dealing with this possibility. Another factor is whether or not it is practical to represent all possible factors which make a position unique. The speed of the algorithm is everything here.

In any event the issue hash collisions must be resolved. Very often it is a tradeoff. Which is more efficient? devoting extra effort to ensuring that hash codes are created uniquely? Or, as in your case I believe, taking extra steps after the fact to deal with the uniqueness thing.
+Pie Number of slices to send: Send
+1 for a nice explanation.
+Pie Number of slices to send: Send
Fred, do I have to write some code to do the final sort if there is more than one item returned for a hash code? I can't see it in the examples I have seen. Thanks.
+Pie Number of slices to send: Send
 

james dunster wrote:Fred, do I have to write some code to do the final sort if there is more than one item returned for a hash code? I can't see it in the examples I have seen. Thanks.



You know, I'm pretty good with the theory, but not so good with the implementation. I know the reasons for hash tables, and what sorts of things cause problems, but haven't yet mastered the actual details of how to do it. I was just trying to give a high level overview.

That being said, if there is a specific example from the links I provided, then please specify which example, and I will try to be of assistance.

Otherwise, someone else here could probably help you better.

regards.
+Pie Number of slices to send: Send
 

james dunster wrote:Fred, do I have to write some code to do the final sort if there is more than one item returned for a hash code? I can't see it in the examples I have seen. Thanks.



After the collection uses the hashcode to locate the "bucket", it uses the equals() method to find the right object in that bucket. Hence, it will never return two objects.

Henry
+Pie Number of slices to send: Send
 

After the collection uses the hashcode to locate the "bucket", it uses the equals() method to find the right object in that bucket. Hence, it will never return two objects.



so I have to override the equals function to that it compares everything used to generate the hashcode?
+Pie Number of slices to send: Send
 

james dunster wrote:

After the collection uses the hashcode to locate the "bucket", it uses the equals() method to find the right object in that bucket. Hence, it will never return two objects.



so I have to override the equals function to that it compares everything used to generate the hashcode?



In general you should use same set of attributes for both equals & hashcode. Check this for more details.
+Pie Number of slices to send: Send
Yes, if you want collections framework classes to work correctly, you must override both equals() and hashCode().

Find the chapter in Bloch's book about equals(), or Google for "Angelika Langer Java equals" for details, because the equals method appears easy to implement, but it isn't at all easy.
I hired a bunch of ninjas. The fridge is empty, but I can't find them to tell them the mission.
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com


reply
reply
This thread has been viewed 1726 times.
Similar Threads
About Collections
Head First Java HashSet
equals() and hashCode() from Java.Inquisition
Hash Code from Dan's Mock Exam
K&B doubt in hashcode() and equals(): Is my understanding correct ?
More...

All times above are in ranch (not your local) time.
The current ranch time is
Apr 16, 2024 03:48:23.