The art of being crazy is to NEVER commit the crazyness of being normal.
Pat Farrell wrote:What are your definitions of the criteria for "perfect"
If you are just trying to implement good values for hashCode(), using a CRC is serious overkill. An important criteria for most real world applications is that the calculation of the hash has to be fast. If you take too long, you could just sequentially search the list rather than hashing into it. Just because its Big Oh(1) doesn't mean its actually useful.
The art of being crazy is to NEVER commit the crazyness of being normal.
Jeronimo Backes wrote:Perfect hash functions are the ones that won't map two or more inputs into the same value. It means there is no possibility of collisions.
Jeronimo Backes wrote:I've run many tests with random strings of random length and could not find a collision yet.
What design space cares about this?
What is the point of your attempt at a perfect hash function?
The maximum length of a String is 2^31-1. So if you had chosen lengths randomly, then half of the lengths you chose should have been greater than half of that number -- which is 2^30-1. (...) there are 2^16 possible values for a char.
The art of being crazy is to NEVER commit the crazyness of being normal.
Pat Farrell wrote:Have you tried SHA1 as a hash? Its probably as slow as your CRC, but its a cryptographically strong hash, meaning that its proven to have good ransomness in its results. You can simply truncate the long result into whatever length you want to store.
The art of being crazy is to NEVER commit the crazyness of being normal.
Jeronimo Backes wrote:My method is not a "perfect" hash of course (it is impossible to have it as discussed), but it does a much better job in avoiding collisions compared to Java's hashCode and CRC32.
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Consider Paul's rocket mass heater. |