priyanaka jaiswal wrote:
How to make two different string to have same hashcode?
Junilu Lacar wrote:As it is implemented, it might be impossible for two String objects which are unequal according to equals() to have the same value for hashCode
Matthew Brown wrote:It's definitely possible, because there are more possible Strings than possible hash code values.
.
priyanaka jaiswal wrote:How to make two different string to have same hashcode?
Matthew Brown wrote:The only reliable way is to make them both the same String. Even if you hacked it based on the hashCode source code, that could change in a future version.
Mike Simmons wrote:
Matthew Brown wrote:The only reliable way is to make them both the same String. Even if you hacked it based on the hashCode source code, that could change in a future version.
Well, they did actually commit to the formula as part of the API (in JavaDoc); thus they can't really change the calculated hash code result at this point without violating the API. It's pretty reliable at this point. I agree with your other points though.
Matthew Brown wrote:
Mike Simmons wrote:
Matthew Brown wrote:The only reliable way is to make them both the same String. Even if you hacked it based on the hashCode source code, that could change in a future version.
Well, they did actually commit to the formula as part of the API (in JavaDoc); thus they can't really change the calculated hash code result at this point without violating the API. It's pretty reliable at this point. I agree with your other points though.
Good point. I was lazy, and didn't check the API docs to see if the algorithm was specified: I just guessed (wrongly) that it probably wouldn't be.
(Realistically, I suppose it's unlikely anyone would bother changing the implementation even if it wasn't specified, since it seems to be "good enough" and has been around a long time).
John Vorwald wrote:It may be worth pointing out that since the hash code is an int, and there are 4294967295 values for int, all possible hash codes can be generated with a seven character string.
science belief, great bioscience!
drac yang wrote:
John Vorwald wrote:It may be worth pointing out that since the hash code is an int, and there are 4294967295 values for int, all possible hash codes can be generated with a seven character string.
if it's only for alphabets, seven character string could represent totally 26^7 = 8031810176 which is bigger than all the int number 2^32 = 4294967296
it might be sufficient though, it's possible to find two different strings with the same hashcode.
Campbell Ritchie wrote:Indeed we have seen that some Strings return the same hash code, so you might require more than 7 letters to exhaust all the possible hash codes.
Campbell Ritchie wrote:Sounds like some complicated maths to solve a non-problem.
Campbell Ritchie wrote:How do you know there are 2¹⁶ different Strings with 1 character? There are not 2¹⁶ different Unicode characters between 0 and \uffff. Unicode has gaps in,
Jeff Verdegan wrote:
Campbell Ritchie wrote:How do you know there are 2¹⁶ different Strings with 1 character? There are not 2¹⁶ different Unicode characters between 0 and \uffff. Unicode has gaps in,
Does that stop us from constructing a String out of nonexistent characters? I'm not being snarky. I honestly don't know, and can't be arsed to find out for myself.![]()
Paul Clapham wrote:Or to put it more simply:
[...]
And so on... but since there are only 2^32 possible values of hashCode, that means that each possible hashCode belongs to (on average) 2^16 different strings of length 3.
Mike Simmons wrote:John and I are discussing whether there are any hashcodes that can not be achieved by a String of a given length, with a "printable ascii" restriction.
Yes, I tried that in an attempt to confirm Paul C’s point and to see whether I was mistaken there.Mike Simmons wrote:Looks like another way to form a legal Java String out of characters that don't have meaning in Unicode (not when put together that way, at least). . . .
Paul Clapham wrote:Or to put it more simply:
There is 1 string with 0 chars.
There are 2^16 strings with 1 char.
There are 2^32 strings with 2 chars.
There are 2^48 strings with 3 chars.
And so on... but since there are only 2^32 possible values of hashCode, that means that each possible hashCode belongs to (on average) 2^16 different strings of length 3.
science belief, great bioscience!
drac yang wrote:
since the total number of hashcode in int doesn't seem to be sufficient even for strings of 3 chars, should they consider to hold the hashcodes in like the long type instead of int?
drac yang wrote:since the total number of hashcode in int doesn't seem to be sufficient even for strings of 3 chars, should they consider to hold the
hashcodes in like the long type instead of int?
"Leadership is nature's way of removing morons from the productive flow" - Dogbert
Articles by Winston can be found here
Winston Gutkowski wrote:
No. You're missing the point. Hashcodes are probablistic - that is, a good one is supposed to provide a good chance of being different if the underlying object is different.
An int has 4 billion possible values. Given that the chances of you ever dealing with more than a few thousand objects at a time is small, an int is perfectly adequate for most situations.
Remember: the idea of a hashcode is not to never return the same value for different objects; just to make it unlikely. As far as Java hashed collections are concerned, the rest of the algorithm is dealt with by equals() (and hence the tie-in between the two methods).
Winston
science belief, great bioscience!
Jeff Verdegan wrote:
It still wouldn't even be remotely sufficient for relatively short strings, by many orders of magnitude. And if you have an object with, say, two Strings and a Date (e.g. private String firstName ; private String lastName; private Date birthDate;), then what?
And hashCode is so commonly used and so firmly entrenched they're not going to break backward compatibility at this point.
But lets say we start fresh with a new Java. There's still little or no benefit to making hashCode a long. HashCode determines which bucket an object goes into. That means we can have up to 2^32 buckets. Currently, hash based data structures have to re-hash the 4-byte hashCode to a smaller number, because clearly we're not going to create an array of 4 billion entries for our buckets. When multi-terabyte RAM computers are common, and we're storing hundreds of billions of objects in HashMaps, then maybe an 8-byte hashCode would make sense.
I suppose there could even be specialized situations--"big data" type stuff--where it could be useful now. But those situations call for a different tool than Java, or at least writing your own hashing facility, not for breaking backward compatibility in today's Java.
science belief, great bioscience!
drac yang wrote:yeah, besides the backward compatibility issue(but why can't those legacy code use old jdk for maintenance?)
another issue would be space waste, right?
It's never done THAT before. Explain it to me tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
|