• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

two string with same hashcode

 
Ranch Hand
Posts: 79
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

How to make two different string to have same hashcode?




 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's not so easy. You'd have to know how String.hashCode() calculates its return value exactly. You can look this up, the source code of class String is available in src.zip which is in your JDK installation directory.

Note, however, that it is made in such a way that different strings have different hash codes as much as possible - so it will still not be easy to find two strings that result in the same hash code.

(In Java, the class is named String, not string - the capital S is important).
 
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The only reliable way is to make them both the same String. Even if you hacked it based on the hashCode source code, that could change in a future version. I suppose the obvious question is, why do you want to do this? There's probably a better way of doing what you want to do.
 
Sheriff
Posts: 17734
302
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

priyanaka jaiswal wrote:
How to make two different string to have same hashcode?


Can you shed some light on your reason for wanting to do that? I'm always curious to learn why people want to go against the grain and set themselves up for potential problems. Kind of like, "Yes, you can read that book upside down but why would you want to do that?"

As it is implemented, it might be impossible for two String objects which are unequal according to equals() to have the same value for hashCode but the contract for hashCode() states that this is NOT a requirement. That is, the values returned by hashCode() do not have to be different if two objects are unequal according to equals but it helps improve performance of hashtables if they are.
 
Matthew Brown
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Junilu Lacar wrote:As it is implemented, it might be impossible for two String objects which are unequal according to equals() to have the same value for hashCode



It's definitely possible, because there are more possible Strings than possible hash code values. But that doesn't mean it's useful .
 
Junilu Lacar
Sheriff
Posts: 17734
302
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Matthew Brown wrote:It's definitely possible, because there are more possible Strings than possible hash code values. .


Ah yes, makes sense. A quick search turns up "FB" and "Ea" as a typical example.
 
Ranch Hand
Posts: 413
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It depends on the way you going to create/use those strings.

Look at the following code, for example:


Here is the output:

true
true
false
true
 
Ranch Hand
Posts: 139
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The char variable has a range of 65535 values

The hash value of a string is given by
h(i+1) = h(i)*31 + (int)str[i];

Since 31 is less than 65535, there are a multitude of strings that have the same hash value.

Example, for a four character string, h = 29791 * (int)str[0] + 961* (int)str[1] + 31* (int)str[2] + (int)str[3]

So, a matching hash is given by making trivial changes. For example, increase str[2] by one, and decrease str[3] by 31

hash("abyz") = 2987778
hash("abz]") = 2987778



If the hash multiplier, 31, was larger than 65535, then there would be no values that generate the same hash.

In general, since the ascii codes range from 33 to 175, or 142 values, you could shift the next to last character by up to 3 places up or down, and adjust the last character to generate the same hash. If you use non ascii codes, or the full 65535 values, you could adjust the last four characters and obtain different strings that generate the same hash code.
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Below is an example code to generate strings with matching hash codes.

 
Junilu Lacar
Sheriff
Posts: 17734
302
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

priyanaka jaiswal wrote:How to make two different string to have same hashcode?


I guess the only thing left to resolve now is whether the OP meant to generate unequal strings that have the same hashCode or change the hashCodes of two unequal strings so that they become equal (without changing the strings).

That is, by "make" did the OP mean "to create" as in "I made a copy of the file" or "to cause" as in "The magician made the car disappear"?
 
Master Rancher
Posts: 5122
82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Matthew Brown wrote:The only reliable way is to make them both the same String. Even if you hacked it based on the hashCode source code, that could change in a future version.


Well, they did actually commit to the formula as part of the API (in JavaDoc); thus they can't really change the calculated hash code result at this point without violating the API. It's pretty reliable at this point. I agree with your other points though.
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since the hash value of a string is given by
h(i+1) = h(i)*31 + (int)str[i];

This expands to
h = ((int)str[0])*31^(length(str)-1) + ((int)str[1])*31^(length(str)-2) + ((int)str[2])*31^(length(str)-3) + ((int)str[i])*31^(length(str)-(i+1)) ... ((int)str[length(str)-2])*31) + (int)str[length(str)-1]

One approach to generating a second string with the same hash value is to modify the characters so that the same value comes up. If we limit the range of characters to the ASCII set, then this means that the second to last character can be changed by up to three values (i) and the last character changed by -i*31.

A second way that two different strings can generate the same hash is to use the range of int which is from -2,147,483,648 and 2,147,483,647 inclusive.. Since 31^7 = 27,512,614,111, this suggests that the hash value of a string with more than 6 characters would exceed the range of an int, causing folding, and providing a second methodology for matching the hash values of two different strings. If we add one to the maximum integer, we obtain the minimum integer. This suggests that the hash value is given by

ht = h(i)*31 + (int)str[i];
while (ht > 2147483647) ht = ht - 4294967295
h(i+1) = ht

Note that 4294967295 = 4*31^6 + 26*31^5 + 19*31^3 + 29*31^2 + 24*31 + 3

So, that suggests that if we start with a string with more than six characters and shift the seventh character from the end by 4, and the sixth character from the end by 26 and the fourth character from the end by 19 and the third character from the end by 29 and the second character from the end by 24 and the last character by 3, we should obtain a different string with the same hash value. Actually, the last character has to be shifted by 4.

The code below generates both types of string modifications to obtain new strings with the same hash code. The integer folding could be repeated to create additional different strings. The integer folding approach was not checked for generating only ASCII characters, so the printed characters may be misleading.



Output

 
Matthew Brown
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:

Matthew Brown wrote:The only reliable way is to make them both the same String. Even if you hacked it based on the hashCode source code, that could change in a future version.


Well, they did actually commit to the formula as part of the API (in JavaDoc); thus they can't really change the calculated hash code result at this point without violating the API. It's pretty reliable at this point. I agree with your other points though.


Good point. I was lazy, and didn't check the API docs to see if the algorithm was specified: I just guessed (wrongly) that it probably wouldn't be.

(Realistically, I suppose it's unlikely anyone would bother changing the implementation even if it wasn't specified, since it seems to be "good enough" and has been around a long time).
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It may be worth pointing out that since the hash code is an int, and there are 4294967295 values for int, all possible hash codes can be generated with a seven character string.
 
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Matthew Brown wrote:

Mike Simmons wrote:

Matthew Brown wrote:The only reliable way is to make them both the same String. Even if you hacked it based on the hashCode source code, that could change in a future version.


Well, they did actually commit to the formula as part of the API (in JavaDoc); thus they can't really change the calculated hash code result at this point without violating the API. It's pretty reliable at this point. I agree with your other points though.


Good point. I was lazy, and didn't check the API docs to see if the algorithm was specified: I just guessed (wrongly) that it probably wouldn't be.

(Realistically, I suppose it's unlikely anyone would bother changing the implementation even if it wasn't specified, since it seems to be "good enough" and has been around a long time).



I seem to remember, back around Java 1.1 or 1.2, String's hashCode() only looked at the first 8 characters. I assume this was because there was some truth to the "Java is slow" mantra back then. This led to some really awful HashMap performance.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Right you are Jeff, there was a change between early versions of Java.


Calculating a string hashcode is always a tradeoff between speed and hashcode "quality."


Bill
 
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Vorwald wrote:It may be worth pointing out that since the hash code is an int, and there are 4294967295 values for int, all possible hash codes can be generated with a seven character string.



if it's only for alphabets, seven character string could represent totally 26^7 = 8031810176 which is bigger than all the int number 2^32 = 4294967296

it might be sufficient though, it's possible to find two different strings with the same hashcode.
 
Bartender
Posts: 612
7
Mac OS X Python
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But that doesnt mean that each of those have a unique hash code - only the max amount that can be handled as primary hash values.
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

drac yang wrote:

John Vorwald wrote:It may be worth pointing out that since the hash code is an int, and there are 4294967295 values for int, all possible hash codes can be generated with a seven character string.



if it's only for alphabets, seven character string could represent totally 26^7 = 8031810176 which is bigger than all the int number 2^32 = 4294967296

it might be sufficient though, it's possible to find two different strings with the same hashcode.



The question refers to String hash codes, which are given by sum((int)(char[i]) * 31^(nChar-i)) with mod applied when the value exceeds Integer max value.

If we use the printable ascii char, the values range from 32 to 127

We see that 127 * 31^(6-1) = 3607273026, which is less than 4294967296. There needs to be a 7th char to get the required amount of hash values.

The number of hash codes doesn't depend on 26^nchar, but rather (int(char)*31^(nchar-1)). The 26^nchar is the number of possible arrangements of single case (lower or upper) words, but not the number unique hash values.

 
Marshal
Posts: 80281
432
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Indeed we have seen that some Strings return the same hash code, so you might require more than 7 letters to exhaust all the possible hash codes.
Sounds like some complicated maths to solve a non‑problem.
 
Mike Simmons
Master Rancher
Posts: 5122
82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Indeed we have seen that some Strings return the same hash code, so you might require more than 7 letters to exhaust all the possible hash codes.


No, you don't. The range covered is continuous across integers, with no gaps. Except as noted below, for a 6-character string.

Using chars in the range 32-127, here are the ranges of possible hash values:


Only the last two lines are affected by overflow. After overflow, the line for length 6 splits into ranges -2147483648 to -27554432 and 1073741824 to 2147483647. The line for length 7 covers all possible ints.

Campbell Ritchie wrote:Sounds like some complicated maths to solve a non-problem.


The math is simple enough - in each step we effectively just multiply the min and max by 32. If the problem isn't interesting to someone, they can always skip it. The original question was answered some time ago.
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Mike,
That's an interesting response. I'm going to see if I can take it a little further, but include some code due to the details... It looks like 7 characters are needed to cover the full range.

John



 
Sheriff
Posts: 28371
99
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Or to put it more simply:

There is 1 string with 0 chars.

There are 2^16 strings with 1 char.

There are 2^32 strings with 2 chars.

There are 2^48 strings with 3 chars.

And so on... but since there are only 2^32 possible values of hashCode, that means that each possible hashCode belongs to (on average) 2^16 different strings of length 3.
 
Campbell Ritchie
Marshal
Posts: 80281
432
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How do you know there are 2¹⁶ different Strings with 1 character? There are not 2¹⁶ different Unicode characters between 0 and \uffff. Unicode has gaps in, and some of the characters which exist are supplemental characters. So the possible range of hashcodes is all full of gaps. Eventually the ranges will overlap and fill those gaps, however.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:How do you know there are 2¹⁶ different Strings with 1 character? There are not 2¹⁶ different Unicode characters between 0 and \uffff. Unicode has gaps in,



Does that stop us from constructing a String out of nonexistent characters? I'm not being snarky. I honestly don't know, and can't be arsed to find out for myself.
 
Paul Clapham
Sheriff
Posts: 28371
99
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeff Verdegan wrote:

Campbell Ritchie wrote:How do you know there are 2¹⁶ different Strings with 1 character? There are not 2¹⁶ different Unicode characters between 0 and \uffff. Unicode has gaps in,



Does that stop us from constructing a String out of nonexistent characters? I'm not being snarky. I honestly don't know, and can't be arsed to find out for myself.



A char can contain any of 2^16 possibilities. It's true that some of those don't have meanings in Unicode, but nevertheless they can be created in Java.

And a String can be made up of any list of chars. It's true that some of those may not even be valid in Unicode because they misuse surrogate pairs, but nevertheless they can be created in Java.
 
Campbell Ritchie
Marshal
Posts: 80281
432
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think Paul C is correct. You can create a String out of non‑Unicode characters, in which case there are 2¹⁶ possibilities per char, even if you cannot display such a String. I shall have to try it.
 
Mike Simmons
Master Rancher
Posts: 5122
82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, it's part of the Java definitions of char and String. There's no requirement that a char have any defined meaning in Unicode. Here's an example:

From the Unicode chart here, you can see that FF00 has no defined value, while FF01 is a full-width exclamation mark (!). When I try to print these, I get a "＀!" indicating the system doesn't know what to put for the first value. But it's still a valid String, with no exceptions thrown.
 
Mike Simmons
Master Rancher
Posts: 5122
82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Or to put it more simply:

[...]

And so on... but since there are only 2^32 possible values of hashCode, that means that each possible hashCode belongs to (on average) 2^16 different strings of length 3.


I think we all agree that there are multiple strings per hashcode - John and I are discussing whether there are any hashcodes that can not be achieved by a String of a given length, with a "printable ascii" restriction. I guess this actually excludes 127, since that's a delete char. Anyway we seem to agree that by the time you get to 7 chars, all possibilities are covered. Obviously, if more chars values are allowed, that goal can be achieved with fewer chars.
 
Campbell Ritchie
Marshal
Posts: 80281
432
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Try it and see
 
Mike Simmons
Master Rancher
Posts: 5122
82
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Looks like another way to form a legal Java String out of characters that don't have meaning in Unicode (not when put together that way, at least). Which continues to support Paul's point about each char being one of 2^16 possibilities, regardless of their Unicode validity, no?
 
Paul Clapham
Sheriff
Posts: 28371
99
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:John and I are discussing whether there are any hashcodes that can not be achieved by a String of a given length, with a "printable ascii" restriction.



Ah yes, that's a much less trivial problem. And therefore more interesting. I don't see any reason for restricting the contents to printable ASCII characters, though, although I suppose that makes the coding a bit simpler.
 
John Vorwald
Ranch Hand
Posts: 139
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The original question was about different strings that had the same hash code. I sat in a facial recognition meeting last week, and part of the meeting dealt with patterns that were used to create binary scores to quickly locate the face and/or facial features in a digital photo. I wonder if that approach could be applied to the problem of finding different strings that have the same hash code. My simple understanding of the approach was to try a pattern on a number of test cases, and if the pattern had better than 51% chance of success (or failure in which the criteria was reversed), then the pattern could be used. Just a thought, the program isn't obvious to me, so maybe another day....
 
Campbell Ritchie
Marshal
Posts: 80281
432
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Mike Simmons wrote:Looks like another way to form a legal Java String out of characters that don't have meaning in Unicode (not when put together that way, at least). . . .

Yes, I tried that in an attempt to confirm Paul C’s point and to see whether I was mistaken there.
 
drac yang
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Paul Clapham wrote:Or to put it more simply:

There is 1 string with 0 chars.

There are 2^16 strings with 1 char.

There are 2^32 strings with 2 chars.

There are 2^48 strings with 3 chars.

And so on... but since there are only 2^32 possible values of hashCode, that means that each possible hashCode belongs to (on average) 2^16 different strings of length 3.



since the total number of hashcode in int doesn't seem to be sufficient even for strings of 3 chars, should they consider to hold the hashcodes in like the long type instead of int?
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

drac yang wrote:
since the total number of hashcode in int doesn't seem to be sufficient even for strings of 3 chars, should they consider to hold the hashcodes in like the long type instead of int?



It still wouldn't even be remotely sufficient for relatively short strings, by many orders of magnitude. And if you have an object with, say, two Strings and a Date (e.g. private String firstName ; private String lastName; private Date birthDate;), then what?

And hashCode is so commonly used and so firmly entrenched they're not going to break backward compatibility at this point.

But lets say we start fresh with a new Java. There's still little or no benefit to making hashCode a long. HashCode determines which bucket an object goes into. That means we can have up to 2^32 buckets. Currently, hash based data structures have to re-hash the 4-byte hashCode to a smaller number, because clearly we're not going to create an array of 4 billion entries for our buckets. When multi-terabyte RAM computers are common, and we're storing hundreds of billions of objects in HashMaps, then maybe an 8-byte hashCode would make sense.

I suppose there could even be specialized situations--"big data" type stuff--where it could be useful now. But those situations call for a different tool than Java, or at least writing your own hashing facility, not for breaking backward compatibility in today's Java.
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

drac yang wrote:since the total number of hashcode in int doesn't seem to be sufficient even for strings of 3 chars, should they consider to hold the
hashcodes in like the long type instead of int?


No. You're missing the point. Hashcodes are probablistic - that is, a good one is supposed to provide a good chance of being different if the underlying object is different.

An int has 4 billion possible values. Given that the chances of you ever dealing with more than a few thousand objects at a time is small, an int is perfectly adequate for most situations.

Remember: the idea of a hashcode is not to never return the same value for different objects; just to make it unlikely. As far as Java hashed collections are concerned, the rest of the algorithm is dealt with by equals() (and hence the tie-in between the two methods).

Winston
 
drac yang
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:
No. You're missing the point. Hashcodes are probablistic - that is, a good one is supposed to provide a good chance of being different if the underlying object is different.

An int has 4 billion possible values. Given that the chances of you ever dealing with more than a few thousand objects at a time is small, an int is perfectly adequate for most situations.

Remember: the idea of a hashcode is not to never return the same value for different objects; just to make it unlikely. As far as Java hashed collections are concerned, the rest of the algorithm is dealt with by equals() (and hence the tie-in between the two methods).

Winston


yeah, i agree especially on for most situations, problem set scale will be far less than 4 billion.

like the indexFor method in HashMap shows:


only if when the problem set grows, say, close to or more than 4 billion, at that time,
for the sake of optimized loadfactor, people might begin to consider changing to long hashcode(if without the impact of backward compatibility like jeff said).

good reply, thanks.
 
drac yang
Ranch Hand
Posts: 78
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeff Verdegan wrote:
It still wouldn't even be remotely sufficient for relatively short strings, by many orders of magnitude. And if you have an object with, say, two Strings and a Date (e.g. private String firstName ; private String lastName; private Date birthDate;), then what?

And hashCode is so commonly used and so firmly entrenched they're not going to break backward compatibility at this point.

But lets say we start fresh with a new Java. There's still little or no benefit to making hashCode a long. HashCode determines which bucket an object goes into. That means we can have up to 2^32 buckets. Currently, hash based data structures have to re-hash the 4-byte hashCode to a smaller number, because clearly we're not going to create an array of 4 billion entries for our buckets. When multi-terabyte RAM computers are common, and we're storing hundreds of billions of objects in HashMaps, then maybe an 8-byte hashCode would make sense.

I suppose there could even be specialized situations--"big data" type stuff--where it could be useful now. But those situations call for a different tool than Java, or at least writing your own hashing facility, not for breaking backward compatibility in today's Java.



yeah, besides the backward compatibility issue(but why can't those legacy code use old jdk for maintenance?), another issue would be space waste, right? for most situations problem set is way under 4 billion.
good reply, thanks.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

drac yang wrote:yeah, besides the backward compatibility issue(but why can't those legacy code use old jdk for maintenance?)



You're missing the point. I might get Java 8 as soon as it's available, and build my app with it from day 1. If Java 9 changes hashCode() to a long, then I cannot use it without changing all my code--and all the code in every library I use. It's not just old stuff that breaks. It's anything from before the change.

another issue would be space waste, right?



Not really. It's only 4 more bytes, which is at least an order or magnitude less than a lot of the objects it will be generated from anyway, and memory is cheap, and it's not like we typically keep large numbers of hashcode values around anyway. There are several compelling reasons not to change to an 8-byte hashCode(). Saving 4 bytes is not one of them.
 
It's never done THAT before. Explain it to me tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic