Jon Swanson

Ranch Hand

Posts: 225

posted 5 years ago

I have read that the algorithm for Double.hashmap() can be problematic, here is an example.

the problem is the limited precision of the mantissa (I think). My thought on improving this situation is first to recognize that a double is represented by sign * mantissa * 2^exponent. Am I right in thinking that if I am worried about two numbers being almost the same, I can split the mantissa and the exponent , reduce the precision of the mantissa, and then they will hash to the same value? For example, if I thought that the double was surely accurate to 4 decimal places, does this make sense? Or an I missing some important gotcha.

In practice I am sure I can use a much lower threshold. I was thinking of using this to create a hash for a map. I could then use the same (or lower) threshold when implementing an equals method for a Map.

the problem is the limited precision of the mantissa (I think). My thought on improving this situation is first to recognize that a double is represented by sign * mantissa * 2^exponent. Am I right in thinking that if I am worried about two numbers being almost the same, I can split the mantissa and the exponent , reduce the precision of the mantissa, and then they will hash to the same value? For example, if I thought that the double was surely accurate to 4 decimal places, does this make sense? Or an I missing some important gotcha.

In practice I am sure I can use a much lower threshold. I was thinking of using this to create a hash for a map. I could then use the same (or lower) threshold when implementing an equals method for a Map.

posted 5 years ago

Your error has already been made in line 3 of your posted code: the comment on that line is incorrect.

In other words, the basic problem is that floating-point arithmetic is inexact. (Read JavaBeginnersFaq, in particular bullet point number 20, for more information about that.) You can't expect to put something into a Map with one key and then to get it back with a slightly different key.

And your idea that two numbers which are "very close" should be treated as equal for the purpose of being a key in a Map can't work. Suppose that you say that everything within X of some value should be treated as the same value. Then 0 and X are treated as the same. And then X and 2X are treated as the same. And then 2X and 3X are treated as the same. And before you know it everything in the entire number line is treated as the same.

Of course rounding them all to X decimal places doesn't have that flaw. But then you should just use BigDecimal values in the first place, and you wouldn't have the floating-point arithmetic problem.

In other words, the basic problem is that floating-point arithmetic is inexact. (Read JavaBeginnersFaq, in particular bullet point number 20, for more information about that.) You can't expect to put something into a Map with one key and then to get it back with a slightly different key.

And your idea that two numbers which are "very close" should be treated as equal for the purpose of being a key in a Map can't work. Suppose that you say that everything within X of some value should be treated as the same value. Then 0 and X are treated as the same. And then X and 2X are treated as the same. And then 2X and 3X are treated as the same. And before you know it everything in the entire number line is treated as the same.

Of course rounding them all to X decimal places doesn't have that flaw. But then you should just use BigDecimal values in the first place, and you wouldn't have the floating-point arithmetic problem.

Jon Swanson

Ranch Hand

Posts: 225

posted 5 years ago

Everything I tried with BigDecimal failed to produce values that hashed to the same key. Whereas, what I provided does hash to the same key. In the case I showed (100.0000 and 10.0001) only numbers that differ in the 7th digit will hash to the same value. The problem when rounding to a given number of decimal places is that the precision problem is in the number of digits, not the number of decimal places. So very large numbers could give different hash keys even when rounded to a certain number of decimal places. That is why I extracted the mantissa. By rounding it, I can set the approximate number of digits to which I consider two numbers to be the same. It might actually be worth using that logic in the equals comparison, if I needed to work with really big numbers in scientific notation.

You can put something into a map with one key and get it back with a slightly different key. But you need to override the equals and hashcode methods. Since there is an inherent limit on precision in floating point numbers, typically you use a threshold when doing comparisons.

That is easy to build into the equals method. It seemed a bit trickier to do for the hashcode. With floating point numbers it is always a question of exactly how close two numbers are. They are only equal if the number has an exact binary representation.

Here is what I tried to do with BigDecimal. My numbers may be coming from numerical calculations, which seems to make BigDecimal highly problematic, but even so, I'd be curious how you get it to hash as desired.

You can put something into a map with one key and get it back with a slightly different key. But you need to override the equals and hashcode methods. Since there is an inherent limit on precision in floating point numbers, typically you use a threshold when doing comparisons.

That is easy to build into the equals method. It seemed a bit trickier to do for the hashcode. With floating point numbers it is always a question of exactly how close two numbers are. They are only equal if the number has an exact binary representation.

Here is what I tried to do with BigDecimal. My numbers may be coming from numerical calculations, which seems to make BigDecimal highly problematic, but even so, I'd be curious how you get it to hash as desired.

posted 5 years ago

Read the API documentation for this constructor ("The results of this constructor can be somewhat unpredictable and its use is generally not recommended..."), and then implement the predictability fix which it recommends:

In other words, you implemented BigDecimal in such a way that you incorporated the inexactness of floating-point arithmetic into it.

Read the API documentation for this constructor ("The results of this constructor can be somewhat unpredictable and its use is generally not recommended..."), and then implement the predictability fix which it recommends:

In other words, you implemented BigDecimal in such a way that you incorporated the inexactness of floating-point arithmetic into it.

Jon Swanson

Ranch Hand

Posts: 225

Stephan van Hulst

Saloon Keeper

Posts: 7743

142

posted 5 years ago

I think you are being confused by precision.

The reason for your confusion is that floating point literals are represented in decimal notation. However, values represented in decimal can not always be represented in binary notation using finite precision. For example, 0.1 can not be represented exactly in binary.

The thing is, you can never determine whether two real numbers are "equal", unless you specify exactly "how equal" they have to be. You will run into this problem regardless of whether you use doubles, BigDecimal or your own type. You need to determine for yourself if you consider 0.9999999999999999999999999999999999999999 and 1.00000000000000000000000000000000000000001 to be the same value. If not, how precise does their difference need to be before you *do* consider them equal?

I suggest that if you care about exact calculations, you convert your measurements to BigDecimal using the BigDecimal(double, MathContext) constructor, and then doing calculations. Use the MathContext object to tell the code what precision you care about.

~~You can then safely use the decimals as a key for your HashMap.~~

[edit]

Actually, you may want to use TreeMap instead of HashMap, because 2150*10^-1 and 215 are different according to equals(), but the same according to compareTo(). TreeMap uses compareTo().

[edit2]

Unless you stripTrailingZeros() before you put/get a key from the HashMap. It may still be easier to use a TreeMap though.

The reason for your confusion is that floating point literals are represented in decimal notation. However, values represented in decimal can not always be represented in binary notation using finite precision. For example, 0.1 can not be represented exactly in binary.

The thing is, you can never determine whether two real numbers are "equal", unless you specify exactly "how equal" they have to be. You will run into this problem regardless of whether you use doubles, BigDecimal or your own type. You need to determine for yourself if you consider 0.9999999999999999999999999999999999999999 and 1.00000000000000000000000000000000000000001 to be the same value. If not, how precise does their difference need to be before you *do* consider them equal?

I suggest that if you care about exact calculations, you convert your measurements to BigDecimal using the BigDecimal(double, MathContext) constructor, and then doing calculations. Use the MathContext object to tell the code what precision you care about.

[edit]

Actually, you may want to use TreeMap instead of HashMap, because 2150*10^-1 and 215 are different according to equals(), but the same according to compareTo(). TreeMap uses compareTo().

[edit2]

Unless you stripTrailingZeros() before you put/get a key from the HashMap. It may still be easier to use a TreeMap though.

*The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.*

Stephan van Hulst

Saloon Keeper

Posts: 7743

142

Jon Swanson

Ranch Hand

Posts: 225

posted 5 years ago

I don't think I am being confused by precision. My original question was if I took a double, which is represented as mantissa * 2^exponent, extracted the manitissa and exponent as long ints, then built a hash code using using (long) precision * (mantissa / precision) and exponent so that 'almost equal' values would hash to the same bin, whether there would be cases where 'almost equal' values would not hash correctly. From what I have read, I do need to be careful to round the answer coming back from my calculation on the mantissa. Or maybe I would be better off bit shifting back and forth. But so far it seems like that method would work.

I do need to extend it, since I will actually be hashing (x,y). I was thinking along the lines of

which would require generating a hashCode for DataPoint which will consist of two (maybe three at some point) double precision numbers. So I started working out a hashCode() for one number. Do you think it would be sufficent to AND the two hashcodes together, or AND them after offsetting one of them (say with a bit shift)?

I could also use two maps,

that seemed to me to be harder to read and maintain. And based on earlier suggestions, DataPoint could handle all three cases of (x), (x,y) or (x,y,z) keys if equals checks for the number of values in the key.

The information on BigDecimal was extremely interesting. It makes sense now you mention it that since it takes strings, I might have problems with 215.00 vs 215. If I understand correctly, TreeMap only needs equals() and a Comparator implemented vs HashMap requiring equals() and hashCode()? I'd think I'd still need to worry about the Comparator having a similar problem as hashCode().

I do need to extend it, since I will actually be hashing (x,y). I was thinking along the lines of

which would require generating a hashCode for DataPoint which will consist of two (maybe three at some point) double precision numbers. So I started working out a hashCode() for one number. Do you think it would be sufficent to AND the two hashcodes together, or AND them after offsetting one of them (say with a bit shift)?

I could also use two maps,

that seemed to me to be harder to read and maintain. And based on earlier suggestions, DataPoint could handle all three cases of (x), (x,y) or (x,y,z) keys if equals checks for the number of values in the key.

The information on BigDecimal was extremely interesting. It makes sense now you mention it that since it takes strings, I might have problems with 215.00 vs 215. If I understand correctly, TreeMap only needs equals() and a Comparator implemented vs HashMap requiring equals() and hashCode()? I'd think I'd still need to worry about the Comparator having a similar problem as hashCode().

Stephan van Hulst

Saloon Keeper

Posts: 7743

142

posted 5 years ago

No, TreeMap actually does not satisfy the Map interface. It completely ignores equals(). It only produces the same results if the key's type's compareTo() method is consistent with equals. In the case of BigDecimal, it isn't.