# Floating Point Arithmetic: Help with denormalized numbers

Edwin Dalorzo

Ranch Hand

Posts: 961

posted 12 years ago

I have been working a little bit hard in understanding IEEE 754 which is the standard followed by the Java Virtual Machine to treat floating point numbers and operations.

Now I will have to give a little explanation of what I am trying to do in order that you guys understand my questioning. So just be patient with me... I assure you that you might get interested in understanding this as much as I do if you already do not understand it.

First of all, lets convert a floating point number to its binary form representation:

Let's use simple float number: 84.75 (For all calculations I will use a Java float data type).

1. Ok, the number 84 in base 10 is equal to

84 = 42 x 2 +

84 = (21 x 2 +

84 = (((10 x 2 +

84 = ((((5 x 2 +

84 = (((((2 x 2 +

84 = ((((((2 x

84 =

2. Now the number 0.75 expressed in base 2 is

0.75 * 2 =

0.5 * 2 =

You can test this is true if you resolve this expression 1 x 2e-1 + 1 x 2e-2 = 0.75

3. So 84.75 in base 10 is equal to

4. Now if in base 10 we can express a number in scientific notation

5. Then we can also say that...

6. Now the difficult part to explain is the IEEE 754 floating point number anatomy.It is somewhat like this

[ Sign [31] Exponent [23-30] Mantisa [00-22] ]

7. Wich means that the first 22 bits are the fraction. For example:

8. The next 8 bits are the exponent.

However in our floating point number the exponent is 6 as you can see above (

9. The most significant bit represents the sign (

10. So our exponent is 6, however as the exponent part of the floating point anatomy has to be able to express both negative and positive numbers, the standard says that exponent number is biased by 127. That means that you must add to your exponent number 127. This way numbers over 127 means positive exponent, and numbers below that number means negative exponent.

11. That means that our exponent should be

12. So our floating point number is:

13. Which you can see is formed by a positive sign (

14. This number expressed in hexadecimal is 0100 0010 1010 1001 1000 0000 0000 0000 =

15. We can test this in java by means of this code

16. Now, based on this it is very simple to understand the IEEE 754 special values:

17. Positive zero and negative zero are built by means of turning on and off the sign bit and an exponent field of zero and a fraction (mantisa) field of zero.

For example:

Positive zero = 0000 0000 0000 0000 0000 0000 0000 0000 =

Negative zero = 1000 0000 0000 0000 0000 0000 0000 0000 =

18. Positive infinity is an exponent of all 1s and a fraction (mantisa) of all 0s.

For example:

+Infinity = 0111 1111 1000 0000 0000 0000 0000 0000 =

-Infinity = 1111 1111 1000 0000 0000 0000 0000 0000 =

19. NaN (Not-a-Number) An exponent of al 1s and a Non-Zero fraction (mantisa)

Quiet NaN: With most fraction significant bit set (in intermediate operations)

Signaling NaN: With most fraction significant bit clear (in invalid operations)

QNaN = 0111 1111 1100 0000 0000 0000 0000 0000 =

SNaN = 0111 1111 1010 0000 0000 0000 0000 0000 =

20. However there is another kind of special numbers, the denormalized numbers. And here it is where my question comes out.

21. Denormalized numbers are exponent all 0s, but fraction is Non-Zero.

22. For example, in Java the float 5.877472E-39f is a denormalized number.

23. But,

24. I know that this level of detail must probably will not appear in the SCJP 1.4 or 1.5. However I am not taking the exam just for the certification, I really want to know... you know understand everything very well.

Does anyone knows the answer or can help me find it?

Thanks in advance, you all are great!

[ January 18, 2005: Message edited by: Edwin Dalorzo ]

Now I will have to give a little explanation of what I am trying to do in order that you guys understand my questioning. So just be patient with me... I assure you that you might get interested in understanding this as much as I do if you already do not understand it.

First of all, lets convert a floating point number to its binary form representation:

Let's use simple float number: 84.75 (For all calculations I will use a Java float data type).

1. Ok, the number 84 in base 10 is equal to

**1010100**in base 2. It's a simple conversion and I know most of you know how to do it.84 = 42 x 2 +

**0**84 = (21 x 2 +

**0**) x 2 +**0**84 = (((10 x 2 +

**1**) x 2 +**0**) x 2 +**0**)84 = ((((5 x 2 +

**0**) x 2 +**1**) x 2 +**0**) x 2 +**0**)84 = (((((2 x 2 +

**1**) x 2 +**0**) x 2 +**1**) x 2 +**0**) x 2 +**0**)84 = ((((((2 x

**1**+**0**) x 2 +**1**) x 2 +**0**) x 2 +**1**) x 2 +**0**) x 2 +**0**)84 =

**1010100**2. Now the number 0.75 expressed in base 2 is

**0.11**. I know you know how to do it, so just forgive me I insist in writing the procedure. It just helps me to set everything clear.0.75 * 2 =

**1**.50.5 * 2 =

**1**.0You can test this is true if you resolve this expression 1 x 2e-1 + 1 x 2e-2 = 0.75

3. So 84.75 in base 10 is equal to

**1010100.11**in base 2.4. Now if in base 10 we can express a number in scientific notation

**84.75 is equal to 84.75x10e0 is equal to 8.475x10e+1**5. Then we can also say that...

**1010100.11 is equal to 1010100.11x2e0 equal to 1.01010011x2e+6**6. Now the difficult part to explain is the IEEE 754 floating point number anatomy.It is somewhat like this

[ Sign [31] Exponent [23-30] Mantisa [00-22] ]

7. Wich means that the first 22 bits are the fraction. For example:

**8.475x10e+1 the fraction number is .475.**8. The next 8 bits are the exponent.

**In the number 8.475x10e+1 the exponent is 1.**However in our floating point number the exponent is 6 as you can see above (

**1.01010011x2e+6**).9. The most significant bit represents the sign (

**1 means negative**)10. So our exponent is 6, however as the exponent part of the floating point anatomy has to be able to express both negative and positive numbers, the standard says that exponent number is biased by 127. That means that you must add to your exponent number 127. This way numbers over 127 means positive exponent, and numbers below that number means negative exponent.

11. That means that our exponent should be

**127+6**, that is 133 which in binary format is**10000101**12. So our floating point number is:

**0 10000101 01010011000000000000000**13. Which you can see is formed by a positive sign (

**31-bit is clear**) the next 8 bits are the exponent number (**6+127**), that's to say 133 (**10000101**) and the next 22 bits are the fraction number filled with 0s by the right (**01010011000000000000000**)14. This number expressed in hexadecimal is 0100 0010 1010 1001 1000 0000 0000 0000 =

**4 2 A 9 8 0 0 0**15. We can test this in java by means of this code

16. Now, based on this it is very simple to understand the IEEE 754 special values:

17. Positive zero and negative zero are built by means of turning on and off the sign bit and an exponent field of zero and a fraction (mantisa) field of zero.

For example:

Positive zero = 0000 0000 0000 0000 0000 0000 0000 0000 =

**0x0**Negative zero = 1000 0000 0000 0000 0000 0000 0000 0000 =

**0x80000000**18. Positive infinity is an exponent of all 1s and a fraction (mantisa) of all 0s.

For example:

+Infinity = 0111 1111 1000 0000 0000 0000 0000 0000 =

**0x7f800000**-Infinity = 1111 1111 1000 0000 0000 0000 0000 0000 =

**0xff800000**19. NaN (Not-a-Number) An exponent of al 1s and a Non-Zero fraction (mantisa)

Quiet NaN: With most fraction significant bit set (in intermediate operations)

Signaling NaN: With most fraction significant bit clear (in invalid operations)

QNaN = 0111 1111 1100 0000 0000 0000 0000 0000 =

**0x7fC00000**SNaN = 0111 1111 1010 0000 0000 0000 0000 0000 =

**0X7fA00000**20. However there is another kind of special numbers, the denormalized numbers. And here it is where my question comes out.

21. Denormalized numbers are exponent all 0s, but fraction is Non-Zero.

22. For example, in Java the float 5.877472E-39f is a denormalized number.

23. But,

*what are they for?*,*how do I make conversions between a base 10 floating point number and this format?*,*how do I convert them back?*, and*when does the jvm use this kind of numbers?*24. I know that this level of detail must probably will not appear in the SCJP 1.4 or 1.5. However I am not taking the exam just for the certification, I really want to know... you know understand everything very well.

Does anyone knows the answer or can help me find it?

Thanks in advance, you all are great!

[ January 18, 2005: Message edited by: Edwin Dalorzo ]

posted 12 years ago

Excellent question!

A value in

But a denormalized value has no implicit "1" before the mantissa. Instead, a denormalized (or "subnormal") value is understood to be

In general terms we have...

(-1)^(sign bit) * 2^(exponent - bias) * (

(-1)^(sign bit) * 2^(-bias

Ref:

http://en.wikipedia.org/wiki/Floating-point (see esp. "Hidden bit")

http://en.wikipedia.org/wiki/Denormal

http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html

(These details are definitely

[ January 19, 2005: Message edited by: marc weber ]

A value in

*binary*scientific notation is**1.**xxx... times 2 raised to some power. That is, the first non-zero digit is always "1". So in a normalized mode, the non-fractional "1" before the mantissa is*not stored*as part of the value -- it's*implied.*This is sometimes called the "hidden bit." (You can see this in your example above. The significand bits only include the "fractional" portion -- the mantissa.)But a denormalized value has no implicit "1" before the mantissa. Instead, a denormalized (or "subnormal") value is understood to be

**0.**xxx... times 2 raised to some power. This provides an extended range of*very small*numbers, and it comes at the expense of gradually losing precision as the first "1" bit moves farther to the right (leaving less room for significant figures).In general terms we have...

**Normalized**(with a "hidden" bit in the significand):(-1)^(sign bit) * 2^(exponent - bias) * (

**1 +**fractional mantissa)**Denormalized**(with no "hidden" bit):(-1)^(sign bit) * 2^(-bias

**+ 1**) * (fractional mantissa)Ref:

http://en.wikipedia.org/wiki/Floating-point (see esp. "Hidden bit")

http://en.wikipedia.org/wiki/Denormal

http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html

(These details are definitely

*not*on the SCJP exam. )[ January 19, 2005: Message edited by: marc weber ]

*~Joe Strummer*

sscce.org

posted 12 years ago

A couple more links...

Here's a

http://www.public.iastate.edu/~sarita/ieee754/homepage.html

And here's a good technical source :roll: :

http://docs.sun.com/source/806-3568/ncg_goldberg.html

[ January 19, 2005: Message edited by: marc weber ]

Here's a

*friendly*site on IEEE 754 :http://www.public.iastate.edu/~sarita/ieee754/homepage.html

And here's a good technical source :roll: :

http://docs.sun.com/source/806-3568/ncg_goldberg.html

[ January 19, 2005: Message edited by: marc weber ]

*~Joe Strummer*

sscce.org

Don't get me started about those stupid light bulbs. |