double representation

oog · February 24, 2012, 03:56:02 AM

I don't know if this is a bug or not, but the following program which prints out the bits of a double that's been set to 2000.0, prints this:
0 10000001001 1111010000000000000000000000000000000000000000000001

gcc and MSVC print this:
0 10000001001 1111010000000000000000000000000000000000000000000000

I ran it on Pelles C version 6.50.8 RC#4

Code Select


#include <stdio.h>

void printDoubleBits(double d) {
    int i, j, cnt = 0;
    char *c = (char*)&d;
    for (i = 7; i >= 0; i--)
        for (j = 7; j >= 0; j--) {
            if (cnt++ == 1 || cnt == 13)
                printf(" ");
            printf("%d", !!(c[i] & (1 << j)));
        }
    printf("\n");
}

int main() {
	double d = 2000.0;
	printDoubleBits(d);
	return 0;
}

CommonTater · February 24, 2012, 05:27:52 AM

What' you've discovered is the infamous "last bit ambiguity"... Floats and Doubles are not exact numbers, they are approximations made within the number of available bits. Often a number cannot be exactly represented and there can be a "last bit" error where for example: 8 becomes 7.99999999999.

Try several different values in a loop... print the bits if you like but also print the number itself as a float or a double. You will find that GCC, VC++, Pelles etc. all have this same problem, because it's a limitation of the FPU hardware itself.

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Code Select

 
 // demonstration of floating point inaccuracies
#include <math.h>
#include <stdio.h>
int main (void)
  { int i;
    float x = 1.0;

    for (i = 0; i < 100; i++)
      { printf("%f\t",x);
        x += 0.1; }
    return 0; }

And, yes, this is the flaw they exploited in their boss's computers in Office Space.

oog · February 24, 2012, 06:04:55 PM

Thanks for the response Tater.

But it still doesn't explain why GCC and MSVC give a "perfect" representation (no extra 1 bit on the end) whereas Pelles ends up with an extra 1 bit. Could it be that the algorithm Pelles uses to read in a literal double (or one read with scanf) is not as good as it could be?

It's just that I expected a number like 2000 (as opposed to, say, 2000.3), to be represented exactly. I would certainly expect 8.0 to be represented exactly (sign=0,exponent=1026decimal, mantissa all zeroes, which is what Pelles gives for 8.0 in the program in my first post). I realize that once you start doing math on it, adding and subtracting .1, for instance, all bets are off. But here I'm talking about setting a double to the literal value 2000.0.

Thanks for the link (I'll read it) and movie suggestion!

CommonTater · February 24, 2012, 06:15:52 PM

Quote from: oog on February 24, 2012, 06:04:55 PM
Thanks for the response Tater.

But it still doesn't explain why GCC and MSVC give a "perfect" representation (no extra 1 bit on the end) whereas Pelles ends up with an extra 1 bit. Could it be that the algorithm Pelles uses to read in a literal double (or one read with scanf) is not as good as it could be?

It's just that I expected a number like 2000 (as opposed to, say, 2000.3), to be represented exactly. I would certainly expect 8.0 to be represented exactly (sign=0,exponent=1026decimal, mantissa all zeroes, which is what Pelles gives for 8.0 in the program in my first post). I realize that once you start doing math on it, adding and subtracting .1, for instance, all bets are off. But here I'm talking about setting a double to the literal value 2000.0.

Thanks for the link (I'll read it) and movie suggestion!

As I said before... this is a hardware thing. Every programming language is going to run into it and, as you point out, the result is somewhat code dependent... that is, they all screw up, each in their own way

Enjoy the movie!

oog · February 24, 2012, 08:58:20 PM

Quotethey all screw up, each in their own way

I see. I've read some of that link and things are far more complicated than I thought.

QuoteEnjoy the movie!

I will. I like Mike Judge (King of the Hill, etc) so it should be good. Thanks again, man.

CommonTater · February 24, 2012, 11:42:30 PM

Quote from: oog on February 24, 2012, 08:58:20 PM
Quotethey all screw up, each in their own way
I see. I've read some of that link and things are far more complicated than I thought.

And this surprises you because???

This is one of the reasons why I never use floating point math if I can get out of it. In my inventory packages I do all the pricing in pennies using integers... way fewer headaches that way.

TimoVJL · February 25, 2012, 09:27:46 AM

Checking out crt:

Code Select

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
	double d = atof("2000.0");
	unsigned long long *ll = (unsigned long long *)&d;
	printf("atof:   %llX\n", *ll);
	d = strtof("2000.0", NULL);
	printf("strtof: %llX\n", *ll);
	d = strtod("2000.0", NULL);
	printf("strtod. %llX\n", *ll);
	return 0;
}

Output:

Code Select

atof:   409F400000000001
strtof: 409F400000000000
strtod. 409F400000000001

George99 · February 25, 2012, 06:13:12 PM

Quote from: CommonTater on February 24, 2012, 05:27:52 AM
What' you've discovered is the infamous "last bit ambiguity"... Floats and Doubles are not exact numbers, they are approximations made within the number of available bits. Often a number cannot be exactly represented and there can be a "last bit" error where for example: 8 becomes 7.99999999999.

That's right for results of operations such as x + y and also for non-integer values such as 2000.1 but ALL integer values up to 50 bits or so ARE GUARANTEED to be represented EXACTLY in a double floating-point value.
This means this one is really a bug in Pelle's C.

CommonTater · February 25, 2012, 06:57:58 PM

Quote from: George99 on February 25, 2012, 06:13:12 PM
ALL integer values up to 50 bits or so ARE GUARANTEED to be represented EXACTLY in a double floating-point value.
This means this one is really a bug in Pelle's C.

Hi George, thanks for chiming in...

Is that part of a standard someplace?
I ask, because this is the first I've heard of it and I do like to have copies of these things on hand.

I've used both VC++ and MinGW and have seen integer values miss by one bit in these compilers too. As I mentioned the bit about 8 becoming 7.999999 is very common. Of course non-integer values also get the same treatment as my little demo program shows. (Note that it fairs much better if you change the x variable to a double)

oog · February 25, 2012, 08:46:25 PM

I believe George99 is right according to the IEEE-754 standard. The numbers should be represented exactly if they are read in (with scanf, for example) and can be represented exactly in the given number of bits.

I find it particularly strange that only the (integral) numbers in a few ranges between 1000 and 8191 seem to have their LSB set (I tested from zero to one hundred million with the program below, so there could be more above that).

Take note that I'm only talking about floating-point literals in the code or ones read in with scanf (presumably both use the same algorithm). If, on the other hand, you start a double at 0 (d = 0.0) and increment it (d += 1.0), the LSB does not get set within the range 0 to one hundred million (and presumably much higher). I'm not talking about the results of operations on floating point numbers.

The following program demonstrates the above. It generates a string with the numbers 0 to 10000000 (or whatever) then reads them in with sscanf. It outputs the ranges of the numbers that have their LSB set.

My results (positive numbers):
1000 - 1023
1390 - 2047
2780 - 4095
5560 - 8191

My results (negative numbers):
-1000 - -1023
-1390 - -2047
-2780 - -4095
-5560 - -8191

You have to admit it's odd that only those numbers of all numbers between zero and a hundred million have their LSB set. Is it possible that the scanf algorithm is a little off (and the one that reads literal doubles in the source code)?

Code Select


#include <stdio.h>
#include <stdlib.h>

//#define USEFLOAT  // uncomment to test float instead of double
// if USEFLOAT is defined, UPPERBOUND should be no more than 8388608 (pow(2,23))

//#define TEST_NEGATIVE // uncomment to test negative numbers

#ifdef USEFLOAT
# define FPTYPE             float
# define SAME_SIZE_UNSIGNED unsigned
# define SCANF_FORMAT       "%f"
# define EXPONENT_BITS      8
# define UPPERBOUND         8388608
#else
# define FPTYPE             double
# define SAME_SIZE_UNSIGNED long long unsigned
# define SCANF_FORMAT       "%lf"
# define EXPONENT_BITS      11
# define UPPERBOUND         10000000 // ten million executes fairly quickly
#endif

int testFPBit(FPTYPE d, int n) { // returns 0 or 1
	return (int)(*(SAME_SIZE_UNSIGNED*)&d >> n) & 1;
}
/*
void printFPBits(FPTYPE d) {
	int i;
	for (i = sizeof(FPTYPE)*8-1; i >= 0; i--) {
		if (i == sizeof(FPTYPE)*8-2 || i == sizeof(FPTYPE)*8-2-EXPONENT_BITS)
			printf(" ");
		printf("%d", testFPBit(d, i));
	}
	printf("\n");
}
*/
void scanFPs(void) {
	const int SIZE = 100;
	char line[SIZE];
	FPTYPE d;
	int i, n = 0, bInSpan = 0;
	printf("begin scanFPs...\n");
#ifdef TEST_NEGATIVE
	for (i = 0; i > -UPPERBOUND; i--) {
#else
	for (i = 0; i < UPPERBOUND; i++) {
#endif
		sprintf(line, "%d", i);
		sscanf(line, SCANF_FORMAT, &d);
		if (testFPBit(d, 0)) {
			n = (int)d;
			if (!bInSpan) {
				printf("%d - ", n);    // begin range
				bInSpan = 1;
			}
		}
		else {
			if (bInSpan) {
				printf("%d\n", n); // end range
				bInSpan = 0;
			}
		}
	}
	printf("end scanFPs\n");
}

void testIncrement(void) {
	FPTYPE d = 0.0;
	printf("testIncrement... ");
#ifdef TEST_NEGATIVE
	for (d = 0.0; d >= -UPPERBOUND; d -= 1.0)
#else
	for (d = 0.0; d <= UPPERBOUND; d += 1.0)
#endif
		if (testFPBit(d, 0))
			printf("%f\n", d);  // Doesn't happen
	printf("end testIncrement\n");
}

int main(void) {
	testIncrement();
	scanFPs();
	return 0;
}

oog · February 29, 2012, 04:57:02 PM

So what's the status here? Is this an admitted bug?

CommonTater · February 29, 2012, 07:02:14 PM

Quote from: oog on February 29, 2012, 04:57:02 PM
So what's the status here? Is this an admitted bug?

That's for Pelle to say...

My goal in answering your initial post was to point you to that article... This is a widely known problem that appears to be compiler and language independent (I've seen it in C++ and in Pascal as well)... Floating Point math is never better than an approximation because of this "last bit ambiguity" business in the hardware.

czerny · February 29, 2012, 11:00:54 PM

I am a bit confused about what this "last bit ambiguity" should be.

There are always fractions in all number systems which can not be represented exactly.
Say 5/7 which can only be written exactly in a septa system (0.5), but in binary is 0.101101101...
infinitly. So if there is a fixed size of digits, say five, 0.10110 is as good as 0.10111. Is this meant?

But 2000 can be represented exactly in a double with no problems.

Or is that "last bit ambiguity" caused by something other?

czerny

frankie · February 29, 2012, 11:24:24 PM

Read about rounding rules here http://en.wikipedia.org/wiki/IEEE_754-2008#Rounding_rules

czerny · February 29, 2012, 11:35:52 PM

Ok, if I would like to decide which on is the better representation, I would choose a rounding rule. But with 2000 there has nothing to be rounded.

czerny

News:

double representation

oog

CommonTater

oog

CommonTater

oog

CommonTater

TimoVJL

George99

CommonTater

oog

oog

CommonTater

czerny

frankie

czerny