NO

Author Topic: unusual rounding of least significant digit in double precision binary format  (Read 198 times)

Offline WesL

  • Member
  • *
  • Posts: 3
I noticed slightly different floating point results with Pelles C compared to other C compilers.  I tracked it down to how certain decimal values are rounded when converted to the IEEE double precision binary format (sign:1 bit, exponent:11 bits, mantissa:52 bits)

Most values are rounded properly. For example, 0.9 is stored as 0x3feccccccccccccd.  The final hex digit is correctly rounded up to d since c>8.

However, the value 0.3 is stored as 0x3fd3333333333334.  Notice the final hex digit is rounded up to 4 but it seems like it should be rounded down to 3 since 3<8.  I thought perhaps a "round to even" rule might be in use, but this doesn't seem to be the case as it sometimes unexpectedly rounds up to even and other times it rounds up to odd. 

Also, with values that are powers of 2 larger, the rounding is inconsistent.


 0.6 = 0x3fe3333333333334 ?
 1.2 = 0x3ff3333333333333

 5.6 = 0x4016666666666667 ?
11.2 = 0x4026666666666666

0.72 = 0x3fe70a3d70a3d70b ?
1.44 = 0x3ff70a3d70a3d70a

0.84 = 0x3feae147ae147ae2 ?
1.68 = 0x3ffae147ae147ae1


This happens at compile time (x=0.3) as well as at run time with atof() and scanf() functions.  I have not come across any such unusual rounding for single precision floats, just doubles.

I checked with 6 other compilers (ms, gcc, clang, intel, watcom, digital mars) and they all round to the nearest digit in these cases rather than rounding up.

Is this a bug? or simply a rounding scheme with which I am not familiar?


Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 1582
Look here:
http://forum.pellesc.de/index.php?topic=4322.msg16031#msg16031

Code: [Select]
int main(void)
{
double d1, dpi = 3.14159265358979323846;
printf("3.14159265358979323846\n");
printf("%.17lf\t", dpi);
printf("%llXh\n", *(unsigned long long*)&dpi);
(*(unsigned long long*)&d1) = (unsigned long long)0x400921FB54442D18;
printf("%.17lf\t", d1);
printf("%llXh\n", *(unsigned long long*)&d1);
return 0;
}
Code: [Select]
3.14159265358979323846
3.14159265358979355     400921FB54442D19h
3.14159265358979319     400921FB54442D18h

from net, just for own constants
Code: [Select]
double str2double(const char* str)
{
  double value = 0, f = 0;
  int i = 0;

  while (!(('0' <= str[i]) && (str[i] <= '9')) && (str[i] != 0))
    i++;

  while (str[i] != 0) {
    if (str[i] == '.') {
      f = 10;
      i++;
      continue;
    }

    if (f) {
      value += (float)(str[i] - '0') / f;
      f *= 10;
    } else
      value = 10 * value + str[i] - '0';

    i++;
  }

  if (str[0] == '-')
    value = -value;

  return value;
}
« Last Edit: March 29, 2017, 09:22:52 AM by TimoVJL »
May the source be with you

Offline WesL

  • Member
  • *
  • Posts: 3
Thank you for the link.  I had searched for related threads before I posted, but I missed that one.

It looks like the original post in that thread is regarding the same issue.  Many of the responders seem to miss the heart of the problem.  The whole "floats and doubles are not exact" issue is real, but unrelated to what his (and my) post was about.

Your example with pi is a good one.  Whether the compiler converts the decimal literal to 400921FB54442D18h or 400921FB54442D19h, neither one will be exactly correct, but rounding the last digit down to 8 is closer than rounding up to 9.  All other compilers (and online converters) that I've tested do this conversion with "round to nearest" which in this case is down.
Code: [Select]
3.14159265358979323846
3.14159265358979312     400921FB54442D18h
3.14159265358979312     400921FB54442D18h

This is not an hardware FPU issue, but purely a software issue.  There are various rounding schemes (round to nearest, round up, round down, round toward zero, round away from zero, round even, round odd), but Pelles C doesn't seem to be consistent with any of those.  It appears to usually use "round to nearest" but the inconsistency gives the appearance of being a bug.

I wanted to make sure the developers were aware.  Based on the old posts, I guess they are.

Offline jj2007

  • Member
  • *
  • Posts: 445
One problem is perhaps that the fact that 2000.0 should get printed correctly obscures the fact that double precision ends at about 16 digits. You can see that when playing with REAL10 aka "extended" precision:

Code: [Select]
0001.234567890123456789 digits
2000.000000000000000000 CRT double
0003.141592653589793238 PI in REAL10 precision
2003.141592653589793    2000 added, REAL10
2003.141592653589900    CRT double

The two CRT lines are printed using printf("%4.18f\tCRT double\n", fp8) (4.15 for the 2nd line).