NO

Author Topic: Strange behaviour from (f)printf()  (Read 5857 times)

Frontier

  • Guest
Strange behaviour from (f)printf()
« on: December 31, 2007, 03:36:23 PM »
Hello,

It's been almost new years eve, so first let me wish anyone a Happy New Year full of happiness and health.

Now, to the issue at hand. Consider the following simple source code

Code: [Select]
#include <stdio.h>

#define MAXCHARS 127

void make_probfile(FILE *infile, FILE *outfile)
{
unsigned int cfreq[MAXCHARS];
unsigned int numchrs=0;
int ch;

for(int iCount=0; iCount<=MAXCHARS; iCount++)
cfreq[iCount]=0;

while((ch=getc(infile)) != EOF)
{
if(ch >= 0 && ch <=MAXCHARS)
{
cfreq[ch]++;
numchrs++;
}
}
fclose(infile);

for(int iCount=0; iCount<=MAXCHARS; iCount++)
fprintf(outfile,"%.8f\n", (float)((float)cfreq[iCount] / (float)numchrs));
//fprintf(outfile,"%i: cfreq[%c]=%i, numchrs=%i : %.8f\n", iCount, iCount, cfreq[iCount], numchrs, ((float)cfreq[iCount] / (float)numchrs));

fclose(outfile);
}

void main()
{
FILE *fp1, *fp2;

fp1=fopen("c:\\temp\\sample.txt","r");
fp2=fopen("c:\\temp\\outfile.txt","w");

make_probfile(fp1,fp2);
}

We supply one big ASCII text file "sample.txt" to the program (inside c:\temp), that contains only characters from the first 128 (0-127) ASCII character set. The program calculates the frequency that each character appears on the text and then each character's probability. The results are stored on the c:\temp\outfile.txt file.

The line
Code: [Select]
fprintf(outfile,"%.8f\n", (float)((float)cfreq[iCount] / (float)numchrs));

produces different results on the last character (character 127), regarding the computed probability of this character, than the line

Code: [Select]
               fprintf(outfile,"%i: cfreq[%c]=%i, numchrs=%i : %.8f\n", iCount, iCount, cfreq[iCount], numchrs, ((float)cfreq[iCount] / (float)numchrs));

The code works fine on Visual C++ and MinGW32 under Windows, so I suspect that there is something that either I am doing wrong with PellesC or there is a bug in fprintf().

Any ideas what's going wrong with this code?

Many thanks in advance.

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: Strange behaviour from (f)printf()
« Reply #1 on: January 01, 2008, 02:29:18 PM »
Code: [Select]
#define MAXCHARS  127
Code: [Select]
unsigned int cfreq[MAXCHARS];
Code: [Select]
if(ch >= 0 && ch <=MAXCHARS)
    cfreq[ch]++;

You do the math...

(hint: the array is not big enough...)
/Pelle

Frontier

  • Guest
Re: Strange behaviour from (f)printf()
« Reply #2 on: January 01, 2008, 07:42:42 PM »
Code: [Select]
#define MAXCHARS  127
Code: [Select]
unsigned int cfreq[MAXCHARS];
Code: [Select]
if(ch >= 0 && ch <=MAXCHARS)
    cfreq[ch]++;

You do the math...

(hint: the array is not big enough...)

Hello Pelle and Happy New Year :)

Thanks for the input, but I do not understand it: why the above code works fine with Visual C++ (VC2003 Toolkit) and gcc 4.2.1 and not with PellesC?
I do take into account only characters 0 - 127 (128 in total), so I am creating an array of 128 int's.

By doing the following changes
Code: [Select]
#define MAXCHARS 128
Code: [Select]
if(ch >= 0 && ch <=MAXCHARS-1)
and
Code: [Select]
for(iCount=0; iCount<=MAXCHARS-1; iCount++)
the code produced by PellesC manages to produce the same result with the one with VC and gcc, but that's not the issue: could it be that the difference lies in
Code: [Select]
unsigned int cfreq[MAXCHARS];
being understood differently by PellesC than the rest of the compilers above?

Maybe I'm wrong but I thought that arrays are always zero-bound in C, therefor an
Code: [Select]
unsigned int cfreq[MAXCHARS] when MAXCHARS=127, would create an array of 128 unsigned int's.
On PellesC case, I had to change MAXCHARS to 128 in order to work (and change my code in order not to work with the 129th integer).

Again, thanks for your answers.

Offline Robert

  • Member
  • *
  • Posts: 245
Re: Strange behaviour from (f)printf()
« Reply #3 on: January 02, 2008, 07:55:32 AM »
Pelle's C
1031452.28025937
127: cfreq[]=0, numchrs=4164 : 0.00000000

Microsoft Visual C/C++ 8
0.00000000
127: cfreq[]=0, numchrs=4164 : 0.00000000

MinGW
0.00000000
127: cfreq[]=0, numchrs=4164 : 0.00000000

Digital Mars 8.50
1.00000000
127: cfreq[]=4164, numchrs=4164 : 1.00000000

Borland CBuilderX
0.00000000
127: cfreq[]=0, numchrs=4164 : 0.00000000

Watcom C/C++ 1.7a
1.00000000
127: cfreq[]=4164, numchrs=4164 : 1.00000000

Robert Wishlaw

Offline Robert

  • Member
  • *
  • Posts: 245
Re: Strange behaviour from (f)printf()
« Reply #4 on: January 02, 2008, 08:13:59 AM »
Code: [Select]
#define MAXCHARS  127
Code: [Select]
unsigned int cfreq[MAXCHARS];
Code: [Select]
if(ch >= 0 && ch <=MAXCHARS)
    cfreq[ch]++;

You do the math...

(hint: the array is not big enough...)

Maybe I'm wrong but I thought that arrays are always zero-bound in C, therefor an
Code: [Select]
unsigned int cfreq[MAXCHARS] when MAXCHARS=127, would create an array of 128 unsigned int's.


The lower bound of a C array is 0. When MAXCHARS=127 the array created is from 0 TO 126.

IIRC, Microsoft QBASIC will create an array from 0 TO 127 with MAXCHARS=127.

Robert Wishlaw

Offline Robert

  • Member
  • *
  • Posts: 245
Re: Strange behaviour from (f)printf()
« Reply #5 on: January 02, 2008, 08:55:51 AM »
Code: [Select]
#include <stdio.h>

#define MAXCHARS 127
#define ubound(T) (sizeof((T))/sizeof((T[0]))-1)

int main(void)
{
unsigned int cfreq[MAXCHARS];
    printf("% d\n",(int)ubound(cfreq));
}

The ubound macro in the above code, taken from the BCX BASIC to C Translator, will print out 126 as the upper bound of the cfreq array.

Robert Wishlaw
« Last Edit: January 02, 2008, 09:29:52 AM by Robert »

severach

  • Guest
Re: Strange behaviour from (f)printf()
« Reply #6 on: January 05, 2008, 08:01:20 AM »
the code produced by PellesC manages to produce the same result with the one with VC and gcc, but that's not the issue: could it be that the difference lies in
Code: [Select]
unsigned int cfreq[MAXCHARS];
being understood differently by PellesC than the rest of the compilers above?
All compilers handle your buffer overflow exactly the same. The difference is in the optimizations and stack layout. Other compilers may lay the stack variables in a different order or might have optimized different variables into registers. Due to your buffer overflow what exactly is laid in the 128'th position changes how the program works. Thank Pelles-C for helping you find your mistake.