NO

Author Topic: fseek() - no long-int overflow indication  (Read 14572 times)

migf1

  • Guest
fseek() - no long-int overflow indication
« on: April 06, 2012, 11:54:14 AM »
Hello everybody,

I have noticed that in Pelles C (both 6.00 and 6.50.rc4, both 32 and 64bit) fseek() fails to report a long int overflow when it is told to place the file-pos indicator at the end of a large file...

Code: [Select]
        if ( 0 != fseek(fp, 0L, SEEK_END) ) {
                fclose(fp);
                return 0;
        }
The very same piece of code causes fseek() to return a non-zero value, when it is compiled with mingw32-gcc 4.6.2, cygwin-gcc 4.5.3 and lcc-win 3.8.

The following function contains the only truly portable code I know of to get the size of a file, with standard C99...

Code: [Select]
uintmax_t f_size( const char *fname )
{
        long int size = 0;
        FILE    *fp;

        if ( NULL == (fp = fopen(fname, "rb")) )        /* binary mode */
                return 0;

        if ( 0 != fseek(fp, 0L, SEEK_END) ) {
                fclose(fp);
                return 0;
        }

        size = ftell(fp);
        fclose(fp);

        return (errno != 0 || size < 0) ? 0 : (uintmax_t)size;
}

Compiled with any of the above mentioned compilers, on either WinXP 32bit or Win7 64bit, it returns 0 for files larger than LONG_MAX bytes. With Pelles C however, it wraps-around anything larger than LONG_MAX. For example, passing to it a file of size 5.8Gb it returns 1.7Gb... it behaves like if the return value of ftell() was of type unsigned long instead of the declared & documented long

I use the above function in a cross-platform attempt to open files of any size, relying on it to return me 0 on failure, in which case I open the file gradually using realloc, instead of just one malloc.

Is there any special command-line option I should use in order to make it behave as expected? Currently I'm compiling it like this...

Code: [Select]
-Tamd64-coff -W2 -Ot -Ox -Ob1 -Ze -fp:precise -Gr
PS. I do use some (irrelevant) Windows specific stuff, which are wrapped with appropriate pre-processor directives, so they won;t compile on non-windows platforms.

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2113
Re: fseek() - no long-int overflow indication
« Reply #1 on: April 06, 2012, 02:15:42 PM »
You can use the _filelength64 to get lenght for very large files.
Code: [Select]
long long int flen;
FILE *fp;

fp = fopen("myfile.txt", "rb");
flen = _filelength64(_fileno(fp));

not sure about ftell behaviour, maybe has been kept related to a 32 bits signed value, that's why it wraps around.
If you know that im M$SC it works you can link the M$ Cruntime (MSCrt) instead od Pelles runtime (POCrt).
« Last Edit: April 06, 2012, 02:18:59 PM by frankie »
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

migf1

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #2 on: April 06, 2012, 02:25:16 PM »
You can use the _filelength64 to get lenght for very large files.
Code: [Select]
long long int flen;
FILE *fp;

fp = fopen("myfile.txt", "rb");
flen = _filelength64(_fileno(fp));
Thanks for answering!

Yeap, I could indeed use _filelength64 (or even better the _stat family which is much closer related to the rather popular Posix standard) but unfortunately the whole concept of the project is to be as much compiler/platform agnostic as possible.

Nevertheless, I thought I should report this "wrapping-around" behavior of ftell() as well as fseek()'s behavior as a bug, since they both seem to deviate considerably from their behavior on other compilers.

I also had the hope that it could be some kind of misuse from my part (either in the code or in the compiler's command-line options).

EDIT:

Thanks for the suggestion regarding MS c-runtime, but as I noted above, I'm aiming to a compiler agnostic solution.
« Last Edit: April 06, 2012, 02:29:19 PM by migf1 »

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2113
Re: fseek() - no long-int overflow indication
« Reply #3 on: April 06, 2012, 04:05:34 PM »
From what I know the standard posix functions ftell and fseek *must* return a 32 bits integer (posix standard).
***** So this is not a bug *****
The wrap-around that you see is the expected overflow when using that functions on large files, and doesn't have any relation with signed or unsigned.
All the compilers, included PellesC, have 64 bits extensions: _ftell64, fsetpos, fgetpos, _ftello and _fseeko. But they are not C99 standard functions.
The behaviour of MINGW32, that use GCC, and M$ are *not* standard, so are *not platform agnostic*.
I suggest to use _ftello and _fseeko that uses 64 bits offsets, they are not C99 standard, but almost all compilers have.

NB in GCC using the switch "gcc -D_FILE_OFFSET_BITS=64" fseek and ftell use 64 bits offsets for files larger than 2Gb, *but this is not standard behaviour for C nor for POSIX*.
« Last Edit: April 06, 2012, 04:08:01 PM by frankie »
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

migf1

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #4 on: April 06, 2012, 05:09:27 PM »
From what I know the standard posix functions ftell and fseek *must* return a 32 bits integer (posix standard).
***** So this is not a bug *****
The wrap-around that you see is the expected overflow when using that functions on large files, and doesn't have any relation with signed or unsigned.

Dear frankie,

I'm quoting ftell()'s description from the latest documentation of (the original) GCC...

Quote from: gcc libc 2.8
long int ftell (FILE *stream) [Function]

This function returns the current file position of the stream stream. This function can fail if the stream doesn’t support file positioning, or if the file position can’t be represented in a long int, and possibly for other reasons as well. If a failure occurs, a value of -1 is returned.

Moreover, please have a look at The Open Group Base Specifications 2008 documentation of ftell(), at this address: http://pubs.opengroup.org/onlinepubs/9699919799/functions/ftell.html#tag_16_202, where it is clearly stated that...

a) what is described it complies to the ISO C standard

b) the return value of -1 casted to long along with setting errno to EOVERFLOW is set when ftell() is asked for a file offset that cannot be represented correctly in an object of type long (it is in the ERRORS section).

The same holds for fseek(): http://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html#tag_16_196 (again check the ERRORS section).

Pelles C's implementation of those function does not comply to those specifications, so I think it makes sense to considered them buggy.

Quote
All the compilers, included PellesC, have 64 bits extensions: _ftell64, fsetpos, fgetpos, _ftello and _fseeko. But they are not C99 standard functions.

I'm aware of that, but imagine the nightmare of redefining in the pre-processor non-standard function names, types, and constants for every compiler/platform implementation. If one takes that route, it will be extremely painful to write/maintain the code, even for only the most popular compilers/platforms.

Quote
The behaviour of MINGW32, that use GCC, and M$ are *not* standard, so are *not platform agnostic*.
But at the very least they do return a negative value on overflows, they don't wrap around.

Quote
I suggest to use _ftello and _fseeko that uses 64 bits offsets, they are not C99 standard, but almost all compilers have.

NB in GCC using the switch "gcc -D_FILE_OFFSET_BITS=64" fseek and ftell use 64 bits offsets for files larger than 2Gb, *but this is not standard behaviour for C nor for POSIX*.
This fall to my prvious response above. To give just an example, check this out:

a) gcc: off64_t ftello64 (FILE *stream)
b) lcc-win: long long ftelli64(FILE *);
c) pelles c: ????? (I don't seem to find the suggested _ftell64() in the documentation)

Even if _ftell64() does exist (perhaps named differently) imagine what it takes to maintain code that tries to use them abstractly.
« Last Edit: April 06, 2012, 05:11:10 PM by migf1 »

Offline Stefan Pendl

  • Global Moderator
  • Member
  • *****
  • Posts: 582
    • Homepage
Re: fseek() - no long-int overflow indication
« Reply #5 on: April 06, 2012, 05:39:22 PM »
Why not define compiler specific macros in such a case?
---
Stefan

Proud member of the UltraDefrag Development Team

CommonTater

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #6 on: April 06, 2012, 05:47:46 PM »
@migf1 ...

The guys are right in that this is not a POCC bug, per say.... 
The reason you are getting "wrap around" errors is not a bug in ftell() ... it is a limitation of the C programming language; which curiosly enough is totally compiler agnostic.  You need to appreciate that C itself has absolutely no features for range or bounds checking... none... zero... not there.  So, if you are concerned about overflows you will have to develop the means to check them yourself (a short asm routine to check the Carry flag and throw an exception, for example).

Use the 64 bit filesizes... if you have compilers that can't deal with that, you also have compilers that can't load any file bigger than 4gb ... a common limitation of older compilers that pretty much renders them useless with modern data sets. 
 
(For example: I have one inventory package where the main data file is 15+gB ... yes gigabytes.  At one time Pelles C was the only compiler that could seek the full length of it... stuff happens).
 

CommonTater

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #7 on: April 06, 2012, 05:52:06 PM »
c) pelles c: ??? ?? (I don't seem to find the suggested _ftell64() in the documentation)

Even if _ftell64() does exist (perhaps named differently) imagine what it takes to maintain code that tries to use them abstractly.


Look in the help file ... Private headers ... io.h
-OR-
Simply type _ftell64 into the source editor and press F1 on your keyboard.  (All functions are indexed this way)

 
Also, noting your concern about compiling your code on other platforms or compilers... Take a pragmatic approach to the problem:
1) What are the chances your code will ever be implemented on any other platform than windows? 
2) Pelles C is free, there is no consequence to a user for installing it to maintain your code.
 
I've been programming for a long time (since 1985, in Pascal, since 2004 in C).  I have large inventory packages in daily use and have distributed both shareware and freeware... and I can count on the fingers of one hand the number of times people have asked me for Linux (etc.) versions of my code.
 
« Last Edit: April 06, 2012, 06:11:27 PM by CommonTater »

migf1

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #8 on: April 06, 2012, 06:33:18 PM »
Guys thanks for your answers!

@CommonTater:

I do appreciate Pelles C, I'm using it since version 3.x (if I recall correctly, it was surely before 4.x). I have advertised it since, all over the places. But this is not the point here.

I'm not here to claim that Pelles C is buggy or inferior or any such thing. If  the mods feel that what I've described does not classify as a bug, please by all means feel free to move the thread to a different place.

However, I do stand to my original pov: those 2 functions are not in par with the language's standards. Thus making it impossible to rely on them, to write truly portable code. Every other compiler I've used to test the presented code (the ones I mentioned, plus GCC on FreeBSD) return a negative value on long overflow in fseek, as they should. They also correclt set the errno το EINVAL.

So I honestly fail to understand how this not classified as a Pelles C bug. In any case, I felt obliged to report it.

@Stefan Pendl

Because its not only the symbol names, it's also the types, and any relative constants. As is the case with off_t for example, where you may find it as _off_t, and who knows in what other variations/combinations of accompanying implementation routines & types. Same thing for stat/_stat, along with its whole interface (ranging from different bitflag names to different requirements in the included files).

If I was to be in such a trouble, I would pretty much prefer to use each platform's low-level routines, such as GetFileSizeEx() on Windows for example, from which I would at least profit dramatic drop in loading times.

As I explained earlier, there is already a designing choice made to keep the code as compiler/platform agnostic as possible. More than 7000 lines of code are already written, and so far the only problem I'm facing is the one I've mentioned here with Pelles-C.



CommonTater

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #9 on: April 06, 2012, 07:11:22 PM »
Guys thanks for your answers!

@CommonTater:
I'm not here to claim that Pelles C is buggy or inferior or any such thing. If  the mods feel that what I've described does not classify as a bug, please by all means feel free to move the thread to a different place.

Please understand I'm not defending Pelles C, in the sense that I think you're attacking it...  If there's a bug, there's a bug and it needs to be reported and (hopefully) addressed by Pelle.  I've reported a couple myself.

But, there is also value in examining these things interractively as we do... not the least of which is trying to let you get on with your project...

Quote
However, I do stand to my original pov: those 2 functions are not in par with the language's standards. Thus making it impossible to rely on them, to write truly portable code. Every other compiler I've used to test the presented code (the ones I mentioned, plus GCC on FreeBSD) return a negative value on long overflow in fseek, as they should. They also correclt set the errno το EINVAL.

I don't disagree with you... but I will question if there is such a thing as "truly portable code" in C ...  Maybe in an interpreted language like Java or C# ... but C? ... not so much.

Quote
So I honestly fail to understand how this not classified as a Pelles C bug. In any case, I felt obliged to report it.
It may well be... that's for Pelle to examine and determine...  My goal was to help you get back to work on your project... 

Moreover, it makes little sense to use 32 bit file APIs in a 64 bit world.  NTFS supports files bigger than any available hard disk... so should you.

Perhaps you should consider a string of OS specific libraries wrapping the OS's lower level file calls into a common interface... Just sub in the appropriate library when you change OSs...


migf1

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #10 on: April 06, 2012, 07:36:20 PM »
...
But, there is also value in examining these things interractively as we do... not the least of which is trying to let you get on with your project...
We are on the same boat here, I never thought otherwise (it could be that English is not my native language).

Quote
....
Perhaps you should consider a string of OS specific libraries wrapping the OS's lower level file calls into a common interface... Just sub in the appropriate library when you change OSs...
That's exactly what I'm trying to avoid  :)

You are 100% right about the "truly portable code"... there is no such thing with C. I've already broken that concept myself in the above mentioned program (by implementing a mini pre-processor interface for abstractly colorizing the console, either with ANSI codes on non-windows platforms, or with native Windows API calls on Windows, or no colors at all).

But I really (really) cannot afford to convert this exception to a rule for the whole thing. Otherwise I will be easily get lost in my own code, ending up to spend more time to support platform specific stuff instead of improving/extended the program features.

PS. Btw, we have pretty much the same programming origin. I've also started with Pascal in 1985 (on an Apple IIc to be more precise). I never looked back once I moved on to C a couple of years later :)

CommonTater

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #11 on: April 06, 2012, 08:29:46 PM »
We are on the same boat here, I never thought otherwise (it could be that English is not my native language).

No worries! :D   Sometimes I add stuff to messages on the "Just in case" premise... never, ever, is it my intention to offend or demean... But text is an imperfect way to communicate so on occasion my attempts to be thorough do go a little wrong...
 
Quote
You are 100% right about the "truly portable code"... there is no such thing with C. I've already broken that concept myself in the above mentioned program (by implementing a mini pre-processor interface for abstractly colorizing the console, either with ANSI codes on non-windows platforms, or with native Windows API calls on Windows, or no colors at all).

Hense my suggestion of replaceable libraries... DLLs perhaps... that simplify low level acess.  I know it's a pain but you can also do some updates by simply issuing a new DLL... so it's not entirely bad.

Quote
But I really (really) cannot afford to convert this exception to a rule for the whole thing. Otherwise I will be easily get lost in my own code, ending up to spend more time to support platform specific stuff instead of improving/extended the program features.

Such are the bright ideas of life! 
 
Create a new mechanism to replace an old one... you end up maintaining both... :(

Quote
PS. Btw, we have pretty much the same programming origin. I've also started with Pascal in 1985 (on an Apple IIc to be more precise). I never looked back once I moved on to C a couple of years later :)

Sounds about right... I stayed with Pascal for quite a while because it does have range and bound checks and, of course, a native string type.  I was there right up until Borland screwed it all up with Delphi...  I started in with Pelles C --something of a rough beginning, I might add-- in 2004 when maintaining Pascal code became a bigger pain in the backside than recoding in C... But, all in all, I'm glad I'm here.  I keep trying different compilers and languages and I always seem to end up back with Pelles C... :D

migf1

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #12 on: April 07, 2012, 11:38:57 AM »
[offtopic]
Guys, there's no _ftell64() on my pelles c 6.50.rc4 documentation. Could you please remind me how to rebuild the database?

PS. For the moment, I'll just put a note in my program's documentation stating that when compiled with Pelles C, files larger than 2Gb do not load correctly (btw, it's a portable and somewhat fancy console hexviewer, which will be distributed as source-code only).
[/offtopic]
« Last Edit: April 07, 2012, 11:42:01 AM by migf1 »

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2113
Re: fseek() - no long-int overflow indication
« Reply #13 on: April 07, 2012, 01:29:02 PM »
Dear mfg,
I think that the desire of all peoples in this forum is to make PellesC as close as possible to a perfect compiler tool. Nobody want to defend the compiler because it is nice  :)
The papers you indicated clearly mark the description as extensions of the standard as per POSIX.1.2008 :
Quote
[EOVERFLOW]
[CX]  For ftell(), the current file offset cannot be represented correctly in an object of type long. [CX]  Extension to the ISO C standard

[CX]: The functionality described is an extension to the ISO C standard. Application developers may make use of an extension as it is supported on all POSIX.1-2008-conforming systems.
Anyway the C99 standard ISO/IEC 9899:TC3 says:
Quote
If successful, the ftell function returns the current value of the file position indicator
for the stream. On failure, the ftell function returns −1L and stores an
implementation-defined positive value in errno.
While it leaves space to interpretation because ftell is not technically failed, to return an error for overflow could be prefered. We have to consider anyway that this error is more a system than user error in the sense that its meaning should be "wrong function chosed by programmer" more than "runtime error".
So I agree with you that could be better to modify the function on future release, but it is also correct to warn users that the software they are using have a limititation on files size and that for them the behaviour is undefined.
« Last Edit: April 07, 2012, 01:31:29 PM by frankie »
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

migf1

  • Guest
Re: fseek() - no long-int overflow indication
« Reply #14 on: April 07, 2012, 01:44:03 PM »
Opps, my bad! You are absolutely right about the errno being a posix extension, I completely missed it. My apologies!
It would be nice though to have Pelles C at least returning a negative value on such cases.

PS. Any pointers on how to rebuild the database, hoping that _ftell64() will appear on Pelles C documentation on my systems. Currently it is not there (neither on my Win7 64bit installation nor on my WinXP 32bit installation).
« Last Edit: April 07, 2012, 01:48:06 PM by migf1 »