NO

Author Topic: fread() - slow on 4GB file  (Read 8864 times)

Offline tony74

  • Member
  • *
  • Posts: 34
fread() - slow on 4GB file
« on: November 10, 2023, 03:24:50 AM »
OK, maybe I'm not configued right, but here's the thing:

In VS 2022 build tools CLI this code fread()s a 3.9GB file into memory in 1.5 secs, so that's 2581.5 MB/sec for almost 4GB.

In Pelles-C using IDE, the same code fread()s the same 3.9GB file into memory in 12.8 secs, so that's 307.1 MB/sec.

I'm gonna drop my code here so you guys can play with it, but that's a pretty big discrepancy...I was tearing my hair out (what little is left) while working with this in Pelles, then decided to try it in VS 2022 to see if it was something I was coding wrong or my machine was just slow, turns out it was the compiler after all...by several orders of magnitude (1.5 vs 12.8 secs).

OS:      Win10 build 19045
Machine: Xeon, 64GB, 2TB SSD system drive
Test File: Avatar.The.Way.Of.Water.mp4 
File Size: 3941952295

Compiler settings:
-Pelles C 12.00.2-
Debug:          None
Warnings:       level 1
Diagnostics:    Classic
Microsoft Extensions Yes
Multithreaded   Yes
Maximise speed  Yes
inlining:       Default
Architecture:   SSE2
Floating Point: Fast
Calling Conv:  __cdecl
Linker: kernel32.lib advapi32.lib delayimp64.lib user32.lib   

Compiler settings:
-VS 2022 Build Tools-
All default, no libs added, no compiler-flags
command line: cl dde.c

Try it with a file of around the same size (I limit the size to 4GB in the code).
See what you come up with, I'd like to know why it's doing that, as I'm sure you would too.

Drop me a line if you think of any other info I can provide.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <time.h>

#define MEG     1000000.0
#define GIG     1000000000.0
#define MAXLEN  4294967295
#define MEMERR  -100
#define LENERR  -200
#define FILERR  -300
#define FRDERR  -400
#define NOERR    0
#define uchar   unsigned char

struct {
    uchar    *fbuf;
    uchar    *ip;
    uchar    *op;
    uint32_t  addr;
    uint32_t  sz;
    double    et;
}M;



//parse errors
///////////////////////////////////////////////////////////////
char *geterror(uint32_t e)
{
    static char ebuf[32];
    switch(e){
            case MEMERR:
              strcpy(ebuf,"\nMEMERR\n");
            break;
            case LENERR:
              strcpy(ebuf,"\nLENERR\n");
            break;
            case FILERR:
              strcpy(ebuf,"\nFILERR\n");
            break;
            case FRDERR:
              strcpy(ebuf,"\nFRDERR\n");
            break;
            default:
              strcpy(ebuf,"\nNOERR\n");
            break;
    }

return ebuf;   
}


//get filesize up to MAXLEN
///////////////////////////////////////////////////////////////
uint32_t getfsize(char *fname)
{
    FILE     *fp;
    __int64   offsetzed=0;
    __int64   offsetend=0;

    fp=fopen(fname,"rb");

    _fseeki64(fp, offsetzed, SEEK_END);
    offsetend = _ftelli64(fp);

    fclose(fp);

    if(offsetend>MAXLEN) return LENERR;

    M.sz=(uint32_t)offsetend;

return M.sz;
}


//load input file into memory
///////////////////////////////////////////////////////////////
uint32_t loadinput(char *fname)
{
    FILE     *fp;
    uint32_t fsize =0;
    uint32_t r     =0;
    clock_t t;

   
    fsize=getfsize(fname);
    if(fsize==LENERR) return LENERR;

    M.fbuf =(uchar *)malloc((fsize+10) * sizeof(char));
    if(M.fbuf==NULL)  return MEMERR;
    memset(M.fbuf,0,(fsize+10));

    fp=fopen(fname,"rb");
    if(fp==NULL)      return FILERR;

    t = clock();
    r=(uint32_t)fread(M.fbuf,1,fsize,fp); //// slow fread! ////
    t = clock() - t;

    M.et = ((double)t)/CLOCKS_PER_SEC;   //// elapsed time ////

    fclose(fp);

    if(r!=fsize){
       printf ("fread error: r:%u, fsize:%u\n",r,fsize);
    }

return r;
}


// MAIN
///////////////////////////////////////////////////////////////
int main(int argc, char **argv)
{
    uint32_t r=0;

    if(argc<2){
       printf("Syntax: dde filename\n");
       return 0;   
    }

    r=loadinput(argv[1]);
    if(r!=M.sz){
       printf(" %s\n",geterror(r));
       free(M.fbuf); 
       return 0;
    }   

    printf(" fread() took %.1f seconds to read a %.1fGB file.\n", M.et,(float) M.sz/GIG);
    printf(" That's %.1fMB per sec. \n",(M.sz/(float)M.et)/MEG);
 
    free(M.fbuf); 
 
return 0;   
}



Offline John Z

  • Member
  • *
  • Posts: 860
Re: fread() - slow on 4GB file
« Reply #1 on: November 10, 2023, 10:40:34 AM »
Hi tony74,

Is is possible for you to post the entire Pelles project version using the zip feature under Project?

I tried to compile your code but there are a lot of warnings (15), even though level 1, so I'm guessing
something was not included in your post maybe?
I copied your post directly and the build was for console 64 bit and tried console 32 bit.

Just a few of them:
Code: [Select]
C:\Program Files\PellesC\Files\big_open\big_open.c(59): warning #2073: Conversion from 'unsigned int' to 'int' changed value from 4294966996 to -300.

C:\Program Files\PellesC\Files\big_open\big_open.c(84): warning #2018: Undeclared function '_fseeki64' (did you mean: _fseek64?); assuming 'extern' returning 'int'.

C:\Program Files\PellesC\Files\big_open\big_open.c(85): warning #2018: Undeclared function '_ftelli64' (did you mean: _ftell64?); assuming 'extern' returning 'int'.

C:\Program Files\PellesC\Files\big_open\big_open.c(89): warning #2073: Conversion from 'int' to 'unsigned int' changed value from -200 to 4294967096.

John Z

Update: OK fixed all the warnings and undeclared functions.  Largest file I could find was about 2G
Yes - results are disappointing at :
Code: [Select]
fread() took 18.8 seconds to read a 2.0GB file.
 That's 108.7MB per sec.
So even worse than your test, but I don't know how fast your VS 2022 code would run on my system to compare.
Perhaps you'll post that executable here.

Not to nit pick  :) but your results are  only one order of magnitude (approximately) different not "several orders" which would mean 100X or more  ;)

« Last Edit: November 10, 2023, 12:57:12 PM by John Z »

Offline John Z

  • Member
  • *
  • Posts: 860
Re: fread() - slow on 4GB file
« Reply #2 on: November 10, 2023, 03:06:44 PM »
Hi tony74,

So if you can switch to using CreateFile, ReadFile, CloseHandle  [will require windows.h]
or
try switching to _open,_read,_close  [will require io.h and fcntl.h for just constant _O_RDONLY or just use 0]
it is much faster, over an order of magnitude faster  :) on my system...
Code: [Select]
C:\Program Files\PellesC\Files\big_open>big_open EIHH8305.MP4

 LoadFile ReadFile took 1.1 seconds
file length 2039467900 read length 2039467900

 _read() took 1.0 seconds to read a 2.0GB file.
 That's 1951.6MB per sec.

The same file using fread() on my system was about 18 seconds.

John Z

My guess if VS is doing something sneaky with the fread which is supposed to use fgetc ... but I don't know for sure as I don't have VS 2022, last version I have is VC 5.

I'd still like to try your VS 2022 executable on my system for reference....

Offline tony74

  • Member
  • *
  • Posts: 34
Re: fread() - slow on 4GB file
« Reply #3 on: November 10, 2023, 09:26:03 PM »
Ok John, thanks for having a look.
I don't have a clue why you got those errors.
My Pelles compile doesn't show anything...wait while I try Level 2 (Level 1 and 2 are all I show under Compiler Options for Warnings..)...just recompiled with Level 2, got nothing, no errors.

I'm using Pelles version 12.00.2, make sure you're on the same version.
The Pelles project files and exe...in dde.zip.
The VS executable...in VCdde.zip
CLI Sceenshot...fread.jpg

Try an A:B test with both executables, notice the VC exe is about twice the size of the Pelles exe.
Read times will differ on every system, but there's still a big difference with the compilers.
I'd try a compile on gcc just to see what it does, but I don't have it installed on this machine (curious to find out, though).

Tony


Offline Vortex

  • Member
  • *
  • Posts: 864
    • http://www.vortex.masmcode.com
Re: fread() - slow on 4GB file
« Reply #4 on: November 10, 2023, 09:48:47 PM »
Hi Tony,

Can you try the code below based on the simple Win32 file handling functions? ReadFileToMem is a function to load a binary file into memory.

Code: [Select]
#include <windows.h>
#include <stdio.h>

// pFileName       : pointer fo file name
// pMem            : pointer to a pointer indicating the reserved memory to read the file
// pNumOfBytesRead : pointer to a variable that receives the number of the bytes read

BOOL ReadFileToMem(LPSTR pFileName,LPBYTE* pMem,int* pNumOfBytesRead)
{
  HANDLE hFile;
  DWORD  FileSize;
  LPVOID hMem;
  DWORD  nBytesRead;
  BOOL   t;

  hFile=CreateFile(pFileName,GENERIC_READ,0,0,OPEN_EXISTING,0,0);

  if(hFile==INVALID_HANDLE_VALUE ){
      return 0;
    }

  FileSize=GetFileSize(hFile,0);
  if(!(FileSize)){
      return 0;
    }

  hMem=VirtualAlloc(0,FileSize,MEM_COMMIT,PAGE_READWRITE);
  if(!(hMem)){
      return 0;
    }
  *pMem=hMem;

  t=ReadFile(hFile,hMem,FileSize,&nBytesRead,0);
  if(!(t)){
       VirtualFree(hMem,0,MEM_RELEASE);
       return 0;
    }

  *pNumOfBytesRead=nBytesRead;
  CloseHandle(hFile);

  if(!(t)){
      return 0;
    }
  return 1;
}

int main(void)
{
LPBYTE pMemory;
int BytesRead;

ReadFileToMem("test.txt",&pMemory,&BytesRead);
printf("%s",pMemory);

VirtualFree(pMemory,0,MEM_RELEASE);
return 0;
}
Code it... That's all...

Offline John Z

  • Member
  • *
  • Posts: 860
Re: fread() - slow on 4GB file
« Reply #5 on: November 10, 2023, 09:57:41 PM »
Hi tony74,

I'm using Pelles version 12.00.2, make sure you're on the same version.
Yes same Pelles version BUT I'm on WIN 11  I don't have my WIN 10 system anymore.

The code Vortex sent is the same as one of my tests id'd as Readfile in my last previous post. 
Nice of him to send it  :) saving you the coding....

Thanks for the DDE source, and the VS 2022 exe it will be interesting, I'll try shortly.

John Z

Update:  OK - I missed checking Micro$oft Extensions  <- with = no warnings
The benefit of sending the zip  :) guaranteed to be the same...
----------------------------------
Test of VS 2022 executable:
Code: [Select]
C:\Program Files\PellesC\Files\big_open>vcdde  EIHH8305.MP4
File Size: 2.0GB
Read Time: 1.1sec
Read Rate: 1898.9 MB/sec

This 1.1 sec matches the test I did using ReadFile method (Like Vortex sent)
Code: [Select]
C:\Program Files\PellesC\Files\big_open>big_open EIHH8305.MP4

 LoadFile ReadFile took 1.1 seconds
file length 2039467900 read length 2039467900

 _read() took 1.0 seconds to read a 2.0GB file.
 That's 1951.6MB per sec.
and
similar to the 3rd method using _open,_read,_close

Both of my tests used your source but substituted just the file handling method.

So I definitely think VS 2022 is optimizing the code and not really using fread() - still just a guess
but somewhat supported by the output results.
-------------------------
Executing the DDE Pelles C version as sent results in:
Code: [Select]
C:\Program Files\PellesC\Files\big_open>dde.exe EIHH8305.MP4
File Size: 2.0GB
Read Time: 19.4sec
Read Rate: 105.0 MB/sec
-------------------------
To prove it one might try using fgetc until eof and see how long it takes instead of using fread....

John Z
« Last Edit: November 10, 2023, 10:30:48 PM by John Z »

Offline tony74

  • Member
  • *
  • Posts: 34
Re: fread() - slow on 4GB file
« Reply #6 on: November 11, 2023, 01:00:24 AM »
Thanks John and Vortex,

One might think MS is somehow pulling a fast one here, but I went ahead and installed gcc, just to see if a straight ahead compile would differ significantly from VC, results are in the screengrab.

I attached the resulting exe, as well, in case you want to A:B it.
If you guys have more compilers (I only have the three, now) you could see if the trend continues.

The point here is not to have to substitute methods to gain equal performance, but to find out why at least two other compilers are performing up to seven times faster on this issue with plain old fread()...at least with the test file used on my machine.

I haven't done this yet, but gcc is open source, their version of fread() might shine a light on this.
But at this point, it'd be up to the maintainer/s (Orinius et al?) since Pelles-C isn't open source and we can't compare functions.

And John, I'm inclined to agree that the venerable fread() function has been upgraded, I might root through the gcc code to see if it might now look a lot more like yours and Vortex's, might be time to send Orinius a pull request.

Thanks again John for having a look, and to Vortex for your code!

Tony

 
« Last Edit: November 11, 2023, 01:37:40 AM by tony74 »

Offline John Z

  • Member
  • *
  • Posts: 860
Re: fread() - slow on 4GB file
« Reply #7 on: November 11, 2023, 12:33:16 PM »
Hi tony74,

I think the gcc test is significant.
I managed to get VC 6 running and compiled your original DDE as 32 bit
Code: [Select]
C:\Program Files (x86)\Microsoft Visual Studio\Files\fread\Release>fread superbig.exe
 fread() took 0.8 seconds to read a 1.6GB file.
 That's 2058.3MB per sec.
Of course it is MS but nevertheless it is fast too.

I changed your original to use fgetc in Pelles without loading the memory, so almost reflects raw fgetc speed.
Code: [Select]
C:\Program Files\PellesC\Files\big_open>dde EIHH8305.MP4
 File Size: 2.0GB
 Read Time: 92.9sec
Read Rate: 21.9 MB/sec
Shows a hypothesis of fgetc is invalid as far as it goes.

Also tried Pelles C version 10 - it performs similarly to version 12

As this is not open source Pelle does not work by pull requests.  He will review the forum bug reports and if he can reproduce the reported issue he usually looks further into it.  I think you've provided enough that is reproducible.  He's a fairly busy guy so patience is warranted.

Can't think of anything more within my capability ....

John Z

Offline MrBcx

  • Global Moderator
  • Member
  • *****
  • Posts: 189
    • Bcx Basic to C/C++ Translator
Re: fread() - slow on 4GB file
« Reply #8 on: November 11, 2023, 02:29:14 PM »
Tony,

I compiled your test using Pelles, MSVC, Mingw/Clang, and Embarcadero.

I used a 2GB mp4 file to test.

Pelle's took 6 seconds and the others took 1 second.

Dell i7, 16gb, ssd, Win10 Pro
Bcx Basic to C/C++ Translator
https://www.BcxBasicCoders.com

Offline tony74

  • Member
  • *
  • Posts: 34
Re: fread() - slow on 4GB file
« Reply #9 on: November 11, 2023, 04:04:58 PM »
Thanks John, Vortex and MrBcx.

After all the various compilers we've tested, I think it's safe to say there is nothing 'fishy' or 'upgraded' going on with the fread implementation on any of them, they all behave similarly.

That suggests the Pelles-C implementation may differ and should be looked at.

Until then, we have a workaround, thanks to John and Vortex.

Tony

Offline John Z

  • Member
  • *
  • Posts: 860
Re: fread() - slow on 4GB file
« Reply #10 on: November 11, 2023, 04:45:10 PM »
Hi tony74,

Your very welcome, interesting find.

Just for completeness, and since Guru Vortex did it :), attached is the 3rd method using using your base program
modified using Pelles C _open,_read, and _close, and without needing windows.h or Microsoft Extensions. 
However as noted in the help file it is not considered portable.

results:
Code: [Select]
C:\Program Files\PellesC\Files\big_open>big_open.exe EIHH8305.MP4
 fread() took 1.0 seconds to read a 2.0GB file.
 That's 1947.9MB per sec.

Cheers,
John Z

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 2115
Re: fread() - slow on 4GB file
« Reply #11 on: November 12, 2023, 11:10:51 AM »
May the source be with you

Offline John Z

  • Member
  • *
  • Posts: 860
Re: fread() - slow on 4GB file
« Reply #12 on: November 12, 2023, 12:18:02 PM »
Great tip TimoVJL!

Using the original DDE program but adding setvbuf (fp, NULL, _IOFBF, 16384); after fopen and before any actual use has a significant impact.

Code: [Select]
    fp=fopen(fname,"rb");
    if(fp==NULL)      return FILERR;
setvbuf (fp, NULL, _IOFBF, 16384);

Original code output:
Code: [Select]
C:\Program Files\PellesC\Files\big_open>dde EIHH8305.MP4
File Size: 2.0GB
Read Time: 19.3sec
Read Rate: 105.8 MB/sec

Output with added setvbuf:
Code: [Select]
C:\Program Files\PellesC\Files\big_open>dde EIHH8305.MP4
File Size: 2.0GB
Read Time: 1.7sec
Read Rate: 1226.4 MB/sec

More than an order of magnitude improvement.  :)

John Z

Update - just some buffer size testing
Code: [Select]
16K
C:\Program Files\PellesC\Files\big_open>dde EIHH8305.MP4
File Size: 2.0GB
Read Time: 1.7sec
Read Rate: 1226.4 MB/sec

32K
C:\Program Files\PellesC\Files\big_open>dde EIHH8305.MP4
File Size: 2.0GB
Read Time: 1.4sec
Read Rate: 1448.5 MB/sec

64K
C:\Program Files\PellesC\Files\big_open>dde EIHH8305.MP4
File Size: 2.0GB
Read Time: 1.2sec
Read Rate: 1669.0 MB/sec

128K
C:\Program Files\PellesC\Files\big_open>dde EIHH8305.MP4
File Size: 2.0GB
Read Time: 1.2sec
Read Rate: 1731.3 MB/sec

VS 2022
C:\Program Files\PellesC\Files\big_open>vcdde EIHH8305.MP4
File Size: 2.0GB
Read Time: 1.0sec
Read Rate: 1989.7 MB/sec
« Last Edit: November 12, 2023, 01:04:02 PM by John Z »

Offline MrBcx

  • Global Moderator
  • Member
  • *****
  • Posts: 189
    • Bcx Basic to C/C++ Translator
Re: fread() - slow on 4GB file
« Reply #13 on: November 12, 2023, 02:46:46 PM »
Great tip TimoVJL!

Using the original DDE program but adding setvbuf (fp, NULL, _IOFBF, 16384); after fopen and before any actual use has a significant impact.

Thanks Timo and John. 

Adding setvbuf with a 64k buffer size on my 2GB test file dropped the load time from 6 seconds to 1 second.
Bcx Basic to C/C++ Translator
https://www.BcxBasicCoders.com

Offline tony74

  • Member
  • *
  • Posts: 34
Re: fread() - slow on 4GB file
« Reply #14 on: November 14, 2023, 01:22:28 AM »
That's more like it!

Setting the stream buffer to 64k brings read-times down to the same level as all the other compilers we tested. Maybe this will be handled internally on a future update?

Thanks Timo and John, this did the trick!