NO

Author Topic: _msize function causes strange behaviour  (Read 13878 times)

Anonymous

  • Guest
_msize function causes strange behaviour
« on: January 12, 2005, 11:57:20 PM »
Hi Pelle,

Ran into a bit of a problem with _msize from your malloc library.

1) It seems not to know when a pointer is not allocated and returns seemingly random numbers.

2) If you make a mistake and hand it a NULL pointer it crashes the application.

Both these situations should return 0.

Code: [Select]

   
    void* j;
    int     a;
    a = _msize(j);   // returns random numbers
    j = NULL;
    a = _msize(j);   // causes weird behaviour
    j = malloc(1000);
    a = _msize(j);  // returns 1000


I patched the mallocx library to fix the null pointer condition but the first one is still a problem.

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
_msize function causes strange behaviour
« Reply #1 on: January 13, 2005, 02:00:42 AM »
It would be more user-friendly if _msize() handled this, but in the "C spirit", many runtime functions perform very little argument validation. For example, with free() and realloc(), it's documented that you can pass NULL, but passing a pointer *not* coming from a previous malloc(), realloc()... call, will give "unpredictable results" - usually a crash.

If this is good or bad is another matter, but C puts a great deal of responsibility in the hands of the programmer...

( BTW, I tested _msize(NULL) with Microsoft Visual C++ 6.0 - and get a crash there too... )

Pelle
/Pelle

Anonymous

  • Guest
_msize function causes strange behaviour
« Reply #2 on: January 13, 2005, 03:04:22 AM »
Quote from: "Pelle"

If this is good or bad is another matter, but C puts a great deal of responsibility in the hands of the programmer...

( BTW, I tested _msize(NULL) with Microsoft Visual C++ 6.0 - and get a crash there too... )


Ok, then I need to shift gears just a little.  Time to start adding guard code.

Thanks for the heads up on this Pelle... I really don't need something else coming back to bite me the way the SEH in Pascal did.

What I need, then, is some way to validate a pointer...
Is it allocated to something?

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
_msize function causes strange behaviour
« Reply #3 on: January 13, 2005, 03:52:36 AM »
No sure how much help this is, but here goes...

From version 2.90 of Pelles C there are two heaps, one for "small" blocks, one for "large" blocks. The small heap should be fast on all supported Windows version, the big heap will support all available memory. If the small heap is full, the large heap will be used.

The threshold for "small" and "large" can be set through the global variable "__bheap_threshold". Settings this variable to 0 will allocate everything from the large heap (might be useful to know).

The large heap is created through a call to the Win32 API function HeapCreate - and memory allocated through HeapAlloc() etc. I guess it depends on the Windows version, but *maybe* these functions check pointers better?

There are also some useful Win32 API functions for checking pointers: IsBadReadPtr(), IsBadWritePtr() and IsBadStringPtr(). They assume some knowledge about the pointer, so they are not always useful.

Pelle
/Pelle

Anonymous

  • Guest
_msize function causes strange behaviour
« Reply #4 on: January 13, 2005, 03:46:25 PM »
Quote from: "Pelle"
No sure how much help this is, but here goes...

From version 2.90 of Pelles C there are two heaps, one for "small" blocks, one for "large" blocks. The small heap should be fast on all supported Windows version, the big heap will support all available memory. If the small heap is full, the large heap will be used.

The threshold for "small" and "large" can be set through the global variable "__bheap_threshold". Settings this variable to 0 will allocate everything from the large heap (might be useful to know).

The large heap is created through a call to the Win32 API function HeapCreate - and memory allocated through HeapAlloc() etc. I guess it depends on the Windows version, but *maybe* these functions check pointers better?

There are also some useful Win32 API functions for checking pointers: IsBadReadPtr(), IsBadWritePtr() and IsBadStringPtr(). They assume some knowledge about the pointer, so they are not always useful.

Pelle



Well, that certainly explains some of the pointer addresses I was getting...  

One of the things I did like about Delphi was that the memory manager was user replaceable...  They, of course, did it the bloatware way with over 23K of code dedicated to some entirely inscrutable process that ran at about half speed but we could replace it any time we wanted... and I did  :)

If I can dare a suggestion here :wink: ... why not just give us a very basic single heap based memory system in your RTL and expose the heap handle in a global variable so we can either write our own memory managers or replace the heap entirely if we are so inclined?  

It would give us even more flexibility and would no doubt reduce your CRT code complexity somewhat as well.  At that point functions such as malloc calloc etc would actually be nothing but thin wrappers for the windows heap calls.  With access to the heap's handle we would have even better control over memory management through advanced heap calls such as HeapWalk and HeapValidate and, if we were so inclined, we could simply destroy the heap and build our own system...  

For example... here's a "safe copy" routine that could be implemented...
Code: [Select]

void SafeCopy(ptr,oldptr)
  { if ((ptr == NULL) || (!HeapValidate(hheap,0,ptr)))
      ptr = HeapAlloc(HeapSize(hheap,0,oldptr));
     memcpy(ptr,oldptr,HeapSize(hheap,0,oldptr));  }


Note that with heap calls, there's no need to specify the size of the block, or pre-allocate memory.  The subroutine can test and handle it all for us.

I wouldn't be too concerned about the speed issues with older versions... first of all they are older versions and in comparing heap speeds on win 98 and win2k (in delphi) the difference is too minimal to be concerned about.  If I recall correctly it was only a couple of milliseconds on 60,000 10k allocations and destructions... hardly worth the extra code.

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
_msize function causes strange behaviour
« Reply #5 on: January 13, 2005, 04:16:33 PM »
Quote from: "ldblake"
If I can dare a suggestion here :wink: ... why not just give us a very basic single heap based memory system in your RTL and expose the heap handle in a global variable so we can either write our own memory managers or replace the heap entirely if we are so inclined?

I had many complains about the previous memory manager. The problem is that memory allocation can be done in so many different ways: many small blocks, only a few large blocks, and anything in between. The current memory manager is the best "compromise" I can think of. I don't want to go back to complains about "lack of speed" or "can't allocate enough memory".

On the other hand, nothing forces you to use the standard C runtime functions. You can write replacement functions called malloc(), realloc() and free() - the only trick is to make sure the linker sees them before the normal C runtime library (but it *is* doable).

The C runtime library is a "convenience" - it contain functions that are likely to be used over and over, but as I said - you are not required to use them (unlike in higher level languages).

I have written several programs that only uses the Win32 API - never any C runtime functions. If you are only interested in Windows, this can work very well. If you have ambitions about moving the C program to a different platform (without Windows), it's of course better to use the standard C runtime functions.

Pelle
/Pelle

Anonymous

  • Guest
_msize function causes strange behaviour
« Reply #6 on: January 13, 2005, 05:23:37 PM »
Quote from: "Pelle"
I had many complains about the previous memory manager. The problem is that memory allocation can be done in so many different ways: many small blocks, only a few large blocks, and anything in between. The current memory manager is the best "compromise" I can think of. I don't want to go back to complains about "lack of speed" or "can't allocate enough memory".


Understood... :)  

This is just a thought... if the memory manager was in a separate library, (say nemman.lib) we could have a #pragma mem_manager "mymm.lib"  that would allow us to specify any memory manager we like.  So long as a certain set of basic calls were present in the library (meminit, malloc, calloc, realloc, free etc.) we're off and running. But, then, by writing custom libs we could also extend that basic functionality by adding new calls, using differing management schemes, etc. It would, of course, have to default to the standard manager when the pragma is not used.  

It's not a big deal, I'm happy in either case... but the added facility would be nice...

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
_msize function causes strange behaviour
« Reply #7 on: January 13, 2005, 06:34:38 PM »
Quote from: "ldblake"
This is just a thought... if the memory manager was in a separate library, (say nemman.lib) we could have a #pragma mem_manager "mymm.lib"  that would allow us to specify any memory manager we like.

Yes, this could of course be useful. I don't think the final word is said about the memory manager. After the changes in 2.90, at the moment I'm just happy not receiving too many complains about it... ;)

I guess the ambitious programmer also can do something like this:
(in a temporary directory)
1. polib -explode crt.lib (or crtmt.lib)
2. replace malloc.obj, realloc.obj, free.obj, calloc.obj ... with own implementations
3. polib *.obj -out:crt.lib
4. replacing the existing copy of crt.lib with the one created in step 3 (probably making a copy first)

Another way is to compile the attached C file, and then add the *obj file* to the list of files to link for a project - by not putting the obj file in a library, the linker will see (and use) this file before any functions from the C runtime library (libraries are searched *after* any specified obj files). The use of __bheapinit() is special, and required for Pelles C at the moment (2.90 and 3.00). It's not perfect, but works for now.

Pelle
/Pelle

Anonymous

  • Guest
_msize function causes strange behaviour
« Reply #8 on: January 13, 2005, 07:25:12 PM »
Quote from: "Pelle"
I guess the ambitious programmer also can do something like this:
(in a temporary directory)
1. polib -explode crt.lib (or crtmt.lib)
2. replace malloc.obj, realloc.obj, free.obj, calloc.obj ... with own implementations
3. polib *.obj -out:crt.lib
4. replacing the existing copy of crt.lib with the one created in step 3 (probably making a copy first)

(absolutely making a copy first!)

So, if I'm understanding this...  It would also be possible to create a crt.lib without the memory functions included, and move them over to a separate library that could be added at the linking step???

__bheapinit and __bheapterm are the initializers for the heap and could also be moved to a separate library?  

Would it then be possible to specify my alternate CRT and MEMMGR libs somehow at compile time?   (#pragma nodefaultlib... #pragma lib...)

Or am I off in la-la land again?


Quote

Another way is to compile the attached C file, and then add the *obj file* to the list of files to link for a project - by not putting the obj file in a library, the linker will see (and use) this file before any functions from the C runtime library (libraries are searched *after* any specified obj files). The use of __bheapinit() is special, and required for Pelles C at the moment (2.90 and 3.00). It's not perfect, but works for now.
Pelle


Ummm... something you'll learn about me... I'm never in a hurry.  I'd rather have a perminent solution than a patchwork one.   For example I had the errorx and mallocx libs done a month before I uploaded them,  I was waiting patiently for the bugfix in the exception handler.

The OBJ file idea is cunning, but is it the best answer?

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
_msize function causes strange behaviour
« Reply #9 on: January 13, 2005, 08:54:21 PM »
Quote from: "ldblake"
So, if I'm understanding this...  It would also be possible to create a crt.lib without the memory functions included, and move them over to a separate library that could be added at the linking step???

Yes, this should work.

Quote from: "ldblake"

__bheapinit and __bheapterm are the initializers for the heap and could also be moved to a separate library?  

Yes, this should also work.

Quote from: "ldblake"

Would it then be possible to specify my alternate CRT and MEMMGR libs somehow at compile time?   (#pragma nodefaultlib... #pragma lib...)

Yes, this *should* work. Since I havn't actually tested this, it's stupid saying it *will* work... but it should...

Quote from: "ldblake"

The OBJ file idea is cunning, but is it the best answer?

I took some ideas from the top of my head, just to show you what is possible, while still using function names like malloc().

Another approach, that works for me, is to use my_alloc(), my_free() - which are just wrappers around the memory manager I prefer to use (malloc, HeapAlloc, SuperDuperMalloc). There are a few C runtime functions that will return a buffer allocated through malloc(), but they aren't that many, and I tend not to use them anyway. And it's not that hard writing replacements for functions like _strdup().

Pelle
/Pelle

Anonymous

  • Guest
_msize function causes strange behaviour
« Reply #10 on: January 13, 2005, 10:07:08 PM »
Quote from: "Pelle"
Quote from: "ldblake"

__bheapinit and __bheapterm are the initializers for the heap and could also be moved to a separate library?  

Yes, this should also work.


Are they the only heap initializers?  
You mentioned having 2 heaps going on...

Quote
Yes, this *should* work. Since I havn't actually tested this, it's stupid saying it *will* work... but it should...


Ok... so it sounds like it's at least worth testing...

Quote

Another approach, that works for me, is to use my_alloc(), my_free() - which are just wrappers around the memory manager I prefer to use (malloc, HeapAlloc, SuperDuperMalloc). There are a few C runtime functions that will return a buffer allocated through malloc(), but they aren't that many, and I tend not to use them anyway. And it's not that hard writing replacements for functions like _strdup().


Yes, that's what I did with XMalloc etc.  they're just wrappers around your standard functions, that test the result before returning and generate exceptions if an error occurs.

Ok, one more question before I head off to make a mess of things -- err, I mean try this.  If I do move those functions out to a separate lib, can I still use the same names... but replace their content.  i.e. if they externally behave as expected the other functions that use them won't be grumbling at me, will they?

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
_msize function causes strange behaviour
« Reply #11 on: January 13, 2005, 10:35:29 PM »
Quote from: "ldblake"
Are they the only heap initializers?  
You mentioned having 2 heaps going on...

Yes, the other heap is initialized when needed (internally). I need  __bheapinit() called very early, because the startup code needs to  allocate some memory. The dependency on __bheapinit() is a bit messy, and should probably be removed in the future.

Quote from: "ldblake"

Ok, one more question before I head off to make a mess of things -- err, I mean try this.  If I do move those functions out to a separate lib, can I still use the same names... but replace their content.  i.e. if they externally behave as expected the other functions that use them won't be grumbling at me, will they?

No problems - malloc() & friends are like any other functions - they just happen to have standardized names. What complicates things is that if you have more than one malloc() function, the linker will pick the first malloc() it finds - most likely from crt.lib. If you can find a way around this, with the proper use of #pragma nodefaultlib, and #pragma lib "my_memory_functions.lib" you might not even have to mess with pulling apart crt.lib...

Bottom line: this require some time & testing...

Pelle
/Pelle

Anonymous

  • Guest
_msize function causes strange behaviour
« Reply #12 on: January 14, 2005, 01:00:23 AM »
Quote from: "Pelle"
Quote from: "ldblake"
Are they the only heap initializers?  
You mentioned having 2 heaps going on...

Yes, the other heap is initialized when needed (internally). I need  __bheapinit() called very early, because the startup code needs to  allocate some memory. The dependency on __bheapinit() is a bit messy, and should probably be removed in the future.


Ok... I have looked at the lib and the startup sequence... it looks like it's setting 2megs for the small heap and about 4k for the big one...  I've found out where the initialization routines and heap calls etc. are (_bigheap.obj, _dflheap.obj and _getmem.obj)  and I've had a look at the standard functions...  I *think* I know what's going on.  

I looks like I will need to replace bigheap.obj as well as the malloc, calloc, realloc and free standard functions.  No problem...  provided the other crt functions that call on the memory allocator call malloc instead of going directly to __bheap_alloc (etc.) I can probably even combine the whole thing into a single external lib.

The only thing that would stop me is if the small heap ends up wasting memory, even if it's never called...  That would rather undo the whole idea, wouldn't it?   I gather that I can prevent it from ever being called by writing it out of the malloc (etc.) produres and setting __bheap_threshold to 0?

The advantage of this method is that we can still use a dual heap memory manager... it will just be in a different library...  OR if someone wants to write a garbage collector they can... OR... if someone wants to use local alloc and virtual alloc based management, they can... OR...    It all depends what code they hide behind the standard calls...

If I can get this to work it would be possible to eliminate all memory management from the main CRT library and all you would need was a call to an external  __bheap_init function to kick start whatever is in the memory manager library...  

If this can be made to work, I may need to ask you to create a #pragma to let me explicitly call in outside managers.  It could still default to your existing 2  tiered system, except that it would be in an external lib (eg."memmgr.lib") or through the pragma we could tell it to use a custom manager in a different lib (eg. #pragma memmgr "mymem.lib")

But before we do any of that there's the testing and hairpulling phase  :D

Quote

Quote from: "ldblake"

Ok, one more question before I head off to make a mess of things -- err, I mean try this.  If I do move those functions out to a separate lib, can I still use the same names... but replace their content.  i.e. if they externally behave as expected the other functions that use them won't be grumbling at me, will they?

No problems - malloc() & friends are like any other functions - they just happen to have standardized names. What complicates things is that if you have more than one malloc() function, the linker will pick the first malloc() it finds - most likely from crt.lib.

Yeah that's what I'm worried about... beyond the matter of redundency it could cause some truly strange behaviour.

Quote

If you can find a way around this, with the proper use of #pragma nodefaultlib, and #pragma lib "my_memory_functions.lib" you might not even have to mess with pulling apart crt.lib...


Hmmm... interesting thought, but I can't see how I'd do that...
Are you thinking that so long as my lib is specified first then crt.lib after?

Code: [Select]

#pragma nodefaultlib
#pragama lib "mymm.lib"
#pragama lib "crt.lib"


Like that?

Quote

Bottom line: this require some time & testing...


No kidding...  Looks like I've really stepped in it this time...

Since I'm messing with core functionality, you can bet I will be letting you see and test this before I ever put it to use...

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
_msize function causes strange behaviour
« Reply #13 on: January 14, 2005, 01:36:35 AM »
Quote from: "ldblake"
Ok... I have looked at the lib and the startup sequence... it looks like it's setting 2megs for the small heap and about 4k for the big one...  I've found out where the initialization routines and heap calls etc. are (_bigheap.obj, _dflheap.obj and _getmem.obj)  and I've had a look at the standard functions...  I *think* I know what's going on.  

The small heap should reserve 0x2000000 bytes - 32 MB if i'm not mistaken. This will happen on the first call to malloc() or realloc() when the size is below the threshold (1024 bytes).

Quote from: "ldblake"

I looks like I will need to replace bigheap.obj as well as the malloc, calloc, realloc and free standard functions.  No problem...  provided the other crt functions that call on the memory allocator call malloc instead of going directly to __bheap_alloc (etc.) I can probably even combine the whole thing into a single external lib.

malloc(), realloc(), free(), _msize() are the only ones calling _bheap_* functions. All other runtime functions should be polite and call malloc().

Quote from: "ldblake"

The only thing that would stop me is if the small heap ends up wasting memory, even if it's never called...  That would rather undo the whole idea, wouldn't it?   I gather that I can prevent it from ever being called by writing it out of the malloc (etc.) produres and setting __bheap_threshold to 0?

It will only be initialized on the first call, when the block size is less than __bheap_threshold - 1024 by default. If you make sure this never happens, the small heap should not be visible in any way.

Quote from: "ldblake"

If this can be made to work, I may need to ask you to create a #pragma to let me explicitly call in outside managers.

OK.

Quote from: "ldblake"

Hmmm... interesting thought, but I can't see how I'd do that...
Are you thinking that so long as my lib is specified first then crt.lib after?

Code: [Select]

#pragma nodefaultlib
#pragama lib "mymm.lib"
#pragama lib "crt.lib"


Like that?

That was my idea - looking in the compiler sources, I see that (the equivalent of) #pragma lib "crt.lib" is emitted to the obj file at the end of the compilation, so it *should* be possible to squeeze in a #pragma lib "mymm.lib" before this. This should mean that the linker will see/try mymm.lib before crt.lib...

Pelle
/Pelle

Anonymous

  • Guest
_msize function causes strange behaviour
« Reply #14 on: January 14, 2005, 02:33:08 AM »
Quote
That was my idea - looking in the compiler sources, I see that (the equivalent of) #pragma lib "crt.lib" is emitted to the obj file at the end of the compilation, so it *should* be possible to squeeze in a #pragma lib "mymm.lib" before this. This should mean that the linker will see/try mymm.lib before crt.lib...


Ok... I think I've got a handle on this.  I'm gonna spend tomorrow messing with it and see if I can't implement an external memory manager.

For now I'm only going to experiment with crt.lib, the multi-tasking version should be easy to extrapolate from the simpler one... (I hope?!?)

I'll send you the documentation, sources and test lib... or admit defeat, depending on the outcome :lol: