NO

Author Topic: Problem with inline assembly and SSE  (Read 32776 times)

dancho

  • Guest
Problem with inline assembly and SSE
« on: October 08, 2007, 07:05:20 PM »
Hi all,
so these days Im learning something about inline assembly and MMX,SSE SIMD instructions,
and this little snippet working just fine in PellesC...

Code: [Select]
#include <stdio.h>

struct cVector
{
float x,y,z;
};

int main()
{
struct cVector vec1;
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

     __asm {
      movups xmm1, vec1
      mulps xmm1, xmm1
      movups vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

and after this tutorial  encourages you to play with this code and try to change something and
look and analyse results...

So I decide to use movaps (Move Aligned Packed Single) which has to work with aligned adress (16 byte),
and I wrote this:

Code: [Select]
#include <stdio.h>

#pragma pack(16)
struct cVector
{
float x,y,z;
};
#pragma pack()


int main()
{
struct cVector vec1;
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

     __asm {
      movaps xmm1, vec1
      mulps xmm1, xmm1
      movaps vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

I can compile it and buid exe without any problems,but
it crashes at run...

so help plz,what am I doing wrong?

btw,
yes my CPU support SSE...

Synfire

  • Guest
Re: Problem with inline assembly and SSE
« Reply #1 on: October 09, 2007, 05:08:44 AM »
I played around with this a little bit, but was unable to get it to work. Normally you would use something along these lines:

Code: [Select]
#include <stdio.h>

/* Multi-Compiler Alignment */
#ifdef __GNUC__
 #define m_ALIGN(n) __attribute__ ((aligned(n)))
#elif defined(_MSC_VER) || defined(__POCC__)
 #define m_ALIGN(n) __declspec(align(n))
#else
 #define m_ALIGN(n)
#endif

typedef m_ALIGN(16) struct { float x, y, z; } cVector;

int main( )
{
cVector vec1;
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

__asm {
      movaps xmm1, vec1
      mulps xmm1, xmm1
      movaps vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

But after looking over your code I've noticed that the __declspec(align()) option doesn't do anything anymore (it used to). I would say this is most likely a bug you've stumbled upon, maybe Pelle can give you some more insight. As far as I can tell, that *should* be working...

[EDIT] For Pelle: Debugging shows that the __declspec(align(16)) doesn't change the cVector's alignment at all (even if removed), I decided to dig deeper into this so I removed the DECLSPEC_ALIGN(16) from the _CONTEXT struct in winnt.h and built default source that assigned values to it. Checking it out in the PellesC debug view showed that the alignment never changed. I re-added DECLSPEC_ALIGN(1), no difference, then I put it back to normal as not to corrupt my install. It seems that the align() option isn't working at all. I hope this feedback helps in some way.[/EDIT]

Regards,
Bryant Keller
« Last Edit: October 09, 2007, 05:10:34 AM by Synfire »

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #2 on: October 09, 2007, 08:02:27 AM »
I tried using malloc, but however one aligns the struct movaps causes an exception.

movups works ok.

John

dancho

  • Guest
Re: Problem with inline assembly and SSE
« Reply #3 on: October 09, 2007, 09:49:51 AM »
thx Synfire and JohnF for the effort in dealing with this bug...

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: Problem with inline assembly and SSE
« Reply #4 on: October 12, 2007, 06:05:38 PM »
I also made some checks and the result is that PellesC doesnt'have any operator for objects alignement.
The operators:
  • #pragma pack(n)
  • __declspec(align(n))
Only work for the alignement inside complex variables (like structures).
I also tryied creating a new data section with the "#pragma data_seg("SSE")", but the alignement remains on DWORD boundary (4 bytes).
Memory returned from malloc, calloc and the like, is always on a DWORD boundary.
So while there is formally no bug in the compiler, the absence of an alignement operator is relevant.
Moreover the affirmation that each memory returned from the system is "aligned for any kind of data" is not strictly correct toward the extended SSE and MMX instructions.
On the other hand this kind of code could be created in an assembler module where the ALIGN operator should be correctly working.
So do we really need an alignement operator on the 'C' side? If yes we should ask for a new feature.
« Last Edit: October 12, 2007, 06:07:47 PM by frankie »
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

Greg

  • Guest
Re: Problem with inline assembly and SSE
« Reply #5 on: October 12, 2007, 08:02:55 PM »
From the help file:

Quote
__declspec( align(n) ) Used to set alignment for an entire structure or union (not the members). The argument n must be power of 2. [2.80]

Example:
typedef struct __declspec(align(16)) mystruct {
    double mm[2];
} mystruct;

This should do it, but I can't get it to work.

BTW, ALIGN in POASM works fine, but you still need a way to align structures and unions in C.
 
« Last Edit: October 12, 2007, 08:51:37 PM by Greg »

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #6 on: October 12, 2007, 09:50:26 PM »
With regard to using malloc, I shifted the start address of the struct so that all alignment within 16 bytes was tried.

So I'm not sure what is going on.

John

Synfire

  • Guest
Re: Problem with inline assembly and SSE
« Reply #7 on: October 13, 2007, 03:49:23 AM »
Frankie,

This can't really be considered a feature request because I know the m_ALIGN() macro has worked in the past. I believe this to be a recent occurance, but since I've not tried to verify the results of m_ALIGN() in some time now I'm not sure as to when the align() property of __declspec() stopped working. So this is either a bug, or a no longer supported feature in which Pelle forgot to remove from the help documentation. Neither personally effects me much as I don't use SSE that often and when I use ASM and C I generally avoid inline code.

JohnF,

I haven't messed with this since the other day, I've been a bit swamped with other work. Did you make sure you allocated enough memory for the shift as well as the structure? If not MOVAPS would most likely fail for trying to access memory that is unallocated, same errors would appear for unaligned memory access in most debuggers. If I get some time, or maybe someone else will be kind enough to do it, there is a procedure that's called _mm_malloc() [or something around those lines] which is used for creating aligned memory allocation in C, PellesC doesn't have a version of it but it's most commonly used to allocate structures for use with MOVAPS and it might be worth porting for the sake of sse/pellesc.

Synfire

  • Guest
Re: Problem with inline assembly and SSE
« Reply #8 on: October 13, 2007, 05:02:15 AM »
dancho,

I found *a* solution. Turns out there are already _mm_malloc() and _mm_free() equiviliants for windows as part of the API, they are in msvcrt.lib and you can use them like so:

Code: [Select]
#include <stdio.h>

void * __cdecl _aligned_malloc( size_t _Size, size_t _Alignment );
void __cdecl _aligned_free( void * _Memory );

typedef struct { float x, y, z; } cVector;

int main( )
{
cVector* vec1;
vec1 = _aligned_malloc( sizeof(cVector), 16 );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
_aligned_free( vec1 );
return( 0 );
}

The code has been ran and tested. It executes without errors, it produces the same output as the MOVUPS version, and I've checked the alignment and everything is right. Now if we can just get the __declspec(align()) variant working all will be good. :P

Regards,
Bryant Keller

note: you have to make sure you are importing msvcrt.lib into your project, this will give a warning:
'POLINK: warning: Multiple '.rdata' sections found with different flags (0x40000040 and 0xc0000040)'
Just ignore that, it happens when you import msvcrt.lib into a project that uses pocrt.lib (default library). There is a #pragma statement that will hide this warning but I can't remember it right off hand.
« Last Edit: October 13, 2007, 05:04:27 AM by Synfire »

Greg

  • Guest
Re: Problem with inline assembly and SSE
« Reply #9 on: October 13, 2007, 05:30:49 AM »
Bryant,

Good workaround, I had forgotten about those functions. __declspec(align(n)) does  seem to be broken. Pelle hasn't posted here since August 21, 2007.
 

Synfire

  • Guest
Re: Problem with inline assembly and SSE
« Reply #10 on: October 13, 2007, 06:00:25 AM »
Greg,

The arguments are exactly the same as _mm_malloc() and _mm_free(), it'd probably be worth it to create an #ifdef statement in malloc.h to define _mm_malloc() and _mm_free() as macros which call _aligned_malloc() and _aligned_free() respectively, for portability. As for Pelle, it's getting pretty late in the year, he might be on vacation or something. Admins/Developers do that from time to time. ;) If so, it could wait till he got back.

Actually, this is probably a better method considering the naming of his structure ("vector"). API functions allocate through HeapAlloc which is generally prefered over storing values on the stack, when you are talking about game programming anyways. Of course I'm not a game programmer, I've just got a few friends who are determined to annoy me with game/code discussions until they think I will convert. (heh) Reasoning behind the heap vs. stack deal is that the stack is limited to 1MB whereas the heap has a theoretical 1GB limit, I say "theoretical" because there are workarounds for the 1GB limit. In games/3D models you normally deal with a large amount of vectors, the more space you allow for them the better. So HeapAlloc is generally prefered over the stack in the case of any heavy graphics code dealing with vectors.

Stops his babbling...

Anyways, we'll see when Pelle (gets back||becomes active) if a fix for __declspec(align()) is in our future.

Regards,
Bryant Keller

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #11 on: October 13, 2007, 08:23:39 AM »
Well, the problem seems to be that I was using the wrong asm supplied by the original poster. My bad!

I was using this
Code: [Select]
    __asm {
      movups xmm1, vec1
      mulps xmm1, xmm1
      movups vec1, xmm1
}

Dancho has it right with this
edit Bryant has it right with this
Code: [Select]
__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

The PelesC malloc does return addresses aligned to 16 bytes - however, the msvcrt.lib does not, so when using msvcrt.lib one has to use _aligned_malloc.

Here it is working using PellesC malloc and MS _aligned_malloc.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>

struct  cVector{ float x, y, z; };

//#define MS

#ifndef MS

int main(void)
{
struct cVector *vec1 = malloc(sizeof(struct cVector));
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm
{
mov ecx, vec1
movaps xmm1,[ecx]
mulps xmm1, xmm1
movaps[ecx], xmm1
}

printf("PO lib %f %f %f\n", vec1->x, vec1->y, vec1->z);
return (0);
}

#else

#pragma lib "msvcrt.lib"
void * __cdecl _aligned_malloc( size_t _Size, size_t _Alignment );
void __cdecl _aligned_free( void * _Memory );

int main(void)
{
struct cVector* vec1;
vec1 = _aligned_malloc( sizeof(struct cVector), 16 );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "MS lib %f %f %f\n", vec1->x, vec1->y, vec1->z );
_aligned_free( vec1 );
return( 0 );
}
#endif

Of course we still want to do it by aligning the struct without using malloc.

John
« Last Edit: October 13, 2007, 10:03:48 AM by JohnF »

Synfire

  • Guest
Re: Problem with inline assembly and SSE
« Reply #12 on: October 13, 2007, 09:23:25 AM »
John,

The change in the assembly is because when using malloc we are working with a pointer so we must dereference the variable before accessing it's value. I did this through the ecx register. If we were not to access the structure as a pointer, then there would be no need to dereference it through a register and dancho's method of 'MOVAPS xmm1, vec1' would be correct. It was my modification of the code, changing it to a pointer, which required the change of the inline assembly code.

It seems you are right about malloc(). From what I've always been familiar with, you aren't supposed to trust malloc's return to be aligned to any specific location but the msvcrt version is set to align on a 16 byte boundary by default. Microsoft's page on malloc doesn't specify this, but they have a (not so easy to find) page which comments on it - http://msdn2.microsoft.com/en-us/library/ycsb6wwf(vs.80).aspx

Apparently, _aligned_malloc() is only for cases where you wish to explicity specify the alignment of allocated memory yourself. That being said, I think you are wrong about the PellesC version of malloc(). When I build the following code, it crashes.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>

typedef struct { float x, y, z; } cVector;

int main( )
{
cVector* vec1;
vec1 = malloc( sizeof(cVector) );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
free( vec1 );
return( 0 );
}

But if I change the code to the following, it will run without errors.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#pragma lib "msvcrt.lib"

typedef struct { float x, y, z; } cVector;

int main( )
{
cVector* vec1;
vec1 = malloc( sizeof(cVector) );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
free( vec1 );
return( 0 );
}

Changing it to the MS Visual C Runtime version of malloc means that the data is aligned on a 16 byte boundary as specified in the page I linked you to above and as you expressed yourself would happen. But without it, it crashes because malloc does not align on a 16 byte boundary.

I would say about all we can do now is just wait and see if Pelle has a solution regarding the structure alignment. Also, while it's being discussed, I would like to make a feature request for __declspec(align()) to work on variables. Like for example:

Code: [Select]
__declspec(align(16)) float myFloat[4];
I think this would be an extremely useful feature. If it's too much trouble don't worry about it, but it would definately be nice to have for optimizing C routines by allowing manual insurance of stack alignment.

Regards,
Bryant Keller

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #13 on: October 13, 2007, 09:32:35 AM »
Just tried your code, that PellesC malloc works ok here.

If I use the PellesC malloc the return address is 410550, if I use the MS lib malloc returns 3224B8.

Maybe these functions behave differently on different machines, but that would be hard to believe.

John
« Last Edit: October 13, 2007, 09:48:25 AM by JohnF »

dancho

  • Guest
Re: Problem with inline assembly and SSE
« Reply #14 on: October 13, 2007, 09:54:12 AM »
one BIG THX to all,
now i have a working solution for inline assembly...

/OT
and I thought that I know C pretty good but
all this conversation reminds me how much (little) I actually know
OT