Pelles C forum

Assembly language => Assembly discussions => Topic started by: dancho on October 08, 2007, 07:05:20 PM

Title: Problem with inline assembly and SSE
Post by: dancho on October 08, 2007, 07:05:20 PM
Hi all,
so these days Im learning something about inline assembly and MMX,SSE SIMD instructions,
and this little snippet working just fine in PellesC...

Code: [Select]
#include <stdio.h>

struct cVector
{
float x,y,z;
};

int main()
{
struct cVector vec1;
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

     __asm {
      movups xmm1, vec1
      mulps xmm1, xmm1
      movups vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

and after this tutorial  encourages you to play with this code and try to change something and
look and analyse results...

So I decide to use movaps (Move Aligned Packed Single) which has to work with aligned adress (16 byte),
and I wrote this:

Code: [Select]
#include <stdio.h>

#pragma pack(16)
struct cVector
{
float x,y,z;
};
#pragma pack()


int main()
{
struct cVector vec1;
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

     __asm {
      movaps xmm1, vec1
      mulps xmm1, xmm1
      movaps vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

I can compile it and buid exe without any problems,but
it crashes at run...

so help plz,what am I doing wrong?

btw,
yes my CPU support SSE...
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on October 09, 2007, 05:08:44 AM
I played around with this a little bit, but was unable to get it to work. Normally you would use something along these lines:

Code: [Select]
#include <stdio.h>

/* Multi-Compiler Alignment */
#ifdef __GNUC__
 #define m_ALIGN(n) __attribute__ ((aligned(n)))
#elif defined(_MSC_VER) || defined(__POCC__)
 #define m_ALIGN(n) __declspec(align(n))
#else
 #define m_ALIGN(n)
#endif

typedef m_ALIGN(16) struct { float x, y, z; } cVector;

int main( )
{
cVector vec1;
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

__asm {
      movaps xmm1, vec1
      mulps xmm1, xmm1
      movaps vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

But after looking over your code I've noticed that the __declspec(align()) option doesn't do anything anymore (it used to). I would say this is most likely a bug you've stumbled upon, maybe Pelle can give you some more insight. As far as I can tell, that *should* be working...

[EDIT] For Pelle: Debugging shows that the __declspec(align(16)) doesn't change the cVector's alignment at all (even if removed), I decided to dig deeper into this so I removed the DECLSPEC_ALIGN(16) from the _CONTEXT struct in winnt.h and built default source that assigned values to it. Checking it out in the PellesC debug view showed that the alignment never changed. I re-added DECLSPEC_ALIGN(1), no difference, then I put it back to normal as not to corrupt my install. It seems that the align() option isn't working at all. I hope this feedback helps in some way.[/EDIT]

Regards,
Bryant Keller
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 09, 2007, 08:02:27 AM
I tried using malloc, but however one aligns the struct movaps causes an exception.

movups works ok.

John
Title: Re: Problem with inline assembly and SSE
Post by: dancho on October 09, 2007, 09:49:51 AM
thx Synfire and JohnF for the effort in dealing with this bug...
Title: Re: Problem with inline assembly and SSE
Post by: frankie on October 12, 2007, 06:05:38 PM
I also made some checks and the result is that PellesC doesnt'have any operator for objects alignement.
The operators:
Only work for the alignement inside complex variables (like structures).
I also tryied creating a new data section with the "#pragma data_seg("SSE")", but the alignement remains on DWORD boundary (4 bytes).
Memory returned from malloc, calloc and the like, is always on a DWORD boundary.
So while there is formally no bug in the compiler, the absence of an alignement operator is relevant.
Moreover the affirmation that each memory returned from the system is "aligned for any kind of data" is not strictly correct toward the extended SSE and MMX instructions.
On the other hand this kind of code could be created in an assembler module where the ALIGN operator should be correctly working.
So do we really need an alignement operator on the 'C' side? If yes we should ask for a new feature.
Title: Re: Problem with inline assembly and SSE
Post by: Greg on October 12, 2007, 08:02:55 PM
From the help file:

Quote
__declspec( align(n) ) Used to set alignment for an entire structure or union (not the members). The argument n must be power of 2. [2.80]

Example:
typedef struct __declspec(align(16)) mystruct {
    double mm[2];
} mystruct;

This should do it, but I can't get it to work.

BTW, ALIGN in POASM works fine, but you still need a way to align structures and unions in C.
 
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 12, 2007, 09:50:26 PM
With regard to using malloc, I shifted the start address of the struct so that all alignment within 16 bytes was tried.

So I'm not sure what is going on.

John
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on October 13, 2007, 03:49:23 AM
Frankie,

This can't really be considered a feature request because I know the m_ALIGN() macro has worked in the past. I believe this to be a recent occurance, but since I've not tried to verify the results of m_ALIGN() in some time now I'm not sure as to when the align() property of __declspec() stopped working. So this is either a bug, or a no longer supported feature in which Pelle forgot to remove from the help documentation. Neither personally effects me much as I don't use SSE that often and when I use ASM and C I generally avoid inline code.

JohnF,

I haven't messed with this since the other day, I've been a bit swamped with other work. Did you make sure you allocated enough memory for the shift as well as the structure? If not MOVAPS would most likely fail for trying to access memory that is unallocated, same errors would appear for unaligned memory access in most debuggers. If I get some time, or maybe someone else will be kind enough to do it, there is a procedure that's called _mm_malloc() [or something around those lines] which is used for creating aligned memory allocation in C, PellesC doesn't have a version of it but it's most commonly used to allocate structures for use with MOVAPS and it might be worth porting for the sake of sse/pellesc.
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on October 13, 2007, 05:02:15 AM
dancho,

I found *a* solution. Turns out there are already _mm_malloc() and _mm_free() equiviliants for windows as part of the API, they are in msvcrt.lib and you can use them like so:

Code: [Select]
#include <stdio.h>

void * __cdecl _aligned_malloc( size_t _Size, size_t _Alignment );
void __cdecl _aligned_free( void * _Memory );

typedef struct { float x, y, z; } cVector;

int main( )
{
cVector* vec1;
vec1 = _aligned_malloc( sizeof(cVector), 16 );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
_aligned_free( vec1 );
return( 0 );
}

The code has been ran and tested. It executes without errors, it produces the same output as the MOVUPS version, and I've checked the alignment and everything is right. Now if we can just get the __declspec(align()) variant working all will be good. :P

Regards,
Bryant Keller

note: you have to make sure you are importing msvcrt.lib into your project, this will give a warning:
'POLINK: warning: Multiple '.rdata' sections found with different flags (0x40000040 and 0xc0000040)'
Just ignore that, it happens when you import msvcrt.lib into a project that uses pocrt.lib (default library). There is a #pragma statement that will hide this warning but I can't remember it right off hand.
Title: Re: Problem with inline assembly and SSE
Post by: Greg on October 13, 2007, 05:30:49 AM
Bryant,

Good workaround, I had forgotten about those functions. __declspec(align(n)) does  seem to be broken. Pelle hasn't posted here since August 21, 2007.
 
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on October 13, 2007, 06:00:25 AM
Greg,

The arguments are exactly the same as _mm_malloc() and _mm_free(), it'd probably be worth it to create an #ifdef statement in malloc.h to define _mm_malloc() and _mm_free() as macros which call _aligned_malloc() and _aligned_free() respectively, for portability. As for Pelle, it's getting pretty late in the year, he might be on vacation or something. Admins/Developers do that from time to time. ;) If so, it could wait till he got back.

Actually, this is probably a better method considering the naming of his structure ("vector"). API functions allocate through HeapAlloc which is generally prefered over storing values on the stack, when you are talking about game programming anyways. Of course I'm not a game programmer, I've just got a few friends who are determined to annoy me with game/code discussions until they think I will convert. (heh) Reasoning behind the heap vs. stack deal is that the stack is limited to 1MB whereas the heap has a theoretical 1GB limit, I say "theoretical" because there are workarounds for the 1GB limit. In games/3D models you normally deal with a large amount of vectors, the more space you allow for them the better. So HeapAlloc is generally prefered over the stack in the case of any heavy graphics code dealing with vectors.

Stops his babbling...

Anyways, we'll see when Pelle (gets back||becomes active) if a fix for __declspec(align()) is in our future.

Regards,
Bryant Keller
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 13, 2007, 08:23:39 AM
Well, the problem seems to be that I was using the wrong asm supplied by the original poster. My bad!

I was using this
Code: [Select]
    __asm {
      movups xmm1, vec1
      mulps xmm1, xmm1
      movups vec1, xmm1
}

Dancho has it right with this
edit Bryant has it right with this
Code: [Select]
__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

The PelesC malloc does return addresses aligned to 16 bytes - however, the msvcrt.lib does not, so when using msvcrt.lib one has to use _aligned_malloc.

Here it is working using PellesC malloc and MS _aligned_malloc.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>

struct  cVector{ float x, y, z; };

//#define MS

#ifndef MS

int main(void)
{
struct cVector *vec1 = malloc(sizeof(struct cVector));
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm
{
mov ecx, vec1
movaps xmm1,[ecx]
mulps xmm1, xmm1
movaps[ecx], xmm1
}

printf("PO lib %f %f %f\n", vec1->x, vec1->y, vec1->z);
return (0);
}

#else

#pragma lib "msvcrt.lib"
void * __cdecl _aligned_malloc( size_t _Size, size_t _Alignment );
void __cdecl _aligned_free( void * _Memory );

int main(void)
{
struct cVector* vec1;
vec1 = _aligned_malloc( sizeof(struct cVector), 16 );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "MS lib %f %f %f\n", vec1->x, vec1->y, vec1->z );
_aligned_free( vec1 );
return( 0 );
}
#endif

Of course we still want to do it by aligning the struct without using malloc.

John
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on October 13, 2007, 09:23:25 AM
John,

The change in the assembly is because when using malloc we are working with a pointer so we must dereference the variable before accessing it's value. I did this through the ecx register. If we were not to access the structure as a pointer, then there would be no need to dereference it through a register and dancho's method of 'MOVAPS xmm1, vec1' would be correct. It was my modification of the code, changing it to a pointer, which required the change of the inline assembly code.

It seems you are right about malloc(). From what I've always been familiar with, you aren't supposed to trust malloc's return to be aligned to any specific location but the msvcrt version is set to align on a 16 byte boundary by default. Microsoft's page on malloc doesn't specify this, but they have a (not so easy to find) page which comments on it - http://msdn2.microsoft.com/en-us/library/ycsb6wwf(vs.80).aspx

Apparently, _aligned_malloc() is only for cases where you wish to explicity specify the alignment of allocated memory yourself. That being said, I think you are wrong about the PellesC version of malloc(). When I build the following code, it crashes.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>

typedef struct { float x, y, z; } cVector;

int main( )
{
cVector* vec1;
vec1 = malloc( sizeof(cVector) );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
free( vec1 );
return( 0 );
}

But if I change the code to the following, it will run without errors.

Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#pragma lib "msvcrt.lib"

typedef struct { float x, y, z; } cVector;

int main( )
{
cVector* vec1;
vec1 = malloc( sizeof(cVector) );
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
free( vec1 );
return( 0 );
}

Changing it to the MS Visual C Runtime version of malloc means that the data is aligned on a 16 byte boundary as specified in the page I linked you to above and as you expressed yourself would happen. But without it, it crashes because malloc does not align on a 16 byte boundary.

I would say about all we can do now is just wait and see if Pelle has a solution regarding the structure alignment. Also, while it's being discussed, I would like to make a feature request for __declspec(align()) to work on variables. Like for example:

Code: [Select]
__declspec(align(16)) float myFloat[4];
I think this would be an extremely useful feature. If it's too much trouble don't worry about it, but it would definately be nice to have for optimizing C routines by allowing manual insurance of stack alignment.

Regards,
Bryant Keller
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 13, 2007, 09:32:35 AM
Just tried your code, that PellesC malloc works ok here.

If I use the PellesC malloc the return address is 410550, if I use the MS lib malloc returns 3224B8.

Maybe these functions behave differently on different machines, but that would be hard to believe.

John
Title: Re: Problem with inline assembly and SSE
Post by: dancho on October 13, 2007, 09:54:12 AM
one BIG THX to all,
now i have a working solution for inline assembly...

/OT
and I thought that I know C pretty good but
all this conversation reminds me how much (little) I actually know
OT
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 13, 2007, 10:07:55 AM
I'd like to know weather PellesC malloc works for others.

Thanks.

John
Title: Re: Problem with inline assembly and SSE
Post by: TimoVJL on October 13, 2007, 10:20:36 AM
Little test code:

With PellesC crt:
pointer: 12FF84h
align: 4h

With msvcrt.lib
pointer: 12FF48h
align: 8h

Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#pragma lib "msvcrt.lib"

typedef struct cVector{ float x, y, z; };

int main(void)
{
struct cVector *vec1 = malloc(sizeof(struct cVector));

printf("pointer: %0Xh\nalign: %0Xh\n", &vec1, (unsigned long)&vec1 % 0x10);
free(vec1);
return 0;
}
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 13, 2007, 10:26:09 AM
That should be

printf("pointer: %0Xh\nalign: %0Xh\n", vec1, (unsigned long)vec1 % 0x10);

Which results in

With PellesC crt:
pointer: 410550h
align: 0h

With msvcrt.lib
pointer: 322468h
align: 8h

John
Title: Re: Problem with inline assembly and SSE
Post by: TimoVJL on October 13, 2007, 10:37:46 AM
Thank's for correction.

After that PellesC crt:
pointer: 410148h
align: 8h
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 13, 2007, 10:44:59 AM
Ok thanks.

I guess it was just lucky that mine aligned on 16 bytes.

John
Title: Re: Problem with inline assembly and SSE
Post by: TimoVJL on October 13, 2007, 01:16:02 PM
Is this usable code for testing inline assembly and SSE ?
Code: [Select]
#include <stdio.h>
#include <stdlib.h>
//#pragma lib "msvcrt.lib"

typedef struct cVector { float x, y, z; } cVector;

int main(void)
{
unsigned char *pTmp  = malloc(sizeof(struct cVector) + 16);
struct cVector *vec1;

vec1 = (struct cVector*)pTmp;
printf("pointer: %0Xh\nalign: %0Xh\n", vec1, (unsigned long)vec1 % 0x10);
if ((unsigned long)vec1 % 0x10) {
vec1 = (struct cVector*)(pTmp + (16 - ((unsigned long)vec1 % 0x10)));
printf("pointer: %0Xh\nalign: %0Xh\n", vec1, (unsigned long)vec1 % 0x10);
}
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
free(pTmp);
return 0;
}
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 13, 2007, 03:28:59 PM
Looks ok to me.

John
Title: Re: Problem with inline assembly and SSE
Post by: Greg on October 13, 2007, 06:37:59 PM
The MSDN documentation (http://msdn2.microsoft.com/en-us/library/ycsb6wwf(vs.80).aspx) that said malloc is required to return memory aligned on a 16-byte boundary is for x64
 
Title: Re: Problem with inline assembly and SSE
Post by: dancho on October 13, 2007, 07:25:58 PM
heh,
funny thing is that this code ( compiled and linked with VC8EE ) works without crashing... :-\

Code: [Select]
#include <stdio.h>

struct vector{
float x,y,z;
};

int main()
{
struct vector vec1;
vec1.x=1;
vec1.y=2;
vec1.z=3;

__asm{
movaps xmm1,vec1
mulps  xmm1,xmm1
movaps vec1,xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

no pragma directive or aligned_malloc ... ???
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on October 13, 2007, 07:49:26 PM
John,

Okay, so I guess for 16 bit alignment we should probably just use the _aligned_malloc() and _aligned_free() versions just to be safe then. 8 byte alignment will on occation fall on a 16 byte boundary but it's not guarranteed. I've never really trusted the returned alignment of malloc() because it's always been sorta sketchy between computers, as you can see. It's a real pain in the arse to run an application n number of times and it run fine, with proper alignment, only to find that it's not aligning on someone elses computer.

Greg,

Thanks for pointing that out, I found that url by a google search and didn't notice it was x87 specific.

dancho,

The reason it compiles fine with VC8EE is the same reason that document says that malloc() aligns to a 16 byte boundary, as Greg pointed out, it's an x87 (64 bit) compiler. Most likely, on 32 bit systems, it calls _aligned_malloc() internally to ensure compatibility with other routines.

/OT
and I thought that I know C pretty good but
all this conversation reminds me how much (little) I actually know
OT

Truthfully I'm not all that great in C myself. There are a lot of areas I could stand to research in more detail. I write about 80% of my code in assembly and the other 20% (GUIs and psuedo-code) in C. No matter how good you get, there are always going to be people around that are going to make you feel like you have a lot to learn. That's because people tend to specialize. For example, when I talk to game programming friends I feel like a total novice. But at the same time, when any of them get around me and my system software or os development friends they tend to get lost in the conversation. Best advice is not to think about how little you know, but about how much you've learned. ;)

Regards,
Bryant Keller
Title: Re: Problem with inline assembly and SSE
Post by: Greg on October 13, 2007, 08:52:50 PM
dancho,

I compiled that same code with VC++ 2005 Express Edition and I get an 'Access violation' at the movaps line. I think you just got lucky on the alignment.
 
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 13, 2007, 10:49:11 PM
John,

Okay, so I guess for 16 bit alignment we should probably just use the _aligned_malloc() and _aligned_free() versions just to be safe then. 8 byte alignment will on occation fall on a 16 byte boundary but it's not guarranteed. I've never really trusted the returned alignment of malloc() because it's always been sorta sketchy between computers, as you can see. It's a real pain in the arse to run an application n number of times and it run fine, with proper alignment, only to find that it's not aligning on someone elses computer.

Bryant,

Yes I agree, hopefully Pelle will see this thread and sort something out with respect to some alignment directive though.

I've not seen Pelle here for a while.

Edit:
It just occurred you me that it would not be difficult to write ones own _aligned_malloc() and _aligned_free(). Pass an aligned address to the user, but in the previous 4 bytes store the original malloc-ed address. The _aligned_free() function looks 4 bytes before the user address to find the original malloc-ed address so it can be free-ed.

Not explained well but I'm sure you get the drift.

John


Title: Re: Problem with inline assembly and SSE
Post by: JohnF on October 14, 2007, 02:47:17 PM
Here is align16_malloc() and align16_free() if anyone wants. I don't think there are any bugs.

Edit: Changed a long to an unsigned long.

Code: [Select]
#include <stdlib.h>

#define ALIGN16 16

// This code assumes that a 'long' has the same width as 'pointers'
void * align16_malloc(size_t size)
{
char * start, * orig;
orig = malloc(size + sizeof(char*) + ALIGN16);
if(!orig)
return NULL;

// Move forward sizeof(char*)
start = orig + sizeof(char*);

// Then align start on 16 byte boundary
start = (char*)(((unsigned long)start + ALIGN16 - 1) & ~(ALIGN16 - 1));

// Get pointer of position before start, sizeof(char*)
unsigned long * off = (unsigned long*)(start - sizeof(char*));

// store original address at that position
*off = (unsigned long)orig;

return (void*)start;
}

void align16_free(void * mem)
{
unsigned long orig;
if(mem){
// Retrieve value of oringinal malloc address
orig = *((unsigned long*)((char*)mem - sizeof(char*)));
free((void*)orig);
}else
return;
}

John
Title: Re: Problem with inline assembly and SSE
Post by: Pelle on October 28, 2007, 10:32:05 PM
I will have to look at this some more, but...

1) malloc will return a pointer suitable aligned for standard C types. Since SSE??? types are non-standard, at least for the now "you are on your own".

2) I have dropped support for the inline assembler in the 64-bit version, so maybe a good idea to prepare for the future and move to POASM anyway...

3) IIRC, __declspec(align(16)) should work for structures that are *initialized* ...

this will work:
Code: [Select]
__m128 a={0};
this will not work:
Code: [Select]
__m128 a;
since the latter case will end up in the comdat section, where an alignment can't be specified through COFF (AFAIK).
Title: Re: Problem with inline assembly and SSE
Post by: TimoVJL on October 29, 2007, 05:50:11 PM
That old example needs static keyword with initialization.
Code: [Select]
#include <stdio.h>

typedef __declspec(align(16)) struct { float x, y, z; } cVector;

int main( )
{
static cVector vec1 = {0};
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

__asm {
movaps xmm1, vec1
mulps xmm1, xmm1
movaps vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on October 30, 2007, 01:34:50 AM
1) malloc will return a pointer suitable aligned for standard C types. Since SSE??? types are non-standard, at least for the now "you are on your own".

Thanks for the clearification. For the most part I think we got any/all malloc-ing issues covered with everything done in this thread. ;)

2) I have dropped support for the inline assembler in the 64-bit version, so maybe a good idea to prepare for the future and move to POASM anyway...

Truth is, that's really the way it should be. I've never really been a fan of inline assembly. It makes your C code less portable and you are working with a dumbed down version of assembler. All around it's best to just write your assembly routines seperate then call them from your C source code. So I fully agree with your descision there.

3) IIRC, __declspec(align(16)) should work for structures that are *initialized* ...

this will work:
Code: [Select]
__m128 a={0};
this will not work:
Code: [Select]
__m128 a;
since the latter case will end up in the comdat section, where an alignment can't be specified through COFF (AFAIK).

Awesome, Since there is a decent solution - I'd like to repost a feature request from earlier in this thread (in case you overlooked it)...

Quote from: Synfire (Edited CODE Section)
I would like to make a feature request for __declspec(align()) to work on variables. Like for example:

Code: [Select]
__declspec(align(16)) float myFloat[4]={0,0,0,0};

I think this would be an extremely useful feature. If it's too much trouble don't worry about it, but it would definately be nice to have for optimizing C routines by allowing manual insurance of stack alignment.

As I've said I'm mostly an assembly programmer (primarily NASM, GoASM, and POASM), I only use C for doing minor things (mostly psuedo code to test if an idea is going to work before I start actually writing the program) so it could be put on the very bottom of your todo list unless others want it also.

Regards,
Bryant Keller
Title: Re: Problem with inline assembly and SSE
Post by: Greg on October 30, 2007, 03:29:53 AM
timovjl,

Quote
That old example needs static keyword with initialization.

Thanks for the tip. It doesn't work without static.
 
Title: Re: Problem with inline assembly and SSE
Post by: Pelle on November 02, 2007, 02:22:02 AM
Perhaps I can do something about __declspec(align()) for variables later. It suspect it will take much more than "5 minutes", so not top priority at the moment...
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on November 02, 2007, 02:38:50 AM
Oh yea, totally understand mate, no hurry. Just thought I would throw it out there while it was on topic.
Title: Re: Problem with inline assembly and SSE
Post by: severach on November 02, 2007, 06:00:08 AM
I've never really been a fan of inline assembly. It makes your C code less portable and you are working with a dumbed down version of assembler.
I disagree with both. Inline assembly encourages the use of assembly because it is more portable and smarter than a real assembler.

Inline assembly is more portable because you use #ifdef __POCC__ to select assembly or C code depending on the compiler and platform. Inline assembly is smarter because you get to use some C like constructs rather than writing pure TASM or MASM. Anything that makes assembly look more like C makes it better. Anything that makes assembly look less like C makes me less likely to use it, which may be your intent.

Perhaps you don't like the limited functionality of Pelles inline assembly when compared with Borland or GCC implementations.
Title: Re: Problem with inline assembly and SSE
Post by: JohnF on November 03, 2007, 08:45:11 AM
I agree with Severach in that I'm for inlined asm. Although I mainly code in C, to have the possibility of using inlined asm when needed is a great plus.

John
 
Title: Re: Problem with inline assembly and SSE
Post by: Synfire on November 28, 2007, 04:33:43 AM
Inline assembly is more portable because you use #ifdef __POCC__ to select assembly or C code depending on the compiler and platform.

Most modern assemblers support a similiar directive for changing based on symbols (so portability between various platforms can be achieved in much the same way). As for switching between C and Assembly, that's what I don't like. I personally prefer to keep such code seperated into different files. By doing that, you get to choose your prefered assembler and syntax. You can also use the various extentions, libraries, toolkits, etc. that have been developed for that assembler. For example, you mentioned MASM, an assembly programmer could make use of OA32 which comes with many modules for graphics, UI development, COM/OOP, etc. which could simplify a lot of work. Or for example Paul's GeneSys Library which has many procedures to simplify development on Win32 platforms built into linkable libraries for your development and include files for use with MASM. Notably I personally don't use any of these being a NASM/PoASM/GoASM user.. but they are examples as to what I was talking about as far as why a person should work their assembly code seperately from their C code.

Inline assembly is smarter because you get to use some C like constructs rather than writing pure TASM or MASM. Anything that makes assembly look more like C makes it better. Anything that makes assembly look less like C makes me less likely to use it, which may be your intent.

True, my source isn't normally in a very "high level format" compared to many people. But I just see it as, if you are going to use C, use C, if you are going to use assembly, use assembly. If you want something in between, go look at some of the psuedo-assemblers like HLA or EASM.

Perhaps you don't like the limited functionality of Pelles inline assembly when compared with Borland or GCC implementations.

Never used anything from Borland other than TASM and never really liked it all that much. As for GCC, I don't particularly care for AT&T syntax assembly, I'll use it if I have to but that's about as far as it goes. Even when I get stuck coding with GAS I normally switch over using .intel_syntax first chance I get.

I agree with Severach in that I'm for inlined asm. Although I mainly code in C, to have the possibility of using inlined asm when needed is a great plus.

John

Emphasis on "a great plus". I can agree with that. There are a few circumstances where inline assembly can be "a great plus" like when the compiler doesn't want to generate the opcodes you are expecting and you can't seem to figure out a way to make it do it. Just use __asm() and insert the opcodes yourself. Or when you want to use the compiler's optimizer, but you want to make sure that a certain set of opcodes are produced at a certain point (I've had that happen two or three times). And another good one is __asm(int 3) to generate a breakpoint; I actually have a macro which displays a message with fprintf to stderr and is followed by an int3 hardware breakpoint. The macro only inserts that code whereever the DPRINTF(...); lines are if __DEBUG__ is defined. So yea, there are some circumstances where it can be "a great plus" but for overall code design I don't suggest people become reliant on it and I personally consider it to be a bad programming practice.

Regards,
Bryant Keller