NO

Author Topic: Problem with inline assembly and SSE  (Read 25753 times)

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #15 on: October 13, 2007, 10:07:55 am »
I'd like to know weather PellesC malloc works for others.

Thanks.

John

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 1902
Re: Problem with inline assembly and SSE
« Reply #16 on: October 13, 2007, 10:20:36 am »
Little test code:

With PellesC crt:
pointer: 12FF84h
align: 4h

With msvcrt.lib
pointer: 12FF48h
align: 8h

Code: [Select]
#include <stdio.h>
#include <stdlib.h>
#pragma lib "msvcrt.lib"

typedef struct cVector{ float x, y, z; };

int main(void)
{
struct cVector *vec1 = malloc(sizeof(struct cVector));

printf("pointer: %0Xh\nalign: %0Xh\n", &vec1, (unsigned long)&vec1 % 0x10);
free(vec1);
return 0;
}
May the source be with you

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #17 on: October 13, 2007, 10:26:09 am »
That should be

printf("pointer: %0Xh\nalign: %0Xh\n", vec1, (unsigned long)vec1 % 0x10);

Which results in

With PellesC crt:
pointer: 410550h
align: 0h

With msvcrt.lib
pointer: 322468h
align: 8h

John

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 1902
Re: Problem with inline assembly and SSE
« Reply #18 on: October 13, 2007, 10:37:46 am »
Thank's for correction.

After that PellesC crt:
pointer: 410148h
align: 8h
May the source be with you

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #19 on: October 13, 2007, 10:44:59 am »
Ok thanks.

I guess it was just lucky that mine aligned on 16 bytes.

John

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 1902
Re: Problem with inline assembly and SSE
« Reply #20 on: October 13, 2007, 01:16:02 pm »
Is this usable code for testing inline assembly and SSE ?
Code: [Select]
#include <stdio.h>
#include <stdlib.h>
//#pragma lib "msvcrt.lib"

typedef struct cVector { float x, y, z; } cVector;

int main(void)
{
unsigned char *pTmp  = malloc(sizeof(struct cVector) + 16);
struct cVector *vec1;

vec1 = (struct cVector*)pTmp;
printf("pointer: %0Xh\nalign: %0Xh\n", vec1, (unsigned long)vec1 % 0x10);
if ((unsigned long)vec1 % 0x10) {
vec1 = (struct cVector*)(pTmp + (16 - ((unsigned long)vec1 % 0x10)));
printf("pointer: %0Xh\nalign: %0Xh\n", vec1, (unsigned long)vec1 % 0x10);
}
vec1->x = 0.5;
vec1->y = 1.5;
vec1->z = -3.141;

__asm {
      mov ecx, vec1
      movaps xmm1, [ecx]
      mulps xmm1, xmm1
      movaps [ecx], xmm1
}

printf( "%f %f %f\n", vec1->x, vec1->y, vec1->z );
free(pTmp);
return 0;
}
May the source be with you

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #21 on: October 13, 2007, 03:28:59 pm »
Looks ok to me.

John

Greg

  • Guest
Re: Problem with inline assembly and SSE
« Reply #22 on: October 13, 2007, 06:37:59 pm »
The MSDN documentation that said malloc is required to return memory aligned on a 16-byte boundary is for x64
 
« Last Edit: October 13, 2007, 06:45:35 pm by Greg »

dancho

  • Guest
Re: Problem with inline assembly and SSE
« Reply #23 on: October 13, 2007, 07:25:58 pm »
heh,
funny thing is that this code ( compiled and linked with VC8EE ) works without crashing... :-\

Code: [Select]
#include <stdio.h>

struct vector{
float x,y,z;
};

int main()
{
struct vector vec1;
vec1.x=1;
vec1.y=2;
vec1.z=3;

__asm{
movaps xmm1,vec1
mulps  xmm1,xmm1
movaps vec1,xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}

no pragma directive or aligned_malloc ... ???

Synfire

  • Guest
Re: Problem with inline assembly and SSE
« Reply #24 on: October 13, 2007, 07:49:26 pm »
John,

Okay, so I guess for 16 bit alignment we should probably just use the _aligned_malloc() and _aligned_free() versions just to be safe then. 8 byte alignment will on occation fall on a 16 byte boundary but it's not guarranteed. I've never really trusted the returned alignment of malloc() because it's always been sorta sketchy between computers, as you can see. It's a real pain in the arse to run an application n number of times and it run fine, with proper alignment, only to find that it's not aligning on someone elses computer.

Greg,

Thanks for pointing that out, I found that url by a google search and didn't notice it was x87 specific.

dancho,

The reason it compiles fine with VC8EE is the same reason that document says that malloc() aligns to a 16 byte boundary, as Greg pointed out, it's an x87 (64 bit) compiler. Most likely, on 32 bit systems, it calls _aligned_malloc() internally to ensure compatibility with other routines.

/OT
and I thought that I know C pretty good but
all this conversation reminds me how much (little) I actually know
OT

Truthfully I'm not all that great in C myself. There are a lot of areas I could stand to research in more detail. I write about 80% of my code in assembly and the other 20% (GUIs and psuedo-code) in C. No matter how good you get, there are always going to be people around that are going to make you feel like you have a lot to learn. That's because people tend to specialize. For example, when I talk to game programming friends I feel like a total novice. But at the same time, when any of them get around me and my system software or os development friends they tend to get lost in the conversation. Best advice is not to think about how little you know, but about how much you've learned. ;)

Regards,
Bryant Keller
« Last Edit: October 13, 2007, 07:51:48 pm by Synfire »

Greg

  • Guest
Re: Problem with inline assembly and SSE
« Reply #25 on: October 13, 2007, 08:52:50 pm »
dancho,

I compiled that same code with VC++ 2005 Express Edition and I get an 'Access violation' at the movaps line. I think you just got lucky on the alignment.
 

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #26 on: October 13, 2007, 10:49:11 pm »
John,

Okay, so I guess for 16 bit alignment we should probably just use the _aligned_malloc() and _aligned_free() versions just to be safe then. 8 byte alignment will on occation fall on a 16 byte boundary but it's not guarranteed. I've never really trusted the returned alignment of malloc() because it's always been sorta sketchy between computers, as you can see. It's a real pain in the arse to run an application n number of times and it run fine, with proper alignment, only to find that it's not aligning on someone elses computer.

Bryant,

Yes I agree, hopefully Pelle will see this thread and sort something out with respect to some alignment directive though.

I've not seen Pelle here for a while.

Edit:
It just occurred you me that it would not be difficult to write ones own _aligned_malloc() and _aligned_free(). Pass an aligned address to the user, but in the previous 4 bytes store the original malloc-ed address. The _aligned_free() function looks 4 bytes before the user address to find the original malloc-ed address so it can be free-ed.

Not explained well but I'm sure you get the drift.

John


« Last Edit: October 13, 2007, 11:06:40 pm by JohnF »

JohnF

  • Guest
Re: Problem with inline assembly and SSE
« Reply #27 on: October 14, 2007, 02:47:17 pm »
Here is align16_malloc() and align16_free() if anyone wants. I don't think there are any bugs.

Edit: Changed a long to an unsigned long.

Code: [Select]
#include <stdlib.h>

#define ALIGN16 16

// This code assumes that a 'long' has the same width as 'pointers'
void * align16_malloc(size_t size)
{
char * start, * orig;
orig = malloc(size + sizeof(char*) + ALIGN16);
if(!orig)
return NULL;

// Move forward sizeof(char*)
start = orig + sizeof(char*);

// Then align start on 16 byte boundary
start = (char*)(((unsigned long)start + ALIGN16 - 1) & ~(ALIGN16 - 1));

// Get pointer of position before start, sizeof(char*)
unsigned long * off = (unsigned long*)(start - sizeof(char*));

// store original address at that position
*off = (unsigned long)orig;

return (void*)start;
}

void align16_free(void * mem)
{
unsigned long orig;
if(mem){
// Retrieve value of oringinal malloc address
orig = *((unsigned long*)((char*)mem - sizeof(char*)));
free((void*)orig);
}else
return;
}

John
« Last Edit: October 14, 2007, 04:07:24 pm by JohnF »

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2084
    • http://www.smorgasbordet.com
Re: Problem with inline assembly and SSE
« Reply #28 on: October 28, 2007, 10:32:05 pm »
I will have to look at this some more, but...

1) malloc will return a pointer suitable aligned for standard C types. Since SSE??? types are non-standard, at least for the now "you are on your own".

2) I have dropped support for the inline assembler in the 64-bit version, so maybe a good idea to prepare for the future and move to POASM anyway...

3) IIRC, __declspec(align(16)) should work for structures that are *initialized* ...

this will work:
Code: [Select]
__m128 a={0};
this will not work:
Code: [Select]
__m128 a;
since the latter case will end up in the comdat section, where an alignment can't be specified through COFF (AFAIK).
/Pelle

Offline TimoVJL

  • Global Moderator
  • Member
  • *****
  • Posts: 1902
Re: Problem with inline assembly and SSE
« Reply #29 on: October 29, 2007, 05:50:11 pm »
That old example needs static keyword with initialization.
Code: [Select]
#include <stdio.h>

typedef __declspec(align(16)) struct { float x, y, z; } cVector;

int main( )
{
static cVector vec1 = {0};
vec1.x=0.5;
vec1.y=1.5;
vec1.z=-3.141;

__asm {
movaps xmm1, vec1
mulps xmm1, xmm1
movaps vec1, xmm1
}

printf("%f %f %f\n",vec1.x,vec1.y,vec1.z);

return 0;
}
May the source be with you