Pelles C forum

Pelles C => Bug reports => Topic started by: jonathan on January 31, 2019, 11:50:57 pm

Title: Trouble with _Alignas
Post by: jonathan on January 31, 2019, 11:50:57 pm
I need to have arrays and variables aligned on a 32 byte boundary for the AVX packed instructions.  I use the following when I declare my variables:

float _Alignas(32) ionea[8]={0,0,0,0,0,0,0,0};
float _Alignas(32) qonea[8]={0,0,0,0,0,0,0,0};

It does not work reliably.  Sometimes they are aligned, other times they are not.  Whats going on?
Title: Re: Trouble with _Alignas
Post by: TimoVJL on February 01, 2019, 09:12:44 am
A simple test:
Code: [Select]
int __cdecl main(void)
// float _Alignas(32) ionea[8]={0,0,0,0,0,0,0,0};
// float _Alignas(32) qonea[8]={0,0,0,0,0,0,0,0};
float __declspec(align(32)) ionea[8]={0,0,0,0,0,0,0,0};
float __declspec(align(32)) qonea[8]={0,0,0,0,0,0,0,0};
printf("%p %p\n", ionea, qonea);
printf("%u %u\n", (long)ionea & 31, (long)qonea & 31);
return 0;
Code: [Select]
test_align.c(2): warning #2224: Alignment of 'stack' exceeds 16 bytes; ignored.pocc
Code: [Select]
00000000  4883EC68                 sub rsp, 68h
00000004  0F57C0                   xorps xmm0, xmm0
00000007  F30F11442440             movss dword ptr [rsp+40h], xmm0
Code: [Select]
00000000  4055                     push rbp
00000002  4881EC80000000           sub rsp, 80h
00000009  488D6C2440               lea rbp, [rsp+40h]
0000000E  4883E5E0                 and rbp, -20h
00000012  0F57C0                   xorps xmm0, xmm0
00000015  F30F114520               movss dword ptr [rbp+20h], xmm0
Code: [Select]
00000000  55                       push rbp
00000001  4883EC70                 sub rsp, 70h
00000005  488D6C2470               lea rbp, [rsp+70h]
0000000A  4883E4E0                 and rsp, -20h
0000000E  0F57C0                   xorps xmm0, xmm0
00000011  0F29442450               movaps xmmword ptr [rsp+50h], xmm0
EDIT: fix missing float
Title: Re: Trouble with _Alignas
Post by: Pelle on February 03, 2019, 07:50:06 pm
No support for over-aligned stack on X64 with Pelles C (perhaps in the future, but probably not). 

It's often possible to skip local stack variables like these and load a SIMD register directly by using one of the "initialize" intrinsics, like _mm_set_ps(), _mm256_set_ps(), _mm256_set1_ps(), etc.

Code: [Select]
__m256 whatever1 = _mm256_set_ps(0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f);