This is an alignment problem for setjmp() buffer.
EDIT:
In my opinion this is a compiler bug relative to __declspec(align(16)).
After some checks I concluded that the situation is some more complicate. The __declspec seems to work on static memory, can't say on automatic memory because it is always aligned on 16bytes boundary. On the other hand in a dynamic allocation the compiler doesn't know what will be the real address after allocation. Now the chosen strategy here seems to be: Assume that the memory block is aligned on a memory boundary that is suitable for any object (C standard, defined in C17, section 7.22.3, paragraph 1), then displace the aligned object, the jmp_buf in our case, by padding a multiple of alignment starting from beginning of the memory block.
The point is that PellesC, differently from MSVC, returns malloced blocks on 8bytes bounds (to be honest always with final '8' in the address). See this older topic.
Now if the strategy is compliant with MSVC for the alignment it is wrong for allocation.
I can't say where the bug is, but sure something strange happens...
And in any case when a jmp_buf, or any special object that requires an alignment bigger than 8bytes (i.e. 16bytes), is used inside a structure dynamically allocated will fail in PellesC, but will work for other compilers (i.e. MSVC, GCC, LLVM).
The following is the definition of jmp_buf in setjmp.h:
typedef struct __declspec(align(16)) {
unsigned long long data[2];
} jmp_buf[16];
For some reason the calloc returns an address that is always aligned on 8bytes boundary, then
the compiler pads the structure s to align the jump buffer not on a 16bytes boundary, but just 16bytes above in memory.
Look the code:
State* s = (State*)calloc(1, sizeof(State));
...
if (setjmp(s->jmp) == 1) { //Here the program crashes when compiled for 64 bits.
free(s);
return 0;
}
Now look the attached image, it shows that the calloc() returns an address aligned on 8 bytes boundary, then the padding added to R14, that holds address of structure s, is 16bytes (0x10), but it should have been:
LEA RCX[R14+0x10-1]
AND RCX, 0xFFFFFFFFFFFFFFF0
When the setjmp() function tries to save XMM registers (xmm6-xmm15) the alignement exception trigger the CPU.
Try to allocate on the stack the structure s, on the stack it is aligned on 16bytes boundary, and the program will work ((obviously you have to remove the free()).
In 32bits mode there are no registers that require 16bits alignment, or calloc returns memory aligned on 16bytes boundary.
As a workaround you have to use memory aligned on 16bytes boundary. You can use _mm_malloc() and _mm_free() to correctly allocate memory on a 16bytes boundary.
Specifically replace the function
tigrInflate() with following code:
int tigrInflate(void* out, unsigned outlen, const void* in, unsigned inlen) {
int last;
// State* s = (State*)calloc(1, sizeof(State));
State* s = (State*)_mm_malloc(sizeof(State), 16); //Allocate memory on 16bytes boundary
memset(s, 0, sizeof(State)); //Clean memory
// We assume we can buffer 2 extra bytes from off the end of 'in'.
s->in = (unsigned char*)in;
s->inend = s->in + inlen + 2;
s->out = (unsigned char*)out;
s->outend = s->out + outlen;
s->bits = 0;
s->count = 0;
bits(s, 0);
if (setjmp(s->jmp) == 1) {
//free(s);
_mm_free(s);
return 0;
}
do {
last = bits(s, 1);
switch (bits(s, 2)) {
case 0:
stored(s);
break;
case 1:
fixed(s);
block(s);
break;
case 2:
dynamic(s);
block(s);
break;
case 3:
FAIL();
}
} while (!last);
//free(s);
_mm_free(s);
return 1;
}