Optimiser code generation bug?

sal55 · September 25, 2014, 01:04:27 PM

This is with Pelles C version 8, release candidate 6, 32-bit compiler, with optimisation on (/Ot), running in Windows 7 64-bits.

Take the following code:

typedef struct _varrec {
int dummy1, dummy2;
struct _varrec* ptr;
int dummy3;
} varrec;

int main (void) {
varrec u={0},v={0};
varrec *p;

u.ptr=&v;
p=&u;
*p = *(p->ptr);

}

This copies the 16-byte struct from v to u. As far as I know, this is legal, fully defined C. But it crashes when running the optimised code (which inlines the block copy as a set of four moves).

I think it is to with repeatedly re-evaluating the p->ptr expression, even when it is overwritten by the contents of v. (So with the last of the four 32-bit copies, it writes to location 0).

This is part of a disassembly of the code:

0040531A 8B4608 mov eax,[esi+8]
0040531D 8B00 mov eax,[eax]
0040531F 8907 mov [edi],eax
00405321 8B4608 mov eax,[esi+8]
00405324 8B4004 mov eax,[eax+4]
00405327 894704 mov [edi+4],eax
0040532A 8B4608 mov eax,[esi+8]
0040532D 8B4008 mov eax,[eax+8]
00405330 894708 mov [edi+8],eax
00405333 8B4608 mov eax,[esi+8]
00405336 8B400C mov eax,[eax+0Ch]
00405339 89470C mov [edi+0Ch],eax

It seem to be the repeated [esi+8] that is the culprit. (Perhaps needs a lea esi,[esi+8] at the top, then it uses [esi].)

czerny · September 25, 2014, 03:48:35 PM

p->ptr points to v
v.ptr is NULL
after copying the first 3 structur elements (including the ptr element) p->ptr points to NULL
and the last element (dummy3) can not be copied.

sal55 · September 25, 2014, 06:25:58 PM

Quote from: czerny on September 25, 2014, 03:48:35 PM
p->ptr points to v
v.ptr is NULL
after copying the first 3 structur elements (including the ptr element) p->ptr points to NULL
and the last element (dummy3) can not be copied.

Yes, that seems to be the problem when you examine what is happening in detail

But I don't know if you're agreeing with me that it is a bug or not. The assignment should be an atomic operation as far as C is concerned with p->ptr being evaluated only once. (And the code works as expected with half-a-dozen other compilers, including Pelles C in non-optimised mode.)

neo313 · September 25, 2014, 07:48:21 PM

That should be valid c in my opinion.

It is a bug.

The strange thing is the code tries to dereference pointer v.ptr, which is NULL of course, and then copy from that pointer on.
If you change the pointer to something else the code copies from that pointer on, like in my example, where the assignment *p = *( p->ptr ) ; starts to copy from v and then switches to t:

Code Select


#include <stdio.h>
#include <stdlib.h>

typedef struct _varrec 
{
	int dummy1 ;
	int dummy2 ;
	struct _varrec* ptr ;
	int dummy3 ;

} varrec;

int main( void ) 
{
	varrec t = { 444 , 555 , NULL , 666 } ;

	varrec u = { 0 } ;
	varrec v = { 123 , 456 , &t , 789 } ;
	
	varrec *p;

	u.ptr = &v ;

	p = &u ;

	*p = *( p->ptr ) ;

	printf("%d" , u.dummy3 ) ;	//What the fuck

return 0 ;
}

If you disable optimization the bug goes away, and printf prints 789.

To solve the problem temporarily, use memcpy or an intermediate variable( pointer ).

( Sorry, I had to change the style. )

sal55 · September 25, 2014, 10:37:34 PM

Quote from: neo313 on September 25, 2014, 07:48:21 PM
That should be valid c in my opinion.

It is a bug.

The strange thing is the code tries to dereference pointer v.ptr, which is NULL of course, and then copy from that pointer on.
If you change the pointer to something else the code copies from that pointer on, like in my example, where the assignment *p = *( p->ptr ) ; starts to copy from v and then switches to t:

Code Select Expand
printf("%d" , u.dummy3 ) ; //What the fuck

Yeah, the bug could have been harder to find than it was..

Quote
If you disable optimization the bug goes away, and printf prints 789.

The trouble is I've been quite keen to find out how Pelles C performs on my interpreter project, compared with other compilers. V7 had an internal error using /Ot, and V8 has this! (The 64-bit version did quite well, suggesting that 32-bit code would be better, if it follows the pattern of the other compilers, ie. 64-bits generally makes things slower.)

Quote
To solve the problem temporarily, use memcpy or an intermediate variable( pointer ).

I think that any use of memcpy would itself be optimised to inline code! (Also my project spends a lot of time pushing 16-byte structs around, so this operation is crucial.)

The code could possibly be changed, but the actual code where it went wrong looked like this:

*dest = *(p->vptr);

It's not obvious that dest and p point to the same struct! There might be quite a few such expressions, but I will have a go at changing them just to see it working.

neo313 · September 26, 2014, 07:17:05 AM

My mistake:

QuoteTo solve the problem temporarily, use memcpy or an intermediate variable( pointer ).

None of these help. You have to use a temporary struct variable. Which adds about 75% more instructions to the entire copy.

Or define pragma function( memcpy ) to disable memcpy optimization which also removes the bug, but is slower.

update:

If you write a separate function, and inline it, you get correct code and the copy is as fast as assignment( or a bit faster ). Looks like the bug is limited to assignment only. I think that may be best for you.
Also enable auto inline in options.

Code Select


static inline void varrec_cpy( varrec* const a , const varrec* const b )
{
	*a = *b ;
}

varrec_cpy( p , p->ptr ) ;

const qualifiers don't matter but I included them for correctness.

frankie · September 26, 2014, 10:59:50 AM

In your case there are at least 2 relevant points, the first falls in the category of 'side-effects' when copying overlapped areas. This is generally a programmer error to copy two memory areas that overlaps, and in our case the pointer to the structure to copy is inside the memory that will be overwrited. These issues are compiler dependent and not covered by the standard.
The other issue is the optimizer behaviour that re-loads the pointer after it has been overwritten, while it seems logical that reloading pointers is not an optimization, instruction timings could make it convenient.
In conclusion it is not so clear that this is a bug.

neo313 · September 26, 2014, 11:29:25 AM

Quote from: frankie on September 26, 2014, 10:59:50 AM
In your case there are at least 2 relevant points, the first falls in the category of 'side-effects' when copying overlapped areas.

It is clearly a bug.

No memory overlaps in the example. p and p->ptr point to non-overlapping memory.

The code loads value from both operands in unspecified order, and then updates the left value.

6.5.16, p3:
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.

Here is an even more simplified version

Code Select


int main( void ) 
{
	varrec t = { 444 , 555 , NULL , 666 } ;

	varrec u = { 0 } ;
	varrec v = { 123 , 456 , &t , 789 } ;
	
	u.ptr = &v ;

	u = *( u.ptr ) ;

	printf("%d" , u.dummy3 ) ;	//What the fuck

return 0 ;
}

I see no reason for u = *( u.ptr ) ; to be undefined behaviour.

frankie · September 26, 2014, 12:18:08 PM

I see.
The problem once again come from the optimizer...

sal55 · September 27, 2014, 12:12:11 AM

Quote from: sal55 on September 25, 2014, 10:37:34 PM
The trouble is I've been quite keen to find out how Pelles C performs on my interpreter project, compared with other compilers.

I worked around this problem using an intermediate variable, as suggested. Only in half a dozen places, but I had to find them by locating where it crashed. (Obviously the bug is still there and ought to be fixed at some point.)

But the compiler now works pretty well on my application, about third behind gcc and Clang (with two or three slower ones). (gcc and Clang, however, also support label pointers, which in my program give additional speed.)

The only remaining mystery is why, when generating 64-bit code, it (the compiler, runtime, OS or whatever it is), is sometimes locating my functions at addresses above 4GB (the program code is 0.0004Gb in size).

frankie · September 27, 2014, 09:18:33 PM

To speed up modifications I would sugget to use:

Code Select

#pragma optimize(none)    //Disable optimization from here on
int myfunc(...)
{
}
#pragma optimize()    //Restore optimization as set

In this way you can disable the optimization for a single function inside a module.

Going back to the bug definition I'm still not sure that this is a bug. It is true that you define a new pointer to an object, but you use its contents to copy inside the object de-facto copying an overlapped memory. If the compiler would or would'nt reload the pointer is a compiler specific behaviour not a rule. The line:

Code Select

 *p = *(p->ptr);

Clearly indicates that you are modifying a whole object using one of its contents. The assumption that the compiler don't have to reload the pointer is just an arbitrary assumption. In fact using an intermediate pointer solves the problem, a brain damaged compiler would optimize also the use of intermediarte pointer and would have drive to the same problem.
Moreover as far as I can see the standard doesn't define structure assignements as atomic operations.

neo313 · September 28, 2014, 01:24:45 AM

Quotede-facto copying an overlapped memory.

v is copied to u, there is no memory overlap.

QuoteThe assumption that the compiler don't have to reload the pointer is just an arbitrary assumption.

There is no assumption. Standard is clear on this:

6.5.16, p3:
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.

QuoteMoreover as far as I can see the standard doesn't define structure assignements as atomic operations.

This is completely irrelevant, since there is no concurrency issues. Operators are computed before the assignment:

is sequenced after the value computations of the left and right operands.

5.1.2.3, p3:
Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations
executed by a single thread, which induces a partial order among those evaluations.
Given any two evaluations A and B, if A is sequenced before B, then the execution of A
shall precede the execution of B. (Conversely, if A is sequenced before B, then B is
sequenced after A.) If A is not sequenced before or after B, then A and B are
unsequenced. Evaluations A and B are indeterminately sequenced when A is sequenced
either before or after B, but it is unspecified which.13)The presence of a sequence point
between the evaluation of expressions A and B implies that every value computation and
side effect associated with A is sequenced before every value computation and side effect
associated with B.

13) The executions of unsequenced evaluations can interleave. Indeterminately sequenced evaluations
cannot interleave, but can be executed in any order.

In our case A is the computation of the operands and B is assignment.
All of this means that our example is correct as far as standard is concerned.

frankie · September 28, 2014, 09:28:38 PM

So when coping a record object the compiler to avoid any conflict must create a temporary object to copy on than move it atomically on the destination?
Looking at the machine, not the parts of the standard that can be honestly interpreted in different ways, the transfer of blocks, not atomic types, requires different techniques normally referred as fencing. Now, on real hardware, if you crreate a fence to make atomic a block operation what should happen when the code try to access memory because you use a property of that object to copy it over?

sal55 · September 29, 2014, 12:41:12 AM

Quote from: frankie on September 28, 2014, 09:28:38 PM
So when coping a record object the compiler to avoid any conflict must create a temporary object to copy on than move it atomically on the destination?

No, that is not necessary when copying blocks that don't overlap. It is however necessary to establish the source and destination addresses before copying; if either is inside the destination block, it must be read out first (obviously, otherwise you don't know where to read to or write from).

The mistake here, is after the source address which is in the destination has been read, to read again after part of the copy had been done. The optimiser made the error of not evaluating the source address once and for all and relying on re-reading (which is hardly efficient anyway if you're trying to optimise).

Here's another example not involving blocks:

int data[]={0,0,0,256,0,0};
int A=3;

A=data[A];

A takes on the value in data[3], ie. 256. But it might be that this assignment is done a byte at a time (I've seen in happen when the compiler thinks objects are misaligned). But this will go wrong if copied a byte at a time and the index A is reused each time.

News:

Optimiser code generation bug?

sal55

czerny

sal55

neo313

sal55

neo313

frankie

neo313

frankie

sal55

frankie

neo313

frankie

sal55