NO

Author Topic: Optimiser code generation bug?  (Read 5507 times)

sal55

  • Guest
Optimiser code generation bug?
« on: September 25, 2014, 01:04:27 PM »
This is with Pelles C version 8, release candidate 6, 32-bit compiler, with optimisation on (/Ot), running in Windows 7 64-bits.

Take the following code:

typedef struct _varrec {
 int dummy1, dummy2;
 struct _varrec* ptr;
 int dummy3;
} varrec;

int main (void) {
varrec u={0},v={0};
varrec *p;

 u.ptr=&v;
 p=&u;
 *p = *(p->ptr);

}

This copies the 16-byte struct from v to u. As far as I know, this is legal, fully defined C. But it crashes when running the optimised code (which inlines the block copy as a set of four moves).

I think it is to with repeatedly re-evaluating the p->ptr expression, even when it is overwritten by the contents of v. (So with the last of the four 32-bit copies, it writes to location 0).

This is part of a disassembly of the code:

0040531A 8B4608                 mov     eax,[esi+8]
0040531D 8B00                   mov     eax,[eax]
0040531F 8907                   mov     [edi],eax
00405321 8B4608                 mov     eax,[esi+8]
00405324 8B4004                 mov     eax,[eax+4]
00405327 894704                 mov     [edi+4],eax
0040532A 8B4608                 mov     eax,[esi+8]
0040532D 8B4008                 mov     eax,[eax+8]
00405330 894708                 mov     [edi+8],eax
00405333 8B4608                 mov     eax,[esi+8]
00405336 8B400C                 mov     eax,[eax+0Ch]
00405339 89470C                 mov     [edi+0Ch],eax

It seem to be the repeated [esi+8] that is the culprit. (Perhaps needs a lea esi,[esi+8] at the top, then it uses [esi].)

czerny

  • Guest
Re: Optimiser code generation bug?
« Reply #1 on: September 25, 2014, 03:48:35 PM »
p->ptr points to v
v.ptr is NULL
after copying the first 3 structur elements (including the ptr element) p->ptr points to NULL
and the last element (dummy3) can not be copied.

sal55

  • Guest
Re: Optimiser code generation bug?
« Reply #2 on: September 25, 2014, 06:25:58 PM »
p->ptr points to v
v.ptr is NULL
after copying the first 3 structur elements (including the ptr element) p->ptr points to NULL
and the last element (dummy3) can not be copied.

Yes, that seems to be the problem when you examine what is happening in detail

But I don't know if you're agreeing with me that it is a bug or not. The assignment should be an atomic operation as far as C is concerned with p->ptr being evaluated only once. (And the code works as expected with half-a-dozen other compilers, including Pelles C in non-optimised mode.)

neo313

  • Guest
Re: Optimiser code generation bug?
« Reply #3 on: September 25, 2014, 07:48:21 PM »
That should be valid c in my opinion.

It is a bug.

The strange thing is the code tries to dereference pointer v.ptr, which is NULL of course, and then copy from that pointer on.
If you change the pointer to something else the code copies from that pointer on, like in my example, where the assignment *p = *( p->ptr ) ; starts  to copy from v and then switches to t:

Code: [Select]
#include <stdio.h>
#include <stdlib.h>

typedef struct _varrec
{
int dummy1 ;
int dummy2 ;
struct _varrec* ptr ;
int dummy3 ;

} varrec;

int main( void )
{
varrec t = { 444 , 555 , NULL , 666 } ;

varrec u = { 0 } ;
varrec v = { 123 , 456 , &t , 789 } ;

varrec *p;

u.ptr = &v ;

p = &u ;

*p = *( p->ptr ) ;

printf("%d" , u.dummy3 ) ; //What the fuck

return 0 ;
}

If you disable optimization the bug goes away, and printf prints 789.

To solve the problem temporarily, use memcpy or an intermediate variable( pointer ).

( Sorry, I had to change the style. )
« Last Edit: September 25, 2014, 07:58:04 PM by neo313 »

sal55

  • Guest
Re: Optimiser code generation bug?
« Reply #4 on: September 25, 2014, 10:37:34 PM »
That should be valid c in my opinion.

It is a bug.

The strange thing is the code tries to dereference pointer v.ptr, which is NULL of course, and then copy from that pointer on.
If you change the pointer to something else the code copies from that pointer on, like in my example, where the assignment *p = *( p->ptr ) ; starts  to copy from v and then switches to t:

Code: [Select]
printf("%d" , u.dummy3 ) ; //What the fuck

Yeah, the bug could have been harder to find than it was..

Quote
If you disable optimization the bug goes away, and printf prints 789.
The trouble is I've been quite keen to find out how Pelles C performs on my interpreter project, compared with other compilers. V7 had an internal error using /Ot, and V8 has this! (The 64-bit version did quite well, suggesting that 32-bit code would be better, if it follows the pattern of the other compilers, ie. 64-bits generally makes things slower.)

Quote
To solve the problem temporarily, use memcpy or an intermediate variable( pointer ).

I think that any use of memcpy would itself be optimised to inline code! (Also my project spends a lot of time pushing 16-byte structs around, so this operation is crucial.)

The code could possibly be changed, but the actual code where it went wrong looked like this:

 *dest = *(p->vptr);

It's not obvious that dest and p point to the same struct! There might be quite a few such expressions, but I will have a go at changing them just to see it working.
« Last Edit: September 25, 2014, 10:41:36 PM by sal55 »

neo313

  • Guest
Re: Optimiser code generation bug?
« Reply #5 on: September 26, 2014, 07:17:05 AM »
My mistake:

Quote
To solve the problem temporarily, use memcpy or an intermediate variable( pointer ).

None of these help. You have to use a temporary struct variable. Which adds about 75% more instructions to the entire copy.

Or define pragma function( memcpy ) to disable memcpy optimization which also removes the bug, but is slower.

update:

If you write a separate function, and inline it, you get correct code and the copy is as fast as assignment( or a bit faster ). Looks like the bug is limited to assignment only. I think that may be best for you.
Also enable auto inline in options.

Code: [Select]
static inline void varrec_cpy( varrec* const a , const varrec* const b )
{
*a = *b ;
}

varrec_cpy( p , p->ptr ) ;

const qualifiers don't matter but I included them for correctness.


« Last Edit: September 26, 2014, 07:24:26 AM by neo313 »

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: Optimiser code generation bug?
« Reply #6 on: September 26, 2014, 10:59:50 AM »
In your case there are at least 2 relevant points, the first falls in the category of 'side-effects' when copying overlapped areas. This is generally a programmer error to copy two memory areas that overlaps, and in our case the pointer to the structure to copy is inside the memory that will be overwrited. These issues are compiler dependent and not covered by the standard.
The other issue is the optimizer behaviour that re-loads the pointer after it has been overwritten, while it seems logical that reloading pointers is not an optimization, instruction timings could make it convenient.
In conclusion it is not so clear that this is a bug.
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

neo313

  • Guest
Re: Optimiser code generation bug?
« Reply #7 on: September 26, 2014, 11:29:25 AM »
In your case there are at least 2 relevant points, the first falls in the category of 'side-effects' when copying overlapped areas.

It is clearly a bug.

No memory overlaps in the example. p and p->ptr point to non-overlapping memory.

The code loads value from both operands in unspecified order, and then updates the left value.

6.5.16, p3:
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands. The evaluations of
the operands are unsequenced.



Here is an even more simplified version

Code: [Select]
int main( void )
{
varrec t = { 444 , 555 , NULL , 666 } ;

varrec u = { 0 } ;
varrec v = { 123 , 456 , &t , 789 } ;

u.ptr = &v ;

u = *( u.ptr ) ;

printf("%d" , u.dummy3 ) ; //What the fuck

return 0 ;
}

I see no reason for u = *( u.ptr ) ; to be undefined behaviour.
« Last Edit: September 26, 2014, 11:56:53 AM by neo313 »

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: Optimiser code generation bug?
« Reply #8 on: September 26, 2014, 12:18:08 PM »
I see.
The problem once again come from the optimizer...  :(
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

sal55

  • Guest
Re: Optimiser code generation bug?
« Reply #9 on: September 27, 2014, 12:12:11 AM »
The trouble is I've been quite keen to find out how Pelles C performs on my interpreter project, compared with other compilers.

I worked around this problem using an intermediate variable, as suggested. Only in half a dozen places, but I had to find them by locating where it crashed. (Obviously the bug is still there and ought to be fixed at some point.)

But the compiler now works pretty well on my application, about third behind gcc and Clang (with two or three slower ones). (gcc and Clang, however, also support label pointers, which in my program give additional speed.)

The only remaining mystery is why, when generating 64-bit code, it (the compiler, runtime, OS or whatever it is), is sometimes locating my functions at addresses above 4GB (the program code is 0.0004Gb in size).


Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: Optimiser code generation bug?
« Reply #10 on: September 27, 2014, 09:18:33 PM »
To speed up modifications I would sugget to use:
Code: [Select]
#pragma optimize(none)    //Disable optimization from here on
int myfunc(...)
{
}
#pragma optimize()    //Restore optimization as set
In this way you can disable the optimization for a single function inside a module.

Going back to the bug definition I'm still not sure that this is a bug. It is true that you define a new pointer to an object, but you use its contents to copy inside the object de-facto copying an overlapped memory. If the compiler would or would'nt reload the pointer is a compiler specific behaviour not a rule. The line:
Code: [Select]
*p = *(p->ptr);
Clearly indicates that you are modifying a whole object using one of its contents. The assumption that the compiler don't have to reload the pointer is just an arbitrary assumption. In fact using an intermediate pointer solves the problem, a brain damaged compiler would optimize also the use of intermediarte pointer and would have drive to the same problem.
Moreover as far as I can see the standard doesn't define structure assignements as atomic operations.
« Last Edit: September 27, 2014, 09:35:09 PM by frankie »
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

neo313

  • Guest
Re: Optimiser code generation bug?
« Reply #11 on: September 28, 2014, 01:24:45 AM »
Quote
de-facto copying an overlapped memory.

v is copied to u, there is no memory overlap.

Quote
The assumption that the compiler don't have to reload the pointer is just an arbitrary assumption.

There is no assumption. Standard is clear on this:

6.5.16, p3:
The side effect of updating the stored value of the left operand is
sequenced after the value computations of the left and right operands.


Quote
Moreover as far as I can see the standard doesn't define structure assignements as atomic operations.

This is completely irrelevant, since there is no concurrency issues. Operators are computed before the assignment:

is sequenced after the value computations of the left and right operands.

5.1.2.3, p3:
Sequenced before is an asymmetric, transitive, pair-wise relation between evaluations
executed by a single thread, which induces a partial order among those evaluations.
Given any two evaluations A and B, if A is sequenced before B, then the execution of A
shall precede the execution of B. (Conversely, if A is sequenced before B, then B is
sequenced after A.) If A is not sequenced before or after B, then A and B are
unsequenced. Evaluations A and B are indeterminately sequenced when A is sequenced
either before or after B, but it is unspecified which.13)The presence of a sequence point
between the evaluation of expressions A and B implies that every value computation and
side effect associated with A is sequenced before every value computation and side effect
associated with B.

13) The executions of unsequenced evaluations can interleave. Indeterminately sequenced evaluations
cannot interleave, but can be executed in any order.


In our case A is the computation of the operands and B is assignment. 
All of this means that our example is correct as far as standard is concerned.
« Last Edit: September 28, 2014, 02:04:47 AM by neo313 »

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2096
Re: Optimiser code generation bug?
« Reply #12 on: September 28, 2014, 09:28:38 PM »
So when coping a record object the compiler to avoid any conflict must create a temporary object to copy on than move it atomically on the destination?
Looking at the machine, not the parts of the standard that can be honestly interpreted in different ways, the transfer of blocks, not atomic types, requires different techniques normally referred as fencing. Now, on real hardware, if you crreate a fence to make atomic a block operation what should happen when the code try to access memory because you use a property of that object to copy it over?
It is better to be hated for what you are than to be loved for what you are not. - Andre Gide

sal55

  • Guest
Re: Optimiser code generation bug?
« Reply #13 on: September 29, 2014, 12:41:12 AM »
So when coping a record object the compiler to avoid any conflict must create a temporary object to copy on than move it atomically on the destination?
No, that is not necessary when copying blocks that don't overlap. It is however necessary to establish the source and destination addresses before copying; if either is inside the destination block, it must be read out first (obviously, otherwise you don't know where to read to or write from).

The mistake here, is after the source address which is in the destination has been read, to read again after part of the copy had been done. The optimiser made the error of not evaluating the source address once and for all and relying on re-reading (which is hardly efficient anyway if you're trying to optimise).

Here's another example not involving blocks:

  int data[]={0,0,0,256,0,0};
  int A=3;

  A=data[A];

A takes on the value in data[3], ie. 256. But it might be that this assignment is done a byte at a time (I've seen in happen when the compiler thinks objects are misaligned). But this will go wrong if copied a byte at a time and the index A is reused each time.