Print Page - code generator optimization

Title: code generator optimization
Post by: dsula on April 16, 2006, 01:07:48 PM

Hi,
I was playing around with the compiler optimization options and I stumbled across this.


int main(void)
{
   unsigned char a;

   a = 1;
   a = a + 3;

   return 1;
}

Using "pocc /Ox /Tx86-asm" the generated assembly output is:

Code Select


mov cl,1
movzx eax,cl
add eax,3
mov ecx,eax
mov eax,1
@1:
ret

Is there a particular reason why this is not simply done as

Code Select


mov cl,1
add cl, 3

Title: code generator optimization
Post by: kobold on April 17, 2006, 11:38:17 AM

Code Select

MOV CL, 4?

Title: code generator optimization
Post by: dsula on April 17, 2006, 01:27:43 PM

Quote from: "kobold"Or
Code Select Expand
MOV CL, 4?

I agree. However for that matter you can as well completely remove the code, as the variable "a" is never used :shock: . But that's advanced optimization and flow analysis, which I don't consider that important.

I found that the compiler does many unnecessary loads when dealing with register variables. Another example:

Code Select


int b;

int main(void)
{
   int *a;

   a = &b;
   b = *a;

   return 1;
}

When optimization is on it generates:

Code Select


mov eax,(_b)
mov ecx,eax
mov eax,dword [ecx]
mov dword [(_b)],eax

Clearly the better pick would be

Code Select


mov ecx,(_b)
mov eax,dword [ecx]
mov dword [(_b)],eax

Title: code generator optimization
Post by: Pelle on April 17, 2006, 11:39:40 PM

First case: The original LCC compiler from Princeton is basically designed to be retargetable, generating "acceptable" code, and having a mangeable source code tree. The front end will describe the (generic) machine code with as few "operations" as possible - to help the retargeting process. As with any C compiler, types smaller than 'int' will be promoted to 'int'. "Sizing down" types in the back end using the few available "operations" is doable up to a point, but not in all cases (and all cases shouldn't be handled). In C, the size of an 'int' is supposed to map to the most 'natural' and 'efficent' type on a certain machine. For this reason, most sensible C programmers only use 'char' and 'short' when they really must (in machine structures, and the like).

Any type of flow control isn't implemented in LCC. The register allocator is simple, maybe best suited for RISC processors. On a crappy architecture like X86, with far too few general registers, it will sometimes get lost. Adding flow control analyzis, a better register allocator, and many other optimizations, would probably require a staff of several full-time developers - which of course is completely unacceptable for a project like this...

Second case: I don't know how you got this result - I can't reproduce it...

Pelle

Title: code generator optimization
Post by: dsula on April 19, 2006, 03:36:34 AM

I agree with all your remarks. Also I want to point out that I'm perfectly happy with the performance of PelleC as is, I simply point out things that I think are of interest. I also don't think flow analysis is of much beneft as clean and good programming practice makes up for most of it.
So back to our first example.

Code Select

int main(void)
{
   unsigned char a;

   a = 1;
   a = a + 3;

   return 1;
}

When compiling without optimization "pocc.exe /Tx86-asm", I get the following code:

Code Select

mov byte [ebp+(-1)],1
add byte [ebp+(-1)],3

As you can see you don't promote the char to int in this situation and the code is pretty much the best possible.

So I'm just wondering why you promote in one case and not the other ? (BTW: I'm just wondering,not critizing as I almost never use optimization. Keeps my debugging easy.)

ds

Title: code generator optimization
Post by: Pelle on April 20, 2006, 07:47:09 PM

This case is trivial because a constant can be "sized down" to a byte easily, but when you are optimizing and assigning values to *registers*, promotions come into play much more.

As I said before, it's possible to fix this up to a point. I have already spent several month on this. In the next version, your example will look something like this:

Code Select


...
; unsigned char a; 
; a = 1; 
B101                 mov       cl,1
; a = a + 3; 
80C103               add       cl,3
...

It's a matter of adding more operation "patterns". The problem is that it's getting hard when you start combining simple operations into longer sequences. A single binary operation, like add (char1 + char2) can be handled, but two or more such operations are at least *very* hard. Adding char1 + char2 + char3, is one addition with an intermediate result, then this is added with char3 etc.

Another problem with adding more "patterns" is that the pattern-matching code in LCC will quickly bloat. With the current changes, the compiler is some 40-50% bigger. The "size of a diskette" is not a factor anymore, but I don't want the compiler to get too big either...

Here are some more examples:

Code Select


; a = b = c = 1;
C645FF01             mov       byte ptr [ebp-1],1
B301                 mov       bl,1
B101                 mov       cl,1
; a = b + c;
88D9                 mov       cl,bl
024DFF               add       cl,byte ptr [ebp-1]

Here it starts to look ugly again...

Code Select


; a = a + b + c;
0FB6C1               movzx     eax,cl
0FB6D3               movzx     edx,bl
01D0                 add       eax,edx
0FB655FF             movzx     edx,byte ptr [ebp-1]
01D0                 add       eax,edx
88C1                 mov       cl,al

Pelle

Pelles C forum

Pelles C => General discussions => Topic started by: dsula on April 16, 2006, 01:07:48 PM