V12 time optimization

Pelle · October 15, 2023, 11:14:11 PM

The problem turned out to be in the spiller/rewriter.

The way a pointed-to value is fetched from a post-incremented pointer, like:

Code Select

c = *str++

is turned into something like this internally (LCSE=Local Common SubExpression):

Code Select

LCSE-temp = *str
str = str + 1
c = LCSE-temp

This construct reduces the internal set of operators, which is a good thing, but needs careful handling all over.

There was a logical error in that a GCSE (Global Common SubExpression) like:

Code Select

GCSE-temp = <value>
(other code)
... first use of GCSE-temp
(other code)
... second use of GCSE-temp etc.

can be rewritten as:

Code Select

(other code)
... first use of <value>
(other code)
... second use of <value>

this will always work, but this is not always true for a LCSE-temp (due to the above special construct).

The spiller/rewriter is rarely touched, last time was apparently in 2010, so this logical error has remained undetected for a long time. I guess it needed a register-starved architecture like X86 to trigger (most people are presumably using X64 these days).

Quote from: frankie on October 14, 2023, 10:02:27 PM
It took a long time to discover the optimizer problem, the main reason is that compiling with debug the bug disappear.

- Full debugging info can at best be used to find logical errors, but since most optimizer passes that move around code are disabled in this mpde, there is little chance of finding code generator problems.
- Line number debugging info will not disable any optimizer passes, but line numbers can still get lost (complicating debugging) because it can be hard to explain sequences like this:
line #1
line #8
line #3
line #9
line #2
The common concept is to relate to the source file, but how to do that without confusing everyone involved?

frankie · October 16, 2023, 10:22:48 AM

Quote from: Pelle on October 15, 2023, 11:14:11 PM
Quote from: frankie on October 14, 2023, 10:02:27 PM
It took a long time to discover the optimizer problem, the main reason is that compiling with debug the bug disappear.
- Full debugging info can at best be used to find logical errors, but since most optimizer passes that move around code are disabled in this mpde, there is little chance of finding code generator problems.
- Line number debugging info will not disable any optimizer passes, but line numbers can still get lost (complicating debugging) because it can be hard to explain sequences like this:
line #1
line #8
line #3
line #9
line #2
The common concept is to relate to the source file, but how to do that without confusing everyone involved?

Thanks for the problem sharing Pelle.
I understand the points about debug, but I met also another problem. After I have created assembler module with disassembled code, at first, I set a breakpoint on the assembler routines entry to debug in assembler from there, but the debugger passed over ignoring them. The only way I was able to debug was setting breakpoint before the call in C source, then on break I had to change debug mode to assembly then using step-in I was able at last to see the assembler code.

Pelle · October 16, 2023, 01:02:54 PM

Quote from: frankie on October 16, 2023, 10:22:48 AM
I understand the points about debug, but I met also another problem.

Ah, OK. I will have a look and see if I can improve things...

John Z · October 18, 2023, 03:00:50 PM

Quote from: Pelle on October 15, 2023, 11:14:11 PM
The problem turned out to be in the spiller/rewriter.

The spiller/rewriter is rarely touched, last time was apparently in 2010, so this logical error has remained undetected for a long time. I guess it needed a register-starved architecture like X86 to trigger (most people are presumably using X64 these days).

Is it possible that versions prior to version 12 were not utilizing this section/feature of the optimizer, or perhaps had something that mitigated the effect? I ask because when running the exact test code on Version 10 or Version 11 they do not display the error. That might explain why the issue has remained undetected since 2010... I think a lot of code was 32bit in 2010....

John Z

Pelle · October 18, 2023, 05:25:25 PM

Quote from: John Z on October 18, 2023, 03:00:50 PM
Is it possible that versions prior to version 12 were not utilizing this section/feature of the optimizer, or perhaps had something that mitigated the effect?

This piece of code is not part of the optimizer, only affected by it: the register-allocator will (basically) attempt to change the generated machine code into holding as many variables as possible in fast CPU-registers, to avoid having to load the variable again and again from slow RAM-memory. This register-allocator will optimistically assume that all variables can be placed in a CPU-register. When this is not possible, the variable is "spilled" - meaning the machine code is changed into loading this variable from RAM-memory. There is a guessed cost of spilling a variable, and the one with the smallest cost is normally spilled.

An unoptimized function will keep all variables in RAM memory, leaving no work for the register-allocator.

An optimized function that performs actual work is unlikely to keep all variables in CPU-registers. This means the spiller-function (with this bug) is called again and again and again (especially with 32-code that has few CPU-registers to begin with). However, this function can take several different paths depending on the kind/class of the variable to spill. The path that leads to this bug is less likely, but obviously not impossible. A small change to the generated machine code can affect when a spill is needed, and then the actual spilling will affect the generated machine code that may affect further spilling, etc. etc.

Quote from: John Z on October 18, 2023, 03:00:50 PM
I ask because when running the exact test code on Version 10 or Version 11 they do not display the error. That might explain why the issue has remained undetected since 2010...

Basically, your code manage to combine some unlikely factors into revealing this bug. I'm not going to investigate further than that...

Quote from: John Z on October 18, 2023, 03:00:50 PM
I think a lot of code was 32bit in 2010....

I doubt it, but let's not get into that...

John Z · October 18, 2023, 06:56:53 PM

OK, understandable Thanks for figuring it out considering it is only 32 bit...

I'll invest more time in trying to move to 64 bit code.

John Z

Pelle · October 23, 2023, 08:38:36 PM

Quote from: frankie on October 16, 2023, 10:22:48 AM
I understand the points about debug, but I met also another problem. After I have created assembler module with disassembled code, at first, I set a breakpoint on the assembler routines entry to debug in assembler from there, but the debugger passed over ignoring them. The only way I was able to debug was setting breakpoint before the call in C source, then on break I had to change debug mode to assembly then using step-in I was able at last to see the assembler code.

I have now looked at this.

Setting breakpoints in source files (before starting the debugger) requires debug info (with source lines) when starting the debugger. Source lines in the debug info is always relative the start of a function. For an assembly file you can only get the concept of a function by using PROC .. ENDP, just a code label is not enough.

OTOH, starting the debugger with "always break at entry-point" and then setting breakpoints should work...

frankie · October 23, 2023, 10:10:11 PM

Quote from: Pelle on October 23, 2023, 08:38:36 PM
I have now looked at this.

Setting breakpoints in source files (before starting the debugger) requires debug info (with source lines) when starting the debugger. Source lines in the debug info is always relative the start of a function. For an assembly file you can only get the concept of a function by using PROC .. ENDP, just a code label is not enough.

OTOH, starting the debugger with "always break at entry-point" and then setting breakpoints should work...

Ok. Get it.
Thanks

News:

V12 time optimization