NO

Author Topic: V12 time optimization  (Read 11032 times)

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: V12 time optimization
« Reply #15 on: October 15, 2023, 11:14:11 PM »
The problem turned out to be in the spiller/rewriter.

The way a pointed-to value is fetched from a post-incremented pointer, like:
Code: [Select]
c = *str++
is turned into something like this internally (LCSE=Local Common SubExpression):
Code: [Select]
LCSE-temp = *str
str = str + 1
c = LCSE-temp
This construct reduces the internal set of operators, which is a good thing, but needs careful handling all over.

There was a logical error in that a GCSE (Global Common SubExpression) like:
Code: [Select]
GCSE-temp = <value>
(other code)
... first use of GCSE-temp
(other code)
... second use of GCSE-temp etc.

can be rewritten as:
Code: [Select]
(other code)
... first use of <value>
(other code)
... second use of <value>

this will always work, but this is not always true for a LCSE-temp (due to the above special construct).

The spiller/rewriter is rarely touched, last time was apparently in 2010, so this logical error has remained undetected for a long time. I guess it needed a register-starved architecture like X86 to trigger (most people are presumably using X64 these days).


It took a long time to discover the optimizer problem, the main reason is that compiling with debug the bug disappear.
- Full debugging info can at best be used to find logical errors, but since most optimizer passes that move around code are disabled in this mpde, there is little chance of finding code generator problems.
- Line number debugging info will not disable any optimizer passes, but line numbers can still get lost (complicating debugging) because it can be hard to explain sequences like this:
line #1
line #8
line #3
line #9
line #2
The common concept is to relate to the source file, but how to do that without confusing everyone involved?
/Pelle

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2114
Re: V12 time optimization
« Reply #16 on: October 16, 2023, 10:22:48 AM »
It took a long time to discover the optimizer problem, the main reason is that compiling with debug the bug disappear.
- Full debugging info can at best be used to find logical errors, but since most optimizer passes that move around code are disabled in this mpde, there is little chance of finding code generator problems.
- Line number debugging info will not disable any optimizer passes, but line numbers can still get lost (complicating debugging) because it can be hard to explain sequences like this:
line #1
line #8
line #3
line #9
line #2
The common concept is to relate to the source file, but how to do that without confusing everyone involved?
Thanks for the problem sharing Pelle.
I understand the points about debug, but I met also another problem. After I have created assembler module with disassembled code, at first, I set a breakpoint on the assembler routines entry to debug in assembler from there, but the debugger passed over ignoring them. The only way I was able to debug was setting breakpoint before the call in C source, then on break I had to change debug mode to assembly then using step-in I was able at last to see the assembler code.
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: V12 time optimization
« Reply #17 on: October 16, 2023, 01:02:54 PM »
I understand the points about debug, but I met also another problem.
Ah, OK. I will have a look and see if I can improve things...
/Pelle

Offline John Z

  • Member
  • *
  • Posts: 887
Re: V12 time optimization
« Reply #18 on: October 18, 2023, 03:00:50 PM »
The problem turned out to be in the spiller/rewriter.

The spiller/rewriter is rarely touched, last time was apparently in 2010, so this logical error has remained undetected for a long time. I guess it needed a register-starved architecture like X86 to trigger (most people are presumably using X64 these days).

Is it possible that versions prior to version 12 were not utilizing this section/feature of the optimizer, or perhaps had something that mitigated the effect?  I ask because when running the exact test code on Version 10 or Version 11 they do not display the error. That might explain why the issue has remained undetected since 2010...  I think a lot of code was 32bit in 2010....

John Z

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: V12 time optimization
« Reply #19 on: October 18, 2023, 05:25:25 PM »
Is it possible that versions prior to version 12 were not utilizing this section/feature of the optimizer, or perhaps had something that mitigated the effect?
This piece of code is not part of the optimizer, only affected by it: the register-allocator will (basically) attempt to change the generated machine code into holding as many variables as possible in fast CPU-registers, to avoid having to load the variable again and again from slow RAM-memory. This register-allocator will optimistically assume that all variables can be placed in a CPU-register. When this is not possible, the variable is "spilled" - meaning the machine code is changed into loading this variable from RAM-memory. There is a guessed cost of spilling a variable, and the one with the smallest cost is normally spilled.

An unoptimized function will keep all variables in RAM memory, leaving no work for the register-allocator.

An optimized function that performs actual work is unlikely to keep all variables in CPU-registers. This means the spiller-function (with this bug) is called again and again and again (especially with 32-code that has few CPU-registers to begin with). However, this function can take several different paths depending on the kind/class of the variable to spill. The path that leads to this bug is less likely, but obviously not impossible. A small change to the generated machine code can affect when a spill is needed, and then the actual spilling will affect the generated machine code that may affect further spilling, etc. etc.

I ask because when running the exact test code on Version 10 or Version 11 they do not display the error. That might explain why the issue has remained undetected since 2010...
Basically, your code manage to combine some unlikely factors into revealing this bug. I'm not going to investigate further than that...

I think a lot of code was 32bit in 2010....
I doubt it, but let's not get into that...
/Pelle

Offline John Z

  • Member
  • *
  • Posts: 887
Re: V12 time optimization
« Reply #20 on: October 18, 2023, 06:56:53 PM »
OK, understandable  Thanks for figuring it out considering it is only 32 bit...
 
I'll invest more time in trying to move to 64 bit code.

John Z

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: V12 time optimization
« Reply #21 on: October 23, 2023, 08:38:36 PM »
I understand the points about debug, but I met also another problem. After I have created assembler module with disassembled code, at first, I set a breakpoint on the assembler routines entry to debug in assembler from there, but the debugger passed over ignoring them. The only way I was able to debug was setting breakpoint before the call in C source, then on break I had to change debug mode to assembly then using step-in I was able at last to see the assembler code.
I have now looked at this.

Setting breakpoints in source files (before starting the debugger) requires debug info (with source lines) when starting the debugger. Source lines in the debug info is always relative the start of a function. For an assembly file you can only get the concept of a function by using PROC .. ENDP, just a code label is not enough.

OTOH, starting the debugger with "always break at entry-point" and then setting breakpoints should work...
/Pelle

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2114
Re: V12 time optimization
« Reply #22 on: October 23, 2023, 10:10:11 PM »
I have now looked at this.

Setting breakpoints in source files (before starting the debugger) requires debug info (with source lines) when starting the debugger. Source lines in the debug info is always relative the start of a function. For an assembly file you can only get the concept of a function by using PROC .. ENDP, just a code label is not enough.

OTOH, starting the debugger with "always break at entry-point" and then setting breakpoints should work...
Ok. Get it.
Thanks
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide