NO

Author Topic: V12 time optimization  (Read 4093 times)

Offline John Z

  • Member
  • *
  • Posts: 840
V12 time optimization
« on: October 11, 2023, 01:54:16 PM »
To start - I've used Maximize Speed plus checked  Extra Optimizations for what follows since about version 9. Unfortunately starting when I finally switched to version 12 a problem has occurred which I failed to catch until this week.  My bad for not testing each and every feature of my program after upgrading to version 12.  I use third party mini zip sources to complete the assembly of an ODS format spreadsheet.  Starting in version 12 it was created but was always corrupt and UN-openable.  Today I isolated the source code where the problem occurs.

This code does not work correctly when optimized for time, but works well when no optimization or only size optimization is used.  Here is the code -
Code: [Select]
#pragma optimize(none)    // needed for Pelles C v12 optimization bug in time
static char *strrpl(const char *str, size_t n, char oldchar, char newchar) {
  char c;
  size_t i;
  char *rpl = (char *)calloc((1 + n), sizeof(char));
  char *begin = rpl;
  if (!rpl) {
    return NULL;
  }

  for (i = 0; (i < n) && (c = *str++); ++i) {
    if (c == oldchar) {
      c = newchar;
    }
    *rpl++ = c;
  }
Write_To_Log("c:/temp/vcardz.log", (char *)begin, 0, 0); // added by John Z for testing
  return begin;
}
#pragma optimize(time)  // back on for remaining source

When optimized for time the begin character pointer somehow ends up pointing to the string start plus 1 char - so it ends up pointing to the 2nd character in the string as the start of the string.  ONLY happens when using the time optimization.
The attached log file  shows the output with various optimizations. 

So bracketing this one procedure with #pragma fixes my issue.
Maybe information can help find a weak optimization point that can be improved in future versions. 

John Z

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2111
Re: V12 time optimization
« Reply #1 on: October 13, 2023, 07:56:49 PM »
John I can't reproduce the problem.
Anyway some points:
Code: [Select]
char *rpl = (char *)calloc((1 + n), sizeof(char));The cast to character pointer returned from  calloc call is not required, in some cases can cause problems, moreover sizeof(char) is always 1, because by language standard it is always the smallest allocation unit for a system.

But what sound strange is the cast to character pointer of the variable begin inside the call to Write_To_Log function:
Code: [Select]
Write_To_Log("c:/temp/vcardz.log", (char *)begin, 0, 0);Is it a typo? What is exactly the definition of the function Write_To_Log?
Which compilation is it? 32 or 64 bits? The bug happens for both bitness?
The character pointer begin gets wrong inside the function strrpl or only inside the function Write_To_Log?
« Last Edit: October 13, 2023, 07:58:29 PM by frankie »
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

Offline John Z

  • Member
  • *
  • Posts: 840
Re: V12 time optimization
« Reply #2 on: October 13, 2023, 10:24:33 PM »
Thanks frankie,

Wow - it is 100% repeatable in my program/setup.  I'm going to follow Pelle's inputs from before and move this single procedure into its own file.  The code I've shown is preceded with about 950 lines of code, and this is now the last procedure in the file.  I relocated from the middle it to the bottom once I isolated it with #pragma statements that showed where the corruption was occurring.  I found this was the problem proc basically using a search inserting #pragma optimize(none) and  #pragma optimize(time) at locations until I narrowed it down to one procedure.  Yes - it took a while but each test you half the search space :)

I added my Write_to_Log call after I found where the problem was.  The log file was easier to show the output than using the ODS zip file which has the 1st character missing on each of the 4 file names inside of the zip.

Yes the cast on 'begin' is redundant removing it has no effect.  The pointer 'begin' becomes wrong before the Write_to_Log.
I removed the cast and re-ran, everything is the same with or without the cast, as expected since I added the proc to log the issue that was already happening.  I should have clarified that in the original post - sorry that is a red herring ....

When I move it into it's own file, I'll also be able to generate a complete ppj and example.  If that shows the issue I'll post the complete minimized project.  If it all of the sudden works ok when broken out I'll be stumped and may have to re-install Pelles C.....

It is only a 32 bit program BTW - way too much to change to make it into 64 bit ;(
The zip procedure where the issue is found is unchanged from GitHub.

it only happens in version 12 not in version 11.


Thanks very much for looking at this and your analysis.

John Z

Offline Marco

  • Member
  • *
  • Posts: 44
Re: V12 time optimization
« Reply #3 on: October 14, 2023, 11:19:29 AM »
Hello John, I tried the function on my system, and I have the same issue (32-bit compilation - Pelles C v12). The 64-bit executable seems to work as expected (i.e. no errors). To fix the issue you need to add the volatile keyword to the 'i' variable.

Code: [Select]
static char *strrpl(const char *str, size_t n, char oldchar, char newchar) {

  [...]

  volatile size_t i;      // <-- FIX

  [...]

  }
 
Here are the options used during the compilation, if it can help:

Code: [Select]
pocc: -fp:precise -W2 -Gz -Ze -Zx -MT -Tx86-coff
polink: -debug:no -subsystem:console -machine:x86

Marco

Offline John Z

  • Member
  • *
  • Posts: 840
Re: V12 time optimization
« Reply #4 on: October 14, 2023, 11:30:13 AM »
SUCCESS!  Well sort of.....

Attached here is a simple minimum (mostly) program that demonstrates the issue under V12 on my WIN 11 system.
A simple Hello_world program. When executed click the OK button, first an optimized procedure is tested, then the
same thing is sent to an unoptimized procedure.  the exact procedure is duplicated so that one will be optimized and one will not.  This was easier than rebuilding for each test.

Clearly shows that the problem exists on my system under V12.  It does not manifest when run under V11, both outputs are correct and identical.  It also does not manifest when run under V10, all good with or without optimization.  Both V11 and V10 were run on the same system as the V12 test.

Also it can easily demonstrate that it is only the Maximize Speed optimization, with and without Extra Optimization.

So with help this could show it is my installation, or not....BTW tested with all add-ins disabled too.

John Z

Offline John Z

  • Member
  • *
  • Posts: 840
Re: V12 time optimization
« Reply #5 on: October 14, 2023, 11:32:41 AM »
Hello John, I tried the function on my system, and I have the same issue (32-bit compilation - Pelles C v12). The 64-bit executable seems to work as expected (i.e. no errors). To fix the issue you need to add the volatile keyword to the 'i' variable.Marco

Great NEWS!  I'll give that a try.  Easy to test now with the standalone.  Appreciate your help!

John Z

Update:  Tried Marco's volatile fix and can confirm that he's found the workaround fix for V12!  Nice work Marco!
« Last Edit: October 14, 2023, 11:39:25 AM by John Z »

Offline Vortex

  • Member
  • *
  • Posts: 841
    • http://www.vortex.masmcode.com
Re: V12 time optimization
« Reply #6 on: October 14, 2023, 11:40:33 AM »
Quote
If we do not use volatile qualifier, the following problems may arise:
1) Code may not work as expected when optimization is turned on.

https://www.geeksforgeeks.org/understanding-volatile-qualifier-in-c/
Code it... That's all...

Offline John Z

  • Member
  • *
  • Posts: 840
Re: V12 time optimization
« Reply #7 on: October 14, 2023, 11:58:06 AM »
Quote
If we do not use volatile qualifier, the following problems may arise:
1) Code may not work as expected when optimization is turned on.

https://www.geeksforgeeks.org/understanding-volatile-qualifier-in-c/

Yes but but how about -
Quote
"Objects declared as volatile are omitted from optimization because their values can be changed by code outside the scope of current code at any time."

In this case i is local in scope and can not be changed outside of the scope ....
Then there is the issue is everything to be volatile?  How to determine which local scope variable needs volatile?

John Z

Offline Vortex

  • Member
  • *
  • Posts: 841
    • http://www.vortex.masmcode.com
Re: V12 time optimization
« Reply #8 on: October 14, 2023, 12:06:30 PM »
Quote
Some compilers allow you to implicitly declare all variables as volatile. Resist this temptation, since it is essentially a substitute for thought. It also leads to potentially less efficient code.

Also, resist the temptation to blame the optimizer or turn it off when you encounter unexpected program behavior. Modern C/C++ optimizers are so good that I cannot remember the last time I came across an optimization bug. In contrast, I regularly come across failures by programmers to use volatile.

https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword

Quote
In C, the volatile keyword is used to indicate to the compiler that a variable's value may change unexpectedly, so it should not rely on the value being cached in a register or optimized away.

https://www.javatpoint.com/volatile-keyword-in-c#:~:text=In%20C%2C%20the%20volatile%20keyword,a%20register%20or%20optimized%20away.
Code it... That's all...

Offline John Z

  • Member
  • *
  • Posts: 840
Re: V12 time optimization
« Reply #9 on: October 14, 2023, 02:52:59 PM »
Quote
A variable should be declared volatile whenever its value could change unexpectedly. In practice, only three types of variables could change:

1. Memory-mapped peripheral registers

2. Global variables modified by an interrupt service routine

3. Global variables accessed by multiple tasks within a multi-threaded application
same source ->https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword

As is clear from the source code the variable i is neither Global nor a Register variable as written.....

Quote
It is because the value of a volatile variable may be changed by external factors, such as hardware or other threads running concurrently.
Few cases where the usage of 'volatile' is appropriate: Accessing memory-mapped hardware registers, Sharing data between threads, Using non-local jumps
same source ->https://www.javatpoint.com/volatile-keyword-in-c#:~:text=In%20C%2C%20the%20volatile%20keyword,a%20register%20or%20optimized%20away.

None of these appear to apply to variable i ..... and not a threaded application

Anyway I do appreciate you trying to clarify it for me.  Suggest we wait for Pelle, or some others to comment ?

John Z

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Re: V12 time optimization
« Reply #10 on: October 14, 2023, 06:26:44 PM »
This appears to be a bug in the X86 code generator only, the X64 code generator seems fine.

I'm somewhat busy with other things right now, so I'm not going to analyze this until later, but you seem to have enough info to work around the problem for now...
/Pelle

Offline John Z

  • Member
  • *
  • Posts: 840
Re: V12 time optimization
« Reply #11 on: October 14, 2023, 06:54:36 PM »
you seem to have enough info to work around the problem for now...

Definitely do, just treat as something to put in the hopper for later.

Thanks,

John Z

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2111
Re: V12 time optimization
« Reply #12 on: October 14, 2023, 10:02:27 PM »
It took a long time to discover the optimizer problem, the main reason is that compiling with debug the bug disappear.
Then I disassembled the routine and analyzed the assembler source.
Here the problem is evident, due to an aggressive optimization, the loop erroneously starts including the increment of the string pointer:
Code: [Select]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Bad implementation is the exact copy, obtained disassembling the C
; source
strrpl_asm_bad@16:
push              ebp
mov               ebp,esp
sub               esp,4
push              ebx
push              esi
push              edi
mov               edi,dword ptr [n]
mov               bl,byte ptr [oldchar]
push              1
lea               esi,[edi+1]
push              esi
call              calloc
add               esp,8
mov               esi,eax
mov               dword ptr [loc_var],esi
test              eax,eax
je                lbl_52
test              edi,edi
jbe               lbl_4D
xor               esi,esi
; Here is the optimizer bug.
; The loop is entered on pointer increment
; The result is that the loop starts on 2nd character
lbl_2C:
add               dword ptr [pstr],1
mov               ecx,dword ptr [pstr]
mov               cl,byte ptr [ecx]
mov               edx,dword ptr [pstr]
cmp               byte ptr [edx],0
je                lbl_4D
mov               dl,byte ptr [newchar]
cmp               cl,bl
cmove             ecx,edx
mov               byte ptr [eax],cl
inc               eax
inc               esi
cmp               esi,edi
jb                lbl_2C
lbl_4D:
mov               eax,dword ptr [loc_var]
jmp               lbl_54
lbl_52:
xor               eax,eax
lbl_54:
pop               edi
pop               esi
pop               ebx
mov               esp,ebp
pop               ebp
ret               10h
The correct code should be:
Code: [Select]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; This is the correct version
strrpl_asm_good@16:
push              ebp
mov               ebp,esp
sub               esp,4
push              ebx
push              esi
push              edi
mov               edi,dword ptr [n]
mov               bl,byte ptr [oldchar]
push              1
lea               esi,[edi+1]
push              esi
call              calloc
add               esp,8
mov               esi,eax
mov               dword ptr [loc_var],esi
test              eax,eax
je                lbl_g_52
test              edi,edi
jbe               lbl_g_4D
xor               esi,esi
; The correct version jumps in the loop after the increment
jmp   lbl_should_jump_here
lbl_g_2C:
add               dword ptr [pstr],1
lbl_should_jump_here:
mov               ecx,dword ptr [pstr]
mov               cl,byte ptr [ecx]
mov               edx,dword ptr [pstr]
cmp               byte ptr [edx],0
je                lbl_g_4D
mov               dl,byte ptr [newchar]
cmp               cl,bl
cmove             ecx,edx
mov               byte ptr [eax],cl
inc               eax
inc               esi
cmp               esi,edi
jb                lbl_g_2C
lbl_g_4D:
mov               eax,dword ptr [loc_var]
jmp               lbl_g_54
lbl_g_52:
xor               eax,eax
lbl_g_54:
pop               edi
pop               esi
pop               ebx
mov               esp,ebp
pop               ebp
ret               10h
I attach the project from JohnZ with an assembler module containing the 2 assembly procedures.
I had to write it because there was no other way to debug the problem.
« Last Edit: October 15, 2023, 10:49:31 AM by frankie »
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2111
Re: V12 time optimization
« Reply #13 on: October 14, 2023, 10:27:39 PM »
Quote
If we do not use volatile qualifier, the following problems may arise:
1) Code may not work as expected when optimization is turned on.

https://www.geeksforgeeks.org/understanding-volatile-qualifier-in-c/

Yes but but how about -
Quote
"Objects declared as volatile are omitted from optimization because their values can be changed by code outside the scope of current code at any time."

In this case i is local in scope and can not be changed outside of the scope ....
Then there is the issue is everything to be volatile?  How to determine which local scope variable needs volatile?

John Z
It doesn't matter if the variable is global or not. Consider that you can have passed the address of a local variable to another procedure that modify it externally on a timer (to not mention interrupt) or an asynchronous event.
Even single threaded programs can be influenced by external concurrent processes or system I/O and have then volatile content modifications. For this reason the language doesn't put any limitation about the scope of the volatile element.
Anyway, as Vortex highlighted, its use can can be troublesome, and heavily limits the optimization process forcing the reload of the value each time it is used.
Now the workaround force a small variation in code emission that remove the bug. Marco introduced a variation declaring volatile the variable i, this added code to service the volatile property forcing the optimizer to modify strategy.
But it could have been any other small modification. I.e. modifing the for loop:
Code: [Select]
for (i = 0; (i < n) && (c = *str++); ++i)
With a while loop:
Code: [Select]
while (n-- && (c = *str++))
Remove the bug.
« Last Edit: October 14, 2023, 10:33:57 PM by frankie »
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

Offline John Z

  • Member
  • *
  • Posts: 840
Re: V12 time optimization
« Reply #14 on: October 14, 2023, 11:38:29 PM »
Excellent!  Thanks.  I too noticed that a small change in the code made the issue disappear.
For example when I tried to Write_to_Log the incoming string before any processing was performed the problem never appeared.  That is also why I was concerned that putting it into a separate project might obscure the problem. Fortunately it didn't.

Very elegant while loop !

John Z