NO

Author Topic: I suppose...  (Read 6208 times)

Jokaste

  • Guest
I suppose...
« on: October 28, 2017, 07:30:47 PM »
Tell me if I am right, the ghosts you can answer too.
 
Code: [Select]

       mov  [hWndSysInput+rip],rcx
       mov  rdx,[hInstance+rip]
       xor  rax,rax
       mov  [rsp + 64],rcx
       mov  [rsp + 80],rdx
       mov  [rsp + 88],rax
       mov  [rsp + 72],rax
       mov  [rsp + 56],rax
       mov  [rsp + 48],rax
       mov  [rsp + 40],rax
       mov  [rsp + 32],rax
       mov  r9,WS_CHILD or LVS_NOSORTHEADER or LVS_SORTASCENDING or LVS_REPORT
       mov  r8,OFFSET szNullString
       mov  rdx,OFFSET WC_LISTVIEW
       xor  rcx,rcx
       call CreateWindowExA

If I have well understood Agner Fog and others this must execute like this:
 
Code: [Select]
ov  [hWndSysInput+rip],rcx     mov rdx,[hInstance+rip]                 NO PENALTY
xor  rax,rax                    mov [rsp + 64],rcx                      PENALTY because XOR is quicker than MOV
mov  [rsp + 80],rdx             mov [rsp + 88],rax                      NO PENALTY
mov  [rsp + 72],rax             mov [rsp + 56],rax                      NO PENALTY
mov  [rsp + 48],rax             mov [rsp + 40],rax                      NO PENALTY
mov  [rsp + 32],rax             mov r9,WS_CHILD or LVS_NOSORTHEADER     PENALTY cause of 1st instruction
mov  r8,OFFSET szNullString     mov rdx,OFFSET WC_LISTVIEW              NO PENALTY
xor  rcx,rcx                                                            PENALTY because the CALL cannot be excuted in the same cycle as the XOR
call CreateWindowExA                                                    PENALTY

this could be improved:
 
Code: [Select]
mov  [hWndSysInput+rip],rcx     mov  rdx,[hInstance+rip]
mov  [rsp + 64],rcx             mov  [rsp + 80],rdx
xor  rax,rax                    xor  rcx,rcx
mov  r8,OFFSET szNullString     mov  rdx,OFFSET WC_LISTVIEW
mov  [rsp + 88],rax             mov  [rsp + 72],rax
mov  [rsp + 56],rax             mov  [rsp + 48],rax
mov  [rsp + 40],rax             mov  [rsp + 32],rax
mov  r9,WS_CHILD or LVS_NO..    xchg rax,rax (NOP)
call CreateWindowExA

And the final code is:
 
Code: [Select]
mov  [hWndSysInput+rip],rcx
mov  rdx,[hInstance+rip]
mov  [rsp + 64],rcx
mov  [rsp + 80],rdx
xor  rax,rax
xor  rcx,rcx
mov  r8,OFFSET szNullString
mov  rdx,OFFSET WC_LISTVIEW
mov  [rsp + 88],rax
mov  [rsp + 72],rax
mov  [rsp + 56],rax
mov  [rsp + 48],rax
mov  [rsp + 40],rax
mov  [rsp + 32],rax
mov  r9,WS_CHILD or LVS_NOSORTHEADER or LVS_SORTASCENDING or LVS_REPORT
xchg rax,rax (NOP)
call CreateWindowExA

Give me advice about this kind of parallel code.
Thanks
 
Jokaste / Grincheux
 
« Last Edit: October 28, 2017, 07:40:53 PM by Jokaste »

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: I suppose...
« Reply #1 on: October 29, 2017, 12:45:41 PM »
xor  rax,rax                    mov [rsp + 64],rcx                      PENALTY because XOR is quicker than MOV

Launch a debugger and try this:
Code: [Select]
  int 3
  or rax, -1
  xor eax, eax
  or rax, -1
  xor rax, rax

No idea whether xor eax, eax is any faster or slower than xor rax, rax, but it is one byte shorter and does the same.

Jokaste

  • Guest
Re: I suppose...
« Reply #2 on: October 29, 2017, 03:46:51 PM »
Thanks that's the kind of code I am looking for.
I have found that constant are not good for an assembler program because if they have the value the assembler codes them in the same way as they were greather than 0.
MOV RAX,SH_HIDE = MOV RAX,0 rather than XOR RAX,RAX
I would like to create a pdf with all the tricks experienced programmers have.
Yestirday I dowloaded MasmBasic and tryed to read what the editor displays... I stopped before the end, this program really is rich.

Jokaste

  • Guest
Re: I suppose...
« Reply #3 on: October 29, 2017, 04:48:17 PM »
Code: [Select]
MOV RAX,00BADF00DBADCAFE                    ;RAX=00BADF00DBADCAFE
OR EAX,-1                                   ;RAX=00000000FFFFFFFF
MOV RAX,0BADF00DBADCAFEh                    ;RAX=00BADF00DBADCAFE
OR  RAX,-1                                  ;RAX=FFFFFFFFFFFFFFFF
« Last Edit: October 29, 2017, 04:50:56 PM by Jokaste »

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: I suppose...
« Reply #4 on: October 29, 2017, 05:28:27 PM »
I have found that constant are not good for an assembler program because if they have the value the assembler codes them in the same way as they were greather than 0.
MOV RAX,SH_HIDE = MOV RAX,0 rather than XOR RAX,RAX

Sometimes I use conditional assembly to decide which instruction to take:
Code: [Select]
if (IMAGE_ICON-IMAGE_BITMAP) eq 1
inc edi
else
add edi, IMAGE_ICON-IMAGE_BITMAP
endif

In this case, it is not really necessary because IMAGE_ICON and IMAGE_BITMAP are Windows constants that will never change IMHO, so it will be inc edi, always. But in this excerpt from the For_ ... Next macro, it saves a few bytes (MasmBasic.inc, lines 9719ff):
Code: [Select]
ifdifi tmpReg, vtS$
if atVt eq atImmediate
ife vtS$
xor tmpReg, tmpReg
elseif vtS$ eq -1
or tmpReg, -1
elseif (vtS$ le 127) and (vtS$ ge -128)
push vtS$
pop tmpReg
else
mov tmpReg, vtS$
endif
else
mov tmpReg, vtS$
endif
endif

Quote
Yestirday I dowloaded MasmBasic and tryed to read what the editor displays... I stopped before the end, this program really is rich.

Thanks ;)

Jokaste

  • Guest
Re: I suppose...
« Reply #5 on: October 29, 2017, 08:07:38 PM »
For the instant it is in my mind only but I was thining it would be useful to create a macro like this one (pseudo code)
Quote
MOV_CONST_TO_REG MACRO \1 \2
      IF \2 == 0
            XOR \1,\2
      ELSE
            MOV \1,\2
ENDM

Jokaste

  • Guest
Re: I suppose...
« Reply #6 on: October 29, 2017, 08:11:15 PM »
INC/DEC instruction must not be used before LOOP or JCC because they partially affect the flags.
ADD/SUB affect entirely the flag register, it is quicker.


Offline jj2007

  • Member
  • *
  • Posts: 536
Re: I suppose...
« Reply #7 on: October 30, 2017, 09:00:40 AM »
Very old stuff, I use it in the source of the RichMasm editor:
Code: [Select]
movi MACRO M1, M2
LOCAL oa, num
  num = M2
  oa = (opattr num) and 127
  if oa ne 36
echo <M2> is not an immediate
.err
  endif
  if type(M1) ne DWORD
% echo M1 is not a DWORD
.err
  endif
  ife num
if (opattr M1) eq atRegister
xor M1, M1
else
and M1, 0
endif
  elseif (num le 127) and (num ge -128)
pushd M2
pop dword ptr M1
  else
mov M1, M2
  endif
ENDM

INC/DEC instruction must not be used before LOOP or JCC because they partially affect the flags.
ADD/SUB affect entirely the flag register, it is quicker.

If you absolutely need the carry flag, use add/sub (but I never saw a need for that). Otherwise don't, because it needs 3 bytes instead of 1, and we have frequently confirmed with measurements that add/sub is not faster.

Jokaste

  • Guest
Re: I suppose...
« Reply #8 on: October 30, 2017, 12:22:22 PM »
Quote

If you absolutely need the carry flag, use add/sub (but I never saw a need for that). Otherwise don't, because it needs 3 bytes instead of 1, and we have frequently confirmed with measurements that add/sub is not faster.

ADD RAX,1
JBE xxxxxx

Not possible using unsing INC
« Last Edit: October 30, 2017, 12:24:14 PM by Jokaste »

Jokaste

  • Guest
Re: I suppose...
« Reply #9 on: October 30, 2017, 07:49:59 PM »
Quote
3.5.1.1 Use of the INC and DEC Instructions The INC and DEC instructions modify only a subset of the bits in the flag register. This creates a dependence on all previous writes of the flag register. This is especially problematic when these instructions are on the critical path because they are used to change an address for a load on which many other instructions depend. Assembly/Compiler Coding Rule 33. (M impact, H generality) INC and DEC instructions should be replaced with ADD or SUB instructions, because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore creating false dependencies on earlier instructions that set the flags.

From Intel :https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf page 117

I was right too.

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: I suppose...
« Reply #10 on: November 01, 2017, 05:55:19 AM »

ADD RAX,1
JBE xxxxxx

Not possible using unsing INC[/size]

1. Use JLE instead.
2. Put the inc some lines after instructions that modify flags.
3. MEASURE THE TIMINGS, everything else is theory, esoterics, hearsay.

Jokaste

  • Guest
Re: I suppose...
« Reply #11 on: November 01, 2017, 05:43:54 PM »
JJ you are right, At one place we read something and elsewhere we read the opposite! I say that because I read that now it is not possible to measure timings because of many cores... Don't use RD... use Time!
I would like to have something for measuring timings. If it's you or Vortex I accept, everyone else I refuse.

Offline jj2007

  • Member
  • *
  • Posts: 536
Re: I suppose...
« Reply #12 on: November 02, 2017, 12:12:25 PM »
I would like to have something for measuring timings.

GetTickCount is OK if you choose the number of loops so high that the test takes a second or two (granularity is 16 milliseconds). Otherwise QPC. NanoTimer() is based on QPC, and I use it all the time.

Jokaste

  • Guest
Re: I suppose...
« Reply #13 on: November 05, 2017, 07:27:08 AM »
Thank You, I test it now.