NO

Recent Posts

Pages: 1 ... 8 9 [10]
91
Assembly discussions / Re: Short Jumps for X64
« Last post by Jokaste on November 02, 2017, 11:00:35 am »
I had forgotten "JMP SHORT Labelxxxx"!
92
Assembly discussions / Short Jumps for X64
« Last post by Jokaste on November 02, 2017, 09:59:44 am »
Code: [Select]
  [00000001400014E5] E966110000                   jmp               0000000040002650
  [0000000140001517] 4885C0                       test              rax,rax
  [000000014000151A] 7409                         je                0000000140001525
  [000000014000151C] 488D0D9D8F0100               lea               rcx,[000000014001A4C0]
  [0000000140001523] 79B8                         jns               00000001400014DD


I have not found any instruction to have a SHORT JUMP. I had this idea, write a conditionnal jump which always is TRUE. I decided to use JNS. Now I have a SHORT JUMP coded on 2 BYTEs.

93
Bug reports / Re: Error installing Pelles C
« Last post by Robert on November 01, 2017, 09:52:22 pm »
To what operating system are you trying to install the compiler?

Is the operating system 32 or 64 bit?

Does the "C:\Users\Lager\Documents" folder exist?

From where did you download the setup.exe?
94
Bug reports / Error installing Pelles C
« Last post by P3DD3 on November 01, 2017, 07:52:13 pm »
Whenever i try installing Pelles C i get the same error message over and over again.
I saw another post on here with the same error message, but it didn't help with my problem.
I'd appreciate some help.

Thanks in advance
95
Assembly discussions / Re: Merci
« Last post by Jokaste on November 01, 2017, 07:47:49 pm »
Programming is my passion and IT is my job.


Thanks Vortex.
96
Assembly discussions / Re: Merci
« Last post by Vortex on November 01, 2017, 07:10:24 pm »
Hi Jokaste,

No worries. Take it easy. Programming is a vast area. We always need to study as IT is one of most rapidly advancing fields. Sometimes me too, I feel sour when things are not going as expected but there is always the possibility to review some problems. I see that you have a close interest in multimedia programming. Continue to keep up your work. The most important is to find the right material on the net to study the theories. Health, thinking, studying, researching and perseverance are probably all what we need.

For all of us, du courage Jokaste.
97
Assembly discussions / Tips & Tricks
« Last post by Jokaste on November 01, 2017, 06:06:06 pm »
JJ2007 le 29/10/2017
  int 3
  or rax, -1        ; allows to quickly return to -1 a register equivalent to xor rax,rax then not rax
  xor eax, eax
  or rax, -1
  xor rax, rax
---------------------------------------------------
MOV RAX,00BADF00DBADCAFE                    ;RAX=00BADF00DBADCAFE
OR EAX,-1                                   ;RAX=00000000FFFFFFFF
MOV RAX,0BADF00DBADCAFEh                    ;RAX=00BADF00DBADCAFE
OR  RAX,-1                                  ;RAX=FFFFFFFFFFFFFFFF
---------------------------------------------------
A 32-bit AND is extended to 64-bit
---------------------------------------------------
No INC or DEC before a LOOP or JCC
Prefer ADD / SUB
---------------------------------------------------
A call before a RET must be replaced by a JMP
CALL Fonction
ret

becomes

ADD RSP,XXXXX
JMP Fonction
---------------------------------------------------
REPRET TEXTEQU <DB 0F3h, 0C3h>
To be certain that the RET intruction will be well predicted to replace it by REP RET
---------------------------------------------------
JZ Label
RET

must be replaced by

JZ LABEL
NOP
RET
---------------------------------------------------
For repeat counts of less than 4k, expand REP string instructions into equivalent sequences of simple
AMD64 instructions
---------------------------------------------------
Replace :
LABEL :
.
.
.
LOOP LABEL

by


LABEL :
.
.
.
DEC RCX
JNZ LABEL
---------------------------------------------------
MOV REG,0 becomes XOR REG,REG
---------------------------------------------------
Set XOR R64, R64 to XOR R32, R32 because this operation completes the next 32 bits
---------------------------------------------------
XOR R64, R64 followed by a mov R32, value.
Remove the xor because the 32-bit values are extended according to the sign
So this can only be done in case of positive values
---------------------------------------------------
An instruction with RIP relative addressing is not micro-fused in the following cases:
• An additional immediate is needed, for example:
• CMP [RIP+400], 27
• MOV [RIP+3000], 142
• The instruction is a control flow instruction with an indirect target specified using RIP-relative
addressing, for example:
• JMP [RIP+5000000]
In these cases, an instruction that can not be micro-fused will require decoder 0 to issue two micro-ops,
resulting in a slight loss of decode bandwidth.

Macro-fusion merges two instructions into a single micro-op. In Intel Core microarchitecture, this hardware
optimization is limited to specific conditions specific to the first and second of the macro-fusable
instruction pair.
• The first instruction of the macro-fused pair modifies the flags. The following instructions can be
macro-fused:
— In Intel microarchitecture code name Nehalem: CMP, TEST.
— In Intel microarchitecture code name Sandy Bridge: CMP, TEST, ADD, SUB, AND, INC, DEC
— These instructions can fuse if
• The first source / destination operand is a register.
• The second source operand (if exists) is one of: immediate, register, or non RIP-relative
memory.
• The second instruction of the macro-fusable pair is a conditional branch. Table 3-1 describes, for each
instruction, what branches it can fuse with.

Calls and returns are expensive; use inlining for the following reasons:
• Parameter passing overhead can be eliminated.
• In a compiler, inlining a function exposes more opportunity for optimization.
• If the inlined routine contains branches, the additional context of the caller may improve branch
prediction within the routine.
• A mispredicted branch can lead to performance penalties inside a small function that are larger than
those that would occur if that function is inlined.
Assembly/Compiler Coding Rule 5. (MH impact, MH generality) Selectively inline a function if
doing so decreases code size or if the function is small and the call site is frequently executed.
Assembly/Compiler Coding Rule 6. (H impact, H generality) Do not inline a function if doing so
increases the working set size beyond what will fit in the trace cache.
Assembly/Compiler Coding Rule 7. (ML impact, ML generality) If there are more than 16 nested
calls and returns in rapid succession; consider transforming the program with inline to reduce the call
depth.
Assembly/Compiler Coding Rule 8. (ML impact, ML generality) Favor inlining small functions that
contain branches with poor prediction rates. If a branch misprediction results in a RETURN being
prematurely predicted as taken, a performance penalty may be incurred.
Assembly/Compiler Coding Rule 9. (L impact, L generality) If the last statement in a function is
a call to another function, consider converting the call to a jump. This will save the call/return overhead
as well as an entry in the return stack buffer.
Assembly/Compiler Coding Rule 10. (M impact, L generality) Do not put more than four
branches in a 16-byte chunk.
Assembly/Compiler Coding Rule 11. (M impact, L generality) Do not put more than two end loop
branches in a 16-byte chunk

Macro-fusion merges two instructions to a single micro-op. Intel Core microarchitecture performs this
hardware optimization under limited circumstances.
The first instruction of the macro-fused pair must be a CMP or TEST instruction. This instruction can be
REG-REG, REG-IMM, or a micro-fused REG-MEM comparison. The second instruction (adjacent in the
instruction stream) should be a conditional branch.
Since these pairs are common ingredient in basic iterative programming sequences, macro-fusion
improves performance even on un-recompiled binaries. All of the decoders can decode one macro-fused
pair per cycle, with up to three other instructions, resulting in a peak decode bandwidth of 5 instructions
per cycle.
Each macro-fused instruction executes with a single dispatch. This process reduces latency, which in this
case shows up as a cycle removed from branch mispredict penalty. Software also gain all other fusion
benefits: increased rename and retire bandwidth, more storage for instructions in-flight, and power
savings from representing more work in fewer bits.
The following list details when you can use macro-fusion:
• CMP or TEST can be fused when comparing:
REG-REG. For example: CMP EAX,ECX; JZ label
REG-IMM. For example: CMP EAX,0x80; JZ label
REG-MEM. For example: CMP EAX,[ECX]; JZ label
MEM-REG. For example: CMP [EAX],ECX; JZ label
• TEST can fused with all conditional jumps.
• CMP can be fused with only the following conditional jumps in Intel Core microarchitecture. These
conditional jumps check carry flag (CF) or zero flag (ZF). jump. The list of macro-fusion-capable
conditional jumps are:
JA or JNBE
JAE or JNB or JNC
JE or JZ
JNA or JBE
JNAE or JC or JB
JNE or JNZ
CMP and TEST can not be fused when comparing MEM-IMM (e.g. CMP [EAX],0x80; JZ label). Macrofusion
is not supported in 64-bit mode for Intel Core microarchitecture.
• Intel microarchitecture code name Nehalem supports the following enhancements in macrofusion:
— CMP can be fused with the following conditional jumps (that was not supported in Intel Core
microarchitecture):
• JL or JNGE
• JGE or JNL
• JLE or JNG
• JG or JNLE
— Macro-fusion is support in 64-bit mode.
98
Assembly discussions / Captain Hook Very optimized
« Last post by Jokaste on November 01, 2017, 05:49:25 pm »
In this version I reduced program size from 153 Kb to 151 Kb. 2 Kb in 4 days!
In the future (as soon as possible) I'll send à pdf with the optimizations I found.
99
Assembly discussions / Merci
« Last post by Jokaste on November 01, 2017, 05:45:58 pm »

Thank you to everyone who participated in these discussions. I hope I have not been aggressive. I take what I do very hard and be upset it's hard to accept even if I'm wrong.


Thank you JJ, Vortex, Timo and Frankie.
100
Assembly discussions / Re: I suppose...
« Last post by Jokaste on November 01, 2017, 05:43:54 pm »
JJ you are right, At one place we read something and elsewhere we read the opposite! I say that because I read that now it is not possible to measure timings because of many cores... Don't use RD... use Time!
I would like to have something for measuring timings. If it's you or Vortex I accept, everyone else I refuse.
Pages: 1 ... 8 9 [10]