Function trimming spaces and tabs inside a string :
.386
.model flat,stdcall
option casemap:none
ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG
includelib \PellesC\lib\Win\kernel32.lib
includelib \PellesC\lib\Win\user32.lib
includelib msvcrt.lib
.data
lookupTbl db 1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
mystr db ' This Is A Test String.',0
message db 'Trimmed string = %s',13,10
db 'Length of the string = %u',0
.data?
buffer db 64 dup(?)
.code
RemoveSpaces PROC uses edi ebx str1:DWORD,buff:DWORD
mov ebx,OFFSET lookupTbl
mov ecx,str1
mov edi,buff
@@:
movzx eax,BYTE PTR [ecx]
movzx edx,BYTE PTR [ebx+eax]
mov BYTE PTR [edi],al
add ecx,1
add edi,edx
test eax,eax
jnz @b
finish:
mov eax,edi
sub eax,1
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke printf,ADDR message,ADDR buffer,eax
invoke ExitProcess,0
END start
Hi Vortex,
I'm wondering what is the advantage in using the table when there are only two cases that need to be checked.
I would think the table uses more space than coding a check for 09 and a check for 32. Granted that two checks would need to be done every time byte was < 33 whereas with the table only one check always....
John Z
When just a couple of symbols are to be compared the efficiency isn't much between normal comparing vs tables lookup.
When more symbols are to be checked, i.e. word separators, a lookup table rocket speeds-up the execution ;).
Hello,
With today's hardware capacity, a lookup table of 256 bytes should not be an issue. The purpose of using assembly is to optimize code especially where execution speed matters. The lookup table avoids exta jumps and you need only one jmp instruction here. Notice that you can use the same code above with some small modifications in the lookup table to remove more unnecessary characters.
Here is another version without the lookup table :
.686
.model flat,stdcall
option casemap:none
ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG
includelib \PellesC\lib\Win\kernel32.lib
includelib \PellesC\lib\Win\user32.lib
includelib msvcrt.lib
.data
mystr db ' This Is A Test String.',0
message db 'Trimmed string = %s',13,10
db 'Length of the string = %u',0
.data?
buffer db 64 dup(?)
.code
RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD
mov ecx,str1
mov edx,buff
@@:
movzx eax,BYTE PTR [ecx]
mov BYTE PTR [edx],al
add ecx,1
mov ebx,eax
xor ebx,32
xor eax,9
and ebx,eax
add ebx,0FFFFFFFFh
adc edx,0
cmp eax,9
jnz @b
finish:
mov eax,edx
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke printf,ADDR message,ADDR buffer,eax
invoke ExitProcess,0
END start
frankie, Vortex
Thanks for clearing up the table usage. I just thought a bit unusual for just two checks. I used a similar method (table) in C to build my Base 64 encoder/decoder. Wanted to ask to be sure I was understanding the assembly and not missing something. It has been a while (pre 486 ;D )
Vortex, in your versions w/o the table - wouldn't you want to check first if the character was less than 33? This should speed up the code because most of the 255 characters would only need one check while 32 need two checks, if I grok the code that is.
Thanks for the inputs.
John Z
Hi John,
Quotewouldn't you want to check first if the character was less than 33?
Can you try the new version below?
.386
.model flat,stdcall
option casemap:none
ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG
includelib \PellesC\lib\Win\kernel32.lib
includelib \PellesC\lib\Win\user32.lib
includelib msvcrt.lib
.data
mystr db ' This Is A Test String.',0
message db 'Trimmed string = %s',13,10
db 'Length of the string = %u',0
.data?
buffer db 64 dup(?)
.code
RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD
mov ecx,str1
mov edx,buff
mov ebx,33
@@:
movzx eax,BYTE PTR [ecx]
mov BYTE PTR [edx],al
add ecx,1
cmp ebx,eax
adc edx,0
test eax,eax
jnz @b
finish:
mov eax,edx
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke printf,ADDR message,ADDR buffer,eax
invoke ExitProcess,0
END start
Today Intel processors, operating with super-scalar and hyper-threading technologies, instructions are pipelined and executed in parallel when possible.
To speed-up memory access data and instructions are pre-loaded in the fast-speed caches, where are also stored intermediate computation.
The execution pipeline is filled fetching instructions from the address cache while the processor logic assemble the flow for parallel (scalar/super-scalar) execution in the ALU.
In this scenario the caches holds a snapshot of memory accesses and the instruction pipeline a snapshot of what would be the execution flow, with the advantage of fast exchange with ALU and CPU registers. The caches are loaded/stored to physical slower memory transparently to the execution (in burst mode with multiword simultaneous transfers). This whole process contributes to faster execution.
Then what happen to cache contents and especially to the pipelines when a branch, following a comparison, is encountered? We have 2 options, if the flow will follow the instructions already cached the CPU will continue processing at full speed, but in the other case the processor stops while the pipeline is flushed and reloaded after caches refresh with the new instructions flow.
In the last case we loose all the advantages we had using instructions and data caching (eventually pre-calculated data in the wrong branch).
Advanced new processors are able to compute branch predictions to mitigate the problem, but the point remain substantially "avoid any branch" to speed-up execution.
Now it should be evident that the use of table with only one comparison, hopefully expecting that the pipelining can define the compare address in advance storing the value in cache before its use, is the fastes way to do this job.
Quote from: Vortex on June 13, 2024, 11:03:00 PM
Hi John,
Can you try the new version below?
great! Thanks Vortex for showing it and indulging my inexperience/out-of-date knowledge.
Quote from: frankie on June 14, 2024, 01:37:16 PM
Advanced new processors are able to compute branch predictions to mitigate the problem, but the point remain substantially "avoid any branch" to speed-up execution.
Thanks frankie - great explanation
John Z
Hi John,
You are welcome. I uploaded another version without the lookup table :
https://forum.pellesc.de/index.php?topic=11196.msg39303#msg39303
Quote from: Vortex on June 15, 2024, 04:48:20 PM
Hi John,
You are welcome. I uploaded another version without the lookup table :
https://forum.pellesc.de/index.php?topic=11196.msg39303#msg39303
Vortex - That version is very innovative and showing brain power!
John Z
Loop with shorter instructions :
.686
.model flat,stdcall
option casemap:none
ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG
includelib \PellesC\lib\Win\kernel32.lib
includelib \PellesC\lib\Win\user32.lib
includelib msvcrt.lib
.data
mystr db ' This Is A Test String.',0
message db 'Trimmed string = %s',13,10
db 'Length of the string = %u',0
.data?
buffer db 64 dup(?)
.code
RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD
mov ecx,str1
mov edx,buff
xor ebx,ebx
@@:
movzx eax,BYTE PTR [ecx]
mov BYTE PTR [edx],al
add ecx,1
xor al,32
setnz ah
xor al,41
setnz bl
and bl,ah
add edx,ebx
cmp al,9
jnz @b
finish:
mov eax,edx
sub eax,1
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke printf,ADDR message,ADDR buffer,eax
invoke ExitProcess,0
END start
Even more tricky (sophisticated?) One must realize that 9 xor 32 is 41 :)
neat!
John Z
Another version :
.686
.model flat,stdcall
option casemap:none
ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG
includelib \PellesC\lib\Win\kernel32.lib
includelib \PellesC\lib\Win\user32.lib
includelib msvcrt.lib
.data
mystr db ' This Is A Test String.',0
message db 'Trimmed string = %s',13,10
db 'Length of the string = %u',0
.data?
buffer db 64 dup(?)
.code
RemoveSpaces PROC str1:DWORD,buff:DWORD
mov ecx,str1
mov edx,buff
@@:
movzx eax,BYTE PTR [ecx]
mov BYTE PTR [edx],al
add ecx,1
xor al,32
mov ah,al
xor al,41
and ah,al
add ah,0FFh
adc edx,0
cmp al,9
jne @b
finish:
mov eax,edx
sub eax,buff
ret
RemoveSpaces ENDP
start:
invoke RemoveSpaces,ADDR mystr,ADDR buffer
invoke printf,ADDR message,ADDR buffer,eax
invoke ExitProcess,0
END start