Trimming spaces and tabs inside a string

Started by Vortex, June 09, 2024, 12:03:13 PM

Previous topic - Next topic

Vortex

Function trimming spaces and tabs inside a string :

.386
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

lookupTbl   db 1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)


.code

RemoveSpaces PROC uses edi ebx str1:DWORD,buff:DWORD

    mov     ebx,OFFSET lookupTbl
    mov     ecx,str1
    mov     edi,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    movzx   edx,BYTE PTR [ebx+eax]
    mov     BYTE PTR [edi],al
    add     ecx,1
    add     edi,edx
    test    eax,eax
    jnz     @b   

finish:

    mov     eax,edi
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...

John Z

Hi Vortex,

I'm wondering what is the advantage in using the table when there are only two cases that need to be checked.
I would think the table uses more space than coding a check for 09 and a check for 32.  Granted that two checks would need to be done every time byte was < 33 whereas with the table only one check always....

John Z

frankie

When just a couple of symbols are to be compared the efficiency isn't much between normal comparing vs tables lookup.
When more symbols are to be checked, i.e. word separators, a lookup table rocket speeds-up the execution  ;).
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

Vortex

Hello,

With today's hardware capacity, a lookup table of 256 bytes should not be an issue. The purpose of using assembly is to optimize code especially where execution speed matters. The lookup table avoids exta jumps and you need only one jmp instruction here. Notice that you can use the same code above with some small modifications in the lookup table to remove more unnecessary characters.
Code it... That's all...

Vortex

#4
Here is another version without the lookup table :

.686
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al

    add     ecx,1

    mov     ebx,eax
    xor     ebx,32
    xor     eax,9
    and     ebx,eax
    add     ebx,0FFFFFFFFh
    adc     edx,0

    cmp     eax,9
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...

John Z

frankie, Vortex

Thanks for clearing up the table usage.  I just thought a bit unusual for just two checks.  I used a similar method (table) in C to build my Base 64 encoder/decoder.  Wanted to ask to be sure I was understanding the assembly and not missing something.  It has been a while (pre 486  ;D )

Vortex,  in your versions w/o the table - wouldn't you want to check first if the character was less than 33? This should speed up the code because most of the 255 characters would only need one check while 32 need two checks, if I grok the code that is.

Thanks for the inputs.

John Z


Vortex

Hi John,

Quotewouldn't you want to check first if the character was less than 33?

Can you try the new version below?

.386
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
    mov     ebx,33
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al

    add     ecx,1
    cmp     ebx,eax
    adc     edx,0
    test    eax,eax
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...

frankie

#7
Today Intel processors, operating with super-scalar and hyper-threading technologies, instructions are pipelined and executed in parallel when possible.
To speed-up memory access data and instructions are pre-loaded in the fast-speed caches, where are also stored intermediate computation.
The execution pipeline is filled fetching instructions from the address cache while the processor logic assemble the flow for parallel (scalar/super-scalar) execution in the ALU.
In this scenario the caches holds a snapshot of memory accesses and the instruction pipeline a snapshot of what would be the execution flow, with the advantage of fast exchange with ALU and CPU registers. The caches are loaded/stored to physical slower memory transparently to the execution (in burst mode with multiword simultaneous transfers). This whole process contributes to faster execution.
Then what happen to cache contents and especially to the pipelines when a branch, following a comparison, is encountered? We have 2 options, if the flow will follow the instructions already cached the CPU will continue processing at full speed, but in the other case the processor stops while the pipeline is flushed and reloaded after caches refresh with the new instructions flow.
In the last case we loose all the advantages we had using instructions and data caching (eventually pre-calculated data in the wrong branch).
Advanced new processors are able to compute branch predictions to mitigate the problem, but the point remain substantially "avoid any branch" to speed-up execution.

Now it should be evident that the use of table with only one comparison, hopefully expecting that the pipelining can define the compare address in advance storing the value in cache before its use, is the fastes way to do this job.
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

John Z

Quote from: Vortex on June 13, 2024, 11:03:00 PM
Hi John,
Can you try the new version below?

great! Thanks Vortex for showing it and indulging my inexperience/out-of-date knowledge.

Quote from: frankie on June 14, 2024, 01:37:16 PM
Advanced new processors are able to compute branch predictions to mitigate the problem, but the point remain substantially "avoid any branch" to speed-up execution.

Thanks frankie - great explanation

John Z

Vortex

Code it... That's all...

John Z

Quote from: Vortex on June 15, 2024, 04:48:20 PM
Hi John,

You are welcome. I uploaded another version without the lookup table :

https://forum.pellesc.de/index.php?topic=11196.msg39303#msg39303

Vortex - That version is very innovative and showing brain power! 

John Z

Vortex

Loop with shorter instructions :

.686
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
    xor     ebx,ebx
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    setnz   ah
    xor     al,41
    setnz   bl
    and     bl,ah
    add     edx,ebx

    cmp     al,9
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...

John Z

Even more tricky (sophisticated?)  One must realize that 9 xor 32 is 41  :)

neat!

John Z

Vortex

Another version :

.686
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    mov     ah,al
    xor     al,41
    and     ah,al
    add     ah,0FFh
    adc     edx,0

    cmp     al,9
    jne     @b

finish:

    mov     eax,edx
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...