NO

Author Topic: Trimming spaces and tabs inside a string  (Read 1985 times)

Offline Vortex

  • Member
  • *
  • Posts: 836
    • http://www.vortex.masmcode.com
Trimming spaces and tabs inside a string
« on: June 09, 2024, 12:03:13 PM »
Function trimming spaces and tabs inside a string :

Code: [Select]
.386
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

lookupTbl   db 1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
            db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)


.code

RemoveSpaces PROC uses edi ebx str1:DWORD,buff:DWORD

    mov     ebx,OFFSET lookupTbl
    mov     ecx,str1
    mov     edi,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    movzx   edx,BYTE PTR [ebx+eax]
    mov     BYTE PTR [edi],al
    add     ecx,1
    add     edi,edx
    test    eax,eax
    jnz     @b   
 
finish:

    mov     eax,edi
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...

Offline John Z

  • Member
  • *
  • Posts: 838
Re: Trimming spaces and tabs inside a string
« Reply #1 on: June 11, 2024, 01:32:04 PM »
Hi Vortex,

I'm wondering what is the advantage in using the table when there are only two cases that need to be checked.
I would think the table uses more space than coding a check for 09 and a check for 32.  Granted that two checks would need to be done every time byte was < 33 whereas with the table only one check always....

John Z

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2111
Re: Trimming spaces and tabs inside a string
« Reply #2 on: June 11, 2024, 02:43:39 PM »
When just a couple of symbols are to be compared the efficiency isn't much between normal comparing vs tables lookup.
When more symbols are to be checked, i.e. word separators, a lookup table rocket speeds-up the execution  ;).
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

Offline Vortex

  • Member
  • *
  • Posts: 836
    • http://www.vortex.masmcode.com
Re: Trimming spaces and tabs inside a string
« Reply #3 on: June 11, 2024, 08:07:37 PM »
Hello,

With today's hardware capacity, a lookup table of 256 bytes should not be an issue. The purpose of using assembly is to optimize code especially where execution speed matters. The lookup table avoids exta jumps and you need only one jmp instruction here. Notice that you can use the same code above with some small modifications in the lookup table to remove more unnecessary characters.
Code it... That's all...

Offline Vortex

  • Member
  • *
  • Posts: 836
    • http://www.vortex.masmcode.com
Re: Trimming spaces and tabs inside a string
« Reply #4 on: June 12, 2024, 09:04:42 PM »
Here is another version without the lookup table :

Code: [Select]
.686
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al

    add     ecx,1

    mov     ebx,eax
    xor     ebx,32
    xor     eax,9
    and     ebx,eax
    add     ebx,0FFFFFFFFh
    adc     edx,0

    cmp     eax,9
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
« Last Edit: June 15, 2024, 04:46:41 PM by Vortex »
Code it... That's all...

Offline John Z

  • Member
  • *
  • Posts: 838
Re: Trimming spaces and tabs inside a string
« Reply #5 on: June 13, 2024, 10:47:52 AM »
frankie, Vortex

Thanks for clearing up the table usage.  I just thought a bit unusual for just two checks.  I used a similar method (table) in C to build my Base 64 encoder/decoder.  Wanted to ask to be sure I was understanding the assembly and not missing something.  It has been a while (pre 486  ;D )

Vortex,  in your versions w/o the table - wouldn't you want to check first if the character was less than 33? This should speed up the code because most of the 255 characters would only need one check while 32 need two checks, if I grok the code that is.

Thanks for the inputs.

John Z


Offline Vortex

  • Member
  • *
  • Posts: 836
    • http://www.vortex.masmcode.com
Re: Trimming spaces and tabs inside a string
« Reply #6 on: June 13, 2024, 11:03:00 PM »
Hi John,

Quote
wouldn't you want to check first if the character was less than 33?

Can you try the new version below?

Code: [Select]
.386
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
    mov     ebx,33
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al

    add     ecx,1
    cmp     ebx,eax
    adc     edx,0
    test    eax,eax
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...

Offline frankie

  • Global Moderator
  • Member
  • *****
  • Posts: 2111
Re: Trimming spaces and tabs inside a string
« Reply #7 on: June 14, 2024, 01:37:16 PM »
Today Intel processors, operating with super-scalar and hyper-threading technologies, instructions are pipelined and executed in parallel when possible.
To speed-up memory access data and instructions are pre-loaded in the fast-speed caches, where are also stored intermediate computation.
The execution pipeline is filled fetching instructions from the address cache while the processor logic assemble the flow for parallel (scalar/super-scalar) execution in the ALU.
In this scenario the caches holds a snapshot of memory accesses and the instruction pipeline a snapshot of what would be the execution flow, with the advantage of fast exchange with ALU and CPU registers. The caches are loaded/stored to physical slower memory transparently to the execution (in burst mode with multiword simultaneous transfers). This whole process contributes to faster execution.
Then what happen to cache contents and especially to the pipelines when a branch, following a comparison, is encountered? We have 2 options, if the flow will follow the instructions already cached the CPU will continue processing at full speed, but in the other case the processor stops while the pipeline is flushed and reloaded after caches refresh with the new instructions flow.
In the last case we loose all the advantages we had using instructions and data caching (eventually pre-calculated data in the wrong branch).
Advanced new processors are able to compute branch predictions to mitigate the problem, but the point remain substantially "avoid any branch" to speed-up execution.

Now it should be evident that the use of table with only one comparison, hopefully expecting that the pipelining can define the compare address in advance storing the value in cache before its use, is the fastes way to do this job.
« Last Edit: June 14, 2024, 01:38:50 PM by frankie »
"It is better to be hated for what you are than to be loved for what you are not." - Andre Gide

Offline John Z

  • Member
  • *
  • Posts: 838
Re: Trimming spaces and tabs inside a string
« Reply #8 on: June 15, 2024, 10:04:46 AM »
Hi John,
Can you try the new version below?

great! Thanks Vortex for showing it and indulging my inexperience/out-of-date knowledge.

Advanced new processors are able to compute branch predictions to mitigate the problem, but the point remain substantially "avoid any branch" to speed-up execution.

Thanks frankie - great explanation

John Z

Offline Vortex

  • Member
  • *
  • Posts: 836
    • http://www.vortex.masmcode.com
Re: Trimming spaces and tabs inside a string
« Reply #9 on: June 15, 2024, 04:48:20 PM »
Hi John,

You are welcome. I uploaded another version without the lookup table :

https://forum.pellesc.de/index.php?topic=11196.msg39303#msg39303
Code it... That's all...

Offline John Z

  • Member
  • *
  • Posts: 838
Re: Trimming spaces and tabs inside a string
« Reply #10 on: June 16, 2024, 11:32:24 AM »
Hi John,

You are welcome. I uploaded another version without the lookup table :

https://forum.pellesc.de/index.php?topic=11196.msg39303#msg39303

Vortex - That version is very innovative and showing brain power! 

John Z

Offline Vortex

  • Member
  • *
  • Posts: 836
    • http://www.vortex.masmcode.com
Re: Trimming spaces and tabs inside a string
« Reply #11 on: June 16, 2024, 03:39:40 PM »
Loop with shorter instructions :

Code: [Select]
.686
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC uses ebx str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
    xor     ebx,ebx
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    setnz   ah
    xor     al,41
    setnz   bl
    and     bl,ah
    add     edx,ebx

    cmp     al,9
    jnz     @b

finish:

    mov     eax,edx
    sub     eax,1
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...

Offline John Z

  • Member
  • *
  • Posts: 838
Re: Trimming spaces and tabs inside a string
« Reply #12 on: June 17, 2024, 12:53:01 PM »
Even more tricky (sophisticated?)  One must realize that 9 xor 32 is 41  :)

neat!

John Z

Offline Vortex

  • Member
  • *
  • Posts: 836
    • http://www.vortex.masmcode.com
Re: Trimming spaces and tabs inside a string
« Reply #13 on: June 21, 2024, 08:31:34 PM »
Another version :

Code: [Select]
.686
.model flat,stdcall
option casemap:none

ExitProcess PROTO :DWORD
printf PROTO C :DWORD,:VARARG

includelib  \PellesC\lib\Win\kernel32.lib
includelib  \PellesC\lib\Win\user32.lib
includelib  msvcrt.lib

.data

mystr       db '    This    Is   A       Test String.',0
message     db 'Trimmed string = %s',13,10
            db 'Length of the string = %u',0

.data?

buffer      db 64 dup(?)

.code

RemoveSpaces PROC str1:DWORD,buff:DWORD

    mov     ecx,str1
    mov     edx,buff
@@:
    movzx   eax,BYTE PTR [ecx]
    mov     BYTE PTR [edx],al
    add     ecx,1

    xor     al,32
    mov     ah,al
    xor     al,41
    and     ah,al
    add     ah,0FFh
    adc     edx,0

    cmp     al,9
    jne     @b

finish:

    mov     eax,edx
    sub     eax,buff
    ret

RemoveSpaces ENDP

start:

    invoke  RemoveSpaces,ADDR mystr,ADDR buffer
    invoke  printf,ADDR message,ADDR buffer,eax
    invoke  ExitProcess,0

END start
Code it... That's all...