NO

Author Topic: Some intrinsics from the days when intel PL/M-86 was the language of choice!  (Read 4776 times)

oforshell

  • Guest
When I programmed in PL/M-86 during the eighties to mid-nineties there were some built-in functions that I liked a lot, especially for text searches.

int findb (void *,char,int);
int findrb (void *,char,int);
int skipb (void *,char,int);
int skiprb (void *,char,int);
int cmpb (void *,void *,int);
int cmprb (void *,void *,int);

findb/findrb scanned a buffer (void *) of x (int) length for the first/last match for a byte (char). Upon completion the returned value would contain the index of the first/last match. If there was no match 0xffff would be returned.

skipb/skiprb scanned a buffer (void *) of x (int) length for the first/last non-match for a byte (char). Upon completion the returned value would contain would be the index of the first/last non-match. If there was no non-match 0xffff would be returned.

cmpb/cmprb compares two buffers of x (int) length byte by byte and reports the position of the first non-match. If the buffers were equal 0xffff would be returned.


I've included the OpenWatcom inline assembly definitions for the 32-bit x86:

#pragma aux cmpb = \
"all_same_0:"\
       "  mov edx,ecx"\
       "  jecxz all_same_1"\
       "    repe cmpsb"\
       "    je short all_same_0"\
       "      sub edx,ecx"\
"all_same_1:"\
       "dec edx"\
        parm [esi][edi][ecx] modify exact [ecx edx esi edi] value [edx];
#pragma aux cmprb = \
       "jecxz all_same"\
       "  lea esi,[esi+ecx-1]"\
       "  lea edi,[edi+ecx-1]"\
       "  std"\
       "  repe cmpsb"\
       "  cld"\
       "  jne short not_same"\
"all_same:"\
       "dec ecx"\
"not_same:"\
        parm [esi][edi][ecx] modify exact [ecx esi edi] value [ecx];
#pragma aux findb = \
"not_found_0:"\
       "mov edx,ecx"\
       "jecxz not_found_1"\
       "  repne scasb"\
       "  jne short not_found_0"\
       "    sub edx,ecx"\
"not_found_1:"\
       "dec edx"\
       parm [edi][al][ecx] modify exact [ecx edx edi] value [edx];
#pragma aux skipb = \
"all_same_0:"\
       "mov edx,ecx"\
       "jecxz all_same_1"\
       "  repe scasb"\
       "  je short all_same_0"\
       "    sub edx,ecx"\
"all_same_1:"\
       "dec edx"\
       parm [edi][al][ecx] modify exact [ecx edx edi] value [edx];
#pragma aux findrb = \
       "jecxz not_found"\
       "  lea edi,[edi+ecx-1]"\
       "  std"\
       "  repne scasb"\
       "  cld"\
       "  je short found"\
"not_found:"\
       "dec ecx"\
"found:"\
       parm [edi][al][ecx] modify exact [ecx edi] value [ecx];
#pragma aux skiprb = \
       "jecxz not_found"\
       "  lea edi,[edi+ecx-1]"\
       "  std"\
       "  repe scasb"\
       "  cld"\
       "  je short found"\
"not_found:"\
       "dec ecx"\
"found:"\
       parm [edi][al][ecx] modify exact [ecx edi] value [ecx];
« Last Edit: February 14, 2011, 11:39:59 PM by oforshell »

Nobody_1707

  • Guest
Wouldn't those have to be char pointers since you're dereferencing them?

Offline AlexN

  • Global Moderator
  • Member
  • *****
  • Posts: 394
    • Alex's Link Sammlung
Since Pelles C has a function specifier inline (search  the usage in the help file) you can try to implement it by yourself and offer it in "User contributions" in the forum. ;)
best regards
 Alex ;)

Offline Pelle

  • Administrator
  • Member
  • *****
  • Posts: 2266
    • http://www.smorgasbordet.com
Not for 6.50, but maybe later add __cmpsb/__cmpsw/etc., and __scasb/__scasw/etc.

The thing is, intrinsics like these are usually added for speed and the Intel optimization guide says "... Using a REP prefix with string move instructions can provide high performance in the situations described above. However, using a REP prefix with string scan instructions (SCASB, SCASW, SCASD, SCASQ) or compare instructions (CMPSB, CMPSW, SMPSD, SMPSQ) is not recommended for high performance. Consider using SIMD instructions instead."
/Pelle

oforshell

  • Guest
The efficiency of the rep block instructions has varied for quite some time. What is true for one manufacturer, processor generation or processor family may not be true for the next. I've experienced some big variations.

Introducing  __cmps* and __scas* intrinsics miss the point to a certain degree since their behaviour is be regulated by both the direction (cld/std) and zero (repe repne) flags. To me it isn't obvious which one should be implemented and which one omitted.

In the same manner __movs* may also be regulated by the direction flag. One favorite of mine was using it for overlapping moves: when preparing an empty space in a buffer I would use the reverse direction move (i e beginning at the numerically highest address and moving my way towards numerically lower). When I had a space that had been vacated I'd use a normal move (moving from numerically lower to higher).

Testing forward byte scasb (find and skip) on one byte vector on a C2D 6400 @ 2.13 GHz results in 6 clks/byte data scanned. Forward cmpsb on two vectors results in 5 clks/byte.