I want to dp this :
QuoteSTEP 1 :Search for <img
STEP 2 : if found => search for rc=".
STEP 3 : if found => search for .jpg.
I use
SCASD. STEP 1 is OK.
The second SCASD goes too far.
I made an other search in looking for
.jpg then backward I look for
rc=" Always the same problem. It seems that only the first SCASD instruction works well.
I downloaded AMD64 instructions set and read about SCAS.
QuoteCompares the AL, AX, EAX, or RAX register with the byte, word, doubleword, or quadword pointed
to by ES:rDI, sets the status flags in the rFLAGS register according to the results, and then increments
or decrements the rDI register according to the state of the DF flag in the rFLAGS register.
If the DF flag is 0, the instruction increments the rDI register; otherwise, it decrements it. The
instruction increments or decrements the rDI register by 1, 2, 4, or 8, depending on the size of the
operands.
The forms of the SCASx instruction with an explicit operand address the operand at ES:rDI. The
explicit operand serves only to specify the size of the values being compared.
The no-operands forms of the instruction use the ES:rDI registers to point to the value to be compared.
The mnemonic determines the size of the operands and the specific register containing the other
comparison value.
For block comparisons, the SCASx instructions support the REPE or REPZ prefixes (they are
synonyms) and the REPNE or REPNZ prefixes (they are synonyms). For details about the REP
prefixes, see "Repeat Prefixes" on page 12. A SCASx instruction can also operate inside a loop
controlled by the LOOPcc instruction.
; http://www.alumnicheerleaders.com/ (URL FOR TEST)
; __________________________________________________________________________________
; _______________________ ParseJpgFile ________________________________________________
; __________________________________________________________________________________
ParseJpgFile PROC USES RDI RSI PARMAREA=4*QWORD
LOCAL _CurrentCount:QWORD
mov rax,OFFSET lpCurrentBuffer ; The html file
mov rdi,[rax]
mov rsi,rdi
mov rax,OFFSET BufferSize ; number of bytes in html buffer
mov rcx,[rax]
shr rcx,2 ; divided by four because I search 4 bytes
mov _CurrentCount,rcx ; save RCX
; =========================================================================
; =========================================================================
; Examples of bytes I can find, not always xxxx.jpg"
; <img src="//monsite.woopic.com/383/f/300x/p/imgtools/img/702daf7fa5fcd2ecd8ce3efffef17d96.jpg"
; <img src="http://api.ning.com/files/mCZ8T-S0x0y9t7RLp0ig6gEugQ1wv5aUW9HI0D80zOi*lwRSkSZVZ9d3IeAiZYkKYHAsJLdVVpj46F2pzoFgtUkDJ7jPbhru/Cheerboots.jpg?crop=1%3A1&width=40"
; <img class="photo photo" src="http://api.ning.com:80/files/LvNfiSNLzicE9UiR73X5lR4kgNr7OgFQzJRmYC23BDbRBzK5LtrgUXZdUkQkogajJ4ea6GCEPbP7bSMttklSreHCaQDsmQg2/1039001571.png?xgip=0%3A4%3A299%3A299%3B%3B&width=48&height=48&crop=1%3A1" alt="" />
@Loop :
cld
mov rcx,_CurrentCount
jrcxz @Finished
mov eax,'gmi<' ; '<img'
repne scasd
jrcxz @Finished ; OK
shr rdi,3 ; I test to have a 8 bits alignment
shl rdi,3
mov eax,'"=cr' ; 'rc="' (SRC=")
repne scasd ; TOO FAR !
jrcxz @Finished
mov rsi,rdi
mov eax,'gpj.' ; '.jpg'
repne scasd
jrcxz @Finished
mov BYTE PTR [rdi],0 ; End of jpeg file name
mov _CurrentCount,rcx
mov rax,OFFSET lpszFileNameFromHtml
mov [rax],rsi
call DownloadThisFile
jmp @Loop
; =========================================================================
; =========================================================================
@Finished :
ret
ParseJpgFile ENDP
Could someone help me?
Thanks
Philippe RIO
Now the problem is solved, I removed the SCASD.
I would like to hae an advise from JJ2007 and Vortex (my masters) ::)
Here is a new version.
Better alignment of code and datas.
Played with processor L1 cache.
Seems quicker.
Quote from: Jokaste on September 28, 2017, 08:47:49 PM
Now the problem is solved, I removed the SCASD.
I would like to hae an advise from JJ2007 and Vortex (my masters) ::)
I feel honoured, Philippe ;-)
scasd advances in dword steps. So if you have a string like "where is the
rc=?", then mov eax, "?=cr" is a good start but not sufficient... because it will find the match only if it happens to be at pos 0, 4, 8, ...
You probably need a combi of scasb and cmp [edi-1], eax.
I did not know that the first character had to be in first position. I was right when I told taht you were my masters.
I suppose it is the same thing with CMPSx. Finally it is not a useful instruction. It can be easily replaced by the old but good CMP.
I have modified my software for having a better aligment, also I added the MFENCE and PREFETCHNT1 instructions. Is it a good idea? Because I make a loop into a buffer that I always scan, if it could be stored into the processor cached seemed to me be a manner to speed the loop.
Is it possible, using XMM registers to make the same loop?
Thanks GOD
Something with C too
// https://www.w3.org/MarkUp/html3/img.html
char *FindPicUrl(char *str, char **pc1, char **pc2)
{
char *pch = str;
*pc1 = *pc2 = 0; // set both empty
do { // find "<img"
while (*pch && *pch != '<') pch++; // first char to test
if (!*pch || !*(pch+1) || !*(pch+2) || !*(pch+3)) break;
if (*(long*)pch == *(long*)"<img") {
pch += 4;
while (*pch && *pch == ' ') pch++; // remove spaces
while (*pch && *(long*)pch != *(long*)"src=") pch++; // remove attributes
if (*(long*)pch == *(long*)"src=") {
*pc1 = pch+4;
break;
}
}
if (*pch) pch++; // don't go past the end
} while (*pch);
if (pch) { // find end " or ?
char *p2;
p2 = pch+5;
while (*p2 && *p2 != '\"' && *p2 != '?') p2++;
*pc2 = p2;
}
return *
pc1;
}
int __cdecl main(void)
{
char str1[] = " <img src=\"//monsite.woopic.com/383/f/300x/p/imgtools/img/702daf7fa5fcd2ecd8ce3efffef17d96.jpg\" junk";
// char str2[] = " <imp src=\"//monsite.woopic.com/383/f/300x/p/imgtools/img/702daf7fa5fcd2ecd8ce3efffef17d96.jpg\" junk";
char str3[] = " <img class=\"photo\" src=\"http://api.foo.com:80/files/L/1039001571.png?xgip=0;width=48&height=48&crop=1%3A1\" alt=\"\" />";
// char str4[] = " <img src=\"' + content.mediaDocuments[i].thumbUrl + '\" class=\"imgClss' + i + '\">";
char *pt1, *pt2;
// pt1 = pt2 = 0;
pt1 = FindPicUrl(str3, &pt1, &pt2);
if (pt2 && *pt2) *(pt2+1) = 0; // cut string
printf("%s\n", pt1);
return 0;
}
Any comments?
Excellent (in French)
This site : http://www.alumnicheerleaders.com/ was a good test.
If you are OK I will include it into my program. A DLL ?
Bravo (https://www.youtube.com/watch?v=k2rxQOVPjw0)
Here is my version (copyright TimoVJL)
#include <tchar.h>
#include <ctype.h>
// https://www.w3.org/MarkUp/html3/img.html
char *FindPicUrl(char *str, char **pc1, char **pc2)
{
char *pch = str;
*pc1 = *pc2 = 0; // set both empty
do { // find "<img"
while (*pch && *pch != '<') pch++; // first char to test
if (!*pch || !*(pch+1) || !*(pch+2) || !*(pch+3)) break;
if (*(long*)pch == *(long*)"<img") {
pch += 4;
while (*pch && *pch == ' ') pch++; // remove spaces
while (*pch && *(long*)pch != *(long*)"src=") pch++; // remove attributes
if (*(long*)pch == *(long*)"src=") {
*pc1 = pch+4;
break;
}
}
if (*pch) pch++; // don't go past the end
} while (*pch);
if (pch) { // find end " or ?
char *p2;
p2 = pch+5;
while (*p2 && *p2 != '\"' && *p2 != '?') p2++;
*pc2 = p2;
}
return (*pc1) ;
}
char *UrlStringCopy(char *__lpszDestination,char *__lpszSource)
{
int _iLength ;
char *_lpDest ;
char _c ;
_iLength = 0 ;
_lpDest = __lpszDestination ;
do
{
_c = *__lpszSource++ ;
if(!((_c && (_iLength < 2048))))
break ;
*_lpDest++ = _c ;
_iLength++ ;
} while (_iLength < 2048) ; // Maximum length of an URL
*_lpDest = '\0' ;
return (__lpszDestination) ;
}
char *FindPictureUrl(char *__lpszStringToSearch,char *__lpszResult)
{
char *_pt1, *_pt2 ;
_pt1 = _pt2 = NULL ;
_pt1 = FindPicUrl(__lpszStringToSearch,&_pt1,&_pt2);
if(_pt2 && *_pt2)
{
*(_pt2 + 1) = 0; // cut string
return (UrlStringCopy(__lpszResult,_pt1)) ;
}
return ((char *) NULL) ;
}
Thank a lot TimoVJL for your functions.
Here is the last version.
Quote from: Jokaste on September 29, 2017, 03:22:52 PM
Is it possible, using XMM registers to make the same loop?
Sure. Search e.g. for
pcmpeqb xmm0, [edi+16] ; compare packed bytes in [m128] and xmm0 for equality
pmovmskb eax, xmm0 ; set byte mask in eax for second 16 byte chunk
For those who want to tweak it with poasm:.code
FindPicUrl PROC
mov qword ptr [r8], 0
mov qword ptr [rdx], 0
jmp ?_002
?_001: add rcx, 1
?_002: cmp byte ptr [rcx], 0
jz ?_009
?_003: cmp byte ptr [rcx], 60 ; <
jnz ?_001
cmp byte ptr [rcx], 0
jz ?_009
cmp byte ptr [rcx+1H], 0
jz ?_009
cmp byte ptr [rcx+2H], 0
jz ?_009
cmp byte ptr [rcx+3H], 0
jz ?_009
cmp dword ptr [rcx], 676D693Ch ;"<img"
jnz ?_008
add rcx, 4
jmp ?_005
?_004: add rcx, 1
?_005: cmp byte ptr [rcx], 0
jz ?_007
cmp byte ptr [rcx], 32 ; ' '
jz ?_004
?_006: cmp byte ptr [rcx], 0
jz ?_007
cmp dword ptr [rcx], 3D637273h ;"src="
jz ?_014
add rcx, 1
jmp ?_006
?_007: cmp dword ptr [rcx], 3D637273h ;"src="
jz ?_014
?_008: cmp byte ptr [rcx], 0
jz ?_009
add rcx, 1
cmp byte ptr [rcx], 0
jnz ?_003
?_009: test rcx, rcx
jz ?_013
lea rax, qword ptr [rcx+5H]
jmp ?_011
?_010: add rax, 1
?_011: cmp byte ptr [rax], 0
jz ?_012
cmp byte ptr [rax], 34 ; "
jz ?_012
cmp byte ptr [rax], 63 ; ?
jnz ?_010
?_012: mov qword ptr [r8], rax
?_013: mov rax, qword ptr [rdx]
ret
?_014: lea rax, qword ptr [rcx+4H]
mov qword ptr [rdx], rax
jmp ?_009
ret
FindPicUrl ENDP
END
source code generated with objconv.exe and edited for poasm.
Some bugs corrected.
Added some SSE/SSE2 instructions.
Changed aligment for some variables.
replaced the strlen function with one written by Agner Frog (http://www.agner.org/optimize/).
I would like to use the entire library but it is seems to be written for an other assembler (gas, namsm?).
so I can't.
Hi Jokaste,
Sorry, I could be not helpful. Here is my image library converted to Poasm. Maybe you could use it for your projects.
Quote
LoadImageFromFile
LoadImageFromFile PROC pImageFileName:DWORD
This function loads a BMP, JPG, GIF or WMF image from disc and returns the handle to the image.
pImageFileName is a pointer to the FULL path name of the image file to be displayed
LoadImageFromMem
LoadImageFromMem PROC pImageAddr:DWORD,ImageLen:DWORD
This function returns the handle of an image stored in memory. Valid image formats are
BMP, JPG, GIF and WMF
pImageAddr is a pointer to the location of the image in memory.
ImageLen is the size of the image.
In case of error, both of the functions will return NULL.
Thanks Vortex. I take. :)