Pelles C forum

C language => Tips & tricks => Topic started by: Grincheux on August 05, 2017, 06:14:59 pm

Title: Perceptual Hash
Post by: Grincheux on August 05, 2017, 06:14:59 pm
Hello the Ghosts

What is perceptual hash?

Perceptual is not a hash like we can understand, not  an MD5... SHAx.
It is used to find similar images.

Imagine a big collection of images, difficult to find similar images.

For each image
____Make an horizontal mirror
____Apply a gaussian blur
____Resize your image without keeping the W/H ratio (important) (It looks like a cropping)
____Change colors depth by converting to gray (that reduces colors frequencies)

____Compute the color average and create the hash
________Compare the current color to the average color
________IF Color(i) < AverageColor THEN Hash(i) = 1 ;
________IF Color(i) == AverageColor THEN Hash(i) = 2 ;
________IF Color(i) > AverageColor THEN Hash(i) = 3 ;
____For Each Color


You can now transform it to a MD5 that will gives the duplicated images.
If you don't transform it to a MD5 that will help you to find similar images (80%,85%...)

On the web the image generaly  is resized to 8x8. I think it is small. I use 32x32.

Code: [Select]
gflFlipHorizontal(_GflBitmap,NULL) ;
gflGaussianBlur(_GflBitmap,NULL,5) ;
gflResize(_GflBitmap,NULL,24,24,GFL_RESIZE_BILINEAR,0) ;
gflChangeColorDepth(_GflBitmap,NULL,GFL_MODE_TO_256GREY,GFL_MODE_NO_DITHER) ;
_lpImage = Image_GflBitmapToImage(_GflBitmap) ;

_lpBits = (LPDWORD) _lpImage->lpImageBits ;
_iAverage = 0 ;

for(_i = 0 ; _i < (24 * 24) ;_i++)
_iAverage += *_lpBits++ ;

_iAverage /= (24 * 24) ;
_lpBits = (LPDWORD) _lpImage->lpImageBits ;

for(_i = 0 ; _i < (24 * 24) ; _i++)
if(((int) *_lpBits) < _iAverage) _szTmp[_i] = 0 ;
else if(((int) *_lpBits == _iAverage)) _szTmp[_i] = 1 ;
else _szTmp[_i] = 2 ;

_lpBits++ ;

Keccak((unsigned char *)_szTmp,sizeof(_szTmp),_cResult) ;
Hex2Str(_cResult,(unsigned char *) __lpImageInfos->szHashSimilar,224 / 8) ;

There is a library called pHash ( that makes the same job. Unfortunately I could not compiled it using Pelles... :-[ :-[

If that can help you, I will be pleased.

Some links

Title: Re: Perceptual Hash
Post by: TimoVJL on August 06, 2017, 01:19:42 pm
Had you problems with that BlockHash?
It's MIT licensed.

Only fix it needed for a void blockhash():
Code: [Select]
    if (width % bits == 0 && height % bits == 0) {
        return blockhash_quick(bits, data, width, height, hash);
Code: [Select]
    if (width % bits == 0 && height % bits == 0) {
        blockhash_quick(bits, data, width, height, hash);
Title: Re: Perceptual Hash
Post by: Grincheux on August 09, 2017, 07:10:53 pm
No, but it requires an image in RGBA, that makes me make too changes.
With it I get all images, I must select into a range (80..100%).
With BlockHash and pHash, you have to reduce te hash by converting the hash to bytes.
Its result is like this : 0,0,1,0,0,0,1,0 and have to be 00100010 (binary)
This is true if you only compare the average color with > operator.
If x > Average => Result = 1 ; else Result = 0.
If you compare with >, < and == you use use 2 bits rather than one.
What I want is to get automatically the similar images, and I think I have found.
I tested on many images, I found 95% of the similar images.

Code: [Select]
gflNegative(_GflBitmap,NULL) ; // Reverse all bits
gflSoften(_GflBitmap,NULL,100) ; // Apply an effect that looks like a blur
gflRotateFine(_GflBitmap,NULL,90.0,NULL) ; // Rotate the image by 90°
gflChangeColorDepth(_GflBitmap,NULL,GFL_MODE_TO_BINARY,GFL_MODE_NO_DITHER) ; // Reduce colors
gflResize(_GflBitmap,NULL,8,8,GFL_RESIZE_BILINEAR,0) ; // Resize image to 8x8
_lpImage = Image_GflBitmapToImage(_GflBitmap) ; // Get the image bits
Keccak(_lpImage->lpImageBits,256,_cResult) ; // Create a classic hash (stronger than MD5) See SPH.LIB with GOOGLE
Hex2Str(_cResult,(unsigned char *) __lpImageInfos->szHashSimilar,256 / 8) ; Convert the hash to an hexadecimal string
ImageFree(_lpImage) ; Free memory (Image_GflBitmapToImage)

After getting the image bits I have 256 bytes (8 * 8 * 4).
Normally I have to count the average color and compare it to the image bits
like this I get a pseudo hash  If I convert to hexa I get 512 bytes.
sprintf and wsprintf have a limit of 1024 bytes for the result, this makes me modify many lines of code.
I test this manner, but the result was not enougth good.
I also have tested many kind of gray conversion.
I have read many things, but too many uses pHash.

The main thing to keep is mind is to convert in gray and resized to 8x8. That gives a FlashCode for the image.
It is very interesting because on many thousands of images (uniques), the FlashCode is different!
For greather sizes (also tested) the result is bad.
When you resize the image, don't apply the image ratio.