Perceptual Hash

Grincheux · August 05, 2017, 06:14:59 PM

Hello the Ghosts

What is perceptual hash?

Perceptual is not a hash like we can understand, not an MD5... SHAx.
It is used to find similar images.

Imagine a big collection of images, difficult to find similar images.

For each image
____Make an horizontal mirror
____Apply a gaussian blur
____Resize your image without keeping the W/H ratio (important) (It looks like a cropping)
____Change colors depth by converting to gray (that reduces colors frequencies)

____Compute the color average and create the hash
________Compare the current color to the average color
________IF Color(i) < AverageColor THEN Hash(i) = 1 ;
________IF Color(i) == AverageColor THEN Hash(i) = 2 ;
________IF Color(i) > AverageColor THEN Hash(i) = 3 ;
____For Each Color

____ENDFor
EndFor

You can now transform it to a MD5 that will gives the duplicated images.
If you don't transform it to a MD5 that will help you to find similar images (80%,85%...)

On the web the image generaly is resized to 8x8. I think it is small. I use 32x32.

Code Select

		gflFlipHorizontal(_GflBitmap,NULL) ;
		gflGaussianBlur(_GflBitmap,NULL,5) ;
		gflResize(_GflBitmap,NULL,24,24,GFL_RESIZE_BILINEAR,0) ;
		gflChangeColorDepth(_GflBitmap,NULL,GFL_MODE_TO_256GREY,GFL_MODE_NO_DITHER) ;
		_lpImage = Image_GflBitmapToImage(_GflBitmap) ;

		_lpBits = (LPDWORD) _lpImage->lpImageBits ;
		_iAverage = 0 ;

		for(_i = 0 ; _i < (24 * 24) ;_i++)
			_iAverage += *_lpBits++ ;

		_iAverage /= (24 * 24) ;
		_lpBits = (LPDWORD) _lpImage->lpImageBits ;

		for(_i = 0 ; _i < (24 * 24) ; _i++)
		{
			if(((int) *_lpBits) < _iAverage)		_szTmp[_i] = 0 ;
			else if(((int) *_lpBits == _iAverage))	_szTmp[_i] = 1 ;
			else 									_szTmp[_i] = 2 ;

			_lpBits++ ;
		}

		Keccak((unsigned char *)_szTmp,sizeof(_szTmp),_cResult) ;
		Hex2Str(_cResult,(unsigned char *) __lpImageInfos->szHashSimilar,224 / 8) ;

There is a library called pHash that makes the same job. Unfortunately I could not compiled it using Pelles...

If that can help you, I will be pleased.

Some links
http://hzqtc.github.io/2013/04/image-duplication-detection.html
https://hosunghwang.wordpress.com/2015/03/12/perceptual-hash/
https://scholar.google.fr/scholar?hl=fr&q=perceptual+hash&btnG=&lr=
https://news.ycombinator.com/item?id=2614797

TimoVJL · August 06, 2017, 01:19:42 PM

Had you problems with that BlockHash?
It's MIT licensed.

Only fix it needed for a void blockhash():
form

Code Select

    if (width % bits == 0 && height % bits == 0) {
        return blockhash_quick(bits, data, width, height, hash);
    }

to

Code Select

    if (width % bits == 0 && height % bits == 0) {
        blockhash_quick(bits, data, width, height, hash);
        return;
    }

Grincheux · August 09, 2017, 07:10:53 PM

No, but it requires an image in RGBA, that makes me make too changes.
With it I get all images, I must select into a range (80..100%).
With BlockHash and pHash, you have to reduce te hash by converting the hash to bytes.
Its result is like this : 0,0,1,0,0,0,1,0 and have to be 00100010 (binary)
This is true if you only compare the average color with > operator.
If x > Average => Result = 1 ; else Result = 0.
If you compare with >, < and == you use use 2 bits rather than one.
What I want is to get automatically the similar images, and I think I have found.
I tested on many images, I found 95% of the similar images.

Code Select

		gflNegative(_GflBitmap,NULL) ; // Reverse all bits
		gflSoften(_GflBitmap,NULL,100) ; // Apply an effect that looks like a blur
		gflRotateFine(_GflBitmap,NULL,90.0,NULL) ; // Rotate the image by 90°
		gflChangeColorDepth(_GflBitmap,NULL,GFL_MODE_TO_BINARY,GFL_MODE_NO_DITHER) ; // Reduce colors
		gflResize(_GflBitmap,NULL,8,8,GFL_RESIZE_BILINEAR,0) ; // Resize image to 8x8
		_lpImage = Image_GflBitmapToImage(_GflBitmap) ; // Get the image bits
		Keccak(_lpImage->lpImageBits,256,_cResult) ; // Create a classic hash (stronger than MD5) See SPH.LIB with GOOGLE
		Hex2Str(_cResult,(unsigned char *) __lpImageInfos->szHashSimilar,256 / 8) ; Convert the hash to an hexadecimal string
		ImageFree(_lpImage) ; Free memory (Image_GflBitmapToImage)

After getting the image bits I have 256 bytes (8 * 8 * 4).
Normally I have to count the average color and compare it to the image bits
like this I get a pseudo hash If I convert to hexa I get 512 bytes.
sprintf and wsprintf have a limit of 1024 bytes for the result, this makes me modify many lines of code.
I test this manner, but the result was not enougth good.
I also have tested many kind of gray conversion.
I have read many things, but too many uses pHash.

The main thing to keep is mind is to convert in gray and resized to 8x8. That gives a FlashCode for the image.
It is very interesting because on many thousands of images (uniques), the FlashCode is different!
For greather sizes (also tested) the result is bad.
When you resize the image, don't apply the image ratio.

News:

Perceptual Hash

Grincheux

TimoVJL

Grincheux