Page 1 of 1

Different signature hashes between IM versions?

Posted: 2013-05-10T23:52:35-07:00
by trog
Hi,

I've just discovered the 'signature' field reported by the "identify" command line tool after looking for a way to compare a set of images with known properties - the same JPEG image data but different EXIF fields.

I thought it'd be a great way to identify duplicate images but from initial testing it seems that the signature is reported different between different ImageMagick versions.

I've tried on three different systems all of which just happen to have different versions of ImageMagic installed - I get something like the following between them (each section includes the ImageMagick version, a sha1sum test to show that it is exactly the same file on each system and then the identify signature command):

==== Windows box ====

identify.exe -version
Version: ImageMagick 6.8.1-10 2013-01-10 Q16 http://www.imagemagick.org

C:\sha1.exe test1.jpg
238fd30e063585c8e2d3572de2c4765dd95bae1d test1.jpg

C:\>identify -verbose -format "%#" test1.jpg
313bfbd3915ba85b0aea12eac03e033ab62d0124d56815e6c1514bd9dbad49d8

==== Linux Box #1 ====

box:/home/vps/Dropbox# identify -version
Version: ImageMagick 6.6.0-4 2012-05-03 Q16 http://www.imagemagick.org

sha1sum test1.jpg
238fd30e063585c8e2d3572de2c4765dd95bae1d test1.jpg

box:/home/Dropbox# identify -verbose -format "%#" test1.jpg
aa53e45042741398ecfe95f019af5c1331bb4d6678c4513f80adac4f784b0155

==== Linux Box #2 ====

[html]$ identify -version
Version: ImageMagick 6.2.8 02/25/09 Q16 file:/usr/share/ImageMagick-6.2.8/doc/index.html

sha1sum test1.jpg
238fd30e063585c8e2d3572de2c4765dd95bae1d test1.jpg

[html]$ identify -verbose -format "%#" test1.jpg
26d775bb0beb2aff6e58ea551ed9348a069d1aac69c9451f05b0ca798b9b8d74

I basically just wanted to check to see if I'm doing anything obviously boneheaded here; I am wondering if the hashing algorithm might've changed between IM versions or something?

Thanks for any suggestions.

Re: Different signature hashes between IM versions?

Posted: 2013-05-11T06:24:49-07:00
by magick
The image signature algorithm has changed between some releases of ImageMagick once for a bug fix, once to create invariant signatures between quantum depths of ImageMagick, and a final change in ImageMagick version 7 since it supports variable channels (e.g. grayscale is only 1 channel whereas in IMv6 its 3 channels).

Re: Different signature hashes between IM versions?

Posted: 2013-05-11T16:11:26-07:00
by trog
No problems, assumed it was something like that. Thanks for the reply.

Re: Different signature hashes between IM versions?

Posted: 2013-07-03T08:54:47-07:00
by hyanwong
magick wrote:The image signature algorithm has changed between some releases of ImageMagick.
Is this now considered stable and unchanging, or will it be subject to change in the future? If I want to store the image signature of many millions of images in a large and long-term database, how sure can I be that future upgrades of ImageMagick will still produce signature hashes that can be used to detect old duplicates?

Otherwise I could just take the SHA2 of the raw file, which should never change. But then I don't get the nice IM feature of ignoring differences in EXIF data associated with an image.

Re: Different signature hashes between IM versions?

Posted: 2013-07-03T09:15:29-07:00
by magick
The image signatures are stable. We do not anticipate any changes in the future.

Re: Different signature hashes between IM versions?

Posted: 2013-07-03T10:49:08-07:00
by hyanwong
Great, thanks. If it does change in the future, perhaps you could consider adding a "backwards compatibility" flag?

Out of interest, are there any plans to try adding perceptual hashes in the future?

Re: Different signature hashes between IM versions?

Posted: 2013-07-03T11:19:35-07:00
by magick
We do have plans to support perceptual hashes but do not have an ETA.

Re: Different signature hashes between IM versions?

Posted: 2013-07-03T16:01:23-07:00
by trog
hyanwong wrote: Is this now considered stable and unchanging, or will it be subject to change in the future? If I want to store the image signature of many millions of images in a large and long-term database, how sure can I be that future upgrades of ImageMagick will still produce signature hashes that can be used to detect old duplicates?

Otherwise I could just take the SHA2 of the raw file, which should never change. But then I don't get the nice IM feature of ignoring differences in EXIF data associated with an image.
I worked around it in the end by using the neat PHP JPEG Metadata toolkit - I wrote up a bit of a description about the process here: http://trog.qgl.org/20130511/image-data ... jpeg-files

I need this to work more reliably over systems with a variety of older IMs installed where I can't get them upgraded so this was a more portable solution for me.

Re: Different signature hashes between IM versions?

Posted: 2013-07-03T18:35:39-07:00
by GreenKoopa
trog wrote: I worked around it in the end by using the neat PHP JPEG Metadata toolkit
Does your method hash the pixel data or the jpeg compressed data?

Re: Different signature hashes between IM versions?

Posted: 2013-07-03T19:10:53-07:00
by trog
GreenKoopa wrote:
trog wrote: I worked around it in the end by using the neat PHP JPEG Metadata toolkit
Does your method hash the pixel data or the jpeg compressed data?
I believe it is the JPEG compressed data - I can't recall (& don't have time to check - sorry!) but I believe the PHP JPEG Metadata Toolkit just pulls the data stream unaltered directly from the JPEG images.