Different signature hashes between IM versions?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
trog
Posts: 4
Joined: 2013-05-10T23:36:03-07:00
Authentication code: 6789

Different signature hashes between IM versions?

Post by trog »

Hi,

I've just discovered the 'signature' field reported by the "identify" command line tool after looking for a way to compare a set of images with known properties - the same JPEG image data but different EXIF fields.

I thought it'd be a great way to identify duplicate images but from initial testing it seems that the signature is reported different between different ImageMagick versions.

I've tried on three different systems all of which just happen to have different versions of ImageMagic installed - I get something like the following between them (each section includes the ImageMagick version, a sha1sum test to show that it is exactly the same file on each system and then the identify signature command):

==== Windows box ====

identify.exe -version
Version: ImageMagick 6.8.1-10 2013-01-10 Q16 http://www.imagemagick.org

C:\sha1.exe test1.jpg
238fd30e063585c8e2d3572de2c4765dd95bae1d test1.jpg

C:\>identify -verbose -format "%#" test1.jpg
313bfbd3915ba85b0aea12eac03e033ab62d0124d56815e6c1514bd9dbad49d8

==== Linux Box #1 ====

box:/home/vps/Dropbox# identify -version
Version: ImageMagick 6.6.0-4 2012-05-03 Q16 http://www.imagemagick.org

sha1sum test1.jpg
238fd30e063585c8e2d3572de2c4765dd95bae1d test1.jpg

box:/home/Dropbox# identify -verbose -format "%#" test1.jpg
aa53e45042741398ecfe95f019af5c1331bb4d6678c4513f80adac4f784b0155

==== Linux Box #2 ====

[html]$ identify -version
Version: ImageMagick 6.2.8 02/25/09 Q16 file:/usr/share/ImageMagick-6.2.8/doc/index.html

sha1sum test1.jpg
238fd30e063585c8e2d3572de2c4765dd95bae1d test1.jpg

[html]$ identify -verbose -format "%#" test1.jpg
26d775bb0beb2aff6e58ea551ed9348a069d1aac69c9451f05b0ca798b9b8d74

I basically just wanted to check to see if I'm doing anything obviously boneheaded here; I am wondering if the hashing algorithm might've changed between IM versions or something?

Thanks for any suggestions.
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Different signature hashes between IM versions?

Post by magick »

The image signature algorithm has changed between some releases of ImageMagick once for a bug fix, once to create invariant signatures between quantum depths of ImageMagick, and a final change in ImageMagick version 7 since it supports variable channels (e.g. grayscale is only 1 channel whereas in IMv6 its 3 channels).
trog
Posts: 4
Joined: 2013-05-10T23:36:03-07:00
Authentication code: 6789

Re: Different signature hashes between IM versions?

Post by trog »

No problems, assumed it was something like that. Thanks for the reply.
hyanwong
Posts: 5
Joined: 2013-07-02T14:06:45-07:00
Authentication code: 6789

Re: Different signature hashes between IM versions?

Post by hyanwong »

magick wrote:The image signature algorithm has changed between some releases of ImageMagick.
Is this now considered stable and unchanging, or will it be subject to change in the future? If I want to store the image signature of many millions of images in a large and long-term database, how sure can I be that future upgrades of ImageMagick will still produce signature hashes that can be used to detect old duplicates?

Otherwise I could just take the SHA2 of the raw file, which should never change. But then I don't get the nice IM feature of ignoring differences in EXIF data associated with an image.
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Different signature hashes between IM versions?

Post by magick »

The image signatures are stable. We do not anticipate any changes in the future.
hyanwong
Posts: 5
Joined: 2013-07-02T14:06:45-07:00
Authentication code: 6789

Re: Different signature hashes between IM versions?

Post by hyanwong »

Great, thanks. If it does change in the future, perhaps you could consider adding a "backwards compatibility" flag?

Out of interest, are there any plans to try adding perceptual hashes in the future?
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Different signature hashes between IM versions?

Post by magick »

We do have plans to support perceptual hashes but do not have an ETA.
trog
Posts: 4
Joined: 2013-05-10T23:36:03-07:00
Authentication code: 6789

Re: Different signature hashes between IM versions?

Post by trog »

hyanwong wrote: Is this now considered stable and unchanging, or will it be subject to change in the future? If I want to store the image signature of many millions of images in a large and long-term database, how sure can I be that future upgrades of ImageMagick will still produce signature hashes that can be used to detect old duplicates?

Otherwise I could just take the SHA2 of the raw file, which should never change. But then I don't get the nice IM feature of ignoring differences in EXIF data associated with an image.
I worked around it in the end by using the neat PHP JPEG Metadata toolkit - I wrote up a bit of a description about the process here: http://trog.qgl.org/20130511/image-data ... jpeg-files

I need this to work more reliably over systems with a variety of older IMs installed where I can't get them upgraded so this was a more portable solution for me.
User avatar
GreenKoopa
Posts: 457
Joined: 2010-11-04T17:24:08-07:00
Authentication code: 8675308

Re: Different signature hashes between IM versions?

Post by GreenKoopa »

trog wrote: I worked around it in the end by using the neat PHP JPEG Metadata toolkit
Does your method hash the pixel data or the jpeg compressed data?
trog
Posts: 4
Joined: 2013-05-10T23:36:03-07:00
Authentication code: 6789

Re: Different signature hashes between IM versions?

Post by trog »

GreenKoopa wrote:
trog wrote: I worked around it in the end by using the neat PHP JPEG Metadata toolkit
Does your method hash the pixel data or the jpeg compressed data?
I believe it is the JPEG compressed data - I can't recall (& don't have time to check - sorry!) but I believe the PHP JPEG Metadata Toolkit just pulls the data stream unaltered directly from the JPEG images.
Post Reply