Problems with combining Set and Mogrify

PerlMagick is an object-oriented Perl interface to ImageMagick. Use this forum to discuss, make suggestions about, or report bugs concerning PerlMagick.
Post Reply
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Problems with combining Set and Mogrify

Post by gaimrox »

Hi,

I'm attempting to detect duplicates among our internal image hosting service. Currently it has about 65K images, a fair number of which I suspect are duplicates.

We have been using ImageMagick to open the images as uploaded from the user, validate them, and then store them into our database for about 5 years.

My algorithm is as follows:
Open each image one at a time
Perform a mogrify->strip to remove all comment/exif data
If the image is a PNG, strip the date:create and date:modify that randomly started being added in like 2009 (prior to that the images lack this problem)
Store the image
Determine the MD5 of the image and store that for a later deduping sweep

In the case of lossy images it's interesting that cycled images (those added, downloaded and re-added) will not be detected as duplicates here, but there isn't too much I can do about that at this point.

So here is my problem, I wrote all the above code, and deployed it. I then came to find out that the PNG images still had date:create/date:modify even though the API call was made. I believe something is broken because when I switched the mogrify and date stripping around, the output was altered in an unexpected way.

Here is my code, along with the MD5 printed at various points of the processed image:

Code: Select all

warn md5_hex($im->ImageToBlob());  # 2985ceb411ffc2ca80e845c09f389160

# strip out unique attrs from the image that might mess up the final file
$im->Mogrify('strip');

warn md5_hex($im->ImageToBlob()); # de4c581bde9a6c7d5b30d234ac37167e
  
$im->Set( 'date:modify' => '');
warn md5_hex($im->ImageToBlob()); # de4c581bde9a6c7d5b30d234ac37167e

$im->Set( 'date:create' => '');

warn md5_hex($im->ImageToBlob()); # de4c581bde9a6c7d5b30d234ac37167e
I then flipped the order of the mogrify and date modify:

Code: Select all

warn md5_hex($im->ImageToBlob()); # 5f6c94c736f6614a17449bdc6710fd96
 
$im->Set( 'date:modify' => '');
 
warn md5_hex($im->ImageToBlob()); # b5f0d9a8df86ff58ed1b345acc533b78

$im->Set( 'date:create' => '');

warn md5_hex($im->ImageToBlob()); # 9278c7386812b311593b5143331ced52

# strip out unique attrs from the image that might mess up the final file
$im->Mogrify('strip');
  
warn md5_hex($im->ImageToBlob()); # de4c581bde9a6c7d5b30d234ac37167e
Note that once mogrify is run, the MD5 never changes. I have determined that Mogrify does not alter date:create/date:modify - so that is not good. I think this is a bug. Thoughts?

As a sidenote I hope others can find this post about how to clear date:modify and date:create as the documentation is VERY confusing about how to do this. I'm still not sure the above is the correct procedure.

Thanks for reading this far, I have been working on this over a week!
Last edited by gaimrox on 2012-05-25T13:23:06-07:00, edited 1 time in total.
User avatar
magick
Site Admin
Posts: 11064
Joined: 2003-05-31T11:32:55-07:00

Re: Problems with combing Set and Mogrify

Post by magick »

For duplicates, you can use $im->Signature() or $im->Compare(). Both look at the image pixels themselves. Compare() allows for fuzzyness. You can threshold permitting 2 images that are slightly modified to be considered duplicates.
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combining Set and Mogrify

Post by gaimrox »

Hi - thanks for the suggestion. I am aware of the Signature method, but for our use case it makes more sense to normalize the image metadata and generating a signature that way.

I did not know of the Compare method, but we accept a large number of images per day, and comparing an uploaded image to 65,000 stored messages repeatedly would be pretty expensive. This is all live and interactive on a website, so selecting all 65K images and then comparing the uploaded one to each image would take quite a while. Thanks for the suggestion though.

I am nearly certain there is a bug in ImageMagick when combining these two method calls. I think that a call to mogrify sets a flag that records default "modify/create" values in the image regardless of you having "Set" them.

Are there any other ways of clearing the "modify/create" values for PNGs? I am surprised how hard this is.

thanks
User avatar
glennrp
Posts: 1147
Joined: 2006-04-01T08:16:32-07:00
Location: Maryland 39.26.30N 76.16.01W

Re: Problems with combing Set and Mogrify

Post by glennrp »

One simple way is to use "-define png:exclude-chunk=date" which prevents the
png encoder from writing the date-related text chunks. But "-strip" is supposed
to do that operation automatically; I don't see why you are still getting the
date chunks (perhaps it's because you are doing the stripping through an API
instead of through the commandline; if that is the case the API equivalent
of the "-define" should work for you).

Edit: Looking at the code, it seems that montage with "strip" does call
StripImage() and StripImage() does define out the PNG date chunks.
Are you running a recent version of ImageMagick on your server? This
feature was added at version 6.6.6 and 6.6.7 according to the ChangeLog.
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combing Set and Mogrify

Post by gaimrox »

Hi glennrp,

I am running "6.7.4.4_1" on FreeBSD, so this seemingly is an ongoing problem. I am willing to run all sorts of tests to ferret out the problem, I'm just not sure what else to do.

thanks.
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: Problems with combining Set and Mogrify

Post by anthony »

My understanding is that IM automatically creates 'date:modify' and 'date:create' image properity strings when it reads in an image from from file.

You should be able to remove those properties before writing...

Code: Select all

  convert logo: logo.jpg
  convert logo.jpg +set date:create +set date:modify logo1.png

  sleep 2; touch logo.jpg
  convert logo.jpg +set date:create +set date:modify logo2.png

  diff -s logo1.png logo2.png
The "diff" should report... Files logo1.png and logo2.png are identical
as the time stamps were removed.

NOTE: on read. those time stamps are overwritten by the timestamp of the file read! Really they are information timestamps and perhaps they should not be per-image 'properties' but per-image artifacts. (artifacts are per-image 'operational data' which is NOT ment to be written with the image.

Comments and suggestions about this?
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combining Set and Mogrify

Post by gaimrox »

Hi anthony,

What you are performing in your suggestion is unfortunately not directly available via the API.

If you do a search in this forum for stripping off the "create/modify" values using the perl API you will find a number of posts that contradict eachother.

It's not even clear to me right now if strip does remove it, or if I must manually remove it. It's also not clear if my manual removing below actually succeeds in manually removing the values, as there is no good way to look at those values.

I think it's safe to say that automatically setting those values on PNGs at read time is a recipe for problems. A hidden event occurs that you cannot disable, and you thereafter must guess as to how best to restore the file.

In my case I strip the file, remove the dates, and then save the record. If I run the script again it will discover that the date values are not removed, and will then remove them. At this point the file actually no longer has the dates.

My current course of action is to strip the file, output to blob, input back from blog, strip dates, and then save. There is some evidence to show that this produces a different result than all the other suggestions I've seen, and I think we can agree that is bad that I must jump through such hoops.
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combining Set and Mogrify

Post by gaimrox »

Yeah... so I wrote the code as I outlined above, and now it works properly. Either there is a bug, or some documentation needs to be written in order to explain what exactly is going on.

Here is my code to accept and normalize an image:

Code: Select all

  my $im = Image::Magick->new();

  if ($im->BlobToImage($imagedata) == 0) {
    $self->log_error("Image rejected, corrupt binary data.");
    throw RWDE::DataBadException({ info => 'The image appears to be corrupted or of an unrecognized type.' });
  }

  # make sure the uploaded image is in an accepted format
  MM::Image->Check_extension({ extension => lc($im->Get('magick')) });

  # strip out unique attrs from the image that might mess up the final file
  $im->Mogrify('strip');

  # create a second object to work around bug in imagemagick
  my $im2 = Image::Magick->new();
  $im2->BlobToImage($im->ImageToBlob());
  $im = $im2;

  # remove the default date stamps that imageMagick adds
  $im->Set( 'date:modify' => '');
  $im->Set( 'date:create' => '');
It appears to me that calling mogrify locks the image data in some way. You can see in my previous post that setting a null date after a strip actually does not work - but I was able to confirm that the date is still there... hence my conclusion.

In addition I flipped around the strip and date clearing with eachother, and the image then was left with a date within my database. This supports the above conclusion as well.
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: Problems with combining Set and Mogrify

Post by anthony »

The date properities will always re-appear anytime IM reads a file. these are the date stamps of the file read!
Any date stamp saved in the file itself should be ignored.

Actually these probably should be stored as image artefacts rather than as image properties so they don't get saved with the image.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combining Set and Mogrify

Post by gaimrox »

I disagree with your assessment.

I do not have this problem with any images except for JPG. In addition I only have this problem with a small number of JPG, maybe 2% of the total 60K JPG that I have.

If this were something that was supposed to happen, it would happen for all images.

Also, please read my hack-solution above. This actually does fix the problem for 99.99% of the images that we currently accept. I have only had 1 single failure since I updated the code with the attached hack.

Ideally this problem would be totally solved though. I use this code as a rudimentary form of deduping images upon upload to an image hosting service. In the event that people upload the same image I want to block it. Of course JPG is lossy so there is a problem here, but it still cut down a huuuge amount of my duplicates.

I am considering transitioning over to GD because nobody seems to have any idea why or how this code is broken, and that's a little scary to me. I think the same backend lib is used in GD so probably my deduping would not be interrupted by switching over.

Or maybe somebody will read this who knows exactly what's going on and then I can just fix my app :)
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combining Set and Mogrify

Post by gaimrox »

Also as a clarification, I do not have any files at all here.

I accept these images as GLOBs via perl and then I store them in a DB. I never physically write them to a disk in the standard sense.
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combining Set and Mogrify

Post by gaimrox »

Reporting back a few months later and with a much newer version of ImageMagick - problem continues.

I'm on "6.7.7.7_1" now, and the exact same issue reported above continues to occur on about 1 of every 300 images I process. The strip functionality definitely appears to have some sort of long standing bug.
gaimrox
Posts: 11
Joined: 2012-05-24T15:34:58-07:00
Authentication code: 13

Re: Problems with combining Set and Mogrify

Post by gaimrox »

Reporting back many years later. Issue still persists.

Is there anyone with ideas on a possible workaround?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Problems with combining Set and Mogrify

Post by fmw42 »

Have you tried upgrading to the latest IM 6.9.7.9 or IM 7.0.5.0?
Post Reply