Visipics and finding similar images and dupicates with ImageMagick

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

I urge you to test the commands above in a simple way before trying to move the images.

So try this first

Code: Select all

threshold=20
for image1 in *.jpg; do
   for image2 in *.jpg; do
      if [ "$image1" != "$image2" ]; then
          value=$(compare -metric phash $image1 $image2 null: 2>&1);
       fi
       if [[ $value < $threshold ]]; then
          echo "$image1 $image2 pass"
       else
          echo "$image1 $image2 fail"
       fi
done
Always best to build up your command in stages to understand what is going on and to keep your images safe.

Test on a small set of images. And always backup your directory.
buchert
Posts: 36
Joined: 2015-02-13T11:15:29-07:00
Authentication code: 6789

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by buchert »

Thanks for the explanations. I always try new commands on a test directory rather then using echo.

I ran this command from the command line after creating a subdirectory of the working directory.

Image

I tried it as one line:

Image

I tried it as a bash script and ran it from the terminal and also tried using it with Thunar Custom Actions. Shouldn't this part of the code:

Code: Select all

/results/$image1;
be changed to this:

Code: Select all

./results \;
It hangs in all cases. I tried using different thresholds. But I'm wondering if I'm canceling the process before it completes. I'm running it on 100 jpgs.
Also note, you might find one smaller version of a larger image and it might move the smaller version if found first, since the phash will see them as the same. So you might want to add another conditional to compare sizes so you move the larger images of $image1 and $image2.
I have a couple commands that might be useful for that:

Find files below certain dimension and move to subfolder.

Code: Select all

mkdir ./lowdimension
find -iname \*.jpg | while read img; do anytopnm "$img" | pamfile | perl -ane 'exit 1 if $F[3]<300 || $F[5]<300' || mv "$img" ./lowdimension; done
Bash script to rename jpgs according to dimensions:

Code: Select all

#!/bin/bash

for filename in *.jpg* *.JPG* *.jpeg*;
do
inname=`convert $filename -format "%t" info:`
size=`convert $filename -format "%wx%h" info:`
mv $filename "${size}_${inname}.jpg";
done
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by snibgo »

I'm not a bash expert. I thought that every "do" needs a "done". Fred's script is missing a "done", which causes my bash (on Cygwin on Windows) to complain.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

Sorry, snibgo is right. I had not tested it and made a couple of mistakes. Here is corrected code (each line of code should be on a separate line)

Code: Select all

threshold=20
for image1 in *.jpg; do
   for image2 in *.jpg; do
      if [ "$image1" != "$image2" ]; then
          value=$(compare -metric phash $image1 $image2 null: 2>&1)
       fi
       if [[ $value < $threshold ]]; then
          echo "$image1 $image2 pass"
       else
          echo "$image1 $image2 fail"
       fi
    done
done
Note I also had a ; at the end of the value= statement that should not have been there, though may not have made any difference

If you copy and paste into a terminal window, better to remove any leading white spaces. So

Code: Select all

threshold=20
for image1 in *.jpg; do
for image2 in *.jpg; do
if [ "$image1" != "$image2" ]; then
value=$(compare -metric phash $image1 $image2 null: 2>&1)
fi
if [[ $value < $threshold ]]; then
echo "$image1 $image2 pass"
else
echo "$image1 $image2 fail"
fi
done
done
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by snibgo »

Incidentally, comparing two images with metric phash is not massively fast, eg 0.03 seconds for small images, ignoring file I/O. If we have 1400 images, we need 1399*1398/2 = nearly 1,000,000 comparisons, so that takes 8 hours. When we include file I/O and script overhead, the processing takes some days.

Most of the time is taken calculating the image moments, which are needed for the perceptual hashes of the images, so they can be compared. These are calculated for the two images, at each comparison. So the moments and PHs for each image are re-calculated 1400 times.

The actual comparison is simple and quick. The expensive part is calculating the perceptual hashes.

An alternative is to calculate the perceptual hashes of each image just once. This takes a couple of minutes. Then we do 2,000,000 comparisons in a compiled C program. This takes a stupidly small 0.06 seconds. Overall, a few days is reduced to a couple of minutes.

(Why 2,000,000 comparisons? Because I'm a lazy programmer, and saving 0.03 seconds isn't worth doing.)
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

I have scripts phashconvert and phashcompare that will convert the IM 42 moment values to a string of digits (not binary), which can then be stored. The other takes the string of digits, converts back to the 42 values and then does a rms difference to get the comparison value. See my web site link below.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by snibgo »

Yes, that's essentially the same scheme. Grayscale images only need 7 PH values, as the channels Red, Green, Blue and Luma are the same, and the Hue and Chroma channels have no information.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

phash is not as accurate/reliable for grayscale since it only has 7 values.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by snibgo »

Have you found something better for grayscale?
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

snibgo wrote:Have you found something better for grayscale?
I have not done any further experimenting for any other perceptual hash, though I had researched it some before scripting the current phash prior to IM implementation. I chose it more for its robustness for color images compared to other methods.

However, it seems that many people want something with a binary hash that can be stored and then processed with Hamming distance, which is much faster.


see also
http://stackoverflow.com/questions/9986 ... g-used-for
http://www.phash.org/
http://bertolami.com/index.php?engine=b ... al-hashing
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by snibgo »

Interesting. I devised a similar method to the one in your last link: take the FFT magnitude, depolar, and scale to a single row. This is the hash. Then, to compare hashes, duplicate one and append sideways, and search for the other in that long version. This makes it rotationally invariant. This works fine, but is slow to compute a hash and to compare two hashes.

I sometimes get unexpectedly large values from "-metric phash". This happens when there is a colour-shift between images, and the image contains reddish pixels, so the Hue channel moments are very different. This will be because of the discontinuity of Hue at 0 = 100%. Perhaps Lab or Luv would be better than HCL.

A Hamming distance between two strings (or binary numbers) relies on all characters in the string (or bits in the numbers) having equal significance. Each of the 42 PH values could be quantized into, say one of 26 values, represented by 'A' to 'Z', so the hash is a string of 42 letters. The resulting Hamming difference between two strings might work well.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

snibgo wrote: I sometimes get unexpectedly large values from "-metric phash". This happens when there is a colour-shift between images, and the image contains reddish pixels, so the Hue channel moments are very different. This will be because of the discontinuity of Hue at 0 = 100%. Perhaps Lab or Luv would be better than HCL.
I had not thought of that, but that may explain some issues I have seen recently in some work I am doing.

The original paper from which I got the idea use YCbCr. But I tested a number of colorspaces (YCbCr, LAB, HSI, OHTA), but found that HCLp seemed to give the best results overall for the (limited) test set of images that I used. However, I did not not test specifically for reddish issues. You may be right that it should use LAB or YCbCr for that reason.



P.S.

I suspect it is not too hard to switch the HCLp colorspace to something other such as LAB or YCbCr (with a -define or new argument), if anyone is willing to put in the time to test in a beta. Magick would have to let us know how hard that would be.

Also if someone has a better or alternate hash code, it could be added to IM at some point or contributed.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

I have recently implemented several other perceptual hash techniques that create binary string hash values that can be stored in the image. They use the hamming distance to compare the hash values between two images. See my scripts phases and hamming at my link below.
Elapido
Posts: 42
Joined: 2011-06-10T14:27:28-07:00
Authentication code: 8675308

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by Elapido »

Can Visipics be used in a collection of 70000 images or would it crash, or take many days to finish?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Visipics and finding similar images and dupicates with ImageMagick

Post by fmw42 »

This is not a Visipics forum. I suggest you ask there.
Post Reply