Page 1 of 1

Find identical images in bulk?

Posted: 2016-05-03T03:26:48-07:00
by joshuafinny
Imagemagick version 6.9.3
Windows Platform

I have a folder with number of images. I want to check if each image in the folder has an identical image in the folder.

I would prefer a text output.

Re: Find identical images in bulk?

Posted: 2016-05-03T04:00:26-07:00
by snibgo
I've never used it, but hash should do the trick. I understand two identical images should give the same hash value. See http://www.imagemagick.org/script/escape.php

Create a loop that calls convert for every image, like this:

Code: Select all

convert file.ext -format "%%# %%f\n" info:
Arrange to write them all out to a text file. That file has two fields: the hash, and the filename. You computer may have tools to find duplicate hash values (eg sort, remove duplicates, and compare files).

Re: Find identical images in bulk?

Posted: 2016-05-03T22:12:48-07:00
by joshuafinny
snibgo wrote:I've never used it, but hash should do the trick. I understand two identical images should give the same hash value. See http://www.imagemagick.org/script/escape.php

Create a loop that calls convert for every image, like this:

Code: Select all

convert file.ext -format "%%# %%f\n" info:
Arrange to write them all out to a text file. That file has two fields: the hash, and the filename. You computer may have tools to find duplicate hash values (eg sort, remove duplicates, and compare files).
I think you are talking about the image meta being identical. I want to check if they are visually similar images and the degree of similarity.

Re: Find identical images in bulk?

Posted: 2016-05-03T22:28:27-07:00
by snibgo
You first said "identical". I believe hash gives that: two identical images give the same hash value.

Now you say you want "the degree of similarity". If you want that for every pair of images, the only way is to compare every pair of images.

Re: Find identical images in bulk?

Posted: 2016-05-03T23:11:36-07:00
by fmw42
You would then have to use the IM compare function pair-by-pair.

See
http://www.imagemagick.org/script/compare.php
http://www.imagemagick.org/Usage/compare/


Or use the phash values stored in the verbose data. See viewtopic.php?f=4&t=24906. I also have unix bash shell scripts to generate a simpler has and also do the comparison. See my scripts, phashconvert and phashcompare at the links below. These work primarily on color images (sRGB).

See also identify -verbose -moments at http://www.imagemagick.org/script/identify.php

Re: Find identical images in bulk?

Posted: 2016-05-04T10:24:31-07:00
by BrianP007
ccleaner has a dup finder feature which works great on windoz and does produce a text report:
https://www.piriform.com/ccleaner/download
Tools -> dup_finder. And, it's free

Creating an md5 on everything will work but is gross overkill. Look for dup sizes first and only hash those.
Any file with a unique size can not be a dup.

In Perl: something like...

Code: Select all

@file = `find $mydir -type f`;  chomp @file
@file = grep(/\.jpg$|\.tiff$|\.png$/i, @file);  # Filter IN your img types
%s2fa=();  # Size to file array hash (hash of arrays)
foreach $file (@file)  {
    $size=-s $file;  # Get file size quickly
    push @{$s2fa{$size}}, $file ;  # Populate size -> @file hash
}

foreach $size (keys %s2fa)  {
    @file=@{$s2fa{$size}};  # Get array of all files with this size
    next  unless scalar @file > 1;  # Unique size, not a dup
    # Do an MD5 on all files in @file to find dups...
    # Create another hash of arrays (exactly like size -> @file)
    # except with md5 rather than size as key. 
}
And, there is a great deal of speed difference in md5 programs
use Digest::MD5::File qw(dir_md5_hex file_md5_hex); #
file_md5_hex($file) may be faster than shelling out to the
OS to do the crunching...

Re: Find identical images in bulk?

Posted: 2016-05-04T10:30:26-07:00
by fmw42
I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.

Re: Find identical images in bulk?

Posted: 2016-05-04T16:48:51-07:00
by snibgo
BrianP007 wrote:Creating an md5 on everything will work but is gross overkill. Look for dup sizes first and only hash those.
Any file with a unique size can not be a dup.
This is true for comparisons of files. It is not true for comparisons of images.

For example, two files may be entirely different but contain identical images. An MD5 hash on files, and file sizes, tells us nothing about whether the images are the same.

Re: Find identical images in bulk?

Posted: 2016-05-09T03:42:43-07:00
by joshuafinny
fmw42 wrote:I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
Yes, you are correct. Even resolution might differ in some cases but the subject is more or less the same.

Re: Find identical images in bulk?

Posted: 2016-05-09T09:47:31-07:00
by fmw42
joshuafinny wrote:
fmw42 wrote:I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
Yes, you are correct. Even resolution might differ in some cases but the subject is more or less the same.

Only direct way in Imagemagick is perceptual hash. See viewtopic.php?f=4&t=24906