Find identical images in bulk?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
joshuafinny
Posts: 16
Joined: 2016-01-11T05:12:11-07:00
Authentication code: 1151

Find identical images in bulk?

Post by joshuafinny »

Imagemagick version 6.9.3
Windows Platform

I have a folder with number of images. I want to check if each image in the folder has an identical image in the folder.

I would prefer a text output.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Find identical images in bulk?

Post by snibgo »

I've never used it, but hash should do the trick. I understand two identical images should give the same hash value. See http://www.imagemagick.org/script/escape.php

Create a loop that calls convert for every image, like this:

Code: Select all

convert file.ext -format "%%# %%f\n" info:
Arrange to write them all out to a text file. That file has two fields: the hash, and the filename. You computer may have tools to find duplicate hash values (eg sort, remove duplicates, and compare files).
snibgo's IM pages: im.snibgo.com
joshuafinny
Posts: 16
Joined: 2016-01-11T05:12:11-07:00
Authentication code: 1151

Re: Find identical images in bulk?

Post by joshuafinny »

snibgo wrote:I've never used it, but hash should do the trick. I understand two identical images should give the same hash value. See http://www.imagemagick.org/script/escape.php

Create a loop that calls convert for every image, like this:

Code: Select all

convert file.ext -format "%%# %%f\n" info:
Arrange to write them all out to a text file. That file has two fields: the hash, and the filename. You computer may have tools to find duplicate hash values (eg sort, remove duplicates, and compare files).
I think you are talking about the image meta being identical. I want to check if they are visually similar images and the degree of similarity.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Find identical images in bulk?

Post by snibgo »

You first said "identical". I believe hash gives that: two identical images give the same hash value.

Now you say you want "the degree of similarity". If you want that for every pair of images, the only way is to compare every pair of images.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Find identical images in bulk?

Post by fmw42 »

You would then have to use the IM compare function pair-by-pair.

See
http://www.imagemagick.org/script/compare.php
http://www.imagemagick.org/Usage/compare/


Or use the phash values stored in the verbose data. See viewtopic.php?f=4&t=24906. I also have unix bash shell scripts to generate a simpler has and also do the comparison. See my scripts, phashconvert and phashcompare at the links below. These work primarily on color images (sRGB).

See also identify -verbose -moments at http://www.imagemagick.org/script/identify.php
BrianP007
Posts: 49
Joined: 2013-12-13T09:54:14-07:00
Authentication code: 6789

Re: Find identical images in bulk?

Post by BrianP007 »

ccleaner has a dup finder feature which works great on windoz and does produce a text report:
https://www.piriform.com/ccleaner/download
Tools -> dup_finder. And, it's free

Creating an md5 on everything will work but is gross overkill. Look for dup sizes first and only hash those.
Any file with a unique size can not be a dup.

In Perl: something like...

Code: Select all

@file = `find $mydir -type f`;  chomp @file
@file = grep(/\.jpg$|\.tiff$|\.png$/i, @file);  # Filter IN your img types
%s2fa=();  # Size to file array hash (hash of arrays)
foreach $file (@file)  {
    $size=-s $file;  # Get file size quickly
    push @{$s2fa{$size}}, $file ;  # Populate size -> @file hash
}

foreach $size (keys %s2fa)  {
    @file=@{$s2fa{$size}};  # Get array of all files with this size
    next  unless scalar @file > 1;  # Unique size, not a dup
    # Do an MD5 on all files in @file to find dups...
    # Create another hash of arrays (exactly like size -> @file)
    # except with md5 rather than size as key. 
}
And, there is a great deal of speed difference in md5 programs
use Digest::MD5::File qw(dir_md5_hex file_md5_hex); #
file_md5_hex($file) may be faster than shelling out to the
OS to do the crunching...
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Find identical images in bulk?

Post by fmw42 »

I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Find identical images in bulk?

Post by snibgo »

BrianP007 wrote:Creating an md5 on everything will work but is gross overkill. Look for dup sizes first and only hash those.
Any file with a unique size can not be a dup.
This is true for comparisons of files. It is not true for comparisons of images.

For example, two files may be entirely different but contain identical images. An MD5 hash on files, and file sizes, tells us nothing about whether the images are the same.
snibgo's IM pages: im.snibgo.com
joshuafinny
Posts: 16
Joined: 2016-01-11T05:12:11-07:00
Authentication code: 1151

Re: Find identical images in bulk?

Post by joshuafinny »

fmw42 wrote:I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
Yes, you are correct. Even resolution might differ in some cases but the subject is more or less the same.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Find identical images in bulk?

Post by fmw42 »

joshuafinny wrote:
fmw42 wrote:I think he wants to find images that are actually the same subject, but slightly changed, such as blurred or shifted a little, etc, not actually identical in every way or even duplicates in different directories.
Yes, you are correct. Even resolution might differ in some cases but the subject is more or less the same.

Only direct way in Imagemagick is perceptual hash. See viewtopic.php?f=4&t=24906
Post Reply