how to use ImageMagick to remove punch holes?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Adam32
Posts: 20
Joined: 2016-11-08T11:18:35-07:00
Authentication code: 1151

how to use ImageMagick to remove punch holes?

Post by Adam32 »

First of all please bear with me, as I am a windows 7 user who just started playing with ImageMagick today and have very limited knowledge of ImageMagick or programming languages. In fact this is the first command line program I have ever used!

I am trying to remove punch holes from thousands of scanned documents. I can do this manually but it is too time consuming, so am looking for an automatic solution.

The problem is the punch holes are often in different places on different documents. Some holes are not punched level, where the document was inserted into the puncher at an angle. Another factor is text is also often aligned with the punch holes - see attached images where I have highlighted text in red as an example. For these reasons a basic crop won't work without cutting off information. Is there a way I can use ImageMagick to automatically identify punch holes on each page, remove them and fill with the appropriate background colour (in the case of the example this is white)

Image
Image
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: how to use ImageMagick to remove punch holes?

Post by snibgo »

This is an ambitious project when you are not accustomed to command-line programs, let alone ImageMagick. It would need a complex script. Here are some thoughts.

The problem breaks down into:

1. Find the holes.

2. Replace them with the background colour.

To find the holes, we need to know what sets them apart from other parts of the image. If we threshold the image to black and white, we can find all the black components larger than a certain size, eg 190 pxels.

Code: Select all

convert punch_holes_Page_2.jpg -threshold 50% -define connected-components:verbose=true -define connected-components:area-threshold=190 -define connected-components:mean-color=true -connected-components 8 t.png
The output t.png is black where we have holes and "big" text, otherwise white.

If the holes always occur in the left or right margins, a suitable crop will find just the holes. The result can be used to paint out the holes.

Code: Select all

convert punch_holes_Page_2.jpg ( t.png -crop 10%x100%+0+0 -negate ) -compose Lighten -composite x.png
The result x.png isn't quite perfect; the holes could be dilated slightly.
snibgo's IM pages: im.snibgo.com
Adam32
Posts: 20
Joined: 2016-11-08T11:18:35-07:00
Authentication code: 1151

Re: how to use ImageMagick to remove punch holes?

Post by Adam32 »

Many thanks for your help snibgo. I have to say I never realised such a simple task was so difficult to automate. I see X.png is much better, as the punch holes are now somewhat faded. Although as you point out it isn't quite perfect.

I like the logical approach of finding all pixels larger than a certain size, however I feel this may cause problems if there are tables, drawings etc, as these would also be included. Is there no way to have imageMagick identify the punch holes by geometry instead?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: how to use ImageMagick to remove punch holes?

Post by snibgo »

Adam32 wrote:I have to say I never realised such a simple task was so difficult to automate.
It's simple for you because you have human intelligence. The difficult part of this problem is identifying the holes, with rules that work in all circumstances. Yes, if you have graphics in the margin of the document, my simple rules above will fail.

In an ideal world, you would give a few samples to an artificial intelligence system, which would work out the rules by itself. But IM can't do this.

With a bit more work, you can eliminate shapes larger than a certain size, eg 220 pixels. You could do this in a script that processes the text output of "-connected-components".

If they are all crescent-moon shaped, then we can search for that (morphology hit-and-miss).

Incidentally, once the punch-holes are found, I would make them transparent and fill them with one of my "fill holes" scripts. That way, you don't need to find the colour of the paper, which might vary slightly across the scan. With "blurFill", the holes would be automatically filled seamlessly.
snibgo's IM pages: im.snibgo.com
Adam32
Posts: 20
Joined: 2016-11-08T11:18:35-07:00
Authentication code: 1151

Re: how to use ImageMagick to remove punch holes?

Post by Adam32 »

snibgo wrote:f they are all crescent-moon shaped, then we can search for that (morphology hit-and-miss).
How would I do this?

snibgo wrote:Incidentally, once the punch-holes are found, I would make them transparent and fill them with one of my "fill holes" scripts. That way, you don't need to find the colour of the paper, which might vary slightly across the scan. With "blurFill", the holes would be automatically filled seamlessly.
I downloaded your imsnibgoBats.zip and have extracted blurFill.bat. How do I use this with my example images posted? I would be really grateful if you could provide an example, as I am really struggling with this. Sorry for the stupid questions, but I am really new to all this. I think I need to do a lot of reading!

Many thanks for your help
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: how to use ImageMagick to remove punch holes?

Post by snibgo »

For morphology HMT, see http://www.imagemagick.org/Usage/morphology/#hmt

For examples of my blurFill script, see my "Filling holes" page.

My scripts call my scripts. If you unzip just one, it may not work unless you edit it.

Here's a Windows BAT script to:

Find large black components, putting result in t2.png:

Code: Select all

%IM%convert ^
  punch_holes_Page_2.jpg ^
  -threshold 50%% ^
  -define connected-components:verbose=true ^
  -define connected-components:area-threshold=190 ^
  -define connected-components:mean-color=true ^
  -connected-components 8 ^
  t2.png
From t2.png, crop the left 10%, erode the white (this dilates the black), and make those pixels transparent.

Code: Select all

%IM%convert ^
  punch_holes_Page_2.jpg ^
  ( t2.png -crop 10%%x100%%+0+0 ^
    -morphology Erode disk:3 ^
  ) ^
  -alpha off ^
  -set option:compose:outside-overlay false ^
  -compose CopyOpacity -composite ^
  h.png
h.png is now transparent where the holes were.

Fill the transparent holes:

Code: Select all

call %PICTBAT%blurFill h.png . h_bf.png
h_bf.png has the holes filled by the colour of the surrounding paper.
snibgo's IM pages: im.snibgo.com
Adam32
Posts: 20
Joined: 2016-11-08T11:18:35-07:00
Authentication code: 1151

Re: how to use ImageMagick to remove punch holes?

Post by Adam32 »

Many thanks for your help, but I can't seem to get this working. As you said your scripts may depend on each other, so I extracted all the contents of imsnibgoBats.zip to a directory. This includes the blurFill.bat script. I also place my test images in this directory. I then run CMD from this directory and do the following:

1) "Find large black components, putting result in t2.png" - This works okay

2) "From t2.png, crop the left 10%, erode the white (this dilates the black), and make those pixels transparent." - this works okay

3) I then enter into CMD "call blurFill h.png . h_bf.png" but it does not work. I get the error message that it is "not recognised as an internal or external command" I have also tried "magick.exe call blurFill h.png . h_bf.png" but this does not work either. What am I doing wrong?

Sorry for my basic questions, as I am a very basic user. The only thing I have used imagemagick for so far is resizing and converting between file types (which it works really well).
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: how to use ImageMagick to remove punch holes?

Post by fmw42 »

Perhaps go back to a simpler solution. Just sample the colors in a small block of pixels in the corners or the sides of the image and get an average color (in this case "white"). (Or just get the average color of the left side of the image, since that is closer to the holes). Then once you have the transparent image, just flatten it against a -background color. That is easy to do in Imagemagick without the need for special code.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: how to use ImageMagick to remove punch holes?

Post by snibgo »

All the commands I gave are for a BAT script. If you type them at the console, change every doubled %% to a single %.
Adam32 wrote:I then run CMD from this directory ...
Adam32 wrote:I then enter into CMD "call blurFill h.png . h_bf.png" ...
I don't understand. What does "run CMD" and "enter into CMD" mean? You just type this stuff (or paste it) into the console, yes?
Adam32 wrote:I get the error message that it is "not recognised as an internal or external command".
That what is not recognised? If the answer is "blurFill", then type "dir blurFill.bat". If the BAT file is there, but it isn't "recognised", I have no idea what the problem could be.
snibgo's IM pages: im.snibgo.com
Adam32
Posts: 20
Joined: 2016-11-08T11:18:35-07:00
Authentication code: 1151

Re: how to use ImageMagick to remove punch holes?

Post by Adam32 »

Yes I just type the commands at the console. When I type:

Code: Select all

blurFill.bat h.png . h_bf.png
I just get the error "
"identify is not recognized as an internal or external command, operable program or batch file"
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: how to use ImageMagick to remove punch holes?

Post by snibgo »

Okay. My scripts assume that you have installed ImageMagick (convert.exe, identify.exe, compare.exe) to a directory on your path, or that you have an environment variable called IM that contains this directory.

Windows is telling you that neither is true. It can't find "identify". As you use v7, perhaps you can edit the script to change "%IM%identify" to "%IM%magick identify".
snibgo's IM pages: im.snibgo.com
Adam32
Posts: 20
Joined: 2016-11-08T11:18:35-07:00
Authentication code: 1151

Re: how to use ImageMagick to remove punch holes?

Post by Adam32 »

fmw42 wrote:Perhaps go back to a simpler solution. Just sample the colors in a small block of pixels in the corners or the sides of the image and get an average color (in this case "white"). (Or just get the average color of the left side of the image, since that is closer to the holes). Then once you have the transparent image, just flatten it against a -background color. That is easy to do in Imagemagick without the need for special code.
I think that's a really good idea as I seem to be struggling with the bat scripts, as I am a bit out of my depth. Would you mind providing an example with the images I posted.
Adam32
Posts: 20
Joined: 2016-11-08T11:18:35-07:00
Authentication code: 1151

Re: how to use ImageMagick to remove punch holes?

Post by Adam32 »

snibgo wrote:Okay. My scripts assume that you have installed ImageMagick (convert.exe, identify.exe, compare.exe) to a directory on your path, or that you have an environment variable called IM that contains this directory.

Thank you so much, I did not realise this. I reinstalled imagemagick and selected "install legacy components" and it now works. You are a genius! Thanks so much
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: how to use ImageMagick to remove punch holes?

Post by fmw42 »

Adam32 wrote:
fmw42 wrote:Perhaps go back to a simpler solution. Just sample the colors in a small block of pixels in the corners or the sides of the image and get an average color (in this case "white"). (Or just get the average color of the left side of the image, since that is closer to the holes). Then once you have the transparent image, just flatten it against a -background color. That is easy to do in Imagemagick without the need for special code.
I think that's a really good idea as I seem to be struggling with the bat scripts, as I am a bit out of my depth. Would you mind providing an example with the images I posted.
I am on a Mac and so the following is Unix syntax, not Windows. But I have replace snibgo's last command using his script with two other commands. The first gets the average color of a 2 pixel wide strip on the left side of the image. And the last command flattens the image with that color as the background to fill the holes.

Code: Select all

convert \
punch_holes_Page_2.jpg \
-threshold 50% \
-define connected-components:verbose=true \
-define connected-components:area-threshold=190 \
-define connected-components:mean-color=true \
-connected-components 8 \
t2.png

convert \
punch_holes_Page_2.jpg \
\( t2.png -crop 10%x100%+0+0 \
-morphology Erode disk:3 \
\) \
-alpha off \
-set option:compose:outside-overlay false \
-compose CopyOpacity -composite \
h.png

color=$(convert \
punch_holes_Page_2.jpg \
-crop 2x+0+0 +repage \
-scale 1x1 \
-format "%[pixel:u.p{0,0}]" info:)

convert h.png \
-background "$color" \
-flatten \
result.png
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: how to use ImageMagick to remove punch holes?

Post by snibgo »

If the paper is a constant colour, then it can be sampled anywhere to get a colour to fill the holes.

If it isn't a constant colour, an average of the paper edges may be good enough. Interpolating between corners or edges is probably better, or sampling near the hole is probably better still. Unless there is wide variation around the hole, blurFill should work fine.

If there is wide variation around the hole, blurFill will create a visible "crease" effect. One of the process modules, fillholes or fillholespri, would be better. Best of all would be a relaxFill script (not yet published) which solves Laplacian or Poisson equations with Dirichlet boundary conditions.
snibgo's IM pages: im.snibgo.com
Post Reply