Removing corners from scanned cards

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
pkz
Posts: 5
Joined: 2016-12-25T08:25:04-07:00
Authentication code: 1151

Removing corners from scanned cards

Post by pkz »

Hi! I'm looking at preprocessing scanned catalog cards before doing OCR. To reduce OCR noise I want to remove the top right and left black area (the rounded corners). They differ in size and sometimes additional dark areas appear from misaligned cards (se first image below top left and bottom right).

I would also like to remove the black circle in the lower center part. The roundness varies depending on card types. It would of course be possible to add a sufficiently large polygon for the corners but is there some other strategy I could use? I am looking for something like "fill dark areas from the outside".

Examples of cards:

Image

Image

(see more examples at https://data.kb.se/datasets/2016/09/hs_nominalkatalog/)
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing corners from scanned cards

Post by fmw42 »

If you convert those areas to transparency, then user snibgo has a hole-filling script. See http://im.snibgo.com/fillholespri.htm and http://im.snibgo.com/fillholes.htm

Alternately, you can do a fuzzy floodfill at each region

Code: Select all

convert image -fuzz XX% -fill somecolor -draw "color x,y floodfill" resultimage
where XX% determines how much tolerance to use to fill the region located at x,y and somecolor is your desired background color (they tan color in your image). See http://www.imagemagick.org/Usage/draw/#color

Another way is to make the image into a binary mask and use connected components to label each isolated region and then discard those regions, which will be the larger ones. The use the filtered mask to recolor those regions with your tan background color. See http://magick.imagemagick.org/script/co ... onents.php

This is a very simple way, but leaves a small border around the regions. It simply gets the average color of your image. Then creates a mask by thresholding and uses the mask to recolor the black parts of the image. Unix syntax.

Code: Select all

color=`convert Nominal_20151207_103630_000098.jpg -scale 1x1 -format "%[pixel:u.p{0,0}]" info:`
convert Nominal_20151207_103630_000098.jpg \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -threshold 35% -negate \) \
-compose over -composite result.jpg

Please always provide your IM version and platform when asking questions, since syntax may vary.
sgbotsford
Posts: 3
Joined: 2016-12-16T10:58:22-07:00
Authentication code: 1151

Re: Removing corners from scanned cards

Post by sgbotsford »

I'm not fully familiar with imagemagick yet. But when I was building pipes out of NetPBM, you could do this by adding white rectangles at the appropriate offsets. It's been nearly 20 years but it would be something like

pnmadd originalfile.pgm, whitefile.pgm, -top -left | pnmadd - whitefile.pgm -top -right | ....

Offsets were handled in reasonably flexible manners.

The math would do a pixel by pixel addition, then clip, so adding a white box made that part of the image white.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing corners from scanned cards

Post by fmw42 »

You can overlay color boxes the same color as your background color at any point in the background image. So yes, you can do the same thing. But you have to know how big to make each box to cover each black region. That is where connected components comes in. It can tell you the bounding box of every isolated black area in your image or even make an overlay mask for each actual shaped region.
pkz
Posts: 5
Joined: 2016-12-25T08:25:04-07:00
Authentication code: 1151

Re: Removing corners from scanned cards

Post by pkz »

Thank you! The fuzzy fill works very well. If I add a black 10px border around the image first it will touch all black areas (corners, skew gaps etc) and the fill will work for many scenarios.

Code: Select all

convert mypic.jpg \
  -bordercolor black -border 10x10 \
  -fuzz 30% -fill white -draw "color 5,5 floodfill" \
  mypic.clean.jpg
Image
pkz
Posts: 5
Joined: 2016-12-25T08:25:04-07:00
Authentication code: 1151

Re: Removing corners from scanned cards

Post by pkz »

After the area is filled, is there a simple way to shave off e.g. 2px or so to clean the darker part of the remaining paper edge? I guess one option would be to trace the contour and try to add an inset border some way (preferably a few pixels wide with gradually diminishing transparency).
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing corners from scanned cards

Post by fmw42 »

try this. But I suggest if you are going to do more than one command on an image, do not save intermediate results as jpg, since it is lossy and constant colors do not remain constant.

Unix syntax.

Code: Select all

convert YimUDze.jpg \
\( -clone 0 -fill white -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:3 \) \
-compose over -composite result.jpg
Best to combine this operation with our first operation in one command line.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing corners from scanned cards

Post by fmw42 »

So you can do this as one command.

Input:
Image

Code: Select all

convert Nominal_20151207_103630_000098.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill white -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:4 \) \
-compose over -composite result1.jpg
Image


Or even this to fill with nearly the same as your background color.

Code: Select all

color=`convert Nominal_20151207_103630_000098.jpg \
-fuzz 50% -transparent black -scale 1x1 -alpha off -format "%[pixel:u.p{0,0}]" info:`
convert Nominal_20151207_103630_000098.jpg \
-bordercolor black -border 10x10 \
-fuzz 30% -fill white -draw "color 5,5 floodfill" \
\( -clone 0 -fill "$color" -colorize 100 \) \
\( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:12 \) \
-compose over -composite result2.jpg
Image
pkz
Posts: 5
Joined: 2016-12-25T08:25:04-07:00
Authentication code: 1151

Re: Removing corners from scanned cards

Post by pkz »

Thank you! Total cleanup script will look something like this. Hole will be floodfilled as well. This script will hopefully reduce a lot of garbage from Tesseract OCR.

Code: Select all

# coordinates for hole fill
hole_x=`convert $1 -format "%[fx:50*w/100]" info:`
hole_y=`convert $1 -format "%[fx:88*h/100]" info:`

# fill color
color=`convert $1 -fuzz 50% -transparent black -scale 1x1 -alpha off -format "%[pixel:u.p{0,0}]" info:`

convert $1 \
  -bordercolor black -border 10x10 \
  -fuzz 40% -fill "$color" -draw "color $hole_x,$hole_y floodfill" \
  -fuzz 30% -fill white -draw "color 5,5 floodfill" \
  \( -clone 0 -fill "$color" -colorize 100 \) \
  \( -clone 0 -negate -threshold 1% -negate -morphology dilate octagon:12 \) \
  -compose over -composite \
  -blur 1x65535 \
  -contrast -contrast \
  -normalize \
  -despeckle \
  -sharpen 1 \
  -posterize 2 \
  -colorspace Gray \
  "$1.clean.jpg"
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Removing corners from scanned cards

Post by fmw42 »

If you are going to do OCR, then try my script, textcleaner, at my link below. For example on my result2.jpg

Code: Select all

textcleaner -e normalize -f 20 -o 10 result2.jpg result3.jpg
Image
pkz
Posts: 5
Joined: 2016-12-25T08:25:04-07:00
Authentication code: 1151

Re: Removing corners from scanned cards

Post by pkz »

Thank you again. Will definitiely look into that.
Post Reply