Page 1 of 1

remove dots from picture before OCR

Posted: 2019-08-14T07:29:17-07:00
by faboloso
Good morning,

I am new to the forum but I have been using ImageMagick for a while in python script in combination with OCR

Last time I used to remove lines from text and then OCR it

But for this project I am not sure which parameters to use. I hate to ask and not really asking for someone to spoon feed me but anyone can guide me a bit on which parameters I should use to get clear text / no dots or even white background (OCR still recognize it this way already for maybe 60% right)

here is my picture example, any help will be much appreciated.

Image

Thank you

Re: remove dots from picture before OCR

Posted: 2019-08-14T07:47:09-07:00
by Werty
Um, a question first, would we be helping you circumventing some kind of validation "Captcha" like system ?

Re: remove dots from picture before OCR

Posted: 2019-08-14T07:58:30-07:00
by faboloso
No this is not intended for captcha, and not bypassing any captcha system, just reading the information from picture

Re: remove dots from picture before OCR

Posted: 2019-08-14T09:45:56-07:00
by fmw42
You can use -connected-components to remove small dots.

Image

Code: Select all

convert img1.png \
\( +clone -threshold 70% -negate -type bilevel \
-define connected-components:area-threshold=5 \
-define connected-components:mean-color=true \
-connected-components 4 \) \
-alpha off -compose copy_opacity -composite \
-compose over -background white -flatten \
img1_result.png
Image