Page 1 of 1

How to remove all colors except 'redish'

Posted: 2018-02-20T08:29:57-07:00
by jcv
Hey guys,
Thanks for this active forum, helped me out a few times. But now I have a little issue myself:
I have some kind of 'exploded view': an image with an image with numbers: https://ibin.co/3sMyGGx8AT7Z.png
Image

Problem is that numbers are a bit fuzzy: multiple shades of red.
I want to extract the numbers from this image with OCR. I'm getting around 70% of numbers correctly with OCR out of this image, but want to improve.
I know it's possible to remove everything but one color from an image, but have some difficulty implementing this for this image.
I have tried a lot of solutions:
convert image-bg.png -fuzz 22% -fill black -opaque "#da392f" image-clr.png
convert image-bg.png -fill white -fuzz 26% +opaque "#dd4337" image-clr.png

but the result contains a lot of noise which makes OCR a bit difficult. Can anybody hint me in the right direction? Best result would be a black-and-white image with only the numbers.

Thanks!

Re: How to remove all colors except 'redish'

Posted: 2018-02-20T08:52:08-07:00
by snibgo
Three methods are useful here:

1. "-level-colors" can change a particular colour and white, to black and white respectively. This will change the grays to something else. See http://www.imagemagick.org/script/comma ... vel-colors

2. Convert to HCL and separate the "G" channel, and negate. The result is white where the input has zero saturation: white or black or any shade of gray. Hence it an be used as a mask, to turn all those gray shades into white.

3. A morphology method can isolate the thin lines, to distinguish the red text from the thin red lines.

A combination of these will solve the problem.

Re: How to remove all colors except 'redish'

Posted: 2018-02-22T00:25:50-07:00
by jcv
Thanks.
In the end I did not succeed. Cleaning up the image with enough clarity to be read with OCR turned out to be a bit far fetched.

I tried your recommendations but did not get a clear result. The best results I got was when I used an alpha channel to remove all but a few colors and then remove the alpha channel:

convert img-big.png -channel A -fuzz 7% -transparent "#dd4337" -transparent "#d82528" -transparent "#f7d7ca" -transparent "#ea9179" -transparent "#ecaa94" -transparent "#df5944" -transparent "#e47664" -negate +channel -alpha remove img-trans.png

But this was not enough for OCR to be read, too much noise from red lines. If I try to remove the noise and the red lines the numbers get too damaged to read.

But thanks anyway.