Page 1 of 2

Clean Up a Document for Faxing/OCR

Posted: 2015-01-02T07:29:50-07:00
by mattj
Hi All,

I'm trying to clean up a document using the .NET library, where the image might have some darkess or color on the background. So I'd want to make the background white and improve the clarity of the text if possible.

I've been trying to reproduce the commands from this post: viewtopic.php?f=2&t=26744&hilit=contrast which is basically Fred's Textcleaner script: http://www.fmwconcepts.com/imagemagick/textcleaner/

Has anyone had any luck in doing something like this using the .NET library?

Thanks,

Matt

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-02T08:52:01-07:00
by dlemstra
What have you tried so far? The names of the methods in the post 'viewtopic.php?f=2&t=26744&hilit=contrast' are most likely methods of the MagickImage class. For example MagickQuantizeImage = MagickImage.Quantize.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-02T10:17:31-07:00
by mattj
Hi, I've tried to recreate this from the previous post:

Code: Select all

MagickLevelImage(wand,0.0,0.25,MaxRGB);
 MagickNegateImage(wand,false);
 MagickAdaptiveThresholdImage(wand,30,30,10);
 MagickNegateImage(wand,false);
as this in .NET:

Code: Select all

 imgReceipt.AutoLevel();
 imgReceipt.Negate();
 imgReceipt.AdaptiveThreshold(30, 30, 10);
 imgReceipt.Negate();
 
But it's taking the dark background and making it darker. My images are very similar to the ones in Fred's Textcleaner script page: http://www.fmwconcepts.com/imagemagick/ ... /index.php

Also Fred's 2 color threshold script might really be all I need, but I'm having trouble coming up with an equivalent for the .NET code to match:

Code: Select all

convert $infile +dither -colors 2 -colorspace gray -contrast-stretch 0 $outfile
For example I don't see how to specify +dither in .NET.

Thanks for your help dlemstra, ImageMagick rocks!

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-02T12:38:49-07:00
by dlemstra
The +dither is the DitherMethod property of the QuantizeSettings and -colors 2 is Colors property. You can use the QuantizeSettings with the Quantize method of MagickImage.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-05T14:22:46-07:00
by mattj
Thanks that did it. So I really need to use a technique more similar to Fred's TextCleaner and I'm looking at his sample of the ImageMagick command string.

Code: Select all

convert \( $infile -colorspace gray -type grayscale -contrast-stretch 0 \) \
 \( -clone 0 -colorspace gray -negate -lat ${filtersize}x${filtersize}+${offset}% -contrast-stretch 0 \) \
 -compose copy_opacity -composite -fill "$bgcolor" -opaque none +matte \
 -deskew 40% -sharpen 0x1 \ $outfile 
So for the first two lines I've got:

Code: Select all

   
            MagickImage imgReceipt = new MagickImage("receipt.pdf");
            QuantizeSettings qs = new QuantizeSettings();
            qs.ColorSpace = ColorSpace.GRAY;
            imgReceipt.Quantize(qs);
            imgReceipt.ColorType = ColorType.Grayscale;
            imgReceipt.ContrastStretch(0, 0);

            QuantizeSettings qs2 = new QuantizeSettings();
            MagickImage img2 = imgReceipt.Clone();
            qs2.ColorSpace = ColorSpace.GRAY;
            img2.Quantize(qs2);
            img2.Negate();
            img2.AdaptiveThreshold(15, 15, 10);
            img2.ContrastStretch(0, 0);
But then on the third line I'm a little lost. I see that on the image I can set the Compose property, but there is no copy_opacity value. Also I can't find an equivalent for -opaque or +matte. Can you point me in the right direction?

Thanks again.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-05T15:08:36-07:00
by dlemstra
+matte enables the alpha channel of the image (MagickImage.Alpha(AlphaOption.Activate))
-composite is the Composite method of MagickImage
copy_opacity has been renamed to copy_alpha (CompositeOperator,CopyAlpha)
-opaque is MagickImage.Opaque

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-06T13:12:36-07:00
by mattj
Thanks again, making more progress.

Is MagickImage.ContrastStretch(0,0) the equivalent of -contrast-stretch 0 ? MagickImage.ContrastStretch(0,0) seems to turn my image mostly white, while MagickImage.AutoLevel() seems to work much better.

Can you see any reason that this

Code: Select all

//Create Mask
MagickImage imgMask = imgReceipt.Clone();
imgMask.ColorSpace = ColorSpace.GRAY;
imgMask.Negate();
imgMask.AdaptiveThreshold(15, 15, 5);  //lat
imgMask.ContrastStretch(0, 0);
//imgMask.AutoLevel();
Is producing different results than this?

Code: Select all

receipt.jpg -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0
The resulting image from the .NET code is leaving a lot of white streaks in the background, vs the command line version leaves a nice mask with almost all the background black.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-06T13:48:55-07:00
by dlemstra
The last parameter of AdaptiveThreshold should be 5% of the QuantumRange (Quantum.Max). I just submitted a patch to add an overload of AdaptiveThreshold that accepts a percentage.

It also seems that there is a bug in ImageMagick 7. I tried the following in IM6 and IM7 and it produces different results:

Code: Select all

convert logo: -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
I will have to look in this.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-06T13:52:26-07:00
by fmw42
convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
You left off the minus before colorspace (i.e. -colorspace rather than colorspace)

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-06T14:13:20-07:00
by mattj
dlemstra wrote:The last parameter of AdaptiveThreshold should be 5% of the QuantumRange (Quantum.Max). I just submitted a patch to add an overload of AdaptiveThreshold that accepts a percentage.

It also seems that there is a bug in ImageMagick 7. I tried the following in IM6 and IM7 and it produces different results:

Code: Select all

convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
I will have to look in this.
Awesome that was what I needed. It's generating a nice cleaned up document with a white background now.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-06T14:26:24-07:00
by dlemstra
fmw42 wrote:
convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
You left off the minus before colorspace (i.e. -colorspace rather than colorspace)
Thanks Fred, I have narrowed it down to the following:

Code: Select all

convert logo: -lat 15x15+5% logo.png
I will have to check the code of our AdaptiveThreshold method in IM7. @Matt: You will probably get different results in the next release of Magick.NET.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-06T15:24:59-07:00
by dlemstra
The bug in AdaptiveThreshold has been found and will be fixed in the next release of Magick.NET

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-07T05:48:41-07:00
by mattj
Awesome thanks!

And did you see my previous question yesterday about contrast stretch?
Is MagickImage.ContrastStretch(0,0) the equivalent of -contrast-stretch 0 ? MagickImage.ContrastStretch(0,0) seems to turn my image mostly white, while MagickImage.AutoLevel() seems to work much better.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-07T07:58:41-07:00
by dlemstra
I did see it but I forgot about it :)

It looks like you will have to calculate the white point differently. I think it should be Width*Height. I will change the code in the next release of Magick.NET so this means you will have to change your code after the next release.

p.s. I have not tested this yet.

Re: Clean Up a Document for Faxing/OCR

Posted: 2015-01-07T10:51:20-07:00
by fmw42
Since -lat produces a binary image, -contrast-stretch at the end should do nothing. If yo want to use it, it should be used before -lat.