Clean Up a Document for Faxing/OCR

Magick.NET is an object-oriented C# interface to ImageMagick. Use this forum to discuss, make suggestions about, or report bugs concerning Magick.NET
mattj
Posts: 11
Joined: 2015-01-02T07:02:18-07:00
Authentication code: 6789

Clean Up a Document for Faxing/OCR

Post by mattj » 2015-01-02T07:29:50-07:00

Hi All,

I'm trying to clean up a document using the .NET library, where the image might have some darkess or color on the background. So I'd want to make the background white and improve the clarity of the text if possible.

I've been trying to reproduce the commands from this post: viewtopic.php?f=2&t=26744&hilit=contrast which is basically Fred's Textcleaner script: http://www.fmwconcepts.com/imagemagick/textcleaner/

Has anyone had any luck in doing something like this using the .NET library?

Thanks,

Matt

User avatar
dlemstra
Posts: 1577
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Clean Up a Document for Faxing/OCR

Post by dlemstra » 2015-01-02T08:52:01-07:00

What have you tried so far? The names of the methods in the post 'viewtopic.php?f=2&t=26744&hilit=contrast' are most likely methods of the MagickImage class. For example MagickQuantizeImage = MagickImage.Quantize.
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate

mattj
Posts: 11
Joined: 2015-01-02T07:02:18-07:00
Authentication code: 6789

Re: Clean Up a Document for Faxing/OCR

Post by mattj » 2015-01-02T10:17:31-07:00

Hi, I've tried to recreate this from the previous post:

Code: Select all

MagickLevelImage(wand,0.0,0.25,MaxRGB);
 MagickNegateImage(wand,false);
 MagickAdaptiveThresholdImage(wand,30,30,10);
 MagickNegateImage(wand,false);
as this in .NET:

Code: Select all

 imgReceipt.AutoLevel();
 imgReceipt.Negate();
 imgReceipt.AdaptiveThreshold(30, 30, 10);
 imgReceipt.Negate();
 
But it's taking the dark background and making it darker. My images are very similar to the ones in Fred's Textcleaner script page: http://www.fmwconcepts.com/imagemagick/ ... /index.php

Also Fred's 2 color threshold script might really be all I need, but I'm having trouble coming up with an equivalent for the .NET code to match:

Code: Select all

convert $infile +dither -colors 2 -colorspace gray -contrast-stretch 0 $outfile
For example I don't see how to specify +dither in .NET.

Thanks for your help dlemstra, ImageMagick rocks!

User avatar
dlemstra
Posts: 1577
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Clean Up a Document for Faxing/OCR

Post by dlemstra » 2015-01-02T12:38:49-07:00

The +dither is the DitherMethod property of the QuantizeSettings and -colors 2 is Colors property. You can use the QuantizeSettings with the Quantize method of MagickImage.
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate

mattj
Posts: 11
Joined: 2015-01-02T07:02:18-07:00
Authentication code: 6789

Re: Clean Up a Document for Faxing/OCR

Post by mattj » 2015-01-05T14:22:46-07:00

Thanks that did it. So I really need to use a technique more similar to Fred's TextCleaner and I'm looking at his sample of the ImageMagick command string.

Code: Select all

convert \( $infile -colorspace gray -type grayscale -contrast-stretch 0 \) \
 \( -clone 0 -colorspace gray -negate -lat ${filtersize}x${filtersize}+${offset}% -contrast-stretch 0 \) \
 -compose copy_opacity -composite -fill "$bgcolor" -opaque none +matte \
 -deskew 40% -sharpen 0x1 \ $outfile 
So for the first two lines I've got:

Code: Select all

   
            MagickImage imgReceipt = new MagickImage("receipt.pdf");
            QuantizeSettings qs = new QuantizeSettings();
            qs.ColorSpace = ColorSpace.GRAY;
            imgReceipt.Quantize(qs);
            imgReceipt.ColorType = ColorType.Grayscale;
            imgReceipt.ContrastStretch(0, 0);

            QuantizeSettings qs2 = new QuantizeSettings();
            MagickImage img2 = imgReceipt.Clone();
            qs2.ColorSpace = ColorSpace.GRAY;
            img2.Quantize(qs2);
            img2.Negate();
            img2.AdaptiveThreshold(15, 15, 10);
            img2.ContrastStretch(0, 0);
But then on the third line I'm a little lost. I see that on the image I can set the Compose property, but there is no copy_opacity value. Also I can't find an equivalent for -opaque or +matte. Can you point me in the right direction?

Thanks again.

User avatar
dlemstra
Posts: 1577
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Clean Up a Document for Faxing/OCR

Post by dlemstra » 2015-01-05T15:08:36-07:00

+matte enables the alpha channel of the image (MagickImage.Alpha(AlphaOption.Activate))
-composite is the Composite method of MagickImage
copy_opacity has been renamed to copy_alpha (CompositeOperator,CopyAlpha)
-opaque is MagickImage.Opaque
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate

mattj
Posts: 11
Joined: 2015-01-02T07:02:18-07:00
Authentication code: 6789

Re: Clean Up a Document for Faxing/OCR

Post by mattj » 2015-01-06T13:12:36-07:00

Thanks again, making more progress.

Is MagickImage.ContrastStretch(0,0) the equivalent of -contrast-stretch 0 ? MagickImage.ContrastStretch(0,0) seems to turn my image mostly white, while MagickImage.AutoLevel() seems to work much better.

Can you see any reason that this

Code: Select all

//Create Mask
MagickImage imgMask = imgReceipt.Clone();
imgMask.ColorSpace = ColorSpace.GRAY;
imgMask.Negate();
imgMask.AdaptiveThreshold(15, 15, 5);  //lat
imgMask.ContrastStretch(0, 0);
//imgMask.AutoLevel();
Is producing different results than this?

Code: Select all

receipt.jpg -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0
The resulting image from the .NET code is leaving a lot of white streaks in the background, vs the command line version leaves a nice mask with almost all the background black.

User avatar
dlemstra
Posts: 1577
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Clean Up a Document for Faxing/OCR

Post by dlemstra » 2015-01-06T13:48:55-07:00

The last parameter of AdaptiveThreshold should be 5% of the QuantumRange (Quantum.Max). I just submitted a patch to add an overload of AdaptiveThreshold that accepts a percentage.

It also seems that there is a bug in ImageMagick 7. I tried the following in IM6 and IM7 and it produces different results:

Code: Select all

convert logo: -colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
I will have to look in this.
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate

User avatar
fmw42
Posts: 25664
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Clean Up a Document for Faxing/OCR

Post by fmw42 » 2015-01-06T13:52:26-07:00

convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
You left off the minus before colorspace (i.e. -colorspace rather than colorspace)

mattj
Posts: 11
Joined: 2015-01-02T07:02:18-07:00
Authentication code: 6789

Re: Clean Up a Document for Faxing/OCR

Post by mattj » 2015-01-06T14:13:20-07:00

dlemstra wrote:The last parameter of AdaptiveThreshold should be 5% of the QuantumRange (Quantum.Max). I just submitted a patch to add an overload of AdaptiveThreshold that accepts a percentage.

It also seems that there is a bug in ImageMagick 7. I tried the following in IM6 and IM7 and it produces different results:

Code: Select all

convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
I will have to look in this.
Awesome that was what I needed. It's generating a nice cleaned up document with a white background now.

User avatar
dlemstra
Posts: 1577
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Clean Up a Document for Faxing/OCR

Post by dlemstra » 2015-01-06T14:26:24-07:00

fmw42 wrote:
convert logo: colorspace gray -negate -lat 15x15+5% -contrast-stretch 0 logo.png
You left off the minus before colorspace (i.e. -colorspace rather than colorspace)
Thanks Fred, I have narrowed it down to the following:

Code: Select all

convert logo: -lat 15x15+5% logo.png
I will have to check the code of our AdaptiveThreshold method in IM7. @Matt: You will probably get different results in the next release of Magick.NET.
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate

User avatar
dlemstra
Posts: 1577
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Clean Up a Document for Faxing/OCR

Post by dlemstra » 2015-01-06T15:24:59-07:00

The bug in AdaptiveThreshold has been found and will be fixed in the next release of Magick.NET
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate

mattj
Posts: 11
Joined: 2015-01-02T07:02:18-07:00
Authentication code: 6789

Re: Clean Up a Document for Faxing/OCR

Post by mattj » 2015-01-07T05:48:41-07:00

Awesome thanks!

And did you see my previous question yesterday about contrast stretch?
Is MagickImage.ContrastStretch(0,0) the equivalent of -contrast-stretch 0 ? MagickImage.ContrastStretch(0,0) seems to turn my image mostly white, while MagickImage.AutoLevel() seems to work much better.

User avatar
dlemstra
Posts: 1577
Joined: 2013-05-04T15:28:54-07:00
Authentication code: 6789
Contact:

Re: Clean Up a Document for Faxing/OCR

Post by dlemstra » 2015-01-07T07:58:41-07:00

I did see it but I forgot about it :)

It looks like you will have to calculate the white point differently. I think it should be Width*Height. I will change the code in the next release of Magick.NET so this means you will have to change your code after the next release.

p.s. I have not tested this yet.
.NET + ImageMagick = Magick.NET https://github.com/dlemstra/Magick.NET, @MagickNET, Donate

User avatar
fmw42
Posts: 25664
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Clean Up a Document for Faxing/OCR

Post by fmw42 » 2015-01-07T10:51:20-07:00

Since -lat produces a binary image, -contrast-stretch at the end should do nothing. If yo want to use it, it should be used before -lat.

Post Reply