Page 1 of 1

Convert a PDF to TIFF without loss of quality

Posted: 2017-07-10T03:44:23-07:00
by HariK
Hi All,

I am trying to convert a PDF file to a TIFF file without losing its quality. But I see there is a loss of quality as a result when I OCR the TIFF file using Tesseract words are being misread. I am using Magick.NET-Q16-AnyCPU dll of version 7.0.0.0 in my C# application.

Here is my piece of code for creating a TIFF file from PDF bytes -

Code: Select all

public void ConvertToTIFF(byte[] bytes)
{
                ImageMagick.MagickReadSettings settings = new ImageMagick.MagickReadSettings();
                settings.Density = new ImageMagick.Density(300, 300);
                settings.UseMonochrome = true;
                settings.CompressionMethod = CompressionMethod.LZW;
                
                using (MagickImageCollection images = new MagickImageCollection())
                {
                    images.Read(bytes, settings);
                    images.Write(targetFile);
                }
}
Tried even increasing the DPI from 300 to 400/500 but I don't see much difference in the quality. Looking for some inputs here on how to retain the quality while TIFF conversion.

Thanks in Advance,
Hari

Re: Convert a PDF to TIFF without loss of quality

Posted: 2017-07-10T04:11:08-07:00
by snibgo
What version IM? What version Ghostscript?
HariK wrote:... without losing its quality.
What do you mean by quality? Please show examples. You can upload to somewhere lke dropbox.com and paste the URLs here.

What does "UseMonochrome" do? If it converts the image to black and white only, that is a major drop in quality, and generally makes OCR more difficult. Stretching so paper is white and letters are black, with antialias between them, is better.

Re: Convert a PDF to TIFF without loss of quality

Posted: 2017-07-10T09:27:21-07:00
by fmw42
It would be helpful if you provide an example PDF. You can upload to some free hosting service and put the URL here.

Re: Convert a PDF to TIFF without loss of quality

Posted: 2017-07-11T05:19:33-07:00
by HariK
Thanks snibgo and fmw42 for your replies. I am using Magick.NET-Q16-AnyCPU dll of version 7.0.0.0 which I installed from Nuget. I haven't installed GS but I am using following - "gsdll64.dll" ,"gswin64c.exe" and referring them by means of - MagickNET.SetGhostscriptDirectory(@"~/somepath").

I understand providing a sample file will help you more, but sorry, I cannot upload either a PDF or TIFF files as they are confidential.
As you said, I tried using "Antialias= true" but that did not help in terms of improving OCR accuracy. In addition to the above, I have tried the following methods - MagickImage.Enhance(); MagickImage.Sharpen(); MagickImage.Magnify(); MagickImage.Normalize(); and found a slight improvement in the OCR accuracy but noticed they are taking up more time for a TIFF file creation.

Can you suggest any ImageMagick method/s which helps us in generating the same/more quality TIFF file as that of the source PDF (basically, precise and sharp TIFF file even at a higher zoom level, say 800%) and thus improving the accuracy of OCR...?

Thanks in Advance,
Hari

Re: Convert a PDF to TIFF without loss of quality

Posted: 2017-07-11T08:08:04-07:00
by snibgo
If you could show us a crop of a single word, that might help us understand what you mean by "quality". Otherwise we can only guess. Perhaps the letters are too small. Perhaps they are smudged. Perhaps you need a higher density. Perhaps ...