Convert a PDF to TIFF without loss of quality

Questions and postings pertaining to the development of ImageMagick, feature enhancements, and ImageMagick internals. ImageMagick source code and algorithms are discussed here. Usage questions which are too arcane for the normal user list should also be posted here.
Post Reply
HariK
Posts: 5
Joined: 2017-06-09T00:24:58-07:00
Authentication code: 1151

Convert a PDF to TIFF without loss of quality

Post by HariK » 2017-07-10T03:44:23-07:00

Hi All,

I am trying to convert a PDF file to a TIFF file without losing its quality. But I see there is a loss of quality as a result when I OCR the TIFF file using Tesseract words are being misread. I am using Magick.NET-Q16-AnyCPU dll of version 7.0.0.0 in my C# application.

Here is my piece of code for creating a TIFF file from PDF bytes -

Code: Select all

public void ConvertToTIFF(byte[] bytes)
{
                ImageMagick.MagickReadSettings settings = new ImageMagick.MagickReadSettings();
                settings.Density = new ImageMagick.Density(300, 300);
                settings.UseMonochrome = true;
                settings.CompressionMethod = CompressionMethod.LZW;
                
                using (MagickImageCollection images = new MagickImageCollection())
                {
                    images.Read(bytes, settings);
                    images.Write(targetFile);
                }
}
Tried even increasing the DPI from 300 to 400/500 but I don't see much difference in the quality. Looking for some inputs here on how to retain the quality while TIFF conversion.

Thanks in Advance,
Hari

snibgo
Posts: 9399
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Convert a PDF to TIFF without loss of quality

Post by snibgo » 2017-07-10T04:11:08-07:00

What version IM? What version Ghostscript?
HariK wrote:... without losing its quality.
What do you mean by quality? Please show examples. You can upload to somewhere lke dropbox.com and paste the URLs here.

What does "UseMonochrome" do? If it converts the image to black and white only, that is a major drop in quality, and generally makes OCR more difficult. Stretching so paper is white and letters are black, with antialias between them, is better.
snibgo's IM pages: im.snibgo.com

User avatar
fmw42
Posts: 22101
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: Convert a PDF to TIFF without loss of quality

Post by fmw42 » 2017-07-10T09:27:21-07:00

It would be helpful if you provide an example PDF. You can upload to some free hosting service and put the URL here.

HariK
Posts: 5
Joined: 2017-06-09T00:24:58-07:00
Authentication code: 1151

Re: Convert a PDF to TIFF without loss of quality

Post by HariK » 2017-07-11T05:19:33-07:00

Thanks snibgo and fmw42 for your replies. I am using Magick.NET-Q16-AnyCPU dll of version 7.0.0.0 which I installed from Nuget. I haven't installed GS but I am using following - "gsdll64.dll" ,"gswin64c.exe" and referring them by means of - MagickNET.SetGhostscriptDirectory(@"~/somepath").

I understand providing a sample file will help you more, but sorry, I cannot upload either a PDF or TIFF files as they are confidential.
As you said, I tried using "Antialias= true" but that did not help in terms of improving OCR accuracy. In addition to the above, I have tried the following methods - MagickImage.Enhance(); MagickImage.Sharpen(); MagickImage.Magnify(); MagickImage.Normalize(); and found a slight improvement in the OCR accuracy but noticed they are taking up more time for a TIFF file creation.

Can you suggest any ImageMagick method/s which helps us in generating the same/more quality TIFF file as that of the source PDF (basically, precise and sharp TIFF file even at a higher zoom level, say 800%) and thus improving the accuracy of OCR...?

Thanks in Advance,
Hari

snibgo
Posts: 9399
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Convert a PDF to TIFF without loss of quality

Post by snibgo » 2017-07-11T08:08:04-07:00

If you could show us a crop of a single word, that might help us understand what you mean by "quality". Otherwise we can only guess. Perhaps the letters are too small. Perhaps they are smudged. Perhaps you need a higher density. Perhaps ...
snibgo's IM pages: im.snibgo.com

Post Reply