Unable to process a poor quality image

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
sekhar.hari
Posts: 4
Joined: 2017-09-21T20:46:13-07:00
Authentication code: 1151

Unable to process a poor quality image

Post by sekhar.hari » 2017-09-21T21:09:30-07:00

Hello there -

I started using ImageMagick recently, and so not fully knowledgeable at this time. I have been trying to process the attached images for past few days with a number of IM options. However, none of them are giving me a satisfactory result and the downstream OCR (tesseract) completely fails in extracting the text from the converted document. I converted the source JPG document into a TIF file with the following options:

-auto-level -contrast -contrast -contrast -compress none -density 300 -depth 8 -colorspace gray -negate -strip -background white -alpha off -sharpen 0x1.0 -modulate 100,110,100 -threshold 50% -morphology close diamond

The resultant image's resolution is increased, and human readable. But fails during OCR.

If you can offer me suggestions w.r. to the IM options and values, I would be most grateful.

Images: http://52.178.205.206/VIGIL-TMF/

Many thanks,
Sekhar H.

User avatar
fmw42
Posts: 22079
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: Unable to process a poor quality image

Post by fmw42 » 2017-09-21T21:25:35-07:00

The resolution of your images makes the smaller fonts too small for good OCR. You would need to re-scan the documents at a higher density/resolution.

sekhar.hari
Posts: 4
Joined: 2017-09-21T20:46:13-07:00
Authentication code: 1151

Re: Unable to process a poor quality image

Post by sekhar.hari » 2017-09-21T22:03:40-07:00

Thanks for a quick reply. Is there a way to increase the font size using IM while the resolution is increased through -density 300 (or maybe -density 400)?

Cheers,
Sekhar H.

User avatar
fmw42
Posts: 22079
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: Unable to process a poor quality image

Post by fmw42 » 2017-09-21T22:16:10-07:00

If you had scanned it as PDF, you could increase the density. But not when it is scanned as a raster image.

ozbigben
Posts: 24
Joined: 2012-03-25T02:15:27-07:00
Authentication code: 8675308

Re: Unable to process a poor quality image

Post by ozbigben » 2017-09-21T22:48:29-07:00

You would need to scan it at twice the resolution (at least). Other OCR programs will recognise text in the images but the low resolution will reduce accuracy. Most recordkeeping standards require at least 200dpi as a minimum (~2400px for an A4 page) with most OCR engines operating best with 400dpi images. You can't restore fidelity to the shape of characters by upsampling.

sekhar.hari
Posts: 4
Joined: 2017-09-21T20:46:13-07:00
Authentication code: 1151

Re: Unable to process a poor quality image

Post by sekhar.hari » 2017-09-21T22:49:48-07:00

If I convert the images to PDF using IM, would it be possible to increase the density?

Thanks,
Sekhar H.

User avatar
GeeMack
Posts: 469
Joined: 2015-12-01T22:09:46-07:00
Authentication code: 1151
Location: Central Illinois, USA

Re: Unable to process a poor quality image

Post by GeeMack » 2017-09-22T05:16:12-07:00

sekhar.hari wrote:
2017-09-21T22:49:48-07:00
If I convert the images to PDF using IM, would it be possible to increase the density?
In a word, no. As fmw42 already mentioned, you can't reliably repair the quality of text in a scanned image if it's already too low of a resolution to start with.

Post Reply