Page 1 of 1

Unable to process a poor quality image

Posted: 2017-09-21T21:09:30-07:00
by sekhar.hari
Hello there -

I started using ImageMagick recently, and so not fully knowledgeable at this time. I have been trying to process the attached images for past few days with a number of IM options. However, none of them are giving me a satisfactory result and the downstream OCR (tesseract) completely fails in extracting the text from the converted document. I converted the source JPG document into a TIF file with the following options:

-auto-level -contrast -contrast -contrast -compress none -density 300 -depth 8 -colorspace gray -negate -strip -background white -alpha off -sharpen 0x1.0 -modulate 100,110,100 -threshold 50% -morphology close diamond

The resultant image's resolution is increased, and human readable. But fails during OCR.

If you can offer me suggestions w.r. to the IM options and values, I would be most grateful.

Images: http://52.178.205.206/VIGIL-TMF/

Many thanks,
Sekhar H.

Re: Unable to process a poor quality image

Posted: 2017-09-21T21:25:35-07:00
by fmw42
The resolution of your images makes the smaller fonts too small for good OCR. You would need to re-scan the documents at a higher density/resolution.

Re: Unable to process a poor quality image

Posted: 2017-09-21T22:03:40-07:00
by sekhar.hari
Thanks for a quick reply. Is there a way to increase the font size using IM while the resolution is increased through -density 300 (or maybe -density 400)?

Cheers,
Sekhar H.

Re: Unable to process a poor quality image

Posted: 2017-09-21T22:16:10-07:00
by fmw42
If you had scanned it as PDF, you could increase the density. But not when it is scanned as a raster image.

Re: Unable to process a poor quality image

Posted: 2017-09-21T22:48:29-07:00
by ozbigben
You would need to scan it at twice the resolution (at least). Other OCR programs will recognise text in the images but the low resolution will reduce accuracy. Most recordkeeping standards require at least 200dpi as a minimum (~2400px for an A4 page) with most OCR engines operating best with 400dpi images. You can't restore fidelity to the shape of characters by upsampling.

Re: Unable to process a poor quality image

Posted: 2017-09-21T22:49:48-07:00
by sekhar.hari
If I convert the images to PDF using IM, would it be possible to increase the density?

Thanks,
Sekhar H.

Re: Unable to process a poor quality image

Posted: 2017-09-22T05:16:12-07:00
by GeeMack
sekhar.hari wrote: 2017-09-21T22:49:48-07:00If I convert the images to PDF using IM, would it be possible to increase the density?
In a word, no. As fmw42 already mentioned, you can't reliably repair the quality of text in a scanned image if it's already too low of a resolution to start with.