PAID: Need to enhance text within Image/PDF

Do you need consulting from ImageMagick experts and are willing to pay for their expertise? Or are you well versed in ImageMagick and offer paid consulting? If so, post here otherwise post elsewhere for free assistance.
Post Reply
osutra
Posts: 2
Joined: 2015-08-25T11:17:30-07:00
Authentication code: 1151

PAID: Need to enhance text within Image/PDF

Post by osutra » 2015-08-25T11:54:52-07:00

Hi

We are looking for an experienced IM developer who can help with enhancing text within TIFF/PDF files to improve OCR success ratios.

The text quality within the files we use isn't very good and some enhancements to it "might" help improve the OCR output quality.

Samples attached. Please PM me if interested and we can discuss this further.

100% Size : http://www.tiikoni.com/tis/view/?id=fd41a35
150% size: http://www.tiikoni.com/tis/view/?id=8211a93

This will be a paid/compensated effort.

User avatar
fmw42
Posts: 22083
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: PAID: Need to enhance text within Image/PDF

Post by fmw42 » 2015-08-25T12:07:49-07:00

You should not have any trouble with OCR on these two files. They are pretty good. The only thing you might need is to deskew them to rotate the image to horizontal lines of text. See -deskew at http://www.imagemagick.org/script/comma ... php#deskew

If you have images that have non-white backgrounds, and you are using unix (Linux, Mac OSX or Windows w/Cygwin), then you might try my script, textcleaner, at the link below.

osutra
Posts: 2
Joined: 2015-08-25T11:17:30-07:00
Authentication code: 1151

Re: PAID: Need to enhance text within Image/PDF

Post by osutra » 2015-08-25T13:31:17-07:00

Hi Fred,

We have done the best we can and are using Google's tesseract. So most words show up fine in the extraction but there are quite a few that get messed up.

This image I posted was a crop of the entire PDF file. And, here is the OCR output on the same CROP - http://www.tiikoni.com/tis/view/?id=6a4e21d

Words like "-rash" on the 5th line from bottom at the end show up as "-ra5h1". Same issue with "-cce" and "+BS" that became "+35".

Thoughts on how we can get this fixed.

Would you like me to PM the actual PDF?

Thanks a ton!

User avatar
fmw42
Posts: 22083
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: PAID: Need to enhance text within Image/PDF

Post by fmw42 » 2015-08-25T16:06:59-07:00

Sorry, I am not an OCR expert nor even done much OCR at all. I do not know what to suggest at this point beyond what I have suggested.

Have you tried -deskew to see if better horizontal alignment helps? You might also try some sharpening using -unsharp.

If the PDF is pure vector and not a raster image inside in a PDF container, you could try giving it more density to make the text larger.

Post Reply