Page 1 of 1

Extracting wrong characters from image

Posted: 2016-03-14T00:58:15-07:00
by seopower
Hi,

I have a simple image (as attached) when trying to extract text (OCR) from image giving me wrong characters, resulting in wrong spellings. Please suggest what to do to extract correctly.

Image

Using tesseract with Ubuntu through PHP like given below:

exec('tesseract temp/' . $filename . '.png temp/' . $filename);


Thanks,

Re: Extracting wrong characters from image

Posted: 2016-03-14T03:14:15-07:00
by snibgo
In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.

Re: Extracting wrong characters from image

Posted: 2016-03-14T09:30:18-07:00
by markt
Try adding an extra border around the extracted text image, I seemed to get improved recognition with Tesseract using additional 20x20 white border.

Re: Extracting wrong characters from image

Posted: 2016-03-14T09:51:17-07:00
by seopower
snibgo wrote:In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.
Thanks, I have increased the size of the image by double and now it's recognising correctly but still missing space between two words.