Page 1 of 1

Detect Text vs Picture for OCR

Posted: 2016-02-23T02:22:25-07:00
by gvandyk
I have an image that contains both text and pictures.

I need to clean the image for the ocr process and are using the textcleaner script from Fred. Unfortunately because this script cleans the image to be ready for ocr'ing I loose the images inside of the image.

What is the best approach to clean images that only has text using the textcleaner scripts and for other images that contain both pictures and text, not to run the textcleaner?

Or, what do you suggest should happen with images like this containing both text and pictures for Ocr'ing?

Example:

https://www.dropbox.com/s/0vqr1oo8jqt9dw2/page.jpg?dl=0