help with OCR prepocess

A plethora of command-line scripts that perform geometric transforms, blurs, sharpens, edging, noise removal, and color manipulations.
Post Reply
daddym85
Posts: 1
Joined: 2016-01-03T09:42:03-07:00
Authentication code: 1151

help with OCR prepocess

Post by daddym85 » 2016-01-03T09:49:50-07:00

Hi there,
I am a newbie. I would like to extract text from some images captured by a webcam (1280x720 resolution). An example is given here (Italian language)
https://drive.google.com/file/d/0B-X1ZT ... sp=sharing
I need to preprocess images before OCR with tesseract, I plan to use textcleanear script but I am wonderingabout its parameters and options.
Any idea?
Thanks

User avatar
fmw42
Posts: 22103
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: help with OCR prepocess

Post by fmw42 » 2016-01-03T12:29:35-07:00

My scripts can only be run on Unix systems. What is your IM version and platform. See viewtopic.php?f=1&t=9620

What are your questions about the script usage? It should be explanatory from the documentation and examples at http://www.fmwconcepts.com/imagemagick/ ... /index.php.

The main arguments are are -f and -o.

"-f filtersize ... FILTERSIZE is the size of the filter used to clean up the background. Values are integers>0. The filtersize needs to be larger than the thickness of the writing, but the smaller the better beyond this. Making it larger will increase the processing time and may lose text. The default is 15.'

"-o offset ... OFFSET is the offset threshold in percent used by the filter to eliminate noise. Values are integers>=0. Values too small will leave much noise and artifacts in the result. Values too large will remove too much text leaving gaps. The default is 5."

Best thing is to start with only a few arguments and test for best -f and -o. Then add other arguments as needed.

try this to start:

Code: Select all

textcleaner -f 25 -o 5 s3.jpg result.png
If you scan the whole page showing the page borders, you could perspectively correct the page first. See -distort perspective or my script unperspective.

Post Reply