OCR and specific regions

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
galv
Posts: 62
Joined: 2010-05-23T17:35:59-07:00
Authentication code: 8675308

OCR and specific regions

Post by galv »

I'm using Linux and Mac OS X and I want to use a free OCR to identify a specific word from a screenshot. I want to apply an IM filter at that specific area. I know about "-draw" but how would I get the exact dimensions of the rectangle from the OCR program?
I've heard about gocr, ocrad, tesseract but never used them. Anyone has a solution or ideas?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: OCR and specific regions

Post by fmw42 »

probably should post to an OCR list for information about OCR box size
galv
Posts: 62
Joined: 2010-05-23T17:35:59-07:00
Authentication code: 8675308

Re: OCR and specific regions

Post by galv »

Which free OCR software do you guys recommend?
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: OCR and specific regions

Post by fmw42 »

I am on a Mac and have used ReadIrisPro, but I don't recall that it was free.
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: OCR and specific regions

Post by anthony »

I experiments with using Gocr with some screen shots, and found it did not work very well, even though the captured text was perfect and clear of noise.

The problem I figured out was that the OCR was optimized for Scanned documents at 300 to 600 dpi rather that perfect screen captures at 90 to 120 dpi. When I scaled or resized the image to a higher resolution I had more success.

I really miss the old days on my Commodore 64 and Amiga which had software that could look in a screen boxed text and tell you exactly what the text was for copy and paste. But than that knew exact what font was being used and could match up the symbols perfectly.

Perhaps if you know the font being used you could DIY a solution by doing a morphology matching operation on the boxed text. that is segment the box into letters to find the 'grid' being used, and then match up the letter in each box. A Screen resolution text capture would work well in this form.


And please let us know what you discover and find out. People are interested but few report back their findings.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
Wolfgang Woehl
Posts: 34
Joined: 2010-02-25T15:22:50-07:00
Authentication code: 8675308

Re: OCR and specific regions

Post by Wolfgang Woehl »

tesseract is quite ok. Here's what it outputs from a screenshot of this thread:

$ tesseract
tesseract:Error:Usage:tesseract imagename outputbase [-l lang] [configfile [[+|-]varfile]...]
$ tesseract english.tif text -l eng
$ cat text.txt
---
OCR and specific regions
POSTFIEPLY Ié io Search this topic". Search
OCR and specific regions l?°°”°'E I
Iby gaw » 2010-05-Z4T01:43:59+00:00
l'm using Linux and Mac OS X and I want to use a free OCR to identify a specific word from a screenshot. I want to apply an IM filter at
that specific area. I know about "-draw" but how would I get the exact dimensions of the rectangle from the OCR program?
I've heard about gocr, ocrad, tesseract but never used them. Anyone has a solution or ideas?
Re: OCR and specific regions “¤¤¤TE I
I by rmwaz » 2010-05-Z4T03:35:28+00:0O
probably should post to an OCR list for information about OCR box size
Re: OCR and specific regions .
Iby gaw » 2010-05-Z4T04:0Z:11+00:00
Which free OCR software do you guys recommend?
---

No layout analysis etc.
Post Reply