Can we extract as an image blocks of text?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
FrereTuck
Posts: 19
Joined: 2017-10-13T02:50:15-07:00
Authentication code: 1151

Can we extract as an image blocks of text?

Post by FrereTuck » 2018-03-18T14:24:50-07:00

Hi,

my final goal is to cut words from a scanned text as images, make Tesseract find the text that is within them, and rename the image files containing the words with their content.
Image
For the first line of this image, there would be a first image containing FRUIT then another containing VINES and so on with a random name. If that is possible, I would then give each image to tesseract and rename each image file; the first one would be called FRUIT.png, the second VINES.png and so on.
I would then be able to rearrange text to form once more groups of words (FRUIT VINES) as images.

Do you think the first step could be done with ImageMagick?

Thanks a lot.

muccigrosso
Posts: 64
Joined: 2017-10-03T10:39:52-07:00
Authentication code: 1151

Re: Can we extract as an image blocks of text?

Post by muccigrosso » 2018-03-18T19:53:15-07:00

Isn't this what tesseract does? That is, it finds the text in images. it will output box coordinates, too. Look at the man page and especially the hocr output.

FrereTuck
Posts: 19
Joined: 2017-10-13T02:50:15-07:00
Authentication code: 1151

Re: Can we extract as an image blocks of text?

Post by FrereTuck » 2018-03-19T08:59:25-07:00

I will have a look at it, thanks!

Post Reply