How to find the text?

ghostmansd · Post by **ghostmansd** » 2011-05-05T01:22:04-07:00

Dear users, is it possible to find any piece of text in the image? What I need: script looks through the image with area 50x50 pixels. If area consists of text, script remembers coordinates ($POSX and $POSY) of first background pixel in area and stop it's work. Example is below.

In other words, script must select pixel, similar to background of the text, and then translate it to Fred's Magic Wand. Then Magic Wand will colorize background in white color.

Post by **fmw42** » 2011-05-05T09:18:04-07:00

IM is a pixel processor and does not know about text. So I doubt IM can do what you want. I know of no way to handle that. But perhaps Anthony or someone else might know otherwise.

ghostmansd · Post by **ghostmansd** » 2011-05-05T10:31:44-07:00

Hm, my be it's possible to do next way:
1) IM converts image to monochrome PBM; every PBM is a sequence of symbols 1 and 0, where 1 is black and 0 is white;
2) IM moves through PBM and finds where combination of 1 and 0 looks like text;
3) IM remembers the square of the first place which looks like text;
4) IM takes coordinates of first white pixel and translates it to script.

Post by **fmw42** » 2011-05-05T10:44:04-07:00

define "where combination of 1 and 0 looks like text;"

IM has no knowledge of the shapes of text and cannot distinguish 1 and 0 combinations in images from those of text. But again Anthony may have a better idea how to proceed.

ghostmansd · Post by **ghostmansd** » 2011-05-05T11:23:13-07:00

Yeah, that was really foolish.

I will wait for Anthony: it seems he knows smth about this, but he deleted his old post in my previous topic. However, big thanks again!

Post by **fmw42** » 2011-05-05T12:27:19-07:00

If all you want is to colorize the background, then just use the color of pixel 0,0

color=`convert image -format "%[pixel:u.p{0,0}]" info:`
convert image -fuzz XX% -fill newcolor -opaque $color resultimage

where -fuzz XX% allows you some flexibility to match colors close to $color and recolor them all

ghostmansd · Post by **ghostmansd** » 2011-05-05T12:45:06-07:00

That effects picture in the corner also. In a tiff files, at least.

Post by **fmw42** » 2011-05-05T13:55:41-07:00

ghostmansd wrote:That effects picture in the corner also. In a tiff files, at least.

You could do floodfill, but then any text that has holes in it such as the letter O will not get recolored. I am afraid there is not likely going to be an optimum solution that works as you would like. But lets see what Anthony suggests.

ghostmansd · Post by **ghostmansd** » 2011-05-06T02:58:41-07:00

There is an example on Python (with Imaging Library).
Example (pavian's photo with text)

Code: Select all

from PIL import Image
 
im = Image.open('D:/pavian.png', 'r')
w, h = im.size
a = [[0]*w for i in range(h)]
b = [[0]*w for i in range(h)]
for i in range(h):
    for j in range(w):
        a[i][j] = im.getpixel((j, i))
 
 
d = [[-1,-1], [-1,0], [-1,1], [0,1], [1,1], [1,0], [1,-1], [0,-1]]
 
 
c = 40    # // порог разницы в интенсивностях двух соседей
s = 20    # // сторона квадратиков
 
 
def foo(p, q):
    cnt = 0
    for i in range(p + 1, p + s - 1):
        for j in range(q + 1, q + s - 1):
            for k in range(8):
                if abs(a[i][j] - a[i + d[k][0]][j + d[k][1]] > c):
                    cnt += 1
    return cnt
 
 
z = 0
for i in range(0, h - s, s):
    for j in range(0, w - s, s):
        p = foo(i, j)
        for k in range(s):
            for l in range(s):
                b[i + k][j + l] = p
        z = max(z, p)
 
 
for i in range(h):
    for j in range(w):
        if b[i][j] > z / 2:
            v = 255
        else:
            v = 0
        im.putpixel((j, i), v)
 
f_out = open('D:/pavian_out.png', 'wb')
im.save(f_out)
f_out.close()

Is it possible to realize something like this using IM?

Post by **anthony** » 2011-05-09T21:51:04-07:00

As Fred said it all comes down to...
define "where combination of 1 and 0 looks like text;"

The only thing I can think of is use combinations of morphology and segmentation so as to locate rows of small segments, which generally makes up characters and words, and thus means 'text'.

For example in your thumbnail image above doing a morphology search for long thin horizontal lines can mean 'text'.

On the other hand: define "what is a image" in the page image may in fact be a lot easier!
Again you would use morphology and segmentation to learn about what makes up the page, but in this case a 'image' would be any segment larger than say 2 or 3 typical text rows.

I have done this using Fred's "Multi Crop" to locate the images on a page. (My own need was for the images not text).
My own modified version is in..
http://www.imagemagick.org/Usage/scripts/multi_crop
This does a sparse grid search for any large segments (defined as NOT the background color). If the segment is too small it gets ignored (small character). A small change will let it output a list of rectangles that it thinks are areas of 'non-text'.

WARNING: whatever you do you will need to do a fast 'preview' to check that it worked fine. In my own use I came across pages with overlapping images, extra lines and boxes, or slight image rotations, or text inserts in larger images, that needed some extra work on those specific pages to deal with. But in general it worked and saved me a LOT of work in manually processing each and every page (about a thousand pages).

ghostmansd · Post by **ghostmansd** » 2011-05-23T11:42:14-07:00

anthony wrote:As Fred said it all comes down to...
define "where combination of 1 and 0 looks like text;"

The only thing I can think of is use combinations of morphology and segmentation so as to locate rows of small segments, which generally makes up characters and words, and thus means 'text'.

For example in your thumbnail image above doing a morphology search for long thin horizontal lines can mean 'text'.

On the other hand: define "what is a image" in the page image may in fact be a lot easier!
Again you would use morphology and segmentation to learn about what makes up the page, but in this case a 'image' would be any segment larger than say 2 or 3 typical text rows.

I have done this using Fred's "Multi Crop" to locate the images on a page. (My own need was for the images not text).
My own modified version is in..
http://www.imagemagick.org/Usage/scripts/multi_crop
This does a sparse grid search for any large segments (defined as NOT the background color). If the segment is too small it gets ignored (small character). A small change will let it output a list of rectangles that it thinks are areas of 'non-text'.

WARNING: whatever you do you will need to do a fast 'preview' to check that it worked fine. In my own use I came across pages with overlapping images, extra lines and boxes, or slight image rotations, or text inserts in larger images, that needed some extra work on those specific pages to deal with. But in general it worked and saved me a LOT of work in manually processing each and every page (about a thousand pages).

Anthony, great thanks to you! That's amazingly useful tool! Now all what I've to do:
1. Make a b-w copy of image.
2. Cut the images from image.
3. Insert images into correct positions.

The last is the most difficult. Script must remember BEGINNING(x,y) and ENDING(x,y) coordinates of each image (if your script imagines each image as square). I think the best way is to put name of each image in text file. For example:

Code: Select all

/tmp/image-1.png,20,71,95,100
/tmp/image-2.png,300,150,374,200
/tmp/image-3.png,700,503,729,625

First column is filename, second -- x-coordinates of beginning, third -- y-coordinates of beginning, and then x- and y- coordinates of ending. Is it possible to realize?

Post by **fmw42** » 2011-05-23T12:29:07-07:00

You don't have to remember the coordinates. If you use -crop without adding +repage and save in png format that saves the virual canvas , then you can flatten the cropped images back into their original places as -flatten will look for the virtual canvas information in the file.

see

http://www.imagemagick.org/Usage/crop/#crop
http://www.imagemagick.org/Usage/layers/#flatten

Post by **anthony** » 2011-05-23T22:04:55-07:00

PNG images saved by IM also include a IM specific profile (very tiny) that stores the original page size too.

Just read all the images in, set background color and flatten.

I have does something similar in IM examples, in restoring tile cropping (where offsets and page size was preserved.
http://www.imagemagick.org/Usage/crop/#crop_tile

Just be careful about PNG images with virtual offsets in web browsers. Some browsers go really screwy!

Legacy ImageMagick Discussions Archive

How to find the text?

How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?

Re: How to find the text?