Page 1 of 1

Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T08:46:18-07:00
by aporthog
I'm trying to clean up and center some bitonal page images from books. I've done this with a perl script that divides the page into a grid and looks for the percentage of white in each region. So the page essentially looks like this, except bigger:

Code: Select all

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 
0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 
The script works but it is unacceptably slow for a 400-page book. Right now I put "convert" into a nested loop, something like this:

Code: Select all

   my $y = int (($height / $height_incr) - .5);
   my $dy = 0;
   until (($dy + $y) > $height) {
      my $dx = 0;
      my @values;
      until ($dx >= ($width - $width_incr)) {
         my $region = "${width_incr}x$y+$dx+$dy";
         my $cmd = "convert -crop $region temp.tif -format \"%[fx:100*mean]\" info:";
         open (INFO, "$cmd |");
         my $info = <INFO>;
         chomp ($info);
         close (INFO);
         push @values, $info;
         $dx += $width_incr;
      }
      $dy += $y;
   }
(temp.tif is a version of the original image downsized to 25%)

The problem for me is the crop. I tried using -region so I could change that repeatedly in one command but Imagemagick seems to ignore it here and gives me the percentage of white in the whole image. So I have to re-read and re-crop the image over 600 times per page. So any ideas how to speed the process up?

I have PerlMagick installed and I'm thinking there might be a more efficient way with that, read the image once into memory and operate repeatedly on that. Is there a way to crop a region in PerlMagick to get the info I want, then undo the crop and recrop at the new coordinates? I see in the documentation that omitting the x and y offsets for the crop command will generate a series of images. That sounds like it could be more efficient. Can PerlMagick do this and store the segments in an array or something so I can loop through them all? Or is there a simple ImageMagick command that I've overlooked?

Arvin

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T09:10:36-07:00
by snibgo
The obvious answer is to resize the image such that each new pixel corresponds to one region. Then the value of each pixel gives the mean value of each region. Only one convert per page, instead of 600.

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T09:30:21-07:00
by aporthog
I will experiment with that but I'm not sure it would work. Normal values for a region with text are between about 86% and 94% white for the region sizes I'm using now, so the mean would always come out to white for every pixel. But perhaps I should convert the bitonal image to grayscale or color first. It's an interesting idea. I could reduce the size of the regions as well, which I couldn't do before due to the number of reads. I'll play around with it.

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T09:43:43-07:00
by snibgo
You wouldn't resize to bitonal, of course. Other than that, it should work. In my experience, the mean of an area is equal (within approx QuantumDepth) to the same area resized to a single pixel. If you can distinguish text versus non-text by taking the mean of a region, it should also work per pixel.

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T09:53:52-07:00
by snibgo
You might also think about what you are trying to achieve. If you are simply removing noise and trimming white space, there are simpler methods.

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T11:39:02-07:00
by aporthog
Hey! I think that will work.

convert -type Grayscale -resize 100x20! image.tif temp.bmp

Then read the pixels using:

convert temp.bmp -scale 2x2\! txt:-

I'm not at all familiar with IM's formatting options. Is there an easy way output the format as a gray value between 0 and 255?

These are the types of lines I'm getting:

Code: Select all

73,8: (185,185,185)  #B9B9B9  srgb(185,185,185)
74,8: (171,171,171)  #ABABAB  grey67
18,8: (255,255,255)  #FFFFFF  white

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T12:00:48-07:00
by snibgo
Why do it in two commands, saving the intermediate result? Personally, I wouldn't use the BMP format unless I really had to. Why scale the result? Scaling means that each pixel won't correspond to the regions. The results are coming out as gray values between 20 and 255 -- the 3rd, 4th or 5th number on each line.

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T12:16:06-07:00
by aporthog
I was getting inconsistent results when I tried it in one command so I just tried two instead. Also I wanted to open the temp files and have a look at them. But I think I've got it now. In fact, without the intermediate bmp the results are clearer:

convert -type Grayscale -resize 100x30! image.tif -scale 2x2\! txt:-

Gives me all lines like:

26,12: (148,148,148) #949494 gray(148,148,148)

Which makes total sense now that I think about it considering the bmp format.

Since I'm only processing the vertical edges of the pages I only needed to divide the page into about ten parts vertically when I was using my original regions. But I need a lot of horizontal resolution. But viewing these temp files I see 10 wasn't enough so I upped it to 30. Squishing the image down gives the information I need and makes processing a lot easier.

Thanks for your help! I'm good to go.

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T12:52:56-07:00
by snibgo
If you don't need the pixel coordinates, you can get a more compact output from:

Code: Select all

convert {blah blah} -compress None ppm:-

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T16:47:01-07:00
by aporthog
Ah yes, I do like that. The other output gave me automatic 2-d array subscripts which are handy, but I think I prefer this output.

Re: Getting colors from multiple regions of an image (perl)

Posted: 2013-02-12T22:22:17-07:00
by anthony
for more information see...

Enumerated Pixel Data (txt)
http://www.imagemagick.org/Usage/files/#txt

And PbmPlus/NetPbm image file format
http://www.imagemagick.org/Usage/formats/#pbmplus

Note you can also crop slices from the image before outputing them to text to make it easier.


As an example of use the "image2kernel" perl script
http://www.imagemagick.org/Usage/scripts/image2kernel
actually uses Txt output to extract data from images can convert them into an array of floating point numbers as a morphological kernel.