Split image horizontally while avoiding to cut text

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
visitor x
Posts: 16
Joined: 2013-07-27T13:26:38-07:00
Authentication code: 6789

Split image horizontally while avoiding to cut text

Post by visitor x »

Hi guys,

I have images which are mostly text, black on white.
I need to cut them horizontally, into two pieces (nearly 50/50), but text should not be cut in the middle.

Example (red is where the image gets cut):

Bad:
Image

Good:
Image

What is the easiest way to achieve this?

Thanks!
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split image horizontally while avoiding to cut text

Post by fmw42 »

Average the image down to one column using -scale 1xH!
Convert to text
Find the brightest (whitest) pixel near the middle, which then should be the space between lines of text
Crop at that location.

see
http://www.imagemagick.org/script/comma ... .php#scale
http://www.imagemagick.org/Usage/files/#txt

To make it easier, you can also automatically trim the outer white, then the outer black, then the next area of white outside the text. That way the black stripes around the sides and bottom will not contribute and you should then be able to find a white pixel in the column near the middle. That coordinate should then be used to crop (compensated by the trim size)

If you start by cropping in half and just use the bottom part, then the first white pixel ( or middle of the first set of white pixels) can then be found and used to for the crop coordinates or the original (after adjusting for the size of the top section)
visitor x
Posts: 16
Joined: 2013-07-27T13:26:38-07:00
Authentication code: 6789

Re: Split image horizontally while avoiding to cut text

Post by visitor x »

Thanks for your help.

I couldn't figure out how to "Find the brightest (whitest) pixel near the middle".
I ended up using a PHP CLI script to do the work, as I'm a bit more familiar with PHP than ImageMagick.

In general, an ImageMagick solution would require a script too, right?
I probably wouldn't be able to do it just with command line arguments.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Split image horizontally while avoiding to cut text

Post by snibgo »

The result of fmw42's process will be an image 1 pixel wide by (n) pixels high. Suppose this is "w2.png". To find the first (highest) white pixel:

Code: Select all

compare -metric RMSE -subimage-search w2.png xc:White NULL:
The result (sent to stderr) might be:

Code: Select all

0 (0) @ 0,5
So the fifth pixel down (counting the first as zero) is white.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split image horizontally while avoiding to cut text

Post by fmw42 »

In general, an ImageMagick solution would require a script too, right?
I probably wouldn't be able to do it just with command line arguments.
Yes, that is correct, except in the simple case where you use only the bottom half of the image and do what user snibgo suggested above.
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Split image horizontally while avoiding to cut text

Post by snibgo »

The subimage-search technique could find the white pixel nearest the centre: crop into two, "-flip" the top half, "+append" them together, then search for the first white pixel. The y-coordinate tells if it is in the top or bottom half.
snibgo's IM pages: im.snibgo.com
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split image horizontally while avoiding to cut text

Post by fmw42 »

Actually using compare needs some modification. If you let it run its full course, it would find the pixel with the largest match score (closest to white), which may be further down the image. You need to choose some threshold in the rmse value and use -similarity-threshold, so it stops at the first acceptable value. So you would need to get the column stats first and decide on an rmse value for the -similarity-threshold.

see
http://www.imagemagick.org/script/comma ... -threshold
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split image horizontally while avoiding to cut text

Post by fmw42 »

Here is a short set of command lines.

I first took your second image and removed the red line. So note that there is a line there that will be brighter than where you want the split. Also the bottom of the image will be brighter in the column, since it has no black border there. Thus one needs to stop the compare at the first closest match. The use of -dissimilarity-threshold is there so that the compare does not stop because the white pixel has too large an rmse when compared to any black pixel. This forces the search not to stop for too large a mismatch.

Input:
Image

Commands:

Crop the image into two nearly equal halves vertically.
Get the image width for use later when doing the final crop
Get the height of the top half for use when computing where to do the final crop
Scale the bottom half to one column
Get the y offset from the results of the compare
Add the y offset to the height of the top half to compute the y location in the full image to do the crop
Crop the original image into two parts defined by the compare offset.

Code: Select all

convert Fql1c1.png -crop 1x2@ +repage Fql1c1_%d.png

WW=`convert Fql1c1.png -format "%w" info:`

topH=`convert Fql1c1_0.png -format "%h" info:`

convert Fql1c1_1.png -scale 1x! Fql1c1_1_col.png

yoff=`compare -metric rmse -subimage-search -similarity-threshold 0.01% \
-dissimilarity-threshold 100% Fql1c1_1_col.png xc:white null: \
2>&1 | tr -cs "0-9" " " | cut -d\  -f4`

newH=$((topH+yoff))

convert Fql1c1.png -crop ${WW}x${newH} +repage Fql1c1_crop_%d.png
Results:

Image

Image

If you know the thickness of the spacing, you can add half the spacing to the newH computation so that it splits it in the middle of the spacing.
Last edited by fmw42 on 2013-07-28T17:42:53-07:00, edited 1 time in total.
B_Gaspar
Posts: 16
Joined: 2013-07-25T15:34:49-07:00
Authentication code: 6789

Re: Split image horizontally while avoiding to cut text

Post by B_Gaspar »

Cool results. I'm working on way to split left page from right page. Assuming this logic I could find the binding which should be the darkest point between the two pages
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split image horizontally while avoiding to cut text

Post by fmw42 »

Caution:

My above solution works only for a closely ideal situation.

If you are scanning text from a book, my solution above may not work well, because the text may not end up perfectly horizontal. Thus the spaces between lines of text will not be distinguishable when averaged down to one column. Each page would need to be separate if both pages are scanned together and then unrotated (-deskew possibly if the rotation is small). Even so, the curvature of the spine may distort the text such than the lines of text curve so that when unrotated, you still have a similar problem after averaging down to one column.
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Split image horizontally while avoiding to cut text

Post by fmw42 »

B_Gaspar wrote:Cool results. I'm working on way to split left page from right page. Assuming this logic I could find the binding which should be the darkest point between the two pages
Looking at

Image

if averaging down to one row, that may not be the case, because you have some very large font dark text for which one other column may average down to one pixel that is darker than the center column. But if the margins are wide, you should be able to find the darkest pixel near the center than has a rather light area on either side. Also the image is rotated so there would not be any one column. You would need to find the center of the darker region near the middle of the picture surrounded by a section of very light pixels.

I would also suggest that you floodfill the outside to white before looking for the center line so that you do not have extra black around the outside which would make all results darker and make it harder to distinguish pixels in the one average down row.
Post Reply