horizontal splitting by white line

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
thes
Posts: 2
Joined: 2019-03-26T23:58:28-07:00
Authentication code: 1152

horizontal splitting by white line

Post by thes » 2019-03-27T00:34:17-07:00

hi, I have a scanned document. I would like to cut it into multiple image files: each visible text line into an image file.

some of the options tried:
I have looked at Fred's ImageMagick Scripts page and thought of doing a search of horizontal white color image file inside bigger file - in this case the scanned document image. But could not find a way. Also looked at splice option in convert and some more searches with the keywords "horizontal splitting" but could not find relevant ones. Hence posting this question.

Request to kindly help me. Btw, I have been using ImageMagick for around 2 decades and I guess I can express my thank you here. Thank you.

Version: ImageMagick 7.0.8-26 Q16 x86_64 20190203

snibgo
Posts: 11809
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: horizontal splitting by white line

Post by snibgo » 2019-03-27T06:13:07-07:00

What platform?

Please show a sample image.
snibgo's IM pages: im.snibgo.com

User avatar
fmw42
Posts: 25135
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: horizontal splitting by white line

Post by fmw42 » 2019-03-27T19:55:11-07:00

thes wrote:
2019-03-27T00:34:17-07:00
hi, I have a scanned document. I would like to cut it into multiple image files: each visible text line into an image file.
Average the image down to one column using -scale and threshold to black and white such that everything not near white is black. You should then see alternating runs of black and white. Then use -connected-components to get the bounding box heights and offsets of each black region. Use those to crop the lines of text.

thes
Posts: 2
Joined: 2019-03-26T23:58:28-07:00
Authentication code: 1152

Re: horizontal splitting by white line

Post by thes » 2019-03-27T20:07:17-07:00

hi snibgo,

It is "openSUSE Tumbleweed". I am using the following image for my test:
https://upload.wikimedia.org/wikipedia/ ... e_Text.jpg

hi Fred,

Thanks for the pointers. Will explore the same.

Thanks for the replies.

User avatar
fmw42
Posts: 25135
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: horizontal splitting by white line

Post by fmw42 » 2019-03-27T21:36:59-07:00

Try the following. I put your image in a folder called test on my desktop. I am on a Mac. So the following is bash Unix scripting.

First, I get the image dimensions.
Then, I blur, scale to 1 column, then scale back to the size of the input.
Then, I run connected components and filter out only the black region bounding boxes, which are your crop areas. I put them in an array.
Then I run a loop over each crop region and crop your image and add a border of 2 pixels white, writing each to the test directory.

The reason for the blur is to make sure the quotes are included in the same black region as the text after thresholding.

Code: Select all

cd
cd desktop/test
WxH=`convert image.jpg -format "%wx%h" info:`
cropArr=(`convert image.jpg -blur 0x3 \
-scale 1x! -scale ${WxH}! \
-negate -threshold 2% -negate \
-type bilevel \
-define connected-components:verbose=true \
-connected-components 4 null: | grep "gray(0)" | awk '{print $2}'`)
num=${#cropArr[*]}
for ((i=0; i<num; i++)); do
j=$((i+1))
jj=`printf %02d $j`
convert image.jpg -crop "${cropArr[$i]}" +repage -bordcolor white -border 2 image_line_$jj.jpg
done

godfried76
Posts: 6
Joined: 2019-06-12T03:27:00-07:00
Authentication code: 1152

Re: horizontal splitting by white line

Post by godfried76 » 2019-06-12T14:28:18-07:00

This produces image files with the slices, but they don't seem to come out in the right order (top-down ~ 01..99).

Could this be fixed in any way?

Post Reply