Page 1 of 1

horizontal splitting by white line

Posted: 2019-03-27T00:34:17-07:00
by thes
hi, I have a scanned document. I would like to cut it into multiple image files: each visible text line into an image file.

some of the options tried:
I have looked at Fred's ImageMagick Scripts page and thought of doing a search of horizontal white color image file inside bigger file - in this case the scanned document image. But could not find a way. Also looked at splice option in convert and some more searches with the keywords "horizontal splitting" but could not find relevant ones. Hence posting this question.

Request to kindly help me. Btw, I have been using ImageMagick for around 2 decades and I guess I can express my thank you here. Thank you.

Version: ImageMagick 7.0.8-26 Q16 x86_64 20190203

Re: horizontal splitting by white line

Posted: 2019-03-27T06:13:07-07:00
by snibgo
What platform?

Please show a sample image.

Re: horizontal splitting by white line

Posted: 2019-03-27T19:55:11-07:00
by fmw42
thes wrote: 2019-03-27T00:34:17-07:00 hi, I have a scanned document. I would like to cut it into multiple image files: each visible text line into an image file.
Average the image down to one column using -scale and threshold to black and white such that everything not near white is black. You should then see alternating runs of black and white. Then use -connected-components to get the bounding box heights and offsets of each black region. Use those to crop the lines of text.

Re: horizontal splitting by white line

Posted: 2019-03-27T20:07:17-07:00
by thes
hi snibgo,

It is "openSUSE Tumbleweed". I am using the following image for my test:
https://upload.wikimedia.org/wikipedia/ ... e_Text.jpg

hi Fred,

Thanks for the pointers. Will explore the same.

Thanks for the replies.

Re: horizontal splitting by white line

Posted: 2019-03-27T21:36:59-07:00
by fmw42
Try the following. I put your image in a folder called test on my desktop. I am on a Mac. So the following is bash Unix scripting.

First, I get the image dimensions.
Then, I blur, scale to 1 column, then scale back to the size of the input.
Then, I run connected components and filter out only the black region bounding boxes, which are your crop areas. I put them in an array.
Then I run a loop over each crop region and crop your image and add a border of 2 pixels white, writing each to the test directory.

The reason for the blur is to make sure the quotes are included in the same black region as the text after thresholding.

Code: Select all

cd
cd desktop/test
WxH=`convert image.jpg -format "%wx%h" info:`
cropArr=(`convert image.jpg -blur 0x3 \
-scale 1x! -scale ${WxH}! \
-negate -threshold 2% -negate \
-type bilevel \
-define connected-components:verbose=true \
-connected-components 4 null: | grep "gray(0)" | awk '{print $2}'`)
num=${#cropArr[*]}
for ((i=0; i<num; i++)); do
j=$((i+1))
jj=`printf %02d $j`
convert image.jpg -crop "${cropArr[$i]}" +repage -bordcolor white -border 2 image_line_$jj.jpg
done

Re: horizontal splitting by white line

Posted: 2019-06-12T14:28:18-07:00
by godfried76
This produces image files with the slices, but they don't seem to come out in the right order (top-down ~ 01..99).

Could this be fixed in any way?