textdeskew regression_Arr: bad array subscript

A plethora of command-line scripts that perform geometric transforms, blurs, sharpens, edging, noise removal, and color manipulations.
User avatar
fmw42
Posts: 22086
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: textdeskew regression_Arr: bad array subscript

Post by fmw42 » 2017-10-22T15:06:10-07:00

P.S. Try uninstalling from Homebrew if you cannot figure out how to install FFTW and install from the IM binary at http://www.imagemagick.org/script/download.php#macosx. Or install IM 6 from MacPorts.

User avatar
fmw42
Posts: 22086
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: textdeskew regression_Arr: bad array subscript

Post by fmw42 » 2017-10-22T15:08:51-07:00

Search Google for "install imagemagick from homebrew with FFTW". There are some references to that issue. For example one said, just do a reinstall with FFTW

brew reinstall imagemagick --with-fftw


User avatar
fmw42
Posts: 22086
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: textdeskew regression_Arr: bad array subscript

Post by fmw42 » 2017-10-22T15:19:30-07:00

I do not know about homebrew, but when I install, I install all my delegates first from MacPorts and then install IM from source. But I think in general one needs to install the needed delegates before installing IM and it needs to be installed in the same place so it can find the delegates.

wrumble
Posts: 11
Joined: 2017-10-22T06:03:07-07:00
Authentication code: 1151

Re: textdeskew regression_Arr: bad array subscript

Post by wrumble » 2017-10-22T15:21:56-07:00

Yea so i found that and did the same but the install is without fftw, will try again though.

Running convert -version on my docker image gives:

Code: Select all

Version: ImageMagick 6.8.9-9 Q16 x86_64 2017-07-31 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2014 ImageMagick Studio LLC
Features: DPC Modules OpenMP
Delegates: bzlib cairo djvu fftw fontconfig freetype jbig jng jpeg lcms lqr ltdl lzma openexr pangocairo png rsvg tiff wmf x xml zlib
when i run the updated script from the docker image the outpu is now:

Code: Select all

136,136 gray=100%
236,134 gray=77.862%
258,139 gray=73.721%
36,138 gray=71.446%
235,113 gray=70.399%
14,133 gray=69.438%
236,155 gray=68.846%
259,160 gray=67.991%
136,157 gray=67.962%
220,135 gray=67.948%
136,115 gray=67.924%
186,135 gray=67.451%
137,177 gray=67.304%
135,95 gray=67.135%
138,217 gray=67.073%
134,55 gray=66.865%
228,128 gray=66.796%
138,238 gray=65.933%
134,34 gray=65.756%
258,119 gray=65.58%
awk: line 38: syntax error at or near *

rnum=0;
/home/work/textdeskew: line 468: regression_Arr: bad array subscript
/home/work/textdeskew: line 474: regression_Arr: bad array subscript
Rotating Image 90 degrees
Ive run the brew uninstall imagemagick and install imagemagick --with-fftw and thats when i got the install on my mac i just showed you. Will try the from source version now

User avatar
fmw42
Posts: 22086
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: textdeskew regression_Arr: bad array subscript

Post by fmw42 » 2017-10-22T16:39:31-07:00

For some reason your environment is having trouble with AWK and giving you rnum=0 after finding all the points. Perhaps you do not have the Unix utility AWK installed on you Docker system.

When I run it on my Mac OSX Sierra in IM 6.8.9.9, I get

Code: Select all

136,136 gray=100%
236,134 gray=77.862%
258,139 gray=73.721%
36,138 gray=71.446%
235,113 gray=70.399%
14,133 gray=69.438%
236,155 gray=68.846%
259,160 gray=67.991%
136,157 gray=67.962%
220,135 gray=67.948%
136,115 gray=67.924%
186,135 gray=67.451%
137,177 gray=67.304%
135,95 gray=67.135%
138,217 gray=67.073%
134,55 gray=66.865%
228,128 gray=66.796%
138,238 gray=65.933%
134,34 gray=65.756%
258,119 gray=65.58%

rnum=27;
Residual1:0.963979
Residual2:2.86863
Residual3:1.72728
Residual4:4.79659
Residual5:23.8468
Residual6:0.200683
Residual7:18.1278
Residual8:22.7054
Residual9:21.9605
Residual10:1.57553
Residual11:20.0325
Residual12:0.952328
Residual13:41.9388
Residual14:40.0108
Residual15:81.9137
Residual16:79.9858
Residual17:8.72099
Residual18:102.91
Residual19:100.982
Residual20:18.2694
res_ave:29.7245
res_std:33.4134
phi:-1.55247
r:-132.52
Slope:0.0183326
Intercept:132.543
Angle:1.05026
res_std=33.4134; res_std2=66.8268;
res_0=0.963979; res_std2=66.8268; test2=0;
res_1=2.86863; res_std2=66.8268; test2=0;
res_2=1.72728; res_std2=66.8268; test2=0;
res_3=4.79659; res_std2=66.8268; test2=0;
res_4=23.8468; res_std2=66.8268; test2=0;
res_5=0.200683; res_std2=66.8268; test2=0;
res_6=18.1278; res_std2=66.8268; test2=0;
res_7=22.7054; res_std2=66.8268; test2=0;
res_8=21.9605; res_std2=66.8268; test2=0;
res_9=1.57553; res_std2=66.8268; test2=0;
res_10=20.0325; res_std2=66.8268; test2=0;
res_11=0.952328; res_std2=66.8268; test2=0;
res_12=41.9388; res_std2=66.8268; test2=0;
res_13=40.0108; res_std2=66.8268; test2=0;
res_14=81.9137; res_std2=66.8268; test2=1;
res_15=79.9858; res_std2=66.8268; test2=1;
res_16=8.72099; res_std2=66.8268; test2=0;
res_17=102.91; res_std2=66.8268; test2=1;
res_18=100.982; res_std2=66.8268; test2=1;
res_19=18.2694; res_std2=66.8268; test2=0;
newnum=16
136,136 236,134 258,139 36,138 235,113 14,133 236,155 259,160 136,157 220,135 136,115 186,135 137,177 135,95 228,128 258,119

rnum=23;
Residual1:0.607143
Residual2:1.79797
Residual3:3.11287
Residual4:3.01225
Residual5:22.7937
Residual6:1.89858
Residual7:19.2019
Residual8:24.1086
Residual9:21.607
Residual10:0.733154
Residual11:20.3927
Residual12:0.595412
Residual13:41.6028
Residual14:40.3885
Residual15:7.76551
Residual16:16.887
res_ave:14.1566
res_std:13.4165
phi:-1.56675
r:-134.841
Slope:0.00405128
Intercept:134.842
Angle:0.23212
Rotating Image 89.7679 degrees

wrumble
Posts: 11
Joined: 2017-10-22T06:03:07-07:00
Authentication code: 1151

Re: textdeskew regression_Arr: bad array subscript

Post by wrumble » 2017-10-23T03:27:18-07:00

So i installed IM at work and managed to get an error free response from textdeskew, but it rotated 89.7679 degrees like yours did. I got a slightly better response with unrotate which rotated 6.34 degrees but actually made it worse, haha. See https://imgur.com/aabtztD. But kept getting the following error when using textcleaner on the original image https://imgur.com/JKxDCP3

Code: Select all

convert: geometry does not contain image `./textcleaner_1_62052.mpc' @ warning/attribute.c/GetImageBoundingBox/240.
This was all on my work laptop. Still no joy on my docker image. When i run unrotate on the docker image i get the following response:

Code: Select all

/home/work/unrotate: line 335: bc: command not found
/home/work/unrotate: line 336: bc: command not found
/home/work/unrotate: line 335: bc: command not found
/home/work/unrotate: line 336: bc: command not found
/home/work/unrotate: line 335: bc: command not found
/home/work/unrotate: line 336: bc: command not found
/home/work/unrotate: line 335: bc: command not found
/home/work/unrotate: line 336: bc: command not found
/home/work/unrotate: line 531: bc: command not found
/home/work/unrotate: line 532: bc: command not found
/home/work/unrotate: line 533: bc: command not found
/home/work/unrotate: line 533: [: -eq: unary operator expected
/home/work/unrotate: line 537: bc: command not found
What would you recomend i do/use if im trying to clean up a restaurant receipt image taken from a smart phone that needs to be used for OCR with tesseract? Thanks for all your help so far by the way, amazing.

User avatar
fmw42
Posts: 22086
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: textdeskew regression_Arr: bad array subscript

Post by fmw42 » 2017-10-23T10:46:29-07:00

Your Docker OS is missing the unix bc (basic calculator) command. The script uses bc. Install bc and try again.

Perhaps Docker is missing many of the unix commands such as awk, also. That could explain why you are having such problem on Docker. Sorry I do not know Docker.

Post your receipt image to some free hosting service such as dropbox.com and put the URL here, so I can see what it looks like. Some receipts just do not process well.

User avatar
fmw42
Posts: 22086
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: textdeskew regression_Arr: bad array subscript

Post by fmw42 » 2017-10-23T10:57:01-07:00

The gray background on this image makes it very hard to process and requires custom processing.

Image

My textcleaner does not work on this. Nor does my textdeskew or unrotated. However, you can clean it up by making gray into white and then using the imagemagick -deskew to correct the rotation since it is less than about 5 deg.

Code: Select all

convert JKxDCP3.png -contrast-stretch 10,60% -background white -deskew 40% JKxDCP3_proc.png
Image

You can play with the contrast-stretch arguments to try to make it cleaner.

wrumble
Posts: 11
Joined: 2017-10-22T06:03:07-07:00
Authentication code: 1151

Re: textdeskew regression_Arr: bad array subscript

Post by wrumble » 2017-10-23T11:48:22-07:00

Amazing so installed gawk and bc on the docker image and both unrotate and textdeskew run on the image but give the results you have seen already.
Here is a link to the file via dropbox https://www.dropbox.com/s/ef4xnr97pwyd1 ... e.png?dl=0. i wonder if its worth it as im sure much better quality images will be taken with a smart phone.

But do you have any recommendations for rendering receipt images better for OCR, my app tells the user to crop the photos so they look like these

https://www.dropbox.com/s/yp08k9vwqc31c ... 1.png?dl=0
https://www.dropbox.com/s/6w5zhl5omdk6m ... 2.png?dl=0
https://www.dropbox.com/s/raynwi1i5rgip ... 3.png?dl=0

Just wondering if you had any recommendations that would work for most receipts at preprocessing for OCR

User avatar
fmw42
Posts: 22086
Joined: 2007-07-02T17:14:51-07:00
Location: Sunnyvale, California, USA

Re: textdeskew regression_Arr: bad array subscript

Post by fmw42 » 2017-10-23T12:39:07-07:00

Most of your other images, should process reasonably with textcleaner. The other ones do not need much rotation, so you could just use -deskew. The second one has small fonts and thus you may not be able to do anything to preprocess to help the OCR since it does not work well with too small fonts. I am not an expert with OCR. Your original image, I processed in my previous post. But it too has small fonts. The gray background makes it very hard to preprocess with textcleaner. So I had to use -contrast-stretch.

Higher resolution will help.
Clean white backgrounds will help.
Straight-on views as opposed to oblique views will help.

Some of my scripts that may help are: textcleaner, unrotate, textdeskew, whiteboard, unperspective.

You can also preprocess with -contrast-stretch and rotate with -deskew.

If you have spot noise, you can try either my script, isonoise, or use -connected-components to remove spots smaller than some area (but never as big as the dot in the i character.)

Some receipts are just oddball bad ones that likely nothing will work.

wrumble
Posts: 11
Joined: 2017-10-22T06:03:07-07:00
Authentication code: 1151

Re: textdeskew regression_Arr: bad array subscript

Post by wrumble » 2017-10-23T12:51:29-07:00

This is great, thank you so much for all your help

Post Reply