I have a large number of scanned images which are from a book with some exam questions. Examles:
https://dl.dropboxusercontent.com/u/639 ... ge-236.png
https://dl.dropboxusercontent.com/u/639 ... ge-237.png
https://dl.dropboxusercontent.com/u/639 ... ge-238.png
https://dl.dropboxusercontent.com/u/639 ... ge-329.png
https://dl.dropboxusercontent.com/u/639 ... ge-240.png
https://dl.dropboxusercontent.com/u/639 ... ge-239.png
What I try to achieve is the following:
1. Clean the noise from scanner - I mean these little dots and dashes that are around the text
2. Rotate the image - the middle vertical line should be perpendicular to the image's top and bottom edges
3. Crop each question in separate image
4. Remove white space from each individual image
I managed to partially achieve 1. Clean the noise from scanner using the following commands:
Both are giving relatively satisfactory results. The problems are when on the page there is a drawing. This cleaning cleans even some pixels from the drawings. If someone can recommend better method for cleaning the noise will be great.
Code: Select all
convert source.png -write MPR:source -morphology close rectangle:3x2 result.png convert source.png -write MPR:source -morphology close diamond result.png
For 2 Rotate the image I tried http://fmwconcepts.com/imagemagick/unrotate/index.php from Fred's scripts but I didn't manage to make it work. Can someone advice how can I approach this?
For 3. Crop each question in separate image - I am not even sure if this is possible only with ImageMagic. Maybe I will need some OCR which detects where the question starts and ends and having these coordinates I can use ImageMagic to crop the image in several pieces? Any suggestions for tools/libraries will be highly appreciated.
For 4. This is clear, I had done it before.
I am using ImageMagick's command line too convert on Mac OS Sierra, version:
Version: ImageMagick 6.9.6-3 Q16 x86_64 2016-10-31 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2016 ImageMagick Studio LLC
Features: Cipher DPC Modules
Delegates (built-in): bzlib freetype jng jpeg ltdl lzma png tiff xml zlib
If you need more information about the tools I am using or the images I am ready to assist.
Any help or directions for achieving the output will be really appreciated.