Page 1 of 1

$2000 for cleaning up scanned images

Posted: 2014-05-14T03:15:11-07:00
by alasdairdf

I can pay $2000 for an IM expert to help out with some scripts for cleaning up scanned pages from books. I would be even more interested if anyone can combine IM with some programming, or OpenCV. But just IM would be OK too.

These scans can consist of single images containing text, pictures, noise, artifacts, etc.

Background removal - currently I do this by manufacturing a predicted gradient background for the image and then dividing the original image by this predicted gradient. I think this could be optimized much further into something quite sophisticated.
Thresholding - probably local adaptive is the best bet, but will need some testing and optimization.
Dithering of pictures - I have a good technique for this, but perhaps someone can improve on it.
Noise removal - removing noise without affecting any "wanted" parts of the image, such as punctuation, pictures, borders, horizontal or vertical lines, etc.
Autorotation - this can be done without IM with a bit of trig on the OCR coordinates, but maybe IM has a good way of doing it. Worth a try, especially it might be useful for picture only pages where I can't use the OCR results for this.
Removal of page edges - detecting where the actual page starts and removing everything that is not the page.

Please let me know if anyone is interested. More details will be provided to anyone interested.


Re: $2000 for cleaning up scanned images

Posted: 2014-05-14T03:18:38-07:00
by dlemstra
If you want to combine IM with some programming you should probably also add your OS to the post. And can the latest version be used or does your system use an older version?

Re: $2000 for cleaning up scanned images

Posted: 2014-05-14T05:15:59-07:00
by snibgo
You might also say if the pictures on the pages are colour or monochrome. This can make a big difference to the complexity of the processing.

Re: $2000 for cleaning up scanned images

Posted: 2014-05-14T07:26:41-07:00
by alasdairdf
OS I'm working from is CentOS 6, 64bit. I actually use an older version of IM so that the effects of one particular feature are always the same. But I don't suppose it would be an issue to install 2 different version of IM at the same time? (Latest version and my preferred older version.)

These scans come in RGB but the result I want at the end is B&W dithered. At some point they have to be converted from color to grayscale and then to B&W, but when this is done is dependent on how to achieve the best results.