How to split page in two and reduce color?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
valexiev
Posts: 4
Joined: 2011-09-19T14:40:38-07:00
Authentication code: 8675308

How to split page in two and reduce color?

Post by valexiev » 2011-09-19T14:50:27-07:00

Hi! I'm totally new to IM and truly amazed and overwhelmed by its richness.
I want to scan several textbooks for my kid & convert to Kindle PDF, since her backpack is way too heavy.
I need to:
1. crop (I think I'll do it with interactive software since the pages are not well aligned)
2. split the page X.jpg in two, producing two pages (eg X0.jpg and X1.jpg)
3. reduce to 4-bit gray (I'll try to scan better for the next book)
4. What's the best image format to embed in PDF? Maybe PNG?

Could you please give me some pointers? For 3 I guess I need to read http://www.imagemagick.org/Usage/quantize/
An example page is at http://personal.sirma.bg/vladimir/page000.jpg

User avatar
fmw42
Posts: 25549
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: How to split page in two and reduce color?

Post by fmw42 » 2011-09-19T16:09:04-07:00

valexiev wrote:Hi! I'm totally new to IM and truly amazed and overwhelmed by its richness.
I want to scan several textbooks for my kid & convert to Kindle PDF, since her backpack is way too heavy.
I need to:
1. crop (I think I'll do it with interactive software since the pages are not well aligned)
2. split the page X.jpg in two, producing two pages (eg X0.jpg and X1.jpg)
3. reduce to 4-bit gray (I'll try to scan better for the next book)
4. What's the best image format to embed in PDF? Maybe PNG?

Could you please give me some pointers? For 3 I guess I need to read http://www.imagemagick.org/Usage/quantize/
An example page is at http://personal.sirma.bg/vladimir/page000.jpg

Are the pages well enough aligned that you can find some common crop position to split the images in two non-overlapping parts at the same X position? If so then you can write a script to loop over every file, crop, trim excess white and pad with a small amount of white if desired, reduce to 4-bit gray and then convert to PDF without having to save the file in any specific format if you don't want to.

See
http://www.imagemagick.org/Usage/crop/#crop
http://www.imagemagick.org/Usage/crop/#trim
http://www.imagemagick.org/Usage/crop/#border
http://www.imagemagick.org/script/comma ... colorspace
http://www.imagemagick.org/script/comma ... php#colors
http://www.imagemagick.org/Usage/quantize/#colors
http://www.imagemagick.org/script/comma ... ns.php#lat


Try something like one of these two, which split the image equally in two halves. The first leaves it as 4 grayshades. The second binarizes to b/w but is able to remove most of the background gray.



convert page000.jpg -crop 2x1@ +repage -colorspace gray +dither -colors 4 page000a_%d.png

or

convert page000.jpg -crop 2x1@ +repage -colorspace gray -negate -lat 15x15+10% -negate page000b_%d.png


OR to PDF

convert page000.jpg -crop 2x1@ +repage -colorspace gray +dither -colors 4 page000a.pdf

or

convert page000.jpg -crop 2x1@ +repage -colorspace gray -negate -lat 15x15+10% -negate page000b.pdf

User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: How to split page in two and reduce color?

Post by anthony » 2011-09-19T22:51:06-07:00

Also as the pages are scanned, they will be in raster image form, even when later saved as PDF.
See A word about Vector Image formats
http://www.imagemagick.org/Usage/formats/#vector

As such you may like to consider exactly what the resolution and pixel size for the final PDF (containing raster images) is most appropriate kindle.

We can not help you with the kindle itself, but if you do discover that information, adding that information here, or a pointer to that information for future reader would be helpful.
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/

valexiev
Posts: 4
Joined: 2011-09-19T14:40:38-07:00
Authentication code: 8675308

Re: How to split page in two and reduce color?

Post by valexiev » 2011-09-19T23:31:08-07:00

Thanks for the quick reply!
what resolution and pixel size is most appropriate for kindle
The Kindle DX is 825 x 1200 px, 150 dpi, 4-bit gray (E-Ink, 9.7" diagonal, 5.5 x 8 in, 140 x 203 mm, 0.682 aspect).
It has a very decent PDF reader by Adobe that does a good job at scaling.
It's important to cut 2-up pages and crop them to the bare bones, to get maximum reading size.
I even crop out the page number, and use Acrobat's "Number pages" to put the same numbers as in the original.

The jpeg above clearly uses bad scanning choices, but my kid scanned 160 pages so I don't want to throw them away.

I'm on Windows7 64-bit and have CYGWIN_NT-6.1-WOW64 1.7.9(0.237/5/3) 2011-03-29.
That includes package ImageMagick-6.4.0.6 (there isn't a more recent at cygwin.com).
The command eats up a lot of memory, then gives an error:

Code: Select all

$ convert page000.jpg -crop 2x1@ +repage -colorspace gray +dither -colors 4 page000a_%d.png
convert: UnableToConcatenateString `Cannot allocate memory'.
Warning: recursive semaphore lock detected!
Which binary distrib should I upgrade to? I find these at ftp://gd.tuwien.ac.at/pub/graphics/Imag ... /binaries/

Code: Select all

ImageMagick-6.7.2-7-Q16-windows-dll.exe         16.8 MB 9/17/11 4:45:00 PM
ImageMagick-6.7.2-7-Q16-windows-static.exe      34.6 MB 9/17/11 4:46:00 PM
ImageMagick-6.7.2-7-Q16-windows-x64-dll.exe     16.5 MB 9/17/11 4:47:00 PM
ImageMagick-6.7.2-7-Q16-windows-x64-static.exe  36.5 MB 9/17/11 4:48:00 PM
ImageMagick-6.7.2-7-Q8-windows-dll.exe          16.8 MB 9/17/11 4:49:00 PM
ImageMagick-6.7.2-7-Q8-windows-static.exe       34.5 MB 9/17/11 4:50:00 PM
ImageMagick-6.7.2-Q16-windows.zip               43.1 MB 9/17/11 7:13:00 PM
ImageMagick-i686-pc-cygwin.tar.gz               43.0 MB 9/10/11 4:29:00 PM
What's Q8 vs Q16?
I'll now try ImageMagick-6.7.2-7-Q16-windows-x64-dll.exe

valexiev
Posts: 4
Joined: 2011-09-19T14:40:38-07:00
Authentication code: 8675308

Re: How to split page in two and reduce color?

Post by valexiev » 2011-09-19T23:53:54-07:00

ImageMagick-6.7.2-7-Q16-windows-x64-dll.exe worked like a charm.
The result is satisfactory using both methods, and I have to try it on Kindle to choose one.

Original jpeg: 1.5M: http://personal.sirma.bg/vladimir/page000.jpg
+dither (gray): 200k: http://personal.sirma.bg/vladimir/page000a_1.png
-negate (binarized): 60k: http://personal.sirma.bg/vladimir/page000b_1.png

A minor point: page000a_1.png is reported as 8 bpp, not 4bpp:
- imagine.exe: 8 BPP
- identify -verbose: Depth: 8-bit; Channel depth: gray: 8-bit; Colors: 4
I guess this has minimal effect on file size since the empty bits are compressed away?

I'll now read up on the documentation to figure out the commands.
I may also try expanding to 300dpi before binarizing, to see if Kindle's PDF reader can do something good with the extra pixels.
Thanks for your help!!

User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: How to split page in two and reduce color?

Post by anthony » 2011-09-20T16:30:18-07:00

I would probably try not to threshold the image, but preserve the anti-aliased edges.

Also the reason the image is 8bpp is that it is grayscale not binary, 1bpp is binary.
That first page however has color in it which make it non-grayscale.

What I would do is first try to clean up the background. For example see Composite Division
http://www.imagemagick.org/Usage/compose/#divide

I may also at this point try to remove any extra scan noise by using -morphology Smooth Square. (or perhaps just open or close instead of smooth. Yes morphology was designed with binary images in mind but it works well with greyscale images too.

Now to separate the images I would use a technique of vertical compression. That is use -resize {width}x1\! where {width} is the current image width. The resulting image is a simple line of pixels that should let you algorithmically determine the gap between the two pages so you can separate them.

After that it is just a optional -deskew and saving the page images as you like.


NOTE all the above has been added to specialised page scanning software. ImageMagick provides low level tools to DIY thing exactly as you like, but other software may be more suited to the more specialised task. And yes there are free versions too.

One free version I have found is scantailor
http://scantailor.sourceforge.net/
I have not tried it but it seems to be something like what you are after.
Also see the DIY Book scan Forum for Scan Tailor (or other book scanning software!)
http://www.diybookscanner.org/forum/viewforum.php?f=8
I have noted that Imagemagick is mentioned regularly in those forums, as a low level image processor, that a number of book scanners use to do there tasks 8)
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/

valexiev
Posts: 4
Joined: 2011-09-19T14:40:38-07:00
Authentication code: 8675308

Re: How to split page in two and reduce color?

Post by valexiev » 2011-10-20T00:42:46-07:00

ScanTailor worked perfectly!

Post Reply