How to convert large PDF with limit resources

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
liys_0
Posts: 7
Joined: 2016-02-12T02:03:12-07:00
Authentication code: 1151

How to convert large PDF with limit resources

Post by liys_0 » 2016-02-12T18:22:12-07:00

We are using ImageMagick in our program to convert PDF files to images with -limit disk and -limit memory.
We do not want the program consume too much memory or disk space.

When the PDF files are very large, it naturally leads to the error CacheResourcesExhaused.

Is "stream" the only way to handle this? or can we use some functions to convert the PDF portion by portion and then merge them together to one image.

Looking forward to your suggestions :)

User avatar
GeeMack
Posts: 722
Joined: 2015-12-01T22:09:46-07:00
Authentication code: 1151
Location: Central Illinois, USA

Re: How to convert large PDF with limit resources

Post by GeeMack » 2016-02-12T22:13:04-07:00

liys_0 wrote:Is "stream" the only way to handle this? or can we use some functions to convert the PDF portion by portion and then merge them together to one image.
It would be helpful if you could describe your set-up, like what platform you're working on and what version of ImageMagick.

I use ImageMagick version 7 on a Windows 7 64 machine. I haven't done much with PDF files, but if it's a multi-page document you can open just one page, several individual pages, or a range of pages by specifying the page(s) in square brackets like these examples.

This would convert just page 5. (The indexing starts at number 0.)...

Code: Select all

convert -density 300 document1.pdf[4] -resize 2550x3300 document1_04.jpg
This would convert pages 7 through 12 and output 6 individual files. (The operator "-scene 6" will start numbering the output files at 06.)...

Code: Select all

convert -density 300 document1.pdf[6-11] -resize 2550x3300 -scene 6 document1_%02d.jpg
This would convert pages 7 through 12 then append them to a single very long file...

Code: Select all

convert -density 300 document1.pdf[6-11] -resize 2550x3300 -append document1_06-11.jpg
Maybe you can stay inside your resource limit by building a "for" loop to process your PDFs in pieces smaller than the entire file at once.

atariZen
Posts: 25
Joined: 2016-02-09T12:58:42-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by atariZen » 2016-02-13T05:11:39-07:00

ImageMagick may not be the best tool for this, depending on the source file.

If your source PDF container is simply one large raster image per page, with no vector graphics or other PDF features, then it's better to use the pdfimages tool to extract each image to a file, which is not computationally intensive, and in fact non-lossy as well. Of course, if these are fancier vector PDFs, then you must render each page, in which case you'll need ImageMagick, LaTeX, or the burst feature of pdftk.

If disk space is important is marginal lossyness is acceptable, consider converting the documents to the DjVu format.

liys_0
Posts: 7
Joined: 2016-02-12T02:03:12-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by liys_0 » 2016-02-14T19:23:14-07:00

GeeMack wrote:
liys_0 wrote:Is "stream" the only way to handle this? or can we use some functions to convert the PDF portion by portion and then merge them together to one image.
It would be helpful if you could describe your set-up, like what platform you're working on and what version of ImageMagick.

I use ImageMagick version 7 on a Windows 7 64 machine. I haven't done much with PDF files, but if it's a multi-page document you can open just one page, several individual pages, or a range of pages by specifying the page(s) in square brackets like these examples.

This would convert just page 5. (The indexing starts at number 0.)...

Code: Select all

convert -density 300 document1.pdf[4] -resize 2550x3300 document1_04.jpg
This would convert pages 7 through 12 and output 6 individual files. (The operator "-scene 6" will start numbering the output files at 06.)...

Code: Select all

convert -density 300 document1.pdf[6-11] -resize 2550x3300 -scene 6 document1_%02d.jpg
This would convert pages 7 through 12 then append them to a single very long file...

Code: Select all

convert -density 300 document1.pdf[6-11] -resize 2550x3300 -append document1_06-11.jpg
Maybe you can stay inside your resource limit by building a "for" loop to process your PDFs in pieces smaller than the entire file at once.

Thanks GeeMack. Your suggestion is helpful in some cases:)
We are now working on windows 7 using the version: ImageMagick-6.9.3-4-Q8-x86

Some of our PDF files only contain one very large page.
E.g. we use -density 200 to convert a one-page PDF to a .png file. The generated file is 26400x35200 pixels.
We have to increase the disk to 4GB for the conversion. As we know,it should not be the largest one from our users.

In this case, can we convert one page portion by portion to use less resources? : )

liys_0
Posts: 7
Joined: 2016-02-12T02:03:12-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by liys_0 » 2016-02-14T19:33:47-07:00

atariZen wrote:ImageMagick may not be the best tool for this, depending on the source file.

If your source PDF container is simply one large raster image per page, with no vector graphics or other PDF features, then it's better to use the pdfimages tool to extract each image to a file, which is not computationally intensive, and in fact non-lossy as well. Of course, if these are fancier vector PDFs, then you must render each page, in which case you'll need ImageMagick, LaTeX, or the burst feature of pdftk.

If disk space is important is marginal lossyness is acceptable, consider converting the documents to the DjVu format.
Thanks for your suggestions : )

The PDF files are vector PDFs. Do you mean that we convert the PDFs to Djvu first and then convert the Djvu to image files?

User avatar
GeeMack
Posts: 722
Joined: 2015-12-01T22:09:46-07:00
Authentication code: 1151
Location: Central Illinois, USA

Re: How to convert large PDF with limit resources

Post by GeeMack » 2016-02-14T23:28:15-07:00

liys_0 wrote:In this case, can we convert one page portion by portion to use less resources? : )
If you know the PDF is just a single page document, you could start by using ImageMagick to determine the output dimensions of the image using something like this from the command line...

Code: Select all

convert -density 200 largefile.pdf info:
That will print what the dimensions will be after converting it to a 200dpi image, and some other information.

Then when you know the size in pixels, you can read just a part of the PDF into IM's "convert" by putting the geometry of the requested portion in square brackets at the end of the file name.

For example, I checked the dimensions of my "largefile.pdf" using the command above and found it will make a 2400x2400 pixel PNG. Then if I want to get just the top left quarter of the PDF I would use a command like this...

Code: Select all

convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
With that command IM will take just a 1200x1200 piece starting at the first pixel in the upper left corner, pixel "+0+0". I can get all four quarters of the PDF into separate PNG files with a series of commands like this...

Code: Select all

convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
convert -density 200 largefile.pdf[1200x1200+1200+0] -flatten part1B.png
convert -density 200 largefile.pdf[1200x1200+0+1200] -flatten part2A.png
convert -density 200 largefile.pdf[1200x1200+1200+1200] -flatten part2B.png
To reassemble those four PNG images into a single image later I could use a command like this...

Code: Select all

convert ( part1A.png part1B.png +append ) ( part2A.png part2B.png +append ) -append largefile.png 
Or maybe tile them back together with a properly constructed IM "montage" command.

To make the dis-assembly automated so it can use varying sizes of input files would require a slightly tricky BAT file with some nested "for" loops to get the image size into some variables, break it into parts by calculating those variables, and creating unique meaningful file names for all the output files. But that's a Windows programming issue, not ImageMagick.

I don't know how to run a "convert" command on a particular page of a multi-page PDF and have it use just a segment of the page, since both those processes use square brackets at the end of the file names to specify the details. I tried using two sets of square brackets and had no success. Someone else here might know a way to make that happen. It may not be possible.

atariZen
Posts: 25
Joined: 2016-02-09T12:58:42-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by atariZen » 2016-02-15T00:07:11-07:00

liys_0 wrote: The PDF files are vector PDFs. Do you mean that we convert the PDFs to Djvu first and then convert the Djvu to image files?
Since you're starting with vector PDFs, and must have non-djvu images in the end, I don't think my suggestions will help much. I would only use the lossy djvu format as a middle step if you desperately need to hack around a problem. But I don't think that will help you. In fact, djvu processing is very resource intensive.

liys_0
Posts: 7
Joined: 2016-02-12T02:03:12-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by liys_0 » 2016-02-15T00:42:27-07:00

GeeMack wrote:
liys_0 wrote:In this case, can we convert one page portion by portion to use less resources? : )
I don't know how to run a "convert" command on a particular page of a multi-page PDF and have it use just a segment of the page, since both those processes use square brackets at the end of the file names to specify the details. I tried using two sets of square brackets and had no success. Someone else here might know a way to make that happen. It may not be possible.
Thanks very much for your detailed explanation! This is just as what I think. :)

liys_0
Posts: 7
Joined: 2016-02-12T02:03:12-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by liys_0 » 2016-02-15T00:42:59-07:00

atariZen wrote:
liys_0 wrote: The PDF files are vector PDFs. Do you mean that we convert the PDFs to Djvu first and then convert the Djvu to image files?
Since you're starting with vector PDFs, and must have non-djvu images in the end, I don't think my suggestions will help much. I would only use the lossy djvu format as a middle step if you desperately need to hack around a problem. But I don't think that will help you. In fact, djvu processing is very resource intensive.
Thanks :D

snibgo
Posts: 12272
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: How to convert large PDF with limit resources

Post by snibgo » 2016-02-15T01:31:19-07:00

GeeMack wrote:convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
But does that help the resource problem? With a "-verbose" we can see the Ghostscript command. I get the same command with or without the geometry spec.

Putting [0] as the page spec, I get a different GS command that includes "-dFirstPage=1 -dLastPage=1".

With the geometry spec, I conclude that IM tells GS to render all the pages, and all of each page, and then IM discards much of the data. So it may not help much (depending on how much memory GS uses).

For this type of problem, I would ignore IM, and see what options GS has for rendering parts of pages.
snibgo's IM pages: im.snibgo.com

User avatar
GeeMack
Posts: 722
Joined: 2015-12-01T22:09:46-07:00
Authentication code: 1151
Location: Central Illinois, USA

Re: How to convert large PDF with limit resources

Post by GeeMack » 2016-02-15T18:35:56-07:00

snibgo wrote:But does that help the resource problem?
I just drive the thing. I don't know what goes on under the hood. :wink: It stands to reason there would be no substantial savings if IM is calling the same GS command in either instance.
snibgo wrote:For this type of problem, I would ignore IM, and see what options GS has for rendering parts of pages.
Sounds like a good plan. GS can break a PDF document into pieces and output them directly as PNG images without any help from IM at all.

liys_0
Posts: 7
Joined: 2016-02-12T02:03:12-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by liys_0 » 2016-03-01T00:48:35-07:00

snibgo wrote:
GeeMack wrote:convert -density 200 largefile.pdf[1200x1200+0+0] -flatten part1A.png
But does that help the resource problem? With a "-verbose" we can see the Ghostscript command. I get the same command with or without the geometry spec.

Putting [0] as the page spec, I get a different GS command that includes "-dFirstPage=1 -dLastPage=1".

With the geometry spec, I conclude that IM tells GS to render all the pages, and all of each page, and then IM discards much of the data. So it may not help much (depending on how much memory GS uses).

For this type of problem, I would ignore IM, and see what options GS has for rendering parts of pages.
Thank you. This sounds good. Is there anyway to check the memory used by Ghostscript when do the PDF to image conversion?

snibgo
Posts: 12272
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: How to convert large PDF with limit resources

Post by snibgo » 2016-03-01T01:27:34-07:00

Your operating system will have tools to monitor memory usage.
snibgo's IM pages: im.snibgo.com

liys_0
Posts: 7
Joined: 2016-02-12T02:03:12-07:00
Authentication code: 1151

Re: How to convert large PDF with limit resources

Post by liys_0 » 2016-03-02T22:47:58-07:00

snibgo wrote:Your operating system will have tools to monitor memory usage.
Thanks. Already done this. I found that GhostScrip uses much much less resources than ImageMagick.

To convert a PDF file with dimension (132 in x176 in) using density 200:

The IM consumes about 4GB disk; the GS only takes less than 200MB.
Their memory used are almost the same.

Post Reply