Collating a collection of images into one PDF without holding them all in memory at once

Questions and postings pertaining to the development of ImageMagick, feature enhancements, and ImageMagick internals. ImageMagick source code and algorithms are discussed here. Usage questions which are too arcane for the normal user list should also be posted here.
Post Reply
wbn
Posts: 1
Joined: 2018-10-15T17:04:29-07:00
Authentication code: 1152

Collating a collection of images into one PDF without holding them all in memory at once

Post by wbn »

One of the things about `magick`/`convert` that I find most useful is the ability to transform a collection of images into a single PDF. However, I've found that for large collections of images, the utility tends to hang indefinitely. I tested it under a debugger and confirmed that yes, it is attempting to load all of the image data into memory before writing it out, making it infeasible to collate arbitrarily large collections of images.

Is it possible to write the PDF without holding all the image data in memory at any given time? I poked the source and found `PingImage`/`PingImages`, which looks like it should be useful for this purpose (since it doesn't load the image data, only metadata and a reference to the on-disk data). However, when I test the utility with the `-ping` flag it doesn't write the image data out to the resulting PDF, only the dimensions.

I understand that in many situations - for instance, when you need to perform multiple transformations on a number of images before writing them - it's much more efficient to hold all the image data in memory rather than reading and writing it to disk multiple times. However, I'm wondering if it's possible to ask the utility (or the API) to optimize for memory efficiency in this case.

Cheers.
-wbn
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Collating a collection of images into one PDF without holding them all in memory at once

Post by fmw42 »

I believe you would have to write a script loop over each input image and add that to your pdf one at a time. You could do it as

convert firstinputimage image.pdf
loop over each inputimage besides the first
convert image.pdf inputimage image.pdf

That would continue to add each image to the end of the pdf
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Collating a collection of images into one PDF without holding them all in memory at once

Post by snibgo »

I would convert each image to its own PDF, then use "pdfunite" to unite the PDFs into one.
snibgo's IM pages: im.snibgo.com
Post Reply