no file is written untill all output images are complete

Post any defects you find in the released or beta versions of the ImageMagick software here. Include the ImageMagick version, OS, and any command-line required to reproduce the problem. Got a patch for a bug? Post it here.
Post Reply
matteosistisette
Posts: 17
Joined: 2011-10-10T09:04:08-07:00
Authentication code: 8675308

no file is written untill all output images are complete

Post by matteosistisette »

When you use -crop WxH (without offsets) to produce a sequence of "tiles" form a big image, ImageMagick accumulates all the output images into memory and doesn't start writing the first file unill the last output image is complete.

This has 2 huge drawbacks: (1) it wastes an enormous amount of memory, and (2) if an error happens when processing even the last image, you loose ALL the images. Which often happens just because you go out of memory (imagemagick is pretty sloppy in checking the availability of memory, and if there is not enough memory you'll end up with all kind of nonsense error messages where you can only GUESS you have run out of memory).

If you're slicing a huge image into a lot of small ones, you end up needing almost twice as much memory as would be really necessary.

Every single generated cropped image should be written to file as soon as it is complete, and the memory used by it freed before starting to generate the following output image.
User avatar
anthony
Posts: 8883
Joined: 2004-05-31T19:27:03-07:00
Authentication code: 8675308
Location: Brisbane, Australia

Re: no file is written untill all output images are complete

Post by anthony »

That is the way IM "convert" (and most other IM commands) are designed to work, and is appropriate in most situations. It is only with 'Really Massive Images' that you have problems with memory use.

Using a Q8 version of IM for handling large images is your first step. That simple method halves memory usage, but makes more complex processing (beyond simple crops) produce lesser quality images.

However IM can still process large images by offloading memory to memory mapped disk cache. This automatically happens when memory starts to become a problem, but you can also force it using the -limit option. It is however slower (very much slower)

See IM Examples, File Handling, Really Massive Image Handling
http://www.imagemagick.org/Usage/files/#massive


One solution to reduce memory use is to do each tile crop and save the result, one tile (or perhaps one row) at a time.

Code: Select all

   convert input_image.png \
               \( +clone -crop 128x128+0+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+128+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+256+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+384+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+512+0 +repage -write tile_001.png +delete \) \
               ... \
               \( +clone -crop 128x128+0+128 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+128+128 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+256+128 +repage -write tile_001.png +delete \) \
               ... \
               null:
The command can be programmically generated, though may hit command line length limits.
Even with the "clone" it will have a rough memory cost of just: original_image + tile_image :D

ASIDE: IMv7 (in development) will have 'co-processing' capabilitys. That is you can run a "convert"-like command in the background, and then retrieve information about images, and send IM image processing commands from scripted loops. It would be ideal for doign image processing like the above.

Alternative... Streaming...

The better way is not to read the WHOLE image into memory. And this is where streaming image processing works.
Streaming processes only read one 'row of pixels' into memory at a time, and as such has a far lower memory foot print.

The "stream" command can for example extract one 'crop' area from the input image. However that means reading an image once for event tile you want to extract. This is probably not what you are want either. However it is likely to be faster than a memory mapped solution.

Stream processors exist in the image processing package "PbmPlus" or "NetPbm" but again it seems to be limited to just extracting one crop area from an image.

The ideal solution would be a "stream" like image processor that while reading each row from the input image, also opens M output images for a crop of MxN tiles of WxH pixels. Then as each row is processed it outputs each segment of W pixels to the M individual streams. After H rows, the M images are closed and M new images options for the next set.

Something like this would be a great addition to any image processing library (PbmPlus or IM).
Unfortunately I know of know such program :(

Even the lesser goal of a stream image processor that can separate a large images into a stream of 'rows of tiles' images would be a useful addition! Each smaller row could then be read in and processed individually at smaller memory cost.

Please report here if you come across any other solutions, there is a lot of people interested in this type of thing. Especially me!

Of special interest is actually the reverse of tile cropping.. That is converting multiple tiles back into a huge image, without using a lot of memory (EG a streaming "montage").
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
matteosistisette
Posts: 17
Joined: 2011-10-10T09:04:08-07:00
Authentication code: 8675308

Re: no file is written untill all output images are complete

Post by matteosistisette »

anthony wrote:That is the way IM "convert" (and most other IM commands) are designed to work,
And I think that is a serious design flaw (in the cases when it is not strictly needed).
and is appropriate in most situations.
I don't think it's ever "appropriate" to use N times as much memory as needed (being N the number of images to be output, in this case). It may be "not an issue" in most situations. but I wouldn't call it "appropriate".
It is only with 'Really Massive Images' that you have problems with memory use.
That is to say, it only betrays you when you most need it ;)

Really Tiny Memory may also be another case...


Besides, this is not only a memory issue. Also, if an error happens when processing the last bit of the last image, you loose ALL the N-1 images that may have been processed succesfully.
Using a Q8 version of IM for handling large images is your first step. That simple method halves memory usage, but makes more complex processing (beyond simple crops) produce lesser quality images.
Not an option. That gives you a factor of 2 and possibly lesser quality when you could gain a a factor of N with no quality loss.
One solution to reduce memory use is to do each tile crop and save the result, one tile (or perhaps one row) at a time.

Code: Select all

   convert input_image.png \
               \( +clone -crop 128x128+0+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+128+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+256+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+384+0 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+512+0 +repage -write tile_001.png +delete \) \
               ... \
               \( +clone -crop 128x128+0+128 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+128+128 +repage -write tile_001.png +delete \) \
               \( +clone -crop 128x128+256+128 +repage -write tile_001.png +delete \) \
               ... \
               null:
All these are workarounds. I would expect the tile crop to do exactly that without having to manually write (or programmatically generate) that command line.

I do appreciate the information about this workaround.
The "stream" command can for example extract one 'crop' area from the input image. However that means reading an image once for event tile you want to extract. This is probably not what you are want either. However it is likely to be faster than a memory mapped solution.
Another INTERESTING workaround.


Besides all this, I was tile-cropping a huge VECTOR image (pdf) into small RASTER images, and I've found out that ImageMagick actually renders the _whole_ image into raster, and then tile-crops it. That's a separate issue that I've reported separately (not necessarily related to _tile_ cropping). I would have expected imagemagick to rasterize the tiles one at a time, and write them one at a time, without ever holding in memory a (much) bigger raster image than a single tile.

I ended up (succesfully) solving my task by writing a shell script using gs (ghostscript) directly rather than ImageMagick. My script would first crop the big pdf into a tile vectorially, then raster that tile into an image, and write it, and continue to the next tile.

My point is that that's what I would have expected imagemagick to do in the first place.

Anyway I realise that the dont-rasterise-the-whole-pdf-in-advance part may be questionable (in the case of a tile crop), as the memory efficiency may be at the cost of time efficiency. Probably an option for choosing would be the best. However, when there is no possible trade-off I always expect the program to do the most efficient thing, especially when it is pretty obvious.

I also do understand that in more complex cases processing the whole thing _may_ be the only option (for example if you apply some kind of filter where one tile affects the surrounding ones, I guess), but when it's not the case, it should be avoided.


Thanks
m.
Post Reply