Threading slows down 'convert'

Questions and postings pertaining to the development of ImageMagick, feature enhancements, and ImageMagick internals. ImageMagick source code and algorithms are discussed here. Usage questions which are too arcane for the normal user list should also be posted here.

Threading slows down 'convert'

Postby bmomjian » 2011-12-18T10:06:40+00:00

I am testing the performance of the convert utility in ImageMagick 6.6.0-4 on Debian Squeeze on a dual quad-core (8-cores, 16 core threads) Intel 5620 server.

I tested converting 400MB of JPEG images to a lower resolution. On this server, I found the following timings:

MAGICK_THREAD_LIMIT=1 150 sec
normal 55 sec
waitloop 28 sec
waitloop && MAGICK_THREAD_LIMIT=1 13 sec

Waitloop is a bash script I use to force 16 copies of convert to run in the background. I know convert uses multiple threads automatically to convert an image, so the slow timing of the first item (using only one thread) is expected. It is also expected that using my waitloop tool would improve performance because the normal test only has the CPUs at 40%.

What is surprising is that by disabling threading in convert, and forcing 16 convert processes to run simultaneously, I get a 2x speedup over the waitloop case. Perhaps your documentation should be clearer about the benefits of setting
THREAD_LIMIT=1 when you are already running multiple convert processes, i.e. your documentation isn't clear that setting it to "1" might yield improved performance:

http://www.imagemagick.org/script/architecture.php

If you want more details or a self-contained test case, please let me know.
bmomjian
 
Posts: 1
Joined: 2011-12-18T09:48:36+00:00

Re: Threading slows down 'convert'

Postby magick » 2011-12-18T15:28:08+00:00

It can be difficult to predict behavior in a parallel environment. Performance might depend on a number of factors including the compiler, the version of the OpenMP library, the processor type, the number of cores, the amount of memory, whether hyperthreading is enabled, the mix of applications that are executing concurrently with ImageMagick, or the particular image-processing algorithm you utilize. The only way to be certain of the optimal performance, in terms of the number of threads, is to benchmark. ImageMagick 6.7.4-1 Beta includes progressive threading when benchmarking a command and returns the elapsed time and efficiency for one or more threads. This can help you identify how many threads is the most efficient in your environment. Here is an example benchmark for threads 1-8:

    convert -bench 40 model.png -sharpen 0x1 null:
    Performance[1]: 10i 0.712ips 1.000e 14.000u 0:14.040
    Performance[2]: 10i 1.362ips 0.657e 14.550u 0:07.340
    Performance[3]: 10i 2.033ips 0.741e 14.530u 0:04.920
    Performance[4]: 10i 2.667ips 0.789e 14.590u 0:03.750
    Performance[5]: 10i 3.236ips 0.820e 14.970u 0:03.090
    Performance[6]: 10i 3.802ips 0.842e 15.280u 0:02.630
    Performance[7]: 10i 4.274ips 0.857e 15.540u 0:02.340
    Performance[8]: 10i 4.831ips 0.872e 15.680u 0:02.070
In certain cases, it might be optimal to set the number of threads to 1 or to disable OpenMP completely.
User avatar
magick
Site Admin
 
Posts: 9630
Joined: 2003-05-31T11:32:55+00:00

Re: Threading slows down 'convert'

Postby NicolasRobidoux » 2011-12-22T16:37:46+00:00

Rule of thumb

If the number of independent compute intensive processes is comparable to the number of cores you have (or larger), they will run faster in "single core" mode (no single task spread across multiple processors).

(This is why embarrassingly parallel methods generally should run without any communication between parts: you're better off chopping the big tasks into very roughly equal parts than constantly rebalancing.)

Rule of thumb

If processes have high I/O or memory requirements, and it is not possible to make them run in parallel without significant I/O collisions, they should be run sequentially (one after another, instead of at once).

Example conclusion

If your server typically must handle more image processing requests than there are cores, each ImageMagick task should probably run on a single core (disable OpenMP) except possibly if the input and/or output images are so large that you should run the jobs sequentially (more or less one after another), each with OpenMP enabled, in order to minimize the I/O bottleneck.

-----

Overcoming these rules of thumb generally requires very tricky tuning or programming.
Last edited by NicolasRobidoux on 2012-03-11T18:58:35+00:00, edited 7 times in total.
NicolasRobidoux
 
Posts: 1898
Joined: 2010-08-28T11:16:00+00:00
Location: Copenhagen, Denmark

Re: Threading slows down 'convert'

Postby anthony » 2011-12-22T18:16:34+00:00

Just some note on IM parallelization...

Within IM it is only individual image processing operations that are parallelized. So the saving is more with large image processing, and not with processing large numbers of images.

See Making IM Faster (in general)
http://www.imagemagick.org/Usage/api/#speed
Anthony Thyssen -- Webmaster for ImageMagick Example Pages
http://www.imagemagick.org/Usage/
User avatar
anthony
 
Posts: 8721
Joined: 2004-05-31T19:27:03+00:00
Location: Brisbane, Australia


Return to Developers

Who is online

Users browsing this forum: No registered users and 12 guests