GPU vs CPU on command line benchmark

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
gegupta
Posts: 30
Joined: 2010-12-03T12:14:09-07:00
Authentication code: 8675308

GPU vs CPU on command line benchmark

Post by gegupta »

I am trying to follow directions in : http://gd.tuwien.ac.at/graphics/ImageMa ... allel.html
I am using the param : MAGICK_OCL_DEVICE and trying to see if that have any affect in terms of computation when i run it through command line. Does it help in running on GPU?

IM version :

Code: Select all

identify -version
Version: ImageMagick 6.9.9-26 Q16 x86_64 2018-01-17 http://www.imagemagick.org
Copyright: © 1999-2017 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Features: Cipher DPC OpenCL OpenMP
Delegates (built-in): jbig jng jpeg ltdl lzma png tiff webp zlib
Here is the output. I dont see any differences:

Code: Select all

# export MAGICK_OCL_DEVICE=ON
# convert -bench 10 1M.jpg -resize 400x400 1M-resize.jpg
Performance[1]: 10i 2.660ips 1.000e 4.430u 0:03.760
Performance[2]: 10i 17.544ips 0.868e 1.500u 0:00.570
Performance[3]: 10i 17.857ips 0.870e 1.490u 0:00.560
Performance[4]: 10i 17.857ips 0.870e 1.490u 0:00.560
Performance[5]: 10i 17.544ips 0.868e 1.490u 0:00.570
Performance[6]: 10i 17.857ips 0.870e 1.490u 0:00.560
Performance[7]: 10i 17.857ips 0.870e 1.480u 0:00.560
Performance[8]: 10i 17.857ips 0.870e 1.490u 0:00.560
Performance[9]: 10i 17.857ips 0.870e 1.490u 0:00.560
Performance[10]: 10i 17.857ips 0.870e 1.510u 0:00.560
Performance[11]: 10i 17.857ips 0.870e 1.480u 0:00.560
Performance[12]: 10i 17.544ips 0.868e 1.490u 0:00.570
Performance[13]: 10i 17.544ips 0.868e 1.510u 0:00.570
Performance[14]: 10i 17.857ips 0.870e 1.500u 0:00.560
Performance[15]: 10i 17.857ips 0.870e 1.480u 0:00.560
Performance[16]: 10i 17.857ips 0.870e 1.510u 0:00.560

#export MAGICK_OCL_DEVICE=OFF
# convert -bench 10 1M.jpg -resize 400x400 1M-resize.jpg
Performance[1]: 10i 2.639ips 1.000e 4.510u 0:03.790
Performance[2]: 10i 16.949ips 0.865e 1.580u 0:00.590
Performance[3]: 10i 16.949ips 0.865e 1.580u 0:00.590
Performance[4]: 10i 16.949ips 0.865e 1.570u 0:00.590
Performance[5]: 10i 16.667ips 0.863e 1.610u 0:00.600
Performance[6]: 10i 16.949ips 0.865e 1.590u 0:00.590
Performance[7]: 10i 16.949ips 0.865e 1.560u 0:00.590
Performance[8]: 10i 16.949ips 0.865e 1.590u 0:00.590
Performance[9]: 10i 16.949ips 0.865e 1.560u 0:00.590
Performance[10]: 10i 17.241ips 0.867e 1.560u 0:00.580
Performance[11]: 10i 16.667ips 0.863e 1.590u 0:00.600
Performance[12]: 10i 17.241ips 0.867e 1.580u 0:00.580
Performance[13]: 10i 16.667ips 0.863e 1.610u 0:00.600
Performance[14]: 10i 16.949ips 0.865e 1.610u 0:00.590
Performance[15]: 10i 16.949ips 0.865e 1.510u 0:00.590
Performance[16]: 10i 17.241ips 0.867e 1.560u 0:00.580
Performance[17]: 10i 16.949ips 0.865e 1.570u 0:00.590
Performance[18]: 10i 16.949ips 0.865e 1.590u 0:00.590
Performance[19]: 10i 16.949ips 0.865e 1.580u 0:00.590
Performance[20]: 10i 16.667ips 0.863e 1.590u 0:00.600

Here is the nvidia-smi output where with , It doesnt show any process running on GPU but shows some minimal memory usage

Code: Select all

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:08:00.0 Off |                    0 |
| N/A   32C    P0    30W / 250W |     10MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   28C    P0    33W / 250W |     10MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:16:00.0 Off |                    0 |
| N/A   29C    P0    30W / 250W |     10MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:19:00.0 Off |                    0 |
| N/A   30C    P0    31W / 250W |     10MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
The same param works if i use Imagemagick Apis . I can see processes using all GPUs through nvidia-smi.

Currently my device profile looks like this :

Code: Select all

vi $HOME/.cache/ImageMagick/ImagemagickOpenCLDeviceProfile.xml
<devices>
  <device name="CPU" score="7.689"/>
  <device platform="NVIDIA CUDA" vendor="NVIDIA Corporation" name="Tesla P100-PCIE-16GB" version="384.81" maxClockFrequency="1328" maxComputeUnits="56" score="0.218"/>
</devices>

Code: Select all

# vi $HOME/.cache/ImageMagick/ImagemagickOpenCLDeviceProfile
<version>ImageMagick Device Selection v0.9</version>
<device><type>^A^@^@^@</type><name>Tesla P100-PCIE-16GB</name><driver>384.81</driver><max cu>56</max cu><max clock>1328</max clock><score>42.0000</score></device>
<device><type>^A^@^@^@</type><name>Tesla P100-PCIE-16GB</name><driver>384.81</driver><max cu>56</max cu><max clock>1328</max clock><score>42.0000</score></device>
<device><type>^A^@^@^@</type><name>Tesla P100-PCIE-16GB</name><driver>384.81</driver><max cu>56</max cu><max clock>1328</max clock><score>42.0000</score></device>
<device><type>^A^@^@^@</type><name>Tesla P100-PCIE-16GB</name><driver>384.81</driver><max cu>56</max cu><max clock>1328</max clock><score>42.0000</score></device>
<device><type>^@^@^@^@</type><score>3.3090</score></device>
I am not sure which one of the above to use. Both show opposite scores.

GPU device info :

Code: Select all

lshw -C display
  *-display
       description: 3D controller
       product: NVIDIA Corporation
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:08:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: iomemory:3e80-3e7f iomemory:3ec0-3ebf irq:152 memory:98000000-98ffffff memory:3e800000000-3ebffffffff memory:3ec00000000-3ec01ffffff
Am i missing something or understood something incorrectly?

Thanks
gegupta
Posts: 30
Joined: 2010-12-03T12:14:09-07:00
Authentication code: 8675308

Re: GPU vs CPU on command line benchmark

Post by gegupta »

Just wanted to ask if there were any updates for this post?
holden
Posts: 79
Joined: 2013-02-07T08:22:57-07:00
Authentication code: 6789

Re: GPU vs CPU on command line benchmark

Post by holden »

*I'm not an expert and someone may have better information*

The link you posted points to OpenMP, a CPU technology, whereas GPUs use either OpenCL or CUDA for high performance crunching operations.

** I would also guess due to the nature of image manipulations CPUs are the best option, if only to access more RAM.
gegupta
Posts: 30
Joined: 2010-12-03T12:14:09-07:00
Authentication code: 8675308

Re: GPU vs CPU on command line benchmark

Post by gegupta »

i was guessing if the underlying lying library is built with OpenCL, no matter if we access through API or command line, i should be able to access GPU with correct env variable.
Post Reply