ImageMagick v6 Examples --
Fourier Transforms

Index
ImageMagick Examples Preface and Index
Introduction
The Fourier Transform
FFT/IFT In ImageMagick
Properties Of The Fourier Transform
Practical Applications
Advanced Applications
 
FFT Multiplication and Division (low level examples - sub-page)


Introduction

One of the hardest concepts to comprehend in image processing is Fourier Transforms. There are two reasons for this. First, it is mathematically advanced and second, the resulting images, which do not resemble the original image, are hard to interpret.

Nevertheless, utilizing Fourier Transforms can provide new ways to do familiar processing such as enhancing brightness and contrast, blurring, sharpening and noise removal. But it can also provide new capabilities that one cannot do in the normal image domain. These include deconvolution (also known as deblurring) of typical camera distortions such as motion blur and lens defocus and image matching using normalized cross correlation.

It is the goal of this page to try to explain the background and simplified mathematics of the Fourier Transform and to give examples of the processing that one can do by using the Fourier Transform.

If you find this too much, you can skip it and simply focus on the properties and examples, starting with FFT/IFT In ImageMagick

For those interested, another nice simple discussion, including analogies to optics, can be found at An Intuitive Explanation of Fourier Theory.

The lecture notes from Vanderbilt University School Of Engineering are also very informative for the more mathematically inclined: 1 & 2 Dimensional Fourier Transforms and Frequency Filtering.

Other mathematical references include Wikipedia pages on Fourier Transform, Discrete Fourier Transform and Fast Fourier Transform as well as Complex Numbers.

My thanks to Sean Burke for his coding of the original demo and to ImageMagick's creator for integrating it into ImageMagick. Both were heroic efforts.

Many of the examples use a HDRI Version of ImageMagick which is needed to preserve accuracy of the transformed images. It is recommened that you compile a personal HDRI version if you want to make the most of these techniques.


The Fourier Transform

An image normally consists of an array of 'pixels' each of which are defined by a set of values: red, green, blue and sometimes transparency as well. But for our purposes here we will ignore transparency. Thus each of the red, green and blue 'channels' contain a set of 'intensity' or 'grayscale' values.

This is known as a raster image 'in the spatial domain'. This is just a fancy way of saying, the image is defined by the 'intensity values' it has at each 'location' or 'position in space'.

But an image can also be represented in another way, known as the image's 'frequency domain'. In this domain, each image channel is represented in terms of sinusoidal waves.

In such a 'frequency domain', each channel has 'amplitude' values that are stored in locations based not on X,Y 'spatial' coordinates, but on X,Y 'frequencies'. Since this is a digital representation, the frequencies are multiples of a 'smallest' or unit frequency and the pixel coordinates represent the indices or integer multiples of this unit frequency.

This follows from the principal that "any well-behaved function can be represented by a superposition (combination or sum) of sinusoidal waves". In other words, the 'frequency domain' representation is just another way to store and reproduce the 'spatial domain' image.

But how can an image be represented as a 'wave'?

Images are Waves

Well if we take a single row or column of pixel from any image, and graph it (generated using "gnuplot" using the script "im_profile"), you will find that it looks rather like a wave.

  convert holocaust_tn.gif -colorspace gray miff:- |\
    im_profile -s - image_profile.gif
[IM Output] ==> [IM Output]

If the fluctuations were more regular in spacing and amplitude, you would get something more like a wave pattern, such as...

  convert -size 20x150 gradient: -rotate 90 \
          -function sinusoid 3.5,0,.4   wave.gif
  im_profile -s wave.gif wave_profile.gif
[IM Output] ==> [IM Output]

However while this regular wave pattern is vaguely similar to the image profile shown above, it is too regular.

However if you were to add more waves together you can make a pattern that is even closer to the one from the image.

  convert -size 1x150 gradient: -rotate 90 \
          -function sinusoid 3.5,0,.25,.25     wave_1.png
  convert -size 1x150 gradient: -rotate 90 \
          -function sinusoid 1.5,-90,.13,.15   wave_2.png
  convert -size 1x150 gradient: -rotate 90 \
          -function sinusoid 0.6,-90,.07,.1    wave_3.png

  convert wave_1.png wave_2.png wave_3.png \
          -evaluate-sequence add added_waves.png
[IM Output]  + [IM Output]  + [IM Output] ==> [IM Output]

See also Adding Biased Gradients for a alternative example to the above.

This 'wave superposition' (addition of waves) is much closer, but still does not exactly match the image pattern. However, you can continue in this manner, adding more waves and adjusting them, so the resulting composite wave gets closer and closer to the actual profile of the original image. Eventually by adding enough waves you can exactly reproduce the original profile of the image.

This was the discovery made by the mathematician Joseph Fourier. A modern interpretation of which states that "any well-behaved function can be represented by a superposition of sinusoidal waves".

In other words by adding together a sufficient number of sine waves of just the right frequency and amplitude, you can reproduce any fluctuating pattern. Therefore, the 'frequency domain' representation is just another way to store and reproduce the 'spatial domain' image.

The 'Fourier Transform' is then the process of working out what 'waves' comprise an image, just as was done in the above example.

2 Dimensional Waves in Images

The above shows one example of how you can approximate the profile of a single row of an image with multiple sine waves. However images are 2 dimensional, and as such the waves used to represent an image in the 'frequency domain' also needs to be two dimensional.

Here is an example of one such 2 dimensional wave.

The wave has a number of components to it.

Image example



Using FFT/IFT In ImageMagick

Implementation Notes

ImageMagick makes use of the FFTW, Discrete Fourier Transform Library which requires images to be converted to and from floating point values (complex numbers), and was first implemented in IM version 6.5.4-3.

To make it work as people generally expect for images, any non-square image or one with an odd dimension will be padded (using Virtual Pixels to be square of the maximum width or height of the image. To allow for proper centering of the 'FFT origin' in the center of the image, it is also forced to have a even (multiple of 2) dimensions. The consequence of this is that after applying the Inverse Fourier Transform, the image will need to be cropped back to its original dimensions to remove the padding.

As the Fourier Transform is composed of "Complex Numbers", the result of the transform cannot be visualized directly. Therefore, the complex transform is separated into two component images in one of two forms.

[Diagram]
Complex Number
Real/Imaginary

Real and Imaginary

The normal mathematical and numerical representation of the "Complex Numbers", is a pair of floating point values consisting of 'Real' (a) and 'Imaginary' (b) components. Unfortunately these two numbers may contain negative values and thus do not form viewable images.

As such this representation can not be using in a normal version of ImageMagick, whcih would clip such images (see example below of the resulting effects.

However when using a HDRI version of ImageMagick you can still generate, use, and even save this representation of a Fourier Transformed image. They may not be useful or even viewable as images in their own right, but you can still apply many mathematical operations to them.

To generate this representation we use the 'plus' form of the operators, "+fft" and "+ift", and will be looked at in detail below in FFT as Real-Imaginary Components.

[Diagram]
Complex Polar
Magnitude/Phase

Magnitude and Phase

Direct numerical representation of the "Complex Numbers" is not very useful for image work. But by plotting the values onto a 2-dimentional plane, you can then convert the value into a Polar Representation consisting of 'Magnitude' (r) and 'Phase' (θ) components.

This form is very useful in image processing, especially the magnitude component, which essentually specifies all the frequencies that go to make up the image.

The 'Magnitude' component only contains positive values, and is just directly mapped into image values. It has no fixed range of values, though except for the DC or zero frequency color, the values will generally be quite small. As a consequence of this the magnitude image generally will appear to be very dark (practically black).

By scaling the magnitude and applying a log transform of its intensity values usually will be needed to bring out any visual detail. The resulting 'log-transformed' magnitude image is known as the image's 'spectrum'. However remember that it is the 'magnitude' image, and not the 'spectrum' image, that should be used for the inverse transform.

The DC (Short for "Direct Current") or "Zero Frequency", color that appears at the central 'origin' of the image, will be the average color value for the whole image. Also as input images not contain 'imaginary' components, the DC phase value will also always have zero phase, producing a pure-gray color.

The 'Phase' component however ranges from -π to +π. This is first biased into a 0 to 2π range, then scaled into actual image values ranging from 0 to QuantumRange (as determined by the Compile Time Memory Quality). As a consequence of this, a zero phase will have a pure-gray value (as appropriate for each channel), while a negated phase will be a pure-black ('0') value. Note that a pure-white ('QuantumRange') is almost but not quite the same thing.

A Magnitude and Phase FFT representation of an image is generated using the normal FFT operators, "+fft" and "+ift". This will be looked at first in Generating FFT Images and its Inverse.

Generating FFT Images and its Inverse
(Magnitude and Phase)

Now, lets simply try a Fourier Transform round trip on the Lena image. That is, we simply do the forward transform and immediately apply the inverse transform to get back the original image. Then we will compare the results to see the level of quality produced.


  time convert lena.png -fft -ift lena_roundtrip.png

  echo -n "RMSE = "
  compare -metric RMSE lena.png lena_roundtrip.png null:
  echo -n "PAE = "
  compare -metric PAE  lena.png lena_roundtrip.png null:
[IM Output] ==> [IM Output]
[IM Text]

The "compare" program above returns a measure of how different the two images are. In this case you can see that the general difference is very small, of about 0.22%. With a peak value difference in at least one pixel of about ("PAE", Peak Absolute Error) of just about 1%.

You can improve this by using a HDRI version of ImageMagick. (See FFT with HDRI below).

Lets take a closer look at the FFT images that were generated in the above round trip.

  convert lena.png -fft    +depth +adjoin lena_fft_%d.png
[IM Output]
Original
==> [IM Output]
Magnitude
[IM Output]
Phase

As John M. Brayer said about Fourier Transforms... We generally do not display PHASE images because most people who see them shortly thereafter succumb to hallucinogenics or end up in a Tibetan monastery.

Note that the "-fft" operator generated two images, the first image is the 'magnitude' component (yes it is mostly black with a single colored dot in the middle), while the second, almost random looking image, contains the 'phase' component.

PNG images can only store one image per file as such neither the "+adjoin" or the '%d' in the output filename was actually needed, as IM would handle this. However I include the options in the above for completeness, so as to make it clear I am generating two separate image files, not one. See Writing a Multi-Image Sequence for more details.

As two images are generated the magnitude image (first of zeroth image) is saved into "lena_fft_0.png" and phase image (second image) into "lena_fft_1.png".

To prevent any chance of distortions resulting from saving FFT images, It is best not to save them to disk at all, but hold them in memory while you process the image.

If you must save then it is best to use the Magick File Format "MIFF" so as to preserve the image at its highest quality (bit depth). This format can also save multiple images in the one file. For script work you can also use the verbose "TXT" Enumerated Pixel Format.

DO NOT save them using "JPEG", "GIF" image formats.

If you must save these images into files for actual viewing, such as for a web browser, use the image format "PNG" with a "+depth" reset to the internal default, (as we do in these examples). However it can only store one image per file.

The "TIFF" file format can also be used though is not as acceptable for web browsers, though it does allow multiple images per file.

The best way of saving intermediate images into a single file is to use the "MIFF", file format...

  convert lena.png -fft  +depth lena_fft.miff

Or you can save them into complete separate filenames using "-write" (See Writing Images)...

  convert lena.png -fft  +depth \
          \( -clone 0 -write lena_magnitude.png +delete \) \
          \( -clone 1 -write lena_phase.png +delete \) \
          null:

Note that in the above I used the special "NULL:" image format to junk the two images which are still preserved in memory for further processing.

And finally we again read in the two images again, so as to convert it back into a normal 'spatial' image...

  convert lena_magnitude.png lena_phase.png -ift lena_restored.png
[IM Output] [IM Output] ==> [IM Output]

Both images generated by the FFT process are very sensitive to modification, where even small changes can result greatly distorted results. As such it is important to never save them in any image format that could distort those values.

It important to remember that both images are needed when recovering the image from the frequency domain. So it is no good saving one image, and junking the other, if you plan on using them for image reconstruction.


Magnitude or Phase Only Images

Finally, lets try reconstructing an image from just its magnitude component or just its phase component.


  convert lena_fft_0.png  -size 128x128 xc:'gray(50%)' \
                                                  -ift lena_magitude_only.png

  convert -size 128x128 xc:gray1  lena_fft_1.png  -ift lena_phase_only.png
[IM Output]
Magnitude Only
[IM Output]
Phase Only

You will note from this that it is the phase image that actually contains most of the position information of the image, while the magnitude actually holds much of the color information. This is not exact, as there is some overlap in the information, but that is generally the case.

The 'Magnitude Only' image will always have white corners, as a constant 50% phase image was used. You can remove those white patchs by using a randomized phase image. However make sure the phase of the center pixel is a perfect 50% gray, or the whole image will be dimmed.

The 'Phase Only' image used a constant 1% gray (almost pure black) magnitude image for the conversion. Even with this constant magnitude it still produces patches of very intense pixels, especially along edges.

You just need to remember that both images are needed to reconstruct the original image.

Frequency Spectrum Image

You will have noted that the magnitude image (the first or zeroth image), as appears almost totally black. It isn't really, but to our eyes all the values are very very small. Such an image isn't really very interesting to look at to study, so lets enhance the result with a log transform to produce the a 'frequency spectrum' image.

This is done by applying a strong Evaluate Log Transform to a Normalized 'magnitude' image.

  convert lena_fft_0.png -auto-level -evaluate log 10000 \
          lena_spectrum.png
[IM Output] ==> [IM Output]

Now we can see the detail in the spectrum version of the magnitude image.

You may even see some specific colors in the spectrum image, but generally such colors are unimportant in a spectrum image. It is the overall intensity of each frequency, and the patterns they produce that is far more important. As such you may also like to gray-scale the spectrum image after enhancement.

How much of a log enhancement you need to use depends on the image, so you should adjust it until you get the amount of detail you need to clearly see the pattern of the images frequency spectrum.

Alternatively you can use the following small shell script, to calculate a log scaling factor to use for the specific magnitude image.

  scale=`convert lena_fft_0.png -auto-level \
          -format "%[fx:exp(log(mean)/log(0.5))]" info:`
  convert lena_fft_0.png -auto-level \
          -evaluate log $scale    lena_spectrum_auto.png
[IM Output]

However remember that you can not use a spectrum image, for the inverse "-ift" transform as it will produce an overly bright image.

  convert lena_spectrum.png lena_fft_1.png -ift lena_roundtrip_fail.png

Basically as you have enhanced the 'magnitude' image, you have also enhanced the resulting image in the same way, producing the badly 'clipped' result shown.

[IM Output]

HDRI FFT Images

When we mapped the results of the Fourier Transform into a image representation we scaled and converted the values from floating point "Complex Numbers", into integer image values. This naturally produced Rounding Errors, and other "Quantum" Effects, especially in the smaller lower frequence magnitudes.

If accuracy is important in your image processing, then you will either need to use a Bit Quality (such as Q32 or Q64 bit versions of ImageMagick), or better still use a HDRI version ImageMagick so that the values are stored as floating point numbers.

When using a HDRI version of IM with a Magnitude and Phase representation of the Fourier transform, the magnitude component will still all be positive values, and as such can still be used as shown above, just much more exactly. The phase component will however still be biased and scaled, as previously shown.

In otherwords, the magnitude and phase representation in HDRI is exactly the same, just much more accurate.


For example here I use a HDRI version ImageMagick to generate another 'round trip' conversion of an image.

  # HDRI version of IM used
  time convert lena.png -fft -ift lena_roundtrip_hdri.png

  echo -n "RMSE = "
  compare -metric RMSE lena.png lena_roundtrip_hdri.png null:
  echo -n "PAE = "
  compare -metric PAE  lena.png lena_roundtrip_hdri.png null:
[IM Output]
[IM Text]

If you compare the results above with the previous non-HDRI comparison...
[IM Text]
You will see that the HDRI version of IM produced a much more accurate result, at roughly the same speed as before (speed may vary depending on your computer). Though it will have required much more memory than a a normal Q16 IM (See Compile-Time Quality).

However such images, while more exactly representing the frequency components of the FFT of the image, can contain negative and fractional values can only be saved using special HDRI capable File Formats that can handle floating-point values.

Floating point compatible file formats include "MIFF", "TIFF", "PFM" and the HDRI specific "EXR" file format. However you may need to set "-define quantum:format=floating-point" for it to work.

In later examples processing an FFT of an image, will need such accuracy to produce good results. As such as we proceed with using Fast Fourier Transforms, a HDRI version ImageMagick will become a requirement.

FFT as Real-Imaginary Components

So far we have only look at the 'Magnitude' and a 'Phase' representaion of Fourier Transformed images. But if you have compiled a HDRI version of IM, you can also process images using floating point 'Real' and 'Imaginary', components.

For example here I used a HDRI version of IM to also perform a 'round trip' FFT of an image, but this time generating Real/Imaginary images.

  # HDRI version of IM used
  time convert lena.png   +fft +ift   lena_roundtrip_ri.png

  echo -n "RMSE = "
  compare -metric RMSE lena.png lena_roundtrip_ri.png null:
  echo -n "PAE = "
  compare -metric PAE  lena.png lena_roundtrip_ri.png null:
[IM Output]
[IM Text]

You must use a HDRI version when you use the plus forms to generate Real/Imaginary FFT images. If you don't the resulting image with have a 'dirty' look about them due to the 'clipping' of negative values. For example...

  # non-HDRI Q16 version of IM used  -- THIS IS BAD
  convert lena.png   +fft +ift   lena_roundtrip_ri_bad.png
[IM Output]

The other thing to remember is that whichever form of FFT images you generate will also effect ALL image processing operations you want to apply to the FFT images. They are very different images, and as such they must be processed in very different ways, with different mathematical operations.

Also as before you must have both Real and Imaginary component images to restore the final image. For example, here is what happens if we substitute a 'black' image for one of the components.

  # HDRI version of IM used
  convert lena.png +fft -delete 1 \
          -size 128x128 xc:black +ift lena_real_only.png
  convert lena.png +fft -delete 0 \
          -size 128x128 xc:black +ift lena_imaginary_only.png
[IM Output]
Real Only
[IM Output]
Imaginary Only

You can see from this that both Real/Imaginary FFT images contain vital information about the original image fairly equally. The biggest difference between the two is that the special DC or 'average color' has no imaginary component and as such is only presetn in the magnitude image.

The diagonal mirror effect you see in both images is caused by the loss of the 'sign' information contained in the othe component. Without the other component, the wave could be thought to be 180 degrees out of phase, causing that mirror look. This information loss is equal between the two types of image.


Properties Of The Fourier Transform

FFT of a Constant Image

Lets demonstrate some of these properties.

First lets simply take a constant color image and get its magnitude.

  convert -size 128x128 xc:gold constant.png
  convert constant.png -fft +delete constant_magnitude.png
[IM Output] ==> [IM Output]

Note that the magnitude image in this case really is pure-black, except for a single colored pixel in the very center of the image, at the pixel location width/2, height/2. This pixel is the zero frequency or DC ('Direct Current') value of the image, and is the one pixel that does not represent a sine wave. In other words this value is just the FFT constant component!

To see this single pixel more clearly lets also magnify that area of the image...

  convert constant_magnitude.png -gravity center -extent 5x5 \
           -scale 2000% constant_dc_zoom.gif
[IM Output]

Note that the color of the DC point is the same as the original image.

Actually it is a good idea to remember that what you are seeing is three values. That is the image generated is actually three separate Fast Fourier transforms. A FFT for each of the three red, green and blue image channels. The FFT itself has no real knowledge about colors, only the color values or 'gray-levels'.

In fact a FFT transform could be applied to just about any colorspace, as really... it does not care! To a fourier transform a image is just an array of values, and that is all.

While the 'phase' of the DC value is not important, it should always be a 'zero' angle (a colour value of 50% gray). If it is not set to 50% gray, the DC value will have a 'unreal' component, and its value modulated by the angle given.

Effects of the DC Color

In a more typical non-constant image, the DC value is the average color of the image. The color you should generally get if you had completely blurred, averaged, or resized the image down to a single pixel or color.

For example lets extract the DC pixel from the FFT of the "Lena" image.

  convert lena.png -fft +delete lena_magnitude.png
  convert lena_magnitude.png -gravity center -extent 1x1 \
          -scale 60x60   lena_dc_zoom.gif
[IM Output] ==> [IM Output] ==> [IM Output]

As you can see the average color for the image is a sort of 'dark pink' color.

Another way of thinking about this special pixel is that it represents the center 'bias' level, around which all the other sine waves modify the image colors.

For example lets replace that 'dark pink' DC pixel with some other color such as the more orange color 'tomato'...

  convert lena.png -fft \
          \( -clone 0  -draw "fill tomato  color 64,64 point" \) \
          -swap 0 +delete -ift lena_dc_replace.png
[IM Output]

What is really happening is that by changing the DC value in the FFT images you are changing the whole image in that same way. Actually any change in the DC value (the difference) will be added (or subtracted) from each and every pixel in the resulting image.

This is just as if we really were really adding some constant to each pixel in the original image. As such the final pixel colors in the reconstructed image could also be clipped by the maximum (white) or minimum (black) limits. As such this is not a recommended method of color tinting an image. This is simpler to apply than modifying every pixel in the whole image, though the FFT round trip will make it overall a much slower color tinting technique.

Spectrum Of A Sine Wave Image

Next, lets take a look at the spectrum from a single sine (or cosine) wave image with 4 cycles across the image


  convert -size 128x129 gradient: -chop 0x1 -rotate 90 -evaluate sine 4 \
          sine4.png
  convert sine4.png -fft +delete \
          -auto-level -evaluate log 100  sine4_spectrum.png
[IM Output] ==> [IM Output]

The unusual creation of the gradient image in the above is necessary to ensure that the resulting sine wave image tiles perfectly across the image.

A normal "gradient:" image does not perfectly tile, so neither does a sine wave generated from it. A FFT transform of such a im-perfect tile, will result in an array of undesired harmonics, rather than single 'dots' in the Fourier Transform Spectrum.

See Generating the Perfect Gradient for more details about this problem.

In the spectrum image (enhanced magnitude image) above, we can see that it has 3 dots. The center dot is as before the average DC value. The other two dots represent the perfect sine wave that the Fourier Operator found in the image. As the frequency across the width of the image is exactly 4 cycles, and as a result two frequency pixels are exactly 4 pixels away from the center DC value.

But why two pixels?

Well that is because a sine single wave can be described in two completely different ways, (one with a negative direction and phase). Both descriptions are mathematically correct, and a fourier transform does not distinguish between them.

If we repeat this with a sine wave with 16 cycles, then again we see that it has 3 dots, but the dots are further apart. In this case the side dots are spaced 16 pixels to the left and right sides of the center dot.

  convert -size 128x129 gradient: -chop 0x1 -rotate 90 -evaluate sine 16 \
          -write sine16.png -fft -delete 1 \
          -auto-level -evaluate log 100 sine16_spectrum.png
[IM Output] ==> [IM Output]

From this you can see that perfect sine waves will be represented simply by two dots in the appropriate position. The distance this position is from the center DC value determine the frequency of the sine wave. The smaller the wave length the higher the frequency, so the further the dots will be from the DC value.

In fact by dividing size of the image by the frequency (distance of the dots from the center), will give you the wavelength (distance between peaks) of the wave. In the above case: 128 pixels divided by 16 cycles, gives you a wavelength of 8 pixels between each 'band'.

This is one of the most important distinguishing features of the FFT transformation. A pattern of small features on the original image require small wave lengths, and thus large frequencies. That results in large scale effects in the frequency domain. Similarly large features, use smaller frequencies, so generate small scale patterns, especially close in to toward the center.

In a Fourier transform...
Small becomes large and large becomes small.

This is one of the most vital aspects to remember when dealing with Fourier Transforms, as it is the key to removing noise (small features) from an image, while preserving the overall larger aspects of the image.

Lets take a closer look at these three 'frequencies' by plotting their original magnitudes (not the logarithmic spectrum).

  convert sine16.png -fft -delete 1  miff:- |\
     im_profile - sine16_magnitude_pf.png
[IM Output]

Notice that the DC value (average or bias of image) has a value of 1/2 which is to be expected (the average value of the image is a perfect 50% gray), but that the actual magnitude of the two 16 cycle sine-waves the fourier transform found is only 1/4 of the maximum value.

The magnitude of the original sine-save is really 1/2 but the fourier transform divided that magnitude into two, sharing the results across both plotted frequency waves, so each of the two components only has a magnitude of 1/4. That is a normal part of fourier trasforms.

[IM Output] This duality of positive and negative frequencies in FFT images explains why all FFT image spectrum's (such as the Lena spectrum repeated left) is always symmetrical about the center. For every dot on one side of the image, you will always get a similar 'dot' mirrored across the center of the image.

The same thing happens with the 'phase' component of FFT image pair, but with a 180 degree shift (a negative phase) in value as well.

That means half each image is really a duplicate of the other half, but you need BOTH images to recreate the original image. In other words the two images still contain exactly the same amount of information, half in one image, and half in the other. Together they produce a whole.

During generation the FFT algorithm only generates the left half the images. The other half is generated by rotations and duplication of the generated data.

When converting Frequency Domain images back to a Spatial Domain Image, the algorithm again only looks at the left half of the image. The right half is completely ignored, as it is only a duplicate.

As such when (in later examples) you 'notch filter' a FFT magnitude image, you only really need to filter the left hand side of the magnitude image. You can save yourself some work by also ignoring the right half. However for clarity I will 'notch' both halves.

Generating FFT Images Directly

Now we can use the above information to actually generate a image of a sine wave. All you need to do is create a black and 50% gray image pair, and add 'dots' with the appropriate magnitude, and phase.

For example...

  convert -size 128x128  xc:black \
          -draw 'fill gray(50%)  color 64,64 point' \
          -draw 'fill gray(50%)  color 50,68 point' \
          -draw 'fill gray(25%)  color 78,60 point' \
          generated_magnitude.png
  convert generated_magnitude.png \
          -auto-level -evaluate log 3  generated_spectrum.png
  convert -size 128x128  xc:gray50  generated_phase.png
  convert generated_magnitude.png generated_phase.png \
          -ift  generated_wave.png
[IM Output] [IM Output] ==> [IM Output]

And presto a perfect angled (and tilable) sine wave.

Of course you can only generate perfect sine waves at particular frequencies, and are only tilable in square images (unless resized later).

Unfortunately all the frequencies will also be a power of two in any horizontal or vertical direction, and that is the main limitation of this technique.

Actually only the first (left most) 'gray25' dot was needed to generate the sine wave as the IFT transform completely ignores the right half of the image as this should simply be a mirror of the left half.

The phase of the DC value must have a 'zero angle' (50% gray color). If you don't ensure that is the case the DC color value will be modulated by its non-zero phase, producing a darker, possibily 'clipped' image.

The other pixels in the phase can be any grey level you like, and will effectively 'roll' the sine wave across the image. Again only the phase of the left most dot actually matters. The right hand side is completely ignored. Just ensure the center DC phase pixel remains 50% grey.

FUTURE: Perlin Noise Generator using FFT

Spectrum of a Vertical Line

Show the FFT spectrum of a thin and thick line

Demonstrate how small features become 'big' and big features become 'small' in the FFT of the image. Link that back to the sine wave which could be regarded as a 'line' with a single harmonic. Rotate the line

Spectrum of a Rectangle Pattern Image

Next, lets look at the spectrum of white rectangle of width 8 and height 16 inside a black background.


  convert -size 8x16 xc:white -gravity center \
          -gravity center -background black -extent 128x128 rectangle.png
  convert rectangle.png -fft +delete \
          -auto-level -evaluate log 100 rect_spectrum.png
[IM Output] ==> [IM Output]

As you can see the resulting image has a very particular pattern, with a lot of harmonic frequencies.

You can also see that the rectangle appeared to be rotated 90 degrees. That is incorrect, what you are seeing is that same rule we mentioned before.. big features become small and small features become big. As such the smaller dimension of the rectangle became larger and the larger dimension became smaller.

Now, lets rotate the rectangle by 45 degrees. We find that the spectrum is also rotated in the same direction by 45 degrees.


  convert rectangle.png -rotate 45 -gravity center -extent 128x128 \
          -write rect_rot45.png -fft -delete 1 \
          -auto-level -evaluate log 100 rect_rot45_spectrum.png
[IM Output] ==> [IM Output]

As you can see the same rotation in the frequency domain. That is the effect of some rotated object will also be rotated in its Fourier Transform.

However if we now move the rectangle...

  convert rectangle.png -rotate 45  -geometry +30+20 -extent 128x128 \
          -write rect_rot45off.png -fft -delete 1 \
          -auto-level -evaluate log 100 rect_rot45off_spectrum.png
[IM Output] ==> [IM Output]

The frequency pattern did not move. That is because all the positioning information is contained in the phase image. The frequency pattern (magnitude or its spectrum does not change because it moved).

This position separation, is one of the key features of the Fourier Transform that makes it so very important. It will allow you to search for specific image pattern within a larger image, regradless of the location of the object that produced that fourier spectrum pattern.

Spectrum Of A Flat Circular Pattern Image

Next, lets look at the spectrum from an image with a white, flat circular pattern, in one case with diameters of 12 (radius 6) and in another case with diameter of 24 (radius 12).


  convert -size 128x128 xc:black -fill white  \
          -draw "circle 64,64 64,70" -write circle6.png -fft -delete 1 \
          -auto-level -evaluate log 100 circle6_spectrum.png

  convert -size 128x128 xc:black -fill white  \
          -draw "circle 64,64 64,76" -write circle12.png -fft -delete 1 \
          -auto-level -evaluate log 100 circle12_spectrum.png
[IM Output] ==> [IM Output]
[IM Output] ==> [IM Output]

Note that the first image is very close to what we generated for the jinc example further above. It is however a little broken up. These artifacts occurs due to the small size of the circle. Since it is represented digitally, its perimeter is not perfectly circular. Again we see that the small details become large in the transformed frequency space.

The transform of the larger circle is better as its perimeter is a closer approximation of a true circle. We therefore conclude that indeed the transform of the flat circular shape is a jinc function and that the image containing the smaller diameter circle produces transform features that are more spread out and wider.

According to the mathematical properties of a Fourier Transform, the distance from the center to the middle of the first dark ring in the spectrum will be 1.22*N/d. When the diameter of the circle is d=12, we get a distance of 1.22*128/12=13. Likewise when the diameter of the circle is d=24, we get a distance of 1.22*128/24=6.5.

Spectrum Of A Gaussian Pattern Image

Next, lets look at the spectrum from two images, each with a white Gaussian circular pattern having sigmas of 8 and 16, respectively


  convert -size 128x128 xc:black -fill white \
          -draw "point 64,64" -gaussian-blur 0x8 -auto-level \
          -write gaus8.png -fft -delete 1 \
          -auto-level -evaluate log 1000 gaus8_spectrum.png

  im_profile -s gaus8.png gaus8_pf.gif
  im_profile -s gaus8_spectrum.png gaus8_spectrum_pf.gif
[IM Output] ==> [IM Output]
[IM Output] ==> [IM Output]


  convert -size 128x128 xc:black -fill white \
          -draw "point 64,64" -gaussian-blur 0x16 -auto-level \
          -write gaus16.png -fft -delete 1 \
          -auto-level -evaluate log 1000 gaus16_spectrum.png

  im_profile -s gaus16.png gaus16_pf.gif
  im_profile -s gaus16_spectrum.png gaus16_spectrum_pf.gif
[IM Output] ==> [IM Output]
[IM Output] ==> [IM Output]

Other than the noise produced by the rectangular array of the pattern, the result was that a Gaussian pattern produced a almost identical Gaussian frequency pattern. More importantally that pattern was quite clean in appearance.

Of course there is a size difference, again following that same rule, big becomes small and small become big.

From the mathematical properties, the sigma in the spectrum will be just N/(2*sigma), where sigma is from the original image. So for an image of size N=128 and sigma=8, the sigma in the spectrum will be 128/16=8. Similarly if the image's sigma is 16, then the sigma in the spectrum will be 128/32=4.

This is the mathematical relationship of the "big becomes small and visa-versa" rule, and it can be useful to know.

Spectrum Of A Grid Pattern Image

Next, lets transform an image containing just a set of grid lines spaced 16x8 pixels apart.

  convert -size 16x8 xc:white -fill black \
          -draw "line 0,0 15,0" -draw "line 0,0 0,7" \
          -write mpr:tile +delete \
          -size 128x128 tile:mpr:tile \
          -write grid16x8.png -fft -delete 1 \
          -auto-level -evaluate log 100000 grid16x8_spectrum.png
[IM Output] ==> [IM Output]

The resulting spectrum is just an array of dots where the grid lines that are more closely spaced produce dots further apart and vice versa. According to the properties above, since the grid lines are spaced 16x8 pixels apart, then the dots should be spaced N/a=128/16=8 and M/b=128/8=16, which is just what is measured in this image.

This pattern is of particular importance as it will let you know the relationship for the Fourier Transform to a regular tiling pattern in an image. Such a tiling patterns produces very strong non-central grid patterns in its Fourier Transform.

The key point here is that the shape information is in the center, but tiling information is in a grid like array away from the center of its fourier transform.

More Spectrum Information

Here are some links if you like to know more about spectrum images and there properities.


Practical Applications

OK, now that we have covered the basics, what are the practical applications of using the Fourier Transform?

Some of the things that can be done include: 1) increasing or decreasing the contrast of an image, 2) blurring, 3) sharpening, 4) edge detection and 5) noise removal.

Changing The Contrast Of An Image - Coefficient Rooting

One can adjust the contrast in an image by performing the forward Fourier transform, raising the magnitude image to a power and then using that with the phase in the inverse Fourier transform. To increase, the contrast, one uses an exponent slightly less than one and to decrease the contrast, one uses an exponent slightly greater than one. So lets first increase the contrast on the Lena image using an exponent of 0.9 and then decrease the contrast using an exponent of 1.1.


  convert lena.png -fft \
          \( -clone 0 -evaluate pow 0.9 \) -delete 0 \
          +swap -ift lena_plus_contrast.png

  convert lena.png -fft \
          \( -clone 0 -evaluate pow 1.1 \) -delete 0 \
          +swap -ift lena_minus_contrast.png
[IM Output] ==> [IM Output]
[IM Output] ==> [IM Output]

However doing this to the original image would also have the same effect as doing this to the original image. That is a global modification of the magnitudes has the same effect as if you did a global modification of the original image.

Blurring An Image - Low Pass Filtering

One of the most important properties of Fourier Transforms is that convolution in the spatial domain is equivalent to simple multiplication in the frequency domain. In the spatial domain, one uses small, square-sized, simple convolution filters (kernels) to blur an image with the -convole option. This is called a low pass filter. The simplest filter is just a an equally-weighted, square array. That is all the values are ones, which are normalized by dividing by their sum before applying the convolution. This is equivalent to a local or neighborhood average. Another low pass filter is the Gaussian-weighted, circularly shaped filter provided by either -gaussian-blur or -blur.

In the frequency domain, one type of low pass blurring filter is just a constant intensity white circle surrounded by black. This would be similar to a circularly shaped averaging convolution filter in the spatial domain. However, since convolution in the spatial domain is equivalent to multiplication in the frequency domain, all we need do is perform a forward Fourier transform, then multiply the filter with the magnitude image and finally perform the inverse Fourier transform. We note that a small sized convolution filter will correspond to a large circle in the frequency domain. Multiplication is carried out via -composite with a -compose multiply setting.

So lets try doing this with two sizes of circular filters, one of diameter 40 (radius 20) and the other of diameter 28 (radius 14).


  convert -size 128x128 xc:black -fill white \
          -draw "circle 64,64 44,64" circle_r20.png
  convert lena.png -fft \
       \( -clone 0 circle_r20.png -compose multiply -composite \) \
       \( +clone -evaluate log 10000 -write lena_blur_r20_spec.png +delete \) \
       -swap 0 +delete -ift lena_blur_r20.png

  convert -size 128x128 xc:black -fill white \
          -draw "circle 64,64 50,64" circle_r14.png
  convert lena.png -fft \
       \( -clone 0 circle_r14.png -compose multiply -composite \) \
       \( +clone -evaluate log 10000 -write lena_blur_r14_spec.png +delete \) \
       -swap 0 +delete -ift lena_blur_r14.png
[IM Output] ==> [IM Output] x [IM Output] ==> [IM Output] ==> [IM Output]
[IM Output] ==> [IM Output] ==> [IM Output]

So we see that the image that used the smaller diameter filter produced more blurring. We also note the 'ringing' or 'ripple' effect near edges in the resulting images. This occurs because the Fourier Transform of a circle, as we saw earlier, is a jinc function, which has decreasing oscillations as it progresses outward from the center. Here however, the jinc function and the oscillations are in the spatial domain rather than in the frequency domain, as we had demonstrated earlier above.

So what can we do about this? The simplest thing is to taper the edges of the circles using various Windowing Functions. Alternately, one can use a filter such as a Gaussian shape that is already by definition tapered.

So lets do the latter and use two Gaussian blurred circles to remove most of the sever 'ringing' effects.

  convert circle_r20.png -blur 0x4 -auto-level gaussian_r20.png
  convert lena.png -fft \
       \( -clone 0 gaussian_r20.png -compose multiply -composite \) \
       \( +clone -evaluate log 10000 -write lena_gblur_r20_spec.png +delete \) \
       -swap 0 +delete -ift lena_gblur_r20.png

  convert circle_r14.png -blur 0x4 -auto-level gaussian_r14.png
  convert lena.png -fft \
       \( -clone 0 gaussian_r14.png -compose multiply -composite \) \
       \( +clone -evaluate log 10000 -write lena_gblur_r14_spec.png +delete \) \
       -swap 0 +delete -ift lena_gblur_r14.png
[IM Output] ==> [IM Output] x [IM Output] ==> [IM Output] ==> [IM Output]
[IM Output] ==> [IM Output] ==> [IM Output]

This of course is much better.

The ideal low-pass filter is not to blur circles at all, but actually use a proper gaussian curve of sigma rather than a radius.

Of course in this example we ended up doing a blur, to do a blur! However the blur pattern that is multiplied against the FFT magnitude image used is fixed, and could in fact be retrieved from a pre-generated cache. Also the multiplying image does not need to be the full size of the original image, you can use a smaller image. As such the above can be a lot faster for large images, and in the case of handling lots of images.

The more important point is for large strong blurs, the frequency domain image is small, and only does a single multiply, rather then having to average lots of pixels, for each and every pixel in the original image.

For small sized blurs you may be better with the more direct convolution blur.

Detecting Edges In An Image - High Pass Filtering

In the spatial domain, high pass filters that extract edges from an image are often implemented as convolutions with positive and negative weights such that they sum to zero.

Things are much simpler in the frequency domain. Here a high pass filter is just the negated version of the low pass filter. That is where the low pass filter is bright, the high pass filter is dark and vice versa. So in ImageMagick, all we need do is to -negate the low pass filter image.

So lets apply high pass filters to the Lena image using a circle image. And then again using a purely gaussian curve.


  convert circle_r14.png -negate circle_r14i.png
  convert lena.png -fft \
       \( -clone 0 circle_r14i.png -compose multiply -composite \) \
       \( +clone -evaluate log 10000 -write lena_edge_r14_spec.png +delete \) \
       -delete 0 +swap -ift -normalize lena_edge_r14.png

  convert -size 128x128 xc: -draw "point 64,64" -blur 0x14 \
          -auto-level   gaussian_s14i.png
  convert lena.png -fft \
       \( -clone 0 gaussian_s14i.png -compose multiply -composite \) \
       \( +clone -evaluate log 10000 -write lena_edge_s14_spec.png +delete \) \
       -delete 0 +swap -ift -normalize lena_edge_s14.png
[IM Output] ==> [IM Output] x [IM Output] ==> [IM Output] ==> [IM Output]
[IM Output] ==> [IM Output] ==> [IM Output]

Carefully examining these two results, we see that the simple circle is not quite as good as the gaussian, as it has 'ringing' artifacts and is not quite as sharp.

Sharpening An Image - High Boost Filtering

The simplest way to sharpen an image is to high pass filter it (without the normalization stretch) and then blend it with the original image.

  convert lena.png -fft \
       \( -size 128x128 xc: -draw "point 64,64" -blur 0x14 -auto-level \
          -clone 0 -compose multiply -composite \) \
       -delete 0 +swap -ift \
       lena.png -compose blend -set option:compose:args 100x100 -composite \
       lena_sharp14.png
[IM Output] ==> [IM Output]

Here a high pass filter, is done in the frequency domain and the result transformed back to the spatial domain where it is blended with the original image, to enhance the edges of the image.

Noise Removal - Notch Filtering

Many noisy images contain some kind of patterned noise. This kind of noise is easy to remove in the frequency domain as the patterns show up as either a pattern of a few dots or lines. Recall a simple sine wave is a repeated pattern and shows up as only 3 dots in the spectrum.

In order to remove this noise, one simply, but unfortunately, has to manually mask (or notch) out the dots or lines in the magnitude image. We do this by transforming to the frequency domain, create a grayscale version of the spectrum, mask the dots or lines, threshold it, multiply the binary mask image with the magnitude image and then transform back to the spatial domain.

Lets try this on the clown image, which contains a diagonally striped dither-like pattern. First we transform the clown image to create its magnitude and phase images.


  convert clown.jpg -fft \
          \( +clone  -write clown_phase.png +delete \) +delete \
          -write clown_magnitude.png  -colorspace gray \
          -auto-level -evaluate log 100000  clown_spectrum.png
[IM Output]
Original
==> [IM Output]
Spectrum
[IM Output]
Phase

We see that the spectrum contains four bright star-like dots, one in each quadrant. These unusal points represent the pattern in the image we want to get rid of.

The bright dot and lines in the middle of the image are of no concern as they represent the DC (average image color), and effects of the edges from the image, and should not be modified.

Note that when generating the spectrum image I forced the resulting image to be a pure gray-scale image. This is so I can now loaded the image into an editor, and using any non-gray color (such as red), I masked out the area of those 4 star like patterns.

When finished editing I can extract the areas I colored, by extracting a difference image against teh unedited version. Like this...

  convert clown_spectrum_edited.png clown_spectrum.png \
          -compose difference -composite \
          -threshold 0 -negate clown_spectrum_mask.png
[IM Output] ==> [IM Output]

Now we simply multiply the mask with the magnitude and use the result with the original phase image to transform back to the spatial domain. We display the original image next to it for comparison


  convert clown_magnitude.png clown_spectrum_mask.png \
          -compose multiply -composite \
          clown_phase.png -ift clown_filtered.png
[IM Output] ==> [IM Output]

A very good result. But we can do even better.

As you saw in the previous examples, simple 'circles' are not particularly friendly to a FFT image, so lets blur the mask slightly...

  convert clown_spectrum_mask.png \
          -blur 0x5 -level 50x100%  clown_mask_blurred.png
[IM Output]

And filter the clown, this time re-generating the FFT images in memory.

  convert clown.jpg -fft \
          \( -clone 0 clown_mask_blurred.png -compose multiply -composite \) \
          -swap 0 +delete -ift clown_filtered_2.png
[IM Output]

A simply amazing result! And one that could posibly be improved further by adjusting that mask to fit the 'star' shapes better.

We can even take the difference between the original and the result to create an image of the areas where noise was removed.

  convert clown.jpg clown_filtered_2.png -compose difference \
          -composite -normalize clown_noise.png
[IM Output]

Lets try this on an another example. This time on a "Twigs" image found on the RoboRealm website, which contains an irregular pattern of horizontal and vertical stripes.

Again we extract a grey-scale spectrum image, just as we did before.

  convert twigs.jpg -fft +delete -colorspace gray \
          -auto-level -evaluate log 100000 twigs_spectrum.png
[IM Output] ==> [IM Output]

In this case, as the noise in the image is horizontally and vertically oriented, this shows up as thick horizontal and vertical bands along the center lines but not in the actual center of the image.

Again we mask out the parts using a image editor, this time using a 'blue' color (it really doesn't matter which color is used)...

  convert twigs_spectrum_edited.png twigs_spectrum.png \
          -compose difference -composite \
          -threshold 0 -negate twigs_spectrum_mask.png
[IM Output] ==> [IM Output]

Now we again multiply the mask with the FFT magnitude image, and reconstruct the image.

  convert twigs.jpg -fft \
          \( -clone 0 twigs_spectrum_mask.png -compose multiply -composite \) \
          -swap 0 +delete  -ift twigs_filtered.png
[IM Output] ==> [IM Output]

And we can take the difference between the original and the result to create an image of the areas where noise was removed.


  convert twigs.jpg twigs_filtered.png -compose difference -composite \
          -normalize twigs_noise.png
[IM Output]

Adding a little blur to the mask, could again improve the results even more.

As a exercise, try removing the string from the image. As a hint remember how the effect of a line in a real image is rotated 90 degrees in the FFT. If you get this wrong, you'll probably remove the twig instead.

Advanced Applications

Some of the other more advanced applications of using the Fourier Transform include: 1) deconvolution (deblurring) of motion blurred and defocused images and 2) normalized cross correlation to find where a small image best matches within a larger image.

Examples of FFT Multiplication and Division (deconvolution) moved to a sub-directory as it is waiting a more formally defined image processing operators.


Created: 22 July 2000
Updated: 27 October 2011
Author: Fred Weinhaus, <fmw at alink dot net> with editing and formating by Anthony Thyssen, <A.Thyssen@griffith.edu.au>
Examples Generated with: [version image]
URL: http://www.imagemagick.org/Usage/fourier/