Cheaper sinc computation for resize.c

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-05T19:00:00-07:00

I'm just mentioning this so it's out there (even though I don't think it's really useful).

Given that IM computes with long doubles (when quantum-depth is high), we could produce versions of most of the filtering kernels (Sinc, for example) for which filtering results are within 1 or the "infinite precision" result for 64-bit output images. Needless to say, you can't get that kind of precision with the standard math library.

As I mentioned earlier, I do not think that reproducing kernels very precisely is important for image quality. However if, for any reason, you'd like to reproduce kernels "at 64-bit quality," let us know. Now that we have the boost minimax coefficient computation code tweaked and tuned, and the axiom verification code written and optimized, adding a bit depth is no big deal.

Of course, the code will then run more slowly than now when MAGICKCORE_QUANTUM_DEPTH > 32.

Post by **anthony** » 2010-09-05T21:45:35-07:00

I do not mind seeing a faster version, but would it really provide any great speed increase?

The filter is only setup one for each row and column of the output image, and the results cached for that whole row and column. Also as it is range limited you would then have to test whether the filter is being used in that range or fall back to using actual sin(x) calls.

I would be very very careful of any changes. Especially when you have expert situation such as
using a windowed sinc function with a different window for the actual filter output vs the window function itself.

I welcome new programmers, but I would not like to see the hard work I put into the resize/filter modules destroyed.

For a summery of the resize expert options see
http://www.imagemagick.org/Usage/resize/#filter_options

You may notice that lanczos is not actually a filter in IM code, but defined as a sinc windowed sinc function. Its default is 3 lobed, but all the other sinc window filters is 4 lobed.

Also note that while I have yet to make good use of it, I also have cylindrical filters such as the bezel function. Its cylindrical filter usage would be to fill in a 1000 number cache, which is then used for all pixel weightings in a EWA filter of the Distort function. I just never seem to get time to get back in to re-work the EWA filters.

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-06T04:05:17-07:00

anthony wrote:I do not mind seeing a faster version, but would it really provide any great speed increase?

The filter is only setup one for each row and column of the output image, and the results cached for that whole row and column.

I come from the "pull" (demand driven) world (VIPS and GEGL), and had forgotten that IM resize is "push" (data driven).

Given that, I agree that no great speed increase should be expected, unless the code is restructured so that the row/column coefficients are not cached (and this would be a fairly major code change, the effect of which on speed is hard to estimate ahead of time: it's the usual computation/memory trade off).

anthony wrote: Also as it is range limited you would then have to test whether the filter is being used in that range or fall back to using actual sin(x) calls.

True.

Note however that the number of branches is the same B/C there is no need to check for x == 0 (Sinc's 0/0 removable singularity), and that math.h trig code itself generally contains quite a few branches (which is mostly why it's slow). This being said, I chose the main branch so it covers [-4,4] for Sinc; this, in my understanding, covers 99% of the calls to it? (Correct me if I'm wrong.) I imagine that since you are caching the row/column results you may call it with arguments outside of that range to fill some entries with zeros? (I have not looked at how you store the "matrix" of coefficients, so I don't know whether/how you pad with zeros to have a fixed band.)

If you want, I could track down exactly which locations are fed to Sinc when the row and column coefficient arrays are computed, and adjust the interval of approximation so that the math.c branch is never taken (at least when expert settings are not used). This should not be too hard to do. The number of flops needed to compute the polynomial increases very slowly as one enlarges the domain; for example, at most 6 more flops per call (across the board) should be enough to extend the interval to [-5,5]. Once this is done, the code will be as good as if the trig call was not there on chips with reasonable branch prediction.

anthony wrote:I would be very very careful of any changes. Especially when you have expert situation such as
using a windowed sinc function with a different window for the actual filter output vs the window function itself.

If Chantal and I continue putting together polynomial approximations of the filter kernels, it sounds like we'll have to look really carefully at the choice of interval of approximation.

anthony wrote:I welcome new programmers, but I would not like to see the hard work I put into the resize/filter modules destroyed.

Given that the filter weights are computed once for all rows, and once for all columns, I certainly would understand if you decide to revert to <<straight math.c calls>>. For one thing, it makes the code read less like what's in math.c itself.

Also, more speed benefit of using polynomial approximations is obtained if the windowed sincs (say) are approximated directly. I was indeed wondering if approximating, say, lanczos 2 or 3 directly fits in IM's resize.c. It appears that the answer is "not really."

In any case, thank you for considering our code for inclusion. Chantal and I were pretty elated when it made it in, and we'll be sad if it's pulled out. But we'll understand.

Post by **magick** » 2010-09-06T07:02:14-07:00

Anthony, its your call. Do the approximations stay or do we revert?

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-06T09:12:53-07:00

In light of the above discussion, I think that it may be a good thing if I produce a version with slightly tighter tolerances.

That is, up to c7 for up to 8-bit, up to c9 for up to 16-bit. c11 (the current version) should be more than enough for up to 32-bit, and above, c13 should take care of up to 64-bit nicely.

(This is assuming that I don't enlarge the range per careful consideration of how the coefficient arrays are constructed.)

Just let me know whether you want approximations to stay or go.

Post by **anthony** » 2010-09-06T16:51:29-07:00

NicolasRobidoux wrote:I chose the main branch so it covers [-4,4] for Sinc; this, in my understanding, covers 99% of the calls to it? (Correct me if I'm wrong.)

It does.

I imagine that since you are caching the row/column results you may call it with arguments outside of that range to fill some entries with zeros? (I have not looked at how you store the "matrix" of coefficients, so I don't know whether/how you pad with zeros to have a fixed band.)

If I remember it allocates the cache with just the required values to weigh the pixels in the specific row or column, within the filter's window of operations.

The Cylindrical EWA filter however caches 1000 values over a fixed 0 to 2 window (though the index values into this window is squared to avoid needing to calculate a square root to get actual elliptical radius). It currently uses a very blurry Gaussian filter by default, but I want to try other filters and will be looking at making those 1000 values (or so) cover filter window as appropriate. Of course it would use Besel functions rather than Sinc for cylindrical filtering.

Hmmm you do know that we actually have the Lagrange filter, which is a piece-wise polynomial representation of the sinc filter. Though the pieces can be disjoint, especially for an even number of pieces.
http://www.imagemagick.org/Usage/resize/#lagrange

My re-design of resize was to basically make it a collection of functions which can be put together to generate both direct and windowed filters.

I think if you implement your polynomials as separate functions, then it will fit nicely into this scheme.

You then do not need to modify the existing functions. Though the optimization of 'Blackman' Filter, should remain.

The addition or changes then only needs to be added to the filter selection tables (in the filter setup function). Later in the setup, a check can be made for the range, and the polynomial sinc selection (or sinc selection) can be overridden as appropriate. Later expert options can then override those default selections. I already do something similar for the test between using a Sinc vs the Besel function depending on whether the filter is declared as being two pass rectangular (-resize), or one pass cylindrical (-distort).

This would be easy to install (just new table and function entries), allow automatic fall back if window range is too large, and after defaults worked out, allow expert options to override those function selection defaults. It also leaves the existing functions untouched, without if-then selections or other changes.

The advantage of this is that their will then be no 'window range' decision needed once the filter is setup, as it was already done and was merged into the filter function pointer. Both filters (Sinc and the polynomial Sinc filter or whatever it is called) will be available, either as default for 'named' filters, or via expert options.

After that a user selection of a 'Blackman' filter, can select the 'Blackman windowing function', and your 'polynomial sinc' unless an expert option overrides. Lanczos could then call your 'polynomial lanczos', without any windowing function. Clean an simple.

It may require a separate set of 'function' identifiers for the expert options, as opposed to user selected 'filter' identifiers, which use a table lookup to select 'common' function and window ranges. This separation of the two types of identifiers was something which I managed to avoid in my redesign, as each filter typically specified a specific function combination, however with the possibility of selecting 'Polynomial Sinc' over 'Trigonometric Sinc' that may no longer be the case.

I was indeed wondering if approximating, say, lanczos 2 or 3 directly fits in IM's resize.c. It appears that the answer is "not really."

A Lanczos approximation can be added. But it would not be a windowed 'sinc' filter but a direct filter, much like Gaussian, box, triangle, and the cubic filters.

In any case, thank you for considering our code for inclusion. Chantal and I were pretty elated when it made it in, and we'll be sad if it's pulled out. But we'll understand.

I can definitely see a place for it. IM has the widest image filter selection I know of, with expert controls to make any filter researcher happy! I made it that way on purpose, with many new additions added during that last redesign. As well as many bug fixes - Blackman before that redesign was being used directly and not being used a a sinc windowing filter -- Arrgghhhh -- though the expert controls still allow you to do this, if you really want. I don't see why we can't add even more!

Now that I have remembered what I was doing, I see I made some good decisions, and you code can fit in easilly with minimal impact to existing code. The only major change I see needed is a separate expert 'function' identifiers, verses user 'filter' identifiers, to allow differentiating the new polynomial functions verses the trigonometric functions.

Hmmm you may like to look at how I implemented 'Cubic' filters. I use the same function, but with different 'constants' for the different filters. You polynomial functions may be implemented in a similar way!

Now if we could just so something similar for the un-scaled 'interpolation' filters

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-06T19:45:44-07:00

Anthony:

Given the (apparently) heavy caching of coefficient values in resize.c, would you agree that there is little to be gained by using an approximate, as opposed to "exact", Sinc, even if the approximate one is much faster?

Of course I'd like to boast that my stuff is in IM, but not to the extent of pushing useless stuff in.

The only point I would see to keeping the polynomial approximations in (unless I feed you one which actually is MORE accurate than the direct approach, which I can do) is that it would allow "experts" to check for themselves whether the polynomial approximations make much of a difference when (say) resizing, so that they can be convinced to use them within their own, non coefficient-caching, image processing systems. This is treating IM as documentation, basically. If you are OK with that, many many many thanks. But I certainly would understand "thanks but no thanks."

-----

I'll look carefully to the question of whether some sort of PML 2/3 (Poor Man's Lanczos) make sense for IM.
First, I need to understand better how resize.c uses kernel functions. If "direct" filters don't normalize the weights by their sum, directly or indirectly, the resulting resampler will not be so good for downsampling, only for upsampling and "warping." It sounds from your discussion that any function can be used either way: using the weights with, or without, normalization (and with, or without, a fixed footprint). Hopefully, I'll be able to figure out how it works when I take the time to look more carefully into the source code.

Cheers,

nicolas

Post by **anthony** » 2010-09-06T21:22:31-07:00

NicolasRobidoux wrote:Given the (apparently) heavy caching of coefficient values in resize.c, would you agree that there is little to be gained by using an approximate, as opposed to "exact", Sinc, even if the approximate one is much faster?

It may not be a blindingly fast speed increase, but it may still be useful.

As I said it can be added without interfering with the existing functions relativity easily.

Have you done any timing tests on a small resize with a large digital photo sized image?

The only point I would see to keeping the polynomial approximations in (unless I feed you one which actually is MORE accurate than the direct approach, which I can do) is that it would allow "experts" to check for themselves whether the polynomial approximations make much of a difference when (say) resizing, so that they can be convinced to use them within their own, non coefficient-caching, image processing systems.

I agree. I'll see if I can find a little time to merge it properly tonight. I have downloaded
the changes, while keeping a copy of the previous version so I can make then two separate functions.

First, I need to understand better how resize.c uses kernel functions. If "direct" filters don't normalize the weights by their sum, directly or indirectly, the resulting resampler will not be so good for downsampling, only for upsampling and "warping."

The filter functions do normalize the weights. In fact when transparency become involved weighting normalization has to even happen on a per pixel bases as fully-transparent pixels should not provide any color to the results!

Similarly edge pixels can only weigh pixels which do not fall outside the image, unless they use some type of -virtual-pixel setting. The -resize operator does not use -virtual-pixels, but just ignores sampling outside the image proper. This is handled during the weights caching. Transparency however changes weights on a per pixel bases, so when transparency is involved normalization needs to be handled on every pixel! (twice with the two pass orthogonal resize)

If you did not take transparency pixels in account during you end up with 'halo' effects such as shown in the very old bug report (fixed in 6.2.4). This same problem also was in the early days of my -distort function additions too!
http://www.imagemagick.org/Usage/bugs/resize_halo/

As for 'warping' as per -distort. It needs to use a cylindrical filter (using an Elliptical Weighted Average or EWA algorithm) rather than a two pass orthogonal filter that resize uses. This was actually why I ended up re-developing the resize filters! I needed to be able to access the filters in a way that allowed me to call them so as to generate a 1000 value cache to use for EWA filtering.

Actually the job was so large by the time I finished the re-development, I ran out of time to complete the full use of the filters into -distort. Specifically full use of the EWA support filter beyond its original 2.0 unit cylindrical support window. I really should try to finish that job, at least enough so as to allow rotation experiments using a Lanczos filter 'Besel' equivalent, and get -distorts away from the default use of a blurry gaussian filter.

Basically while many filters will normalize. They don't actually need to as any normalization will generally need to be done by the calling operator. Basically it very much depends on exactly what 'filter weights' are needed by a specify situation, and the filter has no what of knowing that situation.

Exactly!

Weighting normalization is the caller responsibility. Caching of filter values also as only the caller knows what values it will actually need. EWA filtering of distorts also uses caching to avoid needing to do the square root step of radius calculations.

Post by **anthony** » 2010-09-07T04:23:16-07:00

I have re-coded the resize.c module so both the original Sinc() and the new SincPolynomial() functions for use as resize filters. They are attached to the filter names "Sinc" and "SincPolynomial", and both by default use a Blackman Windowing function (support=4.0).

The IM Q16 build was configured with --disable-openmp

Code: Select all

convert -version
Version: ImageMagick 6.6.4-1 2010-09-07 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2010 ImageMagick Studio LLC
Features: OpenMP

My laptop is a new laptop running a i7 4 core 8 thread CPU, hower test while monitoring each of the 8 thread CPU's show that while OpenMP is listed as a feature, it is disabled. The resize does not run multi threaded (one cpu is at maximum, the rest at minimal levels. The 'features' report above is misleading and probably a bug.

Initial test... a Sinc function with Blackman window, using 4 lobes, as I defined in the internal filter setup tables.

Code: Select all

  convert rose: -filter Sinc -resize 200x200 sinc_orig.png
  convert rose: -filter SincPolynomial -resize 200x200 sinc_test.png
  compare sinc_orig.png sinc_test.png sinc_cmp.png

Show both produce exactly the same output images on my IM Q16.
Another test against a older version of IM also shows that the output has also not been changed from the additions.

Right so far so good, now lets try some timing tests. The actual image colors do not matter for a timing test. And the image size is very long but thin so as to maximize the filter calls and minimize the actual resize processing of the image.

This creates a equivalent Laczos filter (Sinc windowed Sinc, support 4) from each type of sinc
with the minimal use of any other image processing operator.

Code: Select all

time convert -size 100000000x1 xc:  \
       -filter Sinc -set option:filter:window Sinc \
       -resize 1000x1 null:

result over three runs (user time) 14.62 14.71 14.49 seconds

Code: Select all

time convert -size 100000000x1 xc:  \
      -filter SincPolynomial -set option:filter:window SincPolynomial \
      -resize 1000x1 null:

results of three runs: 12.78 12.42 13.02

Repeated runs produce similar results with less than 1/2 a second variation.
SO the polynomial filter is faster -- but not by a huge factor. and on a more typical image would probably be minor.

ASIDE: a three run with exactly the same options except for '-resize' to see the setup time produced... 0.05 0.04 0.04 seconds.

Repeating the tests with a more realistic resize of a 2500x2500 image to 1000x1000 generated the following results
Sinc: 1.57 1.55 1.58
SincPolynomial: 1.55 1.52 1.60

SO the Polynomial Approximation for SInc is faster. But when a caching resize operator is used, it really makes little difference as the number of function calls verses the actual image processing is of minimal impact. That of course would not be the case if no caching was performed.

ASIDE: filter caching was part of the original image resizing program "zoom" created by Professor Paul Heckbert in the late 1980's. IM resize (and just about every other image resize operator, including Photoshop), was based on this program. As such it is not really surprising that filter caching is a key component of IM resizing function.

-------------------

Update: On my work machine (3 years old) with a Dual Core processor, with OpenMP enabled, the above Lanczos tests, using a starting image 1/10 the length (yes it is that much of an older and slower machine) produced...

Code: Select all

time convert -size 10000000x1 xc:  \
       -filter Sinc -set option:filter:window Sinc \
       -resize 1000x1 null:

real 0m12.621s
user 0m22.365s
sys 0m0.253s

Code: Select all

time convert -size 10000000x1 xc:  \
      -filter SincPolynomial -set option:filter:window SincPolynomial \
      -resize 1000x1 null:

real 0m8.858s
user 0m16.634s
sys 0m0.235s

this shows a much greater difference, but only in this special case for which caching filter values does not help very much.

Note that real is roughly 1/2 that of the total 'user' time as both processors was used.
You can imagine how fast this test would have been on my home laptop with an i7 8-thread CPU if I had also enabled OpenMP on it for testing purposes!

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-07T14:39:28-07:00

Anthony:

This is all wonderful.

More later.

nicolas

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-08T19:54:54-07:00

Arcane question:

Should I code assuming that IM may be compiled on a chip on which long doubles are actually no larger than ordinary doubles?

Post by **anthony** » 2010-09-08T21:02:43-07:00

It can. If the compiler defines doubles as being really floats. IM will happilly accept that.

Of course that is also what MagickRealType is used, so it could be re-defined later if need be.

Image data however is defined based on a 'Q-level compile time option.
However if you build IM to be HDRI, then doubles are used for image data rather than integers. In this case HDRI is a misnomer, as it just means 'use floating point storage' and not its acronum meaning of "High-Dynamic-Range-Images". I would like to see the term changed.

However in HDRI mode the 'white' value in that in memory data is still defined as being 65535.0 so as to allow functions that think in integer terms to continue to operate correctly.

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T04:23:24-07:00

anthony wrote:It can. If the compiler defines doubles as being really floats. IM will happilly accept that.

In this case, I won't propose a very high order version of Sinc to use with quantum > 32.

I'll stick to "moderate" levels of approximation. (Bye bye e-17.)

---------

Question:

As I mentioned before, the "Poor Man" treatment can be done to every single filter.

I have a smart Masters student (Chantal Racette) who needs something to do.

Would you be interested in having another filter function turned into a polynomial with the minimax machinery?
All of them? Which one should we start with?

Or should I leave it at sinc? (Which makes pretty clear that this approach works.)

Or should a separate, one-polynomial, lanczos 3 be added?

(Note that Chantal and I have a formula for direct evaluation of lanczos 3 which uses only one trig call, instead of two. It uses the triple angle formula. We also have one for lanczos 2. In other words, what we did to blackman we did to the lanczoses.)

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T04:28:12-07:00

(Note completely fit to IM, but just so you get the picture one "lanczos 3 with one trig call".)

Code: Select all

   /*
    Sinc(x)*Sinc(x/3) refactored by Nicolas Robidoux and Chantal
    Racette down to two conditionals, 1 trig call and 7 flops.
  */
  const double xx = x*x;
  if (xx == 0.0)
    return(1.0);
  if (xx <= 9.0)
    {
      const double s = sin((MagickPI/3.0)*x);
      const double ss = s*s;
      const double pp = MagickPI*MagickPI;
      return(ss*((9.0/pp)-ss*(12.0/pp))/xx);
    }
  return(0.0);

NicolasRobidoux · Post by **NicolasRobidoux** » 2010-09-09T06:37:42-07:00

My last mods to resize.c for a while (gotta teach):

1) Loosened the error a bit for Q32 (and cut four flops). Now it's 2.2e-8 < 1/2^25. At least in principle, filtering with SincPolynomial may now deviate from "infinite precision" sinc by more than 1 for 32-bit output images.

2) Added a very high precision one for Q64: 7.8e-17 if long doubles are "actual" long doubles, and 8.7e-14 if they turn out to be "ordinary" doubles. With "true" long doubles, this is better than with math.c.

3) Put the "-" with the sqrt to make it more obvious that the unary minus is done at compile time.

As usual:

http://web.cs.laurentian.ca/nrobidoux/misc/IM/resize.c

Legacy ImageMagick Discussions Archive

Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c

Re: Cheaper sinc computation for resize.c