Cheaper sinc computation for resize.c

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
I've not triplechecked my recent changes at all quantum levels.
May take me a while.
May take me a while.

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
Anthony:
Thanks for the updated benchmarks.

If you want me to revert the Q32 version so that the "within 1 guarantee" holds, let me know.
Thanks for the updated benchmarks.

If you want me to revert the Q32 version so that the "within 1 guarantee" holds, let me know.
 anthony
 Posts: 8884
 Joined: 20040531T19:27:0307:00
 Authentication code: 8675308
 Location: Brisbane, Australia
Re: Cheaper sinc computation for resize.c
Can this be generalized?NicolasRobidoux wrote:(Note completely fit to IM, but just so you get the picture one "lanczos 3 with one trig call".)
Code: Select all
/* Sinc(x)*Sinc(x/3) refactored by Nicolas Robidoux and Chantal Racette down to two conditionals, 1 trig call and 7 flops. */ const double xx = x*x; if (xx == 0.0) return(1.0); if (xx <= 9.0) { const double s = sin((MagickPI/3.0)*x); const double ss = s*s; const double pp = MagickPI*MagickPI; return(ss*((9.0/pp)ss*(12.0/pp))/xx); } return(0.0);
Note that the filter function is called with a pointer to the _ResizeFilter (typically it isn't used, except for cubic functions, so it can use the precalculated cubic coefficients). In that stucture is both the requested 'support' of the filter, and the 'window_support' (which is usally the same value unless overridden by an expert option from some strange reason).
However there is currently no indication if the function call is for the base filter function, or as a windowing filter function. Though if the window() function point points to the Box() filter function, you can be sure no windowing function is being used to scale the base function.
HOWEVER, for a Lanczos() function, can be regarded as purely a base function.
That would be similar to Lagrange() which uses support values for to determine the 'order' of the Lagrange filter.
Anthony Thyssen  Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
You mean: Can Chantal and I produce some magical formula which computes Sinc(x)Sinc(x/n) with one trig call for arbitrary positive integer n?anthony wrote: Can this be generalized?
Answer: No.
We can only do things one n at a time, and for some n's, it's going to be really complicated.

With separate formulas, 2,3,4 are no problem, for example.
 anthony
 Posts: 8884
 Joined: 20040531T19:27:0307:00
 Authentication code: 8675308
 Location: Brisbane, Australia
Re: Cheaper sinc computation for resize.c
Expanded the output of the set option:filter:verbose 1 expert option so that it understands the used of the precision setting and to also expanded the details of exactly what functions was selected in the comments at the top verbose report.
For example you can now verify that the Lanczos filter is currently defined in terms
of a Sinc windowed Sinc support 3.
http://www.imagemagick.org/Usage/resize/#filter
For example you can now verify that the Lanczos filter is currently defined in terms
of a Sinc windowed Sinc support 3.
Code: Select all
convert xc: set option:filter:verbose 1 filter Lanczos \
resize 2x2 null:  grep '^#'
The other 'noncommented' values are used allow checking and graphing of the filters output values, such as the many graphs of filters in IM Examples, Resize Filters#
# Resize Filter (for graphing)
#
# filter = Sinc
# window = Sinc
# support = 3
# winsupport = 3
# blur = 1
# blurred_support = 3
# B,C = 0,0
#
http://www.imagemagick.org/Usage/resize/#filter
Anthony Thyssen  Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/
 anthony
 Posts: 8884
 Joined: 20040531T19:27:0307:00
 Authentication code: 8675308
 Location: Brisbane, Australia
Re: Cheaper sinc computation for resize.c
I just wonder how complicated? The constants in the above are integers, and also divisible by 3 so I wondered similar results was present for other values of 'n'. At least an integer 'n', seeing as noninteger n in Lanczos while allowed, is rather useless.NicolasRobidoux wrote:You mean: Can Chantal and I produce some magical formula which computes Sinc(x)Sinc(x/n) with one trig call for arbitrary positive integer n?anthony wrote: Can this be generalized?
Answer: No.
We can only do things one n at a time, and for some n's, it's going to be really complicated.
Anthony Thyssen  Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
Oops! The lanczos 3 snippet actually only needs 6 flops, B/C MagickPI*MagickPI is computed at compile time.

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
P.S. The following is FINE. (Thanks Chantal. Note to self: Do not triple check things when called to make dinner.)
How to compute all lanczos n with only ONE TRIG CALL
Summary: You can use a recursive formula for the sines of multiples of an angle (http://en.wikipedia.org/wiki/List_of_tr ... hev_method) to compute any lanczos n with only one trig call.
For pi lovers: The interesting thing (to a mathematician) is that the Chebyshev method does NOT allow computing all sines or cosines of multiples of an angle with only ONE trig call (unless one involves square roots and some mechanism for computing root signs, two trig calls are needed). There is something "just right" about products of sines a.k.a the expensive part of the Lanczos kernel formula.
Details: Nothing speaks like code:
Warning:
I can't compile this because this needs to be fitted in the bigger code in ways that I don't have time to figure just now.
The main point, however, is that (assuming the chip's branch prediction figures out what needs to be computed and what does not):
lanczos2(x) can be computed with 1 trig & 1/ & 5* & 1 & 1+
computing lanczos3(x) adds 2* & 1+ to the cost of lanczos2(x) (if the chip's branch prediction is really good, it should only add 1*)
and, from then on, increasing n by 1 adds 1* & 1 & 1+.
(Note that these counts take into account the fact that some operations will be performed at compile time, not at runtime. Only the runtime flops are included.)
I stopped at n = 10, but clearly it is only for very large n that a second trig call beats the recursion (unless the chip's branch prediction sucks).
How to compute all lanczos n with only ONE TRIG CALL
Summary: You can use a recursive formula for the sines of multiples of an angle (http://en.wikipedia.org/wiki/List_of_tr ... hev_method) to compute any lanczos n with only one trig call.
For pi lovers: The interesting thing (to a mathematician) is that the Chebyshev method does NOT allow computing all sines or cosines of multiples of an angle with only ONE trig call (unless one involves square roots and some mechanism for computing root signs, two trig calls are needed). There is something "just right" about products of sines a.k.a the expensive part of the Lanczos kernel formula.
Details: Nothing speaks like code:
Code: Select all
static MagickRealType Lanczos(const MagickRealType x,
const ResizeFilter *resize_filter)
{
/*
Lanczos(n) filter function computed with only one call to a
trigonometric function with a recursive formula, based on the
Chebyshev method for the computation of sines and cosines of
multiples of an angle, discovered by Nicolas Robidoux (pending the
discovery of an earlier discoverer) with the assistance of Chantal
Racette.
*/
const double support = resize_filter>support;
const MagickRealType x2 = x*x;
if (x2 == 0)
return(1.0);
{
const MagickRealType pi2 = MagickPIL*MagickPIL;
const MagickRealType c = cos((double) ((MagickPIL/support)*x));
const MagickRealType s2 = 1  c * c;
const MagickRealType s2c = s2 * c;
const MagickRealType ss2 = s2c + s2c;
/*
Really, we want to use Lanczos 2 if the support is <= 2, but doing
things as follows allows for inaccuracy in the support
computation.
*/
if (support<2.5)
return((2/pi2)/x2*ss2);
{
const MagickRealType ss3 = s2*(3+4*s2);
if (support<3.5)
return((3/pi2)/x2*ss3);
{
const MagickRealType t4 = c*ss3;
const MagickRealType ss4 = t4ss2+t4;
if (support<4.5)
return((4/pi2)/x2*ss4);
{
const MagickRealType t5 = c*ss4;
const MagickRealType ss5 = t5ss3+t5;
if (support<5.5)
return((5/pi2)/x2*ss5);
{
const MagickRealType t6 = c*ss5;
const MagickRealType ss6 = t6ss4+t6;
if (support<6.5)
return((6/pi2)/x2*ss6);
{
const MagickRealType t7 = c*ss6;
const MagickRealType ss7 = t7ss5+t7;
if (support<7.5)
return((7/pi2)/x2*ss7);
{
const MagickRealType t8 = c*ss7;
const MagickRealType ss8 = t8ss6+t8;
if (support<8.5)
return((8/pi2)/x2*ss8);
{
const MagickRealType t9 = c*ss8;
const MagickRealType ss9 = t9ss7+t9;
if (support<9.5)
return((9/pi2)/x2*ss9);
if (support<10.5)
{
const MagickRealType t10 = c*ss9;
const MagickRealType ss10 = t10ss8+t10;
return((10/pi2)/x2*ss10);
}
return(
Sinc((double) x,resize_filter)
*
Sinc((double) (x/nearbyint(support)),resize_filter));
}}}}}}}}
}
I can't compile this because this needs to be fitted in the bigger code in ways that I don't have time to figure just now.
The main point, however, is that (assuming the chip's branch prediction figures out what needs to be computed and what does not):
lanczos2(x) can be computed with 1 trig & 1/ & 5* & 1 & 1+
computing lanczos3(x) adds 2* & 1+ to the cost of lanczos2(x) (if the chip's branch prediction is really good, it should only add 1*)
and, from then on, increasing n by 1 adds 1* & 1 & 1+.
(Note that these counts take into account the fact that some operations will be performed at compile time, not at runtime. Only the runtime flops are included.)
I stopped at n = 10, but clearly it is only for very large n that a second trig call beats the recursion (unless the chip's branch prediction sucks).
Last edited by NicolasRobidoux on 20100911T05:10:3807:00, edited 40 times in total.

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
Anthony: This is just a conjecture, but I think that the more noticeable relative difference in execution time may be due to the fact that you used openmp. I am guessing that the pixel computation is more parallelized than the coefficient computation, so anything which shortens the pixel computation leads to a larger relative speedup of "Poor Man's Sinc" when compared to "math.c sin sinc."anthony wrote: ...
Update: On my work machine (3 years old) with a Dual Core processor, with OpenMP enabled, the above Lanczos tests, using a starting image 1/10 the length (yes it is that much of an older and slower machine) produced...real 0m12.621sCode: Select all
time convert size 10000000x1 xc: \ filter Sinc set option:filter:window Sinc \ resize 1000x1 null:
user 0m22.365s
sys 0m0.253s
real 0m8.858sCode: Select all
time convert size 10000000x1 xc: \ filter SincPolynomial set option:filter:window SincPolynomial \ resize 1000x1 null:
user 0m16.634s
sys 0m0.235s
this shows a much greater difference, but only in this special case for which caching filter values does not help very much.
...

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
Anthony Thyssen privately pointed out that I had changed the behavior of the gaussian filter.
I, indeed, had misread the code (I'm used to formulas having spacing to group things; what a lame excuse!).
In any case, here is code which computes the gaussian filter with three multiplies and a call to the exponential function. This code, in addition, makes it easy to change the value of the variance by changing a macro. (Me and my student Adam Turcotte used to customize resize.c for our purposes, so this is not unheard of.)
As usual: http://web.cs.laurentian.ca/nrobidoux/misc/IM/resize.c
(Based on this morning's svn, but not manically checked. Anthony's nice automated check option will notice any bad stuff anyway.)
I, indeed, had misread the code (I'm used to formulas having spacing to group things; what a lame excuse!).
In any case, here is code which computes the gaussian filter with three multiplies and a call to the exponential function. This code, in addition, makes it easy to change the value of the variance by changing a macro. (Me and my student Adam Turcotte used to customize resize.c for our purposes, so this is not unheard of.)
Code: Select all
static MagickRealType Gaussian(const MagickRealType x,
const ResizeFilter *magick_unused(resize_filter))
{
/*
Normalized Gaussian with variance 1/2 (by default):
1/sqrt(2 pi sigma^2) exp(x^2/(2 sigma^2))
*/
#define MagickGAUSSIANSIGMAL 0.5L
/*
Change the value of MagickGAUSSIANSIGMAL if you want to override
the default.
*/
const MagickRealType sigma2 = MagickGAUSSIANSIGMAL*MagickGAUSSIANSIGMAL;
const MagickRealType alpha = 1.0/(2.0*sigma2);
const MagickRealType normalizer = sqrt((double) (1.0/(2.0*MagickPIL*sigma2)));
return(normalizer*exp((double) (alpha*x*x)));
}
(Based on this morning's svn, but not manically checked. Anthony's nice automated check option will notice any bad stuff anyway.)

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
I did the same tests Anthony did on my 3 year old laptop with Intel Core 2 and 2GB running 64 bit mint, namely
and a similar one with just Sinc, and I got best of 5 Real time 8.054s for Sinc, and 5.497s for SincPolynomial.
(Vanilla IM, so with Q16 and openmp enabled.)
Best of 5 runs for 2500x2500 to 1000x1000 gives Real time .858s for Sinc, .854 for SincPolynomial.
Conclusion: Clearly the coefficients are computed much faster, but this leads to very little difference for "mainstream" resizing.
Code: Select all
time convert size 10000000x1 xc: \
filter SincPolynomial set option:filter:window SincPolynomial \
resize 1000x1 null:
(Vanilla IM, so with Q16 and openmp enabled.)
Best of 5 runs for 2500x2500 to 1000x1000 gives Real time .858s for Sinc, .854 for SincPolynomial.
Conclusion: Clearly the coefficients are computed much faster, but this leads to very little difference for "mainstream" resizing.
Last edited by NicolasRobidoux on 20100911T06:42:5007:00, edited 1 time in total.
 anthony
 Posts: 8884
 Joined: 20040531T19:27:0307:00
 Authentication code: 8675308
 Location: Brisbane, Australia
Re: Cheaper sinc computation for resize.c
Yes as I thought at the begining. The Caching makes the calculations a once off for each row and column. For Distorts a cache of 1000 values is generated as a lookup table for a cylindrical filter that is used for a "elliptical weighted averge" technique.NicolasRobidoux wrote:Conclusion: Clearly the coefficients are computed much faster, but this leads to very little difference for "mainstream" resizing.
As such for 'typical' resizing and distorts, the caching vastly reduces the cost of trig functions.
Only in the extreme case where 'caching' provides little benefit, does the polynomial form show a major improvement. However that does not mean we would not want to make use of the improvements. Especially when the error difference is so low. We may even make the polynomial function the default!

Aside: The EWA technique by the way was also developed originally by the same author as the "zoom" program, Professor Paul Heckbert, that fathered basically every image resize function that is used on computers today.
See Area Resampling for Distorts...
http://www.imagemagick.org/Usage/distor ... a_resample
and Introduction to Resize Filters
http://www.imagemagick.org/Usage/resize/#fillter
At this time the EWA default to the use of a builtin 2D Gaussian filter (it is a cylindrical filter).
Now that I have a improved understanding, due to this discussion, I may be able to make more direct use of resize filters for cylindrical filters, like I originally planned.
However while lots of information is available for 2pass orthogonal filters and the many alternatives, very little is available on 2D cylindrical variants. For example a 2D Lanczos equivalent (Bessel windowed Bessel???) could generate much better and less blurry image distortions (such as perspective correction, or Barrel distortion correction. Perhaps someone knows of something like a simpler Cubic function for cylindrical use (like Mitchell in resize).
Anthony Thyssen  Webmaster for ImageMagick Example Pages
https://imagemagick.org/Usage/
https://imagemagick.org/Usage/

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
P.S. My suggestion below is a BAD IDEA. Ignore it!
I don't know enough to make a really educated suggestion, but nonetheless how about you simply use lanczos3(r) instead of a tensor product of lanczos3 in x and y (r is the distance to the center)? (lanczos2 would also be a good choice, I would guess.) To make a more educated choice, I'd need to do some careful 2D spectral analysis, and this is not likely to happen for quite a while (unless I can find someone to do it for me, like a Masters student ).
Of course, Chantal Racette and I can provide very cheap polynomial approximations of the lanczos2/3 kernels!
Sounds like I have my next Masters student's topic.anthony wrote:However while lots of information is available for 2pass orthogonal filters and the many alternatives, very little is available on 2D cylindrical variants. For example a 2D Lanczos equivalent (Bessel windowed Bessel???) could generate much better and less blurry image distortions (such as perspective correction, or Barrel distortion correction. Perhaps someone knows of something like a simpler Cubic function for cylindrical use (like Mitchell in resize).
I don't know enough to make a really educated suggestion, but nonetheless how about you simply use lanczos3(r) instead of a tensor product of lanczos3 in x and y (r is the distance to the center)? (lanczos2 would also be a good choice, I would guess.) To make a more educated choice, I'd need to do some careful 2D spectral analysis, and this is not likely to happen for quite a while (unless I can find someone to do it for me, like a Masters student ).
Of course, Chantal Racette and I can provide very cheap polynomial approximations of the lanczos2/3 kernels!

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
A "less bad" idea:
Try lanczos2(r^2) with support up to r^2=2.
(Again, my understanding of cylindrical is a bit primitive, which is why I'd need a new grad student to explain things to me, but if it's easy to try things...)
And, of course, Chantal and I can provide ridiculously cheap accurate polynomial approximations of lanczos2(x) (into which you'd plug r^2).
I'd also suggest to simply filter, as opposed to doing an (approximate) exact area computation (if I get the drift without reading more about this). In my experience, there is little benefit to using an exact area approach. And I've done it: http://registry.gimp.org/node/19582
Try lanczos2(r^2) with support up to r^2=2.
(Again, my understanding of cylindrical is a bit primitive, which is why I'd need a new grad student to explain things to me, but if it's easy to try things...)
And, of course, Chantal and I can provide ridiculously cheap accurate polynomial approximations of lanczos2(x) (into which you'd plug r^2).
I'd also suggest to simply filter, as opposed to doing an (approximate) exact area computation (if I get the drift without reading more about this). In my experience, there is little benefit to using an exact area approach. And I've done it: http://registry.gimp.org/node/19582
Last edited by NicolasRobidoux on 20100911T07:30:0507:00, edited 1 time in total.

 Posts: 1944
 Joined: 20100828T11:16:0007:00
 Authentication code: 8675308
 Location: Montreal, Canada
Re: Cheaper sinc computation for resize.c
May I suggest calling the polynomial approximations "Sinc" and their more accurate versions "SincTrigonometric"? (!)anthony wrote: As such for 'typical' resizing and distorts, the caching vastly reduces the cost of trig functions.
Only in the extreme case where 'caching' provides little benefit, does the polynomial form show a major improvement. However that does not mean we would not want to make use of the improvements. Especially when the error difference is so low. We may even make the polynomial function the default!