All formulas
Num. approx.
Misc. math.
Delirium !

Boris Gourévitch
The world of Pi - V2.57
modif. 13/04/2013

Pi-Day in
Pi Day Countdown
Home Version history Guestbook Who I am Some pictures (fr) Acknowledgements Last modifications Contact

Cette page en français This page in English


J.W. Cooley et J.W. Tukey

Fast Multiplication by FFT

1 Back to school

If you were asked to multiply 32*14 in your head, how would you go about it? Most likely, while painfully rembering those every long rules taught in primary school, use 32=3*10+2, 14=10+4,  then multiply each part with each part, remembering any caries over etc. Phew, it seems to be the easiest way... But is it really? This sounds like a weird question, no?

Since the comming a calculators, that is since the late 40s, the gap of useful algorithm for multiplication force us to break each number into smaller parts, for example at school          20       10
B = B210   +B110   + B0  . Hence the multiplicaiton of size n  needed then a time (or number of operations) proportional to  2
n  without taking into account the usage of a memory storage proportional to n  . So to conserve the progression speed of the record of decimals of  calculated during the 50s (see History  or Decimals page), some theoretical and algorithmical improvement were in great need. So it was during this period that things speeded up.

In 1965, Cooley and Tukey introduced in it's modern form a method to reduce the complexity of calculating Fourier's serie, that is now known as Fast Fourier Transform (FFT) [1]. It seems that Gauss had already guessed the trick of critical factorisation in 1805. No doubt, he was a pure genius... Anyway, Schönhage and Strassen produced an algorithm to multiply to big integers with complexity O (n.log(n).log(log(n)))  which is considerably much better than  ( 2)
O n [2] !

This page explain how this algorithm for fast multiplication of two big integers by FFT works, which did so much for the calculation of the decimals of p  ! For the computers, this is great... But for us, well, hum... actually our brain are not wired to function in that way !

2 Algorithm for fast multiplication of two large integers by FFT

Let X and Y be two large integers of size n  in base B = 10    (or of power 10  ).

2.1 Polynomial decomposition

We write X  and Y  in polynomial form by decomposing them (in a unique way) in base B


where x
 j  and y
 j  are therefore respectively the j  th decimales of X  and Y  (0 < x ,y < B - 1
     j j  ). This calculation is roughly of complexity proportional to n  . We now look how to "quickly" calculate the multiplication R(z) = P(z)× Q(z)  then in the end evaluate R(B)  . R(z)  is a polynomial of degree < 2n  and can be found by interpolation by evaluating it in 2n  points, in other words by evaluating P  and Q  in those same 2n  points. Choosing those points, is the magical trick of this algorithm! We use the 2n  th root of unity, i.e.

e2i2knp= wk,    w = e22ipn

since this evaluation (which is none the other than the calculation of a Fourier's serie) can be done in complexity O(n.logn)  if n  is a power of 2  and with a bit of cleverness. This is Fast Fourier Transform or FFT.

2.2 Fast Fourier Transform

By keeping the previous notation, let's evaluate the polynomials P (wk) and Q (wk) with the hypothesis that n = 2d  . Then


and      2
y = z  . So we get

P(z) = P1(y)+ z.P2(y).

Now, we remember that since w  is a 2n  th root of unity, then for k  (-  {1,...,2n}

       (      )
(wk)2 = w 22n+k 2 = (wn+k)2 .

From this, evaluating P  to the 2n  th root of unity come down to evaluating both P1  and P2  at the n  points (  )  (  )     (  )
 w2 1, w2 2,..., w2 n  and put the result back in P  with the help of equation 6. If F (2n)  is the number of elementary opreations (addition, multiplication) needed to evaluate P  in w1,w2,...,w2n  then in accordance to the previous principle and of 6, we get

F(2n) = 2F (n)+ 2× 2n, F(1) = 0

where the term 2 × 2n  comes from the last additions and multiplications needed to obtain P  from P1  and P2  in 6.

Since  2
w  is a n  th root of unity, we can reapply the same process for each polynomial P1  and P2  . It follows that, since      d
n = 2  is a power of 2  , the process (and hence 6) is iterated an  other d  times. We finnaly get a number of operations

                          (        )
F(2n) = 2 × 2n× (d+ 1) = 4n log2 + 1

where the complexity is O(n.log n)  of FFT. Similarly for the inverse FFT , which uses w- k  instead of wk  .

2.3 Interpolation

We have now worked out our   ( k)
P  w and   ( k)
Q w for k = 0,...,2n- 1  . We create the 2n  products   ( k)    ( k)  (  k)
R  w  =  P w   Q w whcih comes back to elementary multiplication since   ( k)
P  w has obviously modulo less than  sum n -1
  j=0 xj < n(B - 1)  using 1, similarly for Q  . Since we are looking to find R(z) =  sum 2nj=-0 1rjzj  , we need to interpolate from the 2n  values        (  )
ak = R  wk . In other words we are looking to solve the system

W  (r0,r1,...,r2n-1) = (a0,a1,...,a2n-1)


    (                              )
       1    1      12    ...    12n-1
       1    w2    w4        w 4n-2
W =    1   w      w         w
       ...                      ...
       1  w2n-1  w4n-2      w2n2- n

which is equivalent to

(r0,r1,...,r2n-1) = W   (a0,a1,...,a2n-1)

where   -1
W  has the great idea of being simply

          ( 1     1       1    ...    1    )
            1    w-1     w-2        w-2n+1
  -1   -1-  1    w-2     w-4        w-4n+2
W    = 2n   .                          .     .
            ..                          ..2
            1  w- 2n+1  w- 4n+2      w -2n+n

Calculating (r0,r1,...,r2n-1)  is no other than the conjugate of Fourier Transform's of (a0,a1,...,a2n-1) , in other words it's inverse Fourier Transform, so we saw that it is in O(n.log n)  like FFT. We then get R(z)  where we can find R(B) = XY  . This last operation is roughly of complexity O(n)  since the r0,r1,...,r2n-1  are already close to the decimals of XY  , plus or minus the carry overs (as soon as rj > B  ). All the carry overs are simultaneously calculated, then carried through, then we recalculate any new carry over created, etc.... (an operation called vectorizing).

The combination of all those algorithm and a few small refining done by programmers allow us to reach roughly the theoretical optimal barrier of Schönhage and Strassen for the FFT in O (n.log(n).log(log(n)))  .


[1]   J. W. Cooley, J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series”, Math. Comp. 1965, vol 19, pp. 297-301.

[2]   A. Schönhage, V. Strassen, “Schennelle Multiplikation grosser Zahlen”, Computing, 1971, vol. 7, pp. 281-292.

back to home page