


J.W. Cooley et J.W. Tukey
Fast Multiplication by FFT1 Back
to school
If you were asked to multiply 32*14 in your head, how would you go
about it? Most likely, while painfully rembering those every long rules
taught in primary school, use 32=3*10+2, 14=10+4, then multiply
each part with each part, remembering any caries over etc. Phew, it
seems to be the easiest way... But is it really? This sounds like a
weird question, no?


(3) 
since this evaluation (which is none the other than the calculation of a Fourier's serie) can be done in complexity if is a power of and with a bit of cleverness. This is Fast Fourier Transform or FFT.
By keeping the previous notation, let's
evaluate the polynomials and with the hypothesis that . Then
and . So we get

(6) 
Now, we remember that since is a th root of unity, then for

(7) 
From this, evaluating to the th root of unity come down to evaluating both and at the points and put the result back in with the help of equation 6. If is the number of elementary opreations (addition, multiplication) needed to evaluate in then in accordance to the previous principle and of 6, we get

(8) 
where the term comes from the last additions and multiplications needed to obtain from and in 6.
Since is a th root of unity, we can reapply the same process for each polynomial and . It follows that, since is a power of , the process (and hence 6) is iterated an other times. We finnaly get a number of operations

(9) 
where the complexity is of FFT. Similarly for the inverse FFT , which uses instead of .
We have now worked out our and for . We create the products whcih comes back to elementary multiplication since has obviously modulo less than using 1, similarly for . Since we are looking to find , we need to interpolate from the values . In other words we are looking to solve the system

(10) 
where

(11) 
which is equivalent to

(12) 
where has the great idea of being simply

(13) 
Calculating is no other than the conjugate of Fourier Transform's of , in other words it's inverse Fourier Transform, so we saw that it is in like FFT. We then get where we can find . This last operation is roughly of complexity since the are already close to the decimals of , plus or minus the carry overs (as soon as ). All the carry overs are simultaneously calculated, then carried through, then we recalculate any new carry over created, etc.... (an operation called vectorizing).
The combination of all those algorithm and a few small refining done by programmers allow us to reach roughly the theoretical optimal barrier of Schönhage and Strassen for the FFT in .
[1] J. W. Cooley, J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series”, Math. Comp. 1965, vol 19, pp. 297301.
[2] A. Schönhage, V. Strassen, “Schennelle Multiplikation grosser Zahlen”, Computing, 1971, vol. 7, pp. 281292.