Fast Multiplication by FFT

Since the comming a calculators, that is since the late 40s, the gap of useful algorithm for multiplication force us to break each number into smaller parts, for example at school $20 10 B = B210 +B110 + B0$ . Hence the multiplicaiton of size $n$ needed then a time (or number of operations) proportional to $2 n$ without taking into account the usage of a memory storage proportional to $n$ . So to conserve the progression speed of the record of decimals of calculated during the 50s (see History or Decimals page), some theoretical and algorithmical improvement were in great need. So it was during this period that things speeded up.

In 1965, Cooley and Tukey introduced in it's modern form a method to reduce the complexity of calculating Fourier's serie, that is now known as Fast Fourier Transform (FFT) [1]. It seems that Gauss had already guessed the trick of critical factorisation in 1805. No doubt, he was a pure genius... Anyway, Schönhage and Strassen produced an algorithm to multiply to big integers with complexity $O (n.log(n).log(log(n)))$ which is considerably much better than $( 2) O n$ [2] !

This page explain how this algorithm for fast multiplication of two big integers by FFT works, which did so much for the calculation of the decimals of $p$ ! For the computers, this is great... But for us, well, hum... actually our brain are not wired to function in that way !

2 Algorithm for fast multiplication of two large integers by FFT

Let X and Y be two large integers of size $n$ in base $B = 10$ (or of power $10$ ).

2.1 Polynomial decomposition

We write $X$ and $Y$ in polynomial form by decomposing them (in a unique way) in base $B$

where $x j$ and $y j$ are therefore respectively the $j$ th decimales of $X$ and $Y$ ( $0 < x ,y < B - 1 j j$ ). This calculation is roughly of complexity proportional to $n$ . We now look how to "quickly" calculate the multiplication $R(z) = P(z)× Q(z)$ then in the end evaluate $R(B)$ . $R(z)$ is a polynomial of degree $< 2n$ and can be found by interpolation by evaluating it in $2n$ points, in other words by evaluating $P$ and $Q$ in those same $2n$ points. Choosing those points, is the magical trick of this algorithm! We use the $2n$ th root of unity, i.e.

since this evaluation (which is none the other than the calculation of a Fourier's serie) can be done in complexity $O(n.logn)$ if $n$ is a power of $2$ and with a bit of cleverness. This is Fast Fourier Transform or FFT.

2.2 Fast Fourier Transform

By keeping the previous notation, let's evaluate the polynomials $P (wk)$ and $Q (wk)$ with the hypothesis that $n = 2d$ . Then

Now, we remember that since $w$ is a $2n$ th root of unity, then for $k (- {1,...,2n}$

From this, evaluating $P$ to the $2n$ th root of unity come down to evaluating both $P1$ and $P2$ at the $n$ points $( ) ( ) ( ) w2 1, w2 2,..., w2 n$ and put the result back in $P$ with the help of equation 6. If $F (2n)$ is the number of elementary opreations (addition, multiplication) needed to evaluate $P$ in $w1,w2,...,w2n$ then in accordance to the previous principle and of 6, we get

where the term $2 × 2n$ comes from the last additions and multiplications needed to obtain $P$ from $P1$ and $P2$ in 6.

Since $2 w$ is a $n$ th root of unity, we can reapply the same process for each polynomial $P1$ and $P2$ . It follows that, since $d n = 2$ is a power of $2$ , the process (and hence 6) is iterated an other $d$ times. We finnaly get a number of operations

where the complexity is $O(n.log n)$ of FFT. Similarly for the inverse FFT , which uses $w- k$ instead of $wk$ .

2.3 Interpolation

We have now worked out our $( k) P w$ and $( k) Q w$ for $k = 0,...,2n- 1$ . We create the $2n$ products $( k) ( k) ( k) R w = P w Q w$ whcih comes back to elementary multiplication since $( k) P w$ has obviously modulo less than $sum n -1 j=0 xj < n(B - 1)$ using 1, similarly for $Q$ . Since we are looking to find $R(z) = sum 2nj=-0 1rjzj$ , we need to interpolate from the $2n$ values $( ) ak = R wk$ . In other words we are looking to solve the system

Calculating $(r0,r1,...,r2n-1)$ is no other than the conjugate of Fourier Transform's of $(a0,a1,...,a2n-1)$ , in other words it's inverse Fourier Transform, so we saw that it is in $O(n.log n)$ like FFT. We then get $R(z)$ where we can find $R(B) = XY$ . This last operation is roughly of complexity $O(n)$ since the $r0,r1,...,r2n-1$ are already close to the decimals of $XY$ , plus or minus the carry overs (as soon as $rj > B$ ). All the carry overs are simultaneously calculated, then carried through, then we recalculate any new carry over created, etc.... (an operation called vectorizing).

The combination of all those algorithm and a few small refining done by programmers allow us to reach roughly the theoretical optimal barrier of Schönhage and Strassen for the FFT in $O (n.log(n).log(log(n)))$ .

References

[1] J. W. Cooley, J. W. Tukey, “An algorithm for the machine calculation of complex Fourier series”, Math. Comp. 1965, vol 19, pp. 297-301.

[2] A. Schönhage, V. Strassen, “Schennelle Multiplikation grosser Zahlen”, Computing, 1971, vol. 7, pp. 281-292.