Go back to the FFTW home page.

FFTW on the Cell Processor

Version 3.2.2 of FFTW, contains specific support for the Cell Broadband Engine ("Cell") processor, added to FFTW in 2007. This page summarizes that support, which is also described in the FFTW manual and the README.Cell file included in FFTW. We also provide some benchmarks from an IBM Cell Blade and a PlayStation 3.

Cell support was removed in FFTW version 3.3 in 2011, primarily because we lack a machine to test on, combined with a perceived lack of user interest for the last few years. Users who wish to employ FFTW on the Cell can continue to use version 3.2.2.


The Cell code in the FFTW was written and graciously donated to the FFTW project by the IBM Austin Research Laboratory. We are grateful to Pat Bohrer and Lorraine Herger of IBM for this generous contribution.


Cell consists of one PowerPC core ("PPE") and of a number of Synergistic Processing Elements ("SPE") to which the PPE can delegate computation. The IBM QS20 Cell blade offers 8 SPEs per Cell chip. The Sony PlayStation 3 contains 6 useable SPEs.

This version of FFTW fully utilizes the SPEs for one- and multi-dimensional complex FFTs of sizes that can be factored into small primes, both in single and double precision. Transforms of real data use SPEs only partially at this time. If FFTW cannot use the SPEs, it falls back to a slower computation on the PPE.

This library is meant to use the SPEs transparently without user intervention. However, certain caveats apply, which are discussed later in this document.


To enable support for Cell in double precision:

   configure --enable-cell
   make install

In single precision:

   configure --enable-cell --enable-single
   make install

In addition, the PPE supports the Altivec (or VMX) instruction set in single precision. (Altivec is Apple/Freescale terminology, VMX is IBM terminology for the same thing.) You can enable support for Altivec with the "--enable-altivec" flag (single precision only).

The software compiles with the Cell SDK 2.0, and probably with earlier ones as well.



The SPEs are fully IEEE-754 compliant in double precision. In single precision, they only implement round-towards-zero as opposed to the standard round-to-even mode. (The PPE is fully IEEE-754 compliant like all other PowerPC implementations.) Because of the rounding mode, FFTW is less accurate when running on the SPEs than on the PPE. The accuracy loss is hard to quantify in general, but as a rough guideline, the L2 norm of the relative roundoff error for random inputs is 4-8 times larger than the corresponding calculation in round-to-even arithmetic. In other words, expect to lose 2 to 3 bits of accuracy.

FFTW currently does not use any algorithm that degrades accuracy to gain performance on the SPE. One implication of this choice is that large 1D transforms run slower than they would if we were willing to sacrifice another bit or so of accuracy.


These benchmarks show the results of running benchFFT on an IBM Cell Blade and a PlayStation 3. Note that, of the programs benchmarked, only FFTW uses the Cell SPEs.

Go back to the FFTW home page.