Next: , Previous: , Up: MPI Data Distribution   [Contents][Index]

#### 6.4.3 Transposed distributions

Internally, FFTW’s MPI transform algorithms work by first computing transforms of the data local to each process, then by globally transposing the data in some fashion to redistribute the data among the processes, transforming the new data local to each process, and transposing back. For example, a two-dimensional `n0` by `n1` array, distributed across the `n0` dimension, is transformd by: (i) transforming the `n1` dimension, which are local to each process; (ii) transposing to an `n1` by `n0` array, distributed across the `n1` dimension; (iii) transforming the `n0` dimension, which is now local to each process; (iv) transposing back.

However, in many applications it is acceptable to compute a multidimensional DFT whose results are produced in transposed order (e.g., `n1` by `n0` in two dimensions). This provides a significant performance advantage, because it means that the final transposition step can be omitted. FFTW supports this optimization, which you specify by passing the flag `FFTW_MPI_TRANSPOSED_OUT` to the planner routines. To compute the inverse transform of transposed output, you specify `FFTW_MPI_TRANSPOSED_IN` to tell it that the input is transposed. In this section, we explain how to interpret the output format of such a transform.

Suppose you have are transforming multi-dimensional data with (at least two) dimensions n0 × n1 × n2 × … × nd-1 . As always, it is distributed along the first dimension n0 . Now, if we compute its DFT with the `FFTW_MPI_TRANSPOSED_OUT` flag, the resulting output data are stored with the first two dimensions transposed: n1 × n0 × n2 ×…× nd-1 , distributed along the n1 dimension. Conversely, if we take the n1 × n0 × n2 ×…× nd-1 data and transform it with the `FFTW_MPI_TRANSPOSED_IN` flag, then the format goes back to the original n0 × n1 × n2 × … × nd-1 array.

There are two ways to find the portion of the transposed array that resides on the current process. First, you can simply call the appropriate ‘local_size’ function, passing n1 × n0 × n2 ×…× nd-1 (the transposed dimensions). This would mean calling the ‘local_size’ function twice, once for the transposed and once for the non-transposed dimensions. Alternatively, you can call one of the ‘local_size_transposed’ functions, which returns both the non-transposed and transposed data distribution from a single call. For example, for a 3d transform with transposed output (or input), you might call:

```ptrdiff_t fftw_mpi_local_size_3d_transposed(
ptrdiff_t n0, ptrdiff_t n1, ptrdiff_t n2, MPI_Comm comm,
ptrdiff_t *local_n0, ptrdiff_t *local_0_start,
ptrdiff_t *local_n1, ptrdiff_t *local_1_start);
```

Here, `local_n0` and `local_0_start` give the size and starting index of the `n0` dimension for the non-transposed data, as in the previous sections. For transposed data (e.g. the output for `FFTW_MPI_TRANSPOSED_OUT`), `local_n1` and `local_1_start` give the size and starting index of the `n1` dimension, which is the first dimension of the transposed data (`n1` by `n0` by `n2`).

(Note that `FFTW_MPI_TRANSPOSED_IN` is completely equivalent to performing `FFTW_MPI_TRANSPOSED_OUT` and passing the first two dimensions to the planner in reverse order, or vice versa. If you pass both the `FFTW_MPI_TRANSPOSED_IN` and `FFTW_MPI_TRANSPOSED_OUT` flags, it is equivalent to swapping the first two dimensions passed to the planner and passing neither flag.)

Next: , Previous: , Up: MPI Data Distribution   [Contents][Index]