FFTW MPI Fortran Interface (FFTW 3.3.10)

6.13 FFTW MPI Fortran Interface

The FFTW MPI interface is callable from modern Fortran compilers supporting the Fortran 2003 iso_c_binding standard for calling C functions. As described in Calling FFTW from Modern Fortran, this means that you can directly call FFTW’s C interface from Fortran with only minor changes in syntax. There are, however, a few things specific to the MPI interface to keep in mind:

Instead of including fftw3.f03 as in Overview of Fortran interface, you should include 'fftw3-mpi.f03' (after use, intrinsic :: iso_c_binding as before). The fftw3-mpi.f03 file includes fftw3.f03, so you should not include them both yourself. (You will also want to include the MPI header file, usually via include 'mpif.h' or similar, although though this is not needed by fftw3-mpi.f03 per se.) (To use the ‘fftwl_’ long double extended-precision routines in supporting compilers, you should include fftw3f-mpi.f03 in addition to fftw3-mpi.f03. See Extended and quadruple precision in Fortran.)
Because of the different storage conventions between C and Fortran, you reverse the order of your array dimensions when passing them to FFTW (see Reversing array dimensions). This is merely a difference in notation and incurs no performance overhead. However, it means that, whereas in C the first dimension is distributed, in Fortran the last dimension of your array is distributed.
In Fortran, communicators are stored as integer types; there is no MPI_Comm type, nor is there any way to access a C MPI_Comm. Fortunately, this is taken care of for you by the FFTW Fortran interface: whenever the C interface expects an MPI_Comm type, you should pass the Fortran communicator as an integer.⁸
Because you need to call the ‘local_size’ function to find out how much space to allocate, and this may be larger than the local portion of the array (see MPI Data Distribution), you should always allocate your arrays dynamically using FFTW’s allocation routines as described in Allocating aligned memory in Fortran. (Coincidentally, this also provides the best performance by guaranteeding proper data alignment.)
Because all sizes in the MPI FFTW interface are declared as ptrdiff_t in C, you should use integer(C_INTPTR_T) in Fortran (see FFTW Fortran type reference).
In Fortran, because of the language semantics, we generally recommend using the new-array execute functions for all plans, even in the common case where you are executing the plan on the same arrays for which the plan was created (see Plan execution in Fortran). However, note that in the MPI interface these functions are changed: fftw_execute_dft becomes fftw_mpi_execute_dft, etcetera. See Using MPI Plans.

For example, here is a Fortran code snippet to perform a distributed L × M complex DFT in-place. (This assumes you have already initialized MPI with MPI_init and have also performed call fftw_mpi_init.)

  use, intrinsic :: iso_c_binding
  include 'fftw3-mpi.f03'
  integer(C_INTPTR_T), parameter :: L = ...
  integer(C_INTPTR_T), parameter :: M = ...
  type(C_PTR) :: plan, cdata
  complex(C_DOUBLE_COMPLEX), pointer :: data(:,:)
  integer(C_INTPTR_T) :: i, j, alloc_local, local_M, local_j_offset

!   get local data size and allocate (note dimension reversal)
  alloc_local = fftw_mpi_local_size_2d(M, L, MPI_COMM_WORLD, &
                                       local_M, local_j_offset)
  cdata = fftw_alloc_complex(alloc_local)
  call c_f_pointer(cdata, data, [L,local_M])

!   create MPI plan for in-place forward DFT (note dimension reversal)
  plan = fftw_mpi_plan_dft_2d(M, L, data, data, MPI_COMM_WORLD, &
                              FFTW_FORWARD, FFTW_MEASURE)

! initialize data to some function my_function(i,j)
  do j = 1, local_M
    do i = 1, L
      data(i, j) = my_function(i, j + local_j_offset)
    end do
  end do

! compute transform (as many times as desired)
  call fftw_mpi_execute_dft(plan, data, data)

  call fftw_destroy_plan(plan)
  call fftw_free(cdata)

Note that when we called fftw_mpi_local_size_2d and fftw_mpi_plan_dft_2d with the dimensions in reversed order, since a L × M Fortran array is viewed by FFTW in C as a M × L array. This means that the array was distributed over the M dimension, the local portion of which is a L × local_M array in Fortran. (You must not use an allocate statement to allocate an L × local_M array, however; you must allocate alloc_local complex numbers, which may be greater than L * local_M, in order to reserve space for intermediate steps of the transform.) Finally, we mention that because C’s array indices are zero-based, the local_j_offset argument can conveniently be interpreted as an offset in the 1-based j index (rather than as a starting index as in C).

If instead you had used the ior(FFTW_MEASURE, FFTW_MPI_TRANSPOSED_OUT) flag, the output of the transform would be a transposed M × local_L array, associated with the same cdata allocation (since the transform is in-place), and which you could declare with:

  complex(C_DOUBLE_COMPLEX), pointer :: tdata(:,:)
  ...
  call c_f_pointer(cdata, tdata, [M,local_L])

where local_L would have been obtained by changing the fftw_mpi_local_size_2d call to:

  alloc_local = fftw_mpi_local_size_2d_transposed(M, L, MPI_COMM_WORLD, &
                           local_M, local_j_offset, local_L, local_i_offset)

Footnotes

(8)

Technically, this is because you aren’t actually calling the C functions directly. You are calling wrapper functions that translate the communicator with MPI_Comm_f2c before calling the ordinary C interface. This is all done transparently, however, since the fftw3-mpi.f03 interface file renames the wrappers so that they are called in Fortran with the same names as the C interface functions.