One-sided communication: functions

Questions

  • What functions should you use for RMA?

Objectives

  • Learn how to create memory windows.

  • Learn how to access remote memory windows.

RMA anatomy

One-sided communication in MPI is achieved in three steps, which map onto three sets of functions:

Windows

Make memory available on each process for remote memory accesses. We use memory windows, which are objects of type MPI_Win providing handles to remotely-accessible memory. MPI provides 4 collective routines for the creation of memory windows:

A handle of type MPI_Win manages memory made available for remote operations on all ranks in the communicator. Memory windows must be explicitly freed after use with MPI_Win_free.

Load/store

Load/store/transform data in remote windows. We can identify an origin and a target process. At variance with two-sided communication, the origin process fully specifies the data transfer: where the data comes from and where it is going to. There are three main groups of MPI routines for this purpose:

Synchronization

Ensure that the data is available for remote memory accesses. The load/store routines are non-blocking and the programmer must take care that subsequent accesses are safe and correct. How synchronization is achieved depends on the one-sided communication paradigm adopted:

  • Active if both origin and target processes play a role in the synchronization. This is indeed the message passing model of parallel computation.

  • Passive if the origin process orchestrates data transfer and synchronization. Conceptually, this is closely related to the shared memory model of parallel computation: the window is the shared memory in the communicator and every process can operate on it, seemingly independently of each other.

There are three sets of routines currently available in MPI:

We will discuss synchronization further in the next episode One-sided communication: synchronization.

../_images/E02-RMA_timeline-coarse.svg

The timeline of window creation, calls to RMA routines, and synchronization in an application which uses MPI one-sided communication. The creation of MPI_Win objects in each process in the communicator allows the execution of RMA routines. Each access to the window must be synchronized: to ensure safety and correctness of the application. Note that any interaction with the memory window must be protected by calls to synchronization routines: even local load/store and/or two-sided communication. The events in between synchronization calls are said to happen in epochs.

Non-blocking vs. RMA

At a first glance, one-sided and non-blocking communication appear similar. The key difference is in the mechanism to use for synchronization.

Window creation

The creation of MPI_Win objects is a collective operation: each process in the communicator will reserve the specified memory for remote memory accesses.

We can expose an array of 10 double-s for RMA with:

// allocate window
double *buf;
MPI_Win win;
MPI_Win_allocate((MPI_Aint)(10 * sizeof(double)), sizeof(double),
                 MPI_INFO_NULL, MPI_COMM_WORLD, &buf, &win);

// do something with win

// free window and the associated memory
MPI_Win_free(&win);

What if the memory is not allocated? We advise to use MPI_Alloc_mem:

// allocate memory
double *buf;
MPI_Alloc_mem((MPI_Aint)(10 * sizeof(double)), MPI_INFO_NULL, &buf);

// create window
MPI_Win win;
MPI_Win_create(buf, (MPI_Aint)(10 * sizeof(double)), sizeof(double),
               MPI_INFO_NULL, MPI_COMM_WORLD, &win);

// do something with win

// free window
MPI_Win_free(&win);

// free memory
MPI_Free_mem(buf);

You must explicitly call MPI_Free_mem to deallocate memory obtained with MPI_Alloc_mem.

Note

The memory window is usually a single array: the size of the window object then coincides with the size of the array. If the base type of the array is a simple type, then the displacement unit is the size of that type, e.g. double and sizeof(double). You should use a displacement unit of 1 otherwise.

Window creation

Let’s look again at the initial example in the type-along. There we published an already allocated buffer as memory window. Use the examples above to figure out how to switch to using MPI_Win_allocate

You can find a scaffold for the code in the content/code/day-3/02_rma-win-allocate folder. A working solution is in the solution subfolder.

RMA operations

Using MPI_Put

Reorganize the sample code of the previous exercise such that rank 1 stores values into rank 0 memory window with MPI_Put, rather than rank 0 loading them with MPI_Get.

You can find a scaffold for the code in the content/code/day-3/03_rma-put folder. A working solution is in the solution subfolder.

Using MPI_Accumulate

You can find a scaffold for the code in the content/code/day-3/04_rma-accumulate folder. Follow the prompts and complete the function calls to:

  1. Create a window object from an allocated buffer:

    int buffer = 42;
    
  2. Let each process accumulate its rank in the memory window of the process with rank 0. We want to obtain the sum of the accumulating values.

With 2 processes, you should get the following output to screen:

[MPI process 0] Value in my window_buffer before MPI_Accumulate: 42.
[MPI process 1] I accumulate data 1 in MPI process 0 window via MPI_Accumulate.
[MPI process 0] Value in my window_buffer after MPI_Accumulate: 43.

A working solution is in the solution subfolder.

Describe the sequence MPI calls connecting the before and after schemes.

  1. ../_images/E02-win_allocate.svg
    1. Window creation with MPI_Win_allocate.

    2. Window creation with MPI_Win_create followed by MPI_Alloc_mem.

    3. Dynamic window creation with MPI_Win_create_dynamic.

    4. Memory allocation with MPI_Alloc_mem followed by window creation MPI_Win_create.

  2. ../_images/E02-win_create_put.svg
    1. Window creation with MPI_Win_allocate and MPI_Get from origin process 2 to target process 1.

    2. Window creation with MPI_Win_create_dynamic and MPI_Put from origin process 1 to target process 2.

    3. Window creation with MPI_Win_create and MPI_Get from origin process 1 to target process 2.

    4. Window creation with MPI_Win_create and MPI_Put from origin process 2 to target process 1.

Note

There are other routines for RMA operations. We give here a list without going into details:

Request-based variants

These routines return a handle of type MPI_Request and synchronization can be achieved with MPI_Wait.

  • MPI_Rget

  • MPI_Rput

  • MPI_Raccumulate

  • MPI_Rget_accumulate

Specialized accumulation variants

These functions perform specialized accumulations, but are conceptually similar to MPI_Accumulate.

  • MPI_Get_accumulate

  • MPI_Fetch_and_op

  • MPI_Compare_and_swap

See also

  • The lecture covering MPI RMA from EPCC is available here

  • Chapter 3 of the Using Advanced MPI by William Gropp et al. [GHTL14]

Keypoints

  • The MPI model for remote memory accesses.

  • Window objects and memory windows.

  • Timeline of RMA and the importance of synchronization.