Derived datatypes: MPI_Datatype
Questions
How can you use your own derived datatypes as content of messages?
Objectives
Understand how MPI handles datatypes.
Learn to send and receive messages using composite datatypes.
Learn how to represent homogeneous collections as MPI datatypes.
Learn how to represent your own derived datatypes as MPI datatypes.
The ability to define custom datatypes is one of the hallmarks of a modern programming language, since it allows programmers to structure their code in a way that enhances readability and maintainability. How can this be done in MPI? Recall that MPI is a standard describing a library to enable parallel programming in the message passing model.
In the C language, types are primitive constructs: they
are defined by the standard and enforced by the compiler.
The MPI types are instead variants in the MPI_Datatype
enumeration: they
appear as the same type to the compiler.
This is a fundamental difference which influences the way custom datatypes are handled.
In the C language, you would declare a struct
such as the following:
struct Pair {
int first;
char second;
};
Pair
is a new type. From the compiler’s point of view, it has status on par
with the fundamental datatypes introduced above. The C standard makes requirements on how
to represent this in memory and the compiler will generate machine code to
comply with it.
MPI does not know how to represent user-defined datatypes in memory by itself:
How much memory does it need? Recall that MPI deals with groups of processes. For portability, you can never assume that two processes share the same architecture!
How are the components of
Pair
laid out in memory? Are they always contiguous? Or are they padded?
The programmer needs to provide this low-level information, such that the MPI runtime can send and receive custom datatypes as messages over a heterogeneous network of processes.
Representation of datatypes in MPI
The representation of datatypes in MPI uses few low-level concepts. The type signature of a custom datatypes is the list of its basic datatypes:
The typemap is the associative array (map) with datatypes, as understood by MPI, as keys and displacements, in bytes, as values.
The displacements are relative to the buffer the datatype describes.
Assuming that an int
takes 4 bytes of memory, the typemap for our Pair
datatype would be: \(\textrm{Typemap}[\texttt{Pair}] = \{ \texttt{int}: 0,
\texttt{char}: 4\}\). Note again that the displacements are relative.
Knowledge of typemap and type signature is not enough for a full description of the type to the MPI runtime: the underlying programming language might mandate architecture-specific alignment of the basic datatypes. The data structure would then be laid out in memory incoherently with the displacements in its typemap. We need a few more concepts. Given a typemap \(m\) we can define:
- Lower bound
The first byte occupied by the datatype.
(3)\[\textrm{LB}[m] = \min_{j}[\textrm{Displacement}_{j}]\]- Upper bound
The last byte occupied by the datatype.
(4)\[\textrm{UB}[m] = \max_{j}[\textrm{Displacement}_{j} + \texttt{sizeof}(\textrm{Datatype}_{j})] + \textrm{Padding}\]- Extent
The amount of memory needed to represent the datatype, taking into account architecture-specific alignment.
(5)\[\textrm{Extent}[m] = \textrm{UB}[m] - \textrm{LB}[m]\]
The C language (and Fortran) require that the data occurs in memory at
well-defined addresses: the data needs to be aligned. The address, in bytes, of
any item must be a multiple of the size of that item in bytes. This is so-called
natural alignment.
For our Pair
data structure the first
element is an int
and occupies
4 bytes. An int
will align to 4 bytes boundaries: when allocating a new
int
in memory, the compiler will insert padding to reach the alignment
boundary. Indeed, second
is a char
and requires just 1 byte. This gives:
To insert yet another
Pair
item, we first need to reach the alignment boundary with a padding of 3
bytes.
Thus:
Which of the following statements about the size and extent of an MPI datatype is true?
The size is always greater than the extent
The size and extent can be equal
The extent is always greater than the size
None of the above
Solution
The size and extent can be equal when no padding is required. It’s best not to rely on this even when it is true, because your code or compiler or MPI library can change.
MPI offers functions to query extent and size of its types: they all take a variant of the MPI_Datatype
enumeration as argument.
Returns the lower bound and extent of a type.
int MPI_Type_get_extent(MPI_Datatype type,
MPI_Aint *lb,
MPI_Aint *extent)
Parameters
type
The datatype whose extent we’re querying.
lb
The lower bound of the datatype.
MPI_Aint
is a type designed to hold any valid address.extent
The extent of the datatype.
MPI_Aint
is a type designed to hold any valid address.
Returns the number of bytes occupied by entries in the datatype.
int MPI_Type_size(MPI_Datatype type,
int *size)
Parameters
type
The datatype whose extent we’re querying.
size
The number of bytes occupied by the entries in the datatype.
Extents and sizes
We will now play around a bit with the compiler and MPI to gain further understanding of padding, alignment, extents, and sizes.
What are extents and sizes for the basis datatypes
char
,int
,float
, anddouble
on your architecture? Do the numbers conform to your expectations? What is the result ofsizeof
for these types?// char printf("sizeof(char) = %ld\n", sizeof(char)); MPI_Type_get_extent(MPI_CHAR, &.., &..); MPI_Type_size(MPI_CHAR, &..); printf("For MPI_CHAR:\n lowerbound = %ld; extent = %ld; size = %d\n", .., .., ..);
You can find the file with the complete source code in the
content/code/day-1/03_basic-extent-size/solution
folder.Let’s now look at the
Pair
data structure. We first need declare the data structure to MPI. The following code, which we will study in much detail later on, achieves the purpose:// build up the typemap for Pair // the type signature for Pair MPI_Datatype typesig[2] = {MPI_INT, MPI_CHAR}; // how many of each type in a "block" of Pair int block_lengths[2] = {1, 1}; // displacements of data members in Pair MPI_Aint displacements[2]; // why not use pointer arithmetic directly? MPI_Get_address(&my_pair.first, &displacements[0]); MPI_Get_address(&my_pair.second, &displacements[1]); // create and commit the new type MPI_Datatype mpi_pair; MPI_Type_create_struct(2, block_lengths, displacements, typesig, &mpi_pair); MPI_Type_commit(&mpi_pair);
What are the size and the extent? Do they match up with our pen-and-paper calculation? Try different combinations of datatypes and adding other fields to the
struct
.You can find the file with the complete source code in the
content/code/day-1/04_struct-extent-size/solution
folder.
Extents and the count
parameter
Let us reiterate: the extent of a custom datatype is not its size. The extent tells the MPI runtime how to get to the next item in an array of a given type, much like a stride.
We can send an array of n
int
-s with a single MPI_Send
:
if (rank == 0) {
fprintf(stdout, "rank %d send\n", rank);
for (int i = 0; i < SIZE; ++i) {
fprintf(stdout, "buffer[%d] = %d\n", i, buffer[i]);
}
MPI_Send(buffer, SIZE, MPI_INT, 1, 0, comm);
} else {
MPI_Recv(buffer, SIZE, MPI_INT, 0, 0, comm, &status);
fprintf(stdout, "rank %d recv\n", rank);
for (int i = 0; i < SIZE; ++i) {
fprintf(stdout, "buffer[%d] = %d\n", i, buffer[i]);
}
}
or with n
such calls:
if (rank == 0) {
for (int i = 0; i < SIZE; ++i) {
fprintf(stdout, "rank %d send: buffer[%d] = %d\n", rank, i, buffer[i]);
MPI_Send(buffer + (i * extent), 1, MPI_INT, 1, 0, comm);
}
} else {
for (int i = 0; i < SIZE; ++i) {
MPI_Recv(buffer + (i * extent), 1, MPI_INT, 0, 0, comm, &status);
fprintf(stdout, "rank %d recv: buffer[%d] = %d\n", rank, i,
buffer[i]);
}
}
In the latter case, we must program explicitly how to get the next element in the array by using the extent of the datatype.
Any type you like: datatype constructors in MPI
The typemap concept allows us to provide a low-level description of any compound
datatype. The class of functions MPI_Type_*
offers facilities for portable type
manipulations in the MPI standard.
At a glance, each custom datatype goes through a well-defined lifecycle in an MPI application:
We construct our new datatype with a type constructor. The new type will be a variable with
MPI_Datatype
type.We publish our new type to the runtime with
MPI_Type_commit
.We use the new type in any of the MPI communication routines, as needed.
We free the new type from memory with
MPI_Type_free
.
It is not always necessary to go all the way down to a typemap to construct new datatypes in MPI. The following types can be created with convenience functions, side-stepping the explicit computation of a typemap. In MPI nomenclature, these types are:
- Contiguous
A homogeneous collection of a given datatype. The returned new type will describe a collection of
count
times the old type. Elements are contiguous: \(n\) and \(n-1\) are separated by the extent of the old type.int MPI_Type_contiguous(int count, MPI_Datatype oldtype, MPI_Datatype *newtype)
- Vector
A slight generalization of the contiguous type:
count
elements in the new type can be separated by a stride that is an arbitrary multiple of the extent of the old type.int MPI_Type_vector(int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype)
- Hvector
Yet another generalization of the contiguous datatype. The separation between elements in a hvector is expressed in bytes, rather than as a multiple of the extent.
int MPI_Type_create_hvector(int count, int blocklength, MPI_Aint stride, MPI_Datatype oldtype, MPI_Datatype *newtype)
- Indexed
This type allows to have non-homogeneous separations between the elements. Each displacement is intended as a multiple of the extent of the old type.
int MPI_Type_indexed(int count, const int array_of_blocklengths[], const int array_of_displacements[], MPI_Datatype oldtype, MPI_Datatype *newtype)
- Hindexed
This is a generalization of the indexed type analogous to the hvector. The non-homogeneous separations between the elements are expressed in bytes, rather than as multiples of the extent.
int MPI_Type_create_hindexed(int count, const int array_of_blocklengths[], const MPI_Aint array_of_displacements[], MPI_Datatype oldtype, MPI_Datatype *newtype)
Before using the output parameter newtype
, it needs to be “published” to the
runtime with MPI_Type_commit
:
int MPI_Type_commit(MPI_Datatype *type)
newtype
is a variable of type MPI_Datatype
. The programmer must
ensure proper release of the memory used at the end of the program by calling
MPI_Type_free
:
int MPI_Type_free(MPI_Datatype *type)
In practice, none of the previous convenience constructors might be suitable for
your application. As we glimpsed in a previous challenge, the general type
constructor MPI_Type_create_struct
will suit your needs:
int MPI_Type_create_struct(int count,
const int array_of_block_lengths[],
const MPI_Aint array_of_displacements[],
const MPI_Datatype array_of_types[],
MPI_Datatype *newtype)
Parameters
count
Number of fields (blocks in MPI nomenclature) of the datatype. This is the length of the
array_of_block_lengths
,array_of_displacements
, andarray_of_types
parameters.array_of_block_lengths
Number of elements in each field of the datatype.
array_of_displacements
Displacements, in bytes, for each field of the datatype.
array_of_types
Types for each field of the datatype, i.e. the type signature.
newtype
The new datatype.
The MPI version of the Pair
datatype
We saw code for this earlier on, but without explanation. Let’s dive into it now!
You can find the file with the complete source code in the
content/code/day-1/04_struct-extent-size/solution
folder.
Pair
has two fields, hence count = 2
in the call to
MPI_Type_create_struct
. All array arguments to this function will have
length 2.
The type signature is:
MPI_Datatype typesig[2] = {MPI_INT, MPI_CHAR};
We have one int
in the first
field and one char
in the second
fields, hence the array_of_block_lengths
argument is:
int block_lengths[2] = {1, 1};
The calculation of displacements is slightly more involved. We will use
MPI_Get_address
to fill the displacements
array. Notice that its
elements are of type MPI_Aint
:
MPI_Aint displacements[2];
MPI_Get_address(&my_pair.first, &displacements[0]);
MPI_Get_address(&my_pair.second, &displacements[1]);
We cannot use pointer arithmetic to compute displacements. Always keep in mind that your program might be deployed on heterogeneous architectures: you have to program for correctness and portability.
We are now ready to call the type constructor and commit our type:
MPI_Datatype mpi_pair;
MPI_Type_create_struct(2, block_lengths, displacements, typesig, &mpi_pair);
MPI_Type_commit(&mpi_pair);
And clean up after use, of course!
MPI_Type_free(&mpi_pair);
See also
The lecture covering MPI datatypes from EPCC is available on GitHub
Chapter 5 of the Using MPI book by William Gropp et al. [GLS14]
Chapter 6 of the Parallel Programming with MPI book by Peter Pacheco. [Pac97]
Keypoints
Typemaps are essential to enable MPI communication of complex datatypes.
MPI offers many type constructors to portably use your own datatypes in message passing.
Usage of the type constructors can be quite involved, but you strictly ensure your programs will be portable.