Data environment

Objectives

Understand explicit and implicit data movement
Understand structured and unstructured data clauses
Understand different mapping types

Data mapping

Due to distinct memory spaces on host and device, transferring data becomes inevitable. A combination of both explicit and implicit data mapping is used.

The MAP cluase on a device construct explicitly specifies how items are mapped from the host to the device data environment. The common mapped items consist of arrays(array sections), scalars, pointers, and structure elements. The various forms of the map cluase are summarised in the following table

`map([map-type]:list)`	map clause
`map(to:list)`	On entering the region, variables in the list are initialized on the device using the original values from the host
`map(from:list)`	At the end of the target region, the values from variables in the list are copied into the original variables on the host. On entering the region, the initial value of the variables on the device is not initialized
`map(tofrom:list)`	the effect of both a map-to and a map-from
`map(alloc:list)`	On entering the region, data is allocated and uninitialized on the device
`map(list)`	equivalent to ``map(tofrom:list)``

If the variables are not explicitly mapped, the compiler will do it implicitly:

Since v4.5, scalar is mapped as firstprivate, and the variable is not copied back to the host
non-scalar variables are mapped with a map-type tofrom
a C/C++ pointer is mapped as a zero-length array section
note that only the pointer value is mapped, but not the data it points to

Note

When mapping data arrays or pointers, be careful about the array section notation:

In C/C++: array[lower-bound:length]. The notation :N is equivalent to 0:N.
In Fortran:array[lower-bound:upper-bound]. The notation :N is equivalent to 1:N.

Exercise04: explicit and implicit data mapping

explicitly adding the map clauses for data transfer between the host and device
offloading the part where it “calculates the sum”

/* Copyright (c) 2019 CSC Training */
/* Copyright (c) 2021 ENCCS */
#include <stdio.h>
#include <math.h>
#define NX 102400

int main(void)
{
  double vecA[NX],vecB[NX],vecC[NX];
  double r=0.2;

/* Initialization of vectors */
  for (int i = 0; i < NX; i++) {
     vecA[i] = pow(r, i);
     vecB[i] = 1.0;
  }

/* dot product of two vectors */
  #pragma omp target teams distribute
  for (int i = 0; i < NX; i++) {
     vecC[i] = vecA[i] * vecB[i];
  }

  double sum = 0.0;
  /* calculate the sum */
  for (int i = 0; i < NX; i++) {
    sum += vecC[i];
  }
  printf("The sum is: %8.6f \n", sum);
  return 0;
}

! Copyright (c) 2019 CSC Training
! Copyright (c) 2021 ENCCS
program dotproduct
  implicit none

  integer, parameter :: nx = 102400
  real, parameter :: r=0.2

  real, dimension(nx) :: vecA,vecB,vecC
  real    :: sum
  integer :: i

  ! Initialization of vectors
  do i = 1, nx
     vecA(i) = r**(i-1)
     vecB(i) = 1.0
  end do

  ! Dot product of two vectors 
  !$omp target teams distribute 
  do i = 1, nx
     vecC(i) =  vecA(i) * vecB(i)
  end do
  !$omp end target teams distribute

  sum = 0.0
  ! Calculate the sum 
  do i = 1, nx
     sum =  vecC(i) + sum
  end do

  write(*,*) 'The sum is: ', sum

end program dotproduct

Solution

/* Copyright (c) 2019 CSC Training */
/* Copyright (c) 2021 ENCCS */
#include <stdio.h>
#include <math.h>
#define NX 102400

int main(void)
{
  double vecA[NX],vecB[NX],vecC[NX];
  double r=0.2;

/* Initialization of vectors */
  for (int i = 0; i < NX; i++) {
     vecA[i] = pow(r, i);
     vecB[i] = 1.0;
  }

/* dot product of two vectors */
  #pragma omp target teams distribute map(from:vecC[0:NX]) map(to:vecA[0:NX],vecB[0:NX])
  for (int i = 0; i < NX; i++) {
     vecC[i] = vecA[i] * vecB[i];
  }

  double sum = 0.0;
  /* calculate the sum */
  #pragma omp target map(tofrom:sum)
  for (int i = 0; i < NX; i++) {
    sum += vecC[i];
  }
  printf("The sum is: %8.6f \n", sum);
  return 0;
}

! Copyright (c) 2019 CSC Training
! Copyright (c) 2021 ENCCS
program dotproduct
  implicit none

  integer, parameter :: nx = 102400
  real, parameter :: r=0.2

  real, dimension(nx) :: vecA,vecB,vecC
  real    :: sum
  integer :: i

  ! Initialization of vectors
  do i = 1, nx
     vecA(i) = r**(i-1)
     vecB(i) = 1.0
  end do

  ! Dot product of two vectors 
  !$omp target teams distribute map(from:vecC) map(to:vecA,vecB) 
  do i = 1, nx
     vecC(i) =  vecA(i) * vecB(i)
  end do
  !$omp end target teams distribute

  sum = 0.0
  ! Calculate the sum
  !$omp target map(tofrom:sum)
  do i = 1, nx
     sum =  vecC(i) + sum
  end do
  !$omp end target
  write(*,*) 'The sum is: ', sum

end program dotproduct

Data region

How the TARGET construct creates storage, transfer data, and remove storage on the device are clasiffied as two categories: structured data region and unstructured data region.

Structured Data Regions

The TARGET DATA construct is used to create a structured data region which is convenient for providing persistent data on the device which could be used for subseqent target constructs.

Syntax

#pragma omp target data clause [clauses]
      structured-block

clause:
if( [target data:]scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
use_device_ptr(list)

!$omp target data clause [clauses]
        structured-block
!$omp end target data

clause:
if( [target data:]scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
use_device_ptr(list)

Unstructured Data Regions

The TARGET DATA construct however is inconvenient in real applications. The unstructured data constructs (TARGET ENTER DATA and TARGET EXIT DATA) have much more freedom in creating and deleting of data on the device at any appropriate point.

Syntax

#pragma omp target enter data [clauses]

#pragma omp target exit data [clauses]

clause:
if(scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
depend(dependence-type:list)
nowait

!$omp target enter data [clauses]

!$omp target exit data [clauses]

clause:
if(scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
depend(dependence-type:list)
nowait

Keypoints

Structured Data Region

start and end points within a single subroutine
Memory exists within the data region

Unstructured Data Region

multiple start and end points across different subroutines
Memory exists until explicitly deallocated

TARGET UPDATE construct

The TARGET UPDATE construct is used to keep the variable consistent between the host and the device. Data can be updated within a target regions with the transfer direction specified in the clause.

Syntax

#pragma omp target update [clause]

clause is motion-clause or one of:
if(scalar-logical-expression)
device(scalar-integer-expression)
nowait
depend(dependence-type:list)

motion-clause:
to(list)
from(list)

!$omp target udpate clause

clause is motion-clause or one of:
if(scalar-logical-expression)
device(scalar-integer-expression)
nowait
depend(dependence-type:list)

motion-clause:
to(list)
from(list)

Exercise05: TARGET DATA structured region

Create a data region using TARGET DATA and add map clauses for data transfer.

/* Copyright (c) 2019 CSC Training */
/* Copyright (c) 2021 ENCCS */
#include <stdio.h>
#include <math.h>
#define NX 102400

int main(void)
{
  double vecA[NX],vecB[NX],vecC[NX];
  double r=0.2;

/* Initialization of vectors */
  for (int i = 0; i < NX; i++) {
     vecA[i] = pow(r, i);
     vecB[i] = 1.0;
  }

/* dot product of two vectors */
     #pragma omp target
     for (int i = 0; i < NX; i++) {
        vecC[i] = vecA[i] * vecB[i];
     }

/* Initialization of vectors again */
     for (int i = 0; i < NX; i++) {
        vecA[i] = 1.0;
        vecB[i] = 1.0;
     }

     #pragma omp target 
     for (int i = 0; i < NX; i++) {
        vecC[i] = vecC[i] + vecA[i] * vecB[i];
     }
  double sum = 0.0;
  /* calculate the sum */
  for (int i = 0; i < NX; i++) {
    sum += vecC[i];
  }
  printf("The sum is: %8.6f \n", sum);
  return 0;
}

! Copyright (c) 2019 CSC Training
! Copyright (c) 2021 ENCCS
program dotproduct
  implicit none

  integer, parameter :: nx = 102400
  real, parameter :: r=0.2

  real, dimension(nx) :: vecA,vecB,vecC
  real    :: sum
  integer :: i

  ! Initialization of vectors
  do i = 1, nx
     vecA(i) = r**(i-1)
     vecB(i) = 1.0
  end do

  ! Dot product of two vectors 
  !$omp target 
  do i = 1, nx
     vecC(i) =  vecA(i) * vecB(i)
  end do
  !$omp end target 

  ! Initialization of vectors again
  do i = 1, nx
     vecA(i) = r**(i-1)
     vecB(i) = 1.0
  end do

  !$omp target
  do i = 1, nx
     vecC(i) =  vecC(i) + vecA(i) * vecB(i)
  end do
  !$omp end target

  sum = 0.0
  ! Calculate the sum
  do i = 1, nx
     sum =  vecC(i) + sum
  end do
  write(*,'(A,F18.6)') 'The sum is: ', sum

end program dotproduct

Solution

/* Copyright (c) 2019 CSC Training */
/* Copyright (c) 2021 ENCCS */
#include <stdio.h>
#include <math.h>
#define NX 102400

int main(void)
{
  double vecA[NX],vecB[NX],vecC[NX];
  double r=0.2;

/* Initialization of vectors */
  for (int i = 0; i < NX; i++) {
     vecA[i] = pow(r, i);
     vecB[i] = 1.0;
  }

/* dot product of two vectors */
  #pragma omp target data map(from:vecC[0:NX])
  {
     #pragma omp target map(to:vecA[0:NX],vecB[0:NX])
     for (int i = 0; i < NX; i++) {
        vecC[i] = vecA[i] * vecB[i];
     }

/* Initialization of vectors again */
     for (int i = 0; i < NX; i++) {
        vecA[i] = 0.5;
        vecB[i] = 2.0;
     }

     #pragma omp target map(to:vecA[0:NX],vecB[0:NX])
     for (int i = 0; i < NX; i++) {
        vecC[i] = vecC[i] + vecA[i] * vecB[i];
     }
  }
  double sum = 0.0;
  /* calculate the sum */
  for (int i = 0; i < NX; i++) {
    sum += vecC[i];
  }
  printf("The sum is: %8.6f \n", sum);
  return 0;
}

! Copyright (c) 2019 CSC Training
! Copyright (c) 2021 ENCCS
program dotproduct
  implicit none

  integer, parameter :: nx = 102400
  real, parameter :: r=0.2

  real, dimension(nx) :: vecA,vecB,vecC
  real    :: sum
  integer :: i

  ! Initialization of vectors
  do i = 1, nx
     vecA(i) = r**(i-1)
     vecB(i) = 1.0
  end do

  ! Dot product of two vectors 
  !$omp target data map(from:vecC) 
  !$omp target map(to:vecA,vecB)
  do i = 1, nx
     vecC(i) =  vecA(i) * vecB(i)
  end do
  !$omp end target 

  ! Initialization of vectors again
  do i = 1, nx
     vecA(i) = 0.5 
     vecB(i) = 2.0
  end do

  !$omp target map(to:vecA,vecB)
  do i = 1, nx
     vecC(i) =  vecC(i) + vecA(i) * vecB(i)
  end do
  !$omp end target
  !$omp end target data 

  sum = 0.0
  ! Calculate the sum
  do i = 1, nx
     sum =  vecC(i) + sum
  end do
  write(*,'(A,F18.6)') 'The sum is: ', sum

end program dotproduct

Exercise06: TARGET UPDATE

Trying to figure out the variable values on host and device at each check point.

/* Copyright (c) 2021 ENCCS */
#include <stdio.h>
int main(void)
{
  int x = 0;

  #pragma omp target data map(tofrom:x)
  {
/* check point 1 */
    x = 10;                        
/* check point 2 */
  #pragma omp target update to(x)       
/* check point 3 */
  }

return 0;
}

! Copyright (c) 2021 ENCCS
program dotproduct
  implicit none

  integer :: x

  x = 0
  !$omp target data map(tofrom:x) 
  ! check point 1 
  x = 10                        
  ! check point 2 
  !$omp target update to(x)       
  ! check point 3 
  !$omp end target data

end program dotproduct

Optimize Data Transfers

Explicitely map the data instead of using the implicit mapping
Reduce the amount of data mapping between host and device, get rid of unneeded data transfer
Try to keep data environment residing on the target device as long as possible

Exercise: Data Movement

This exercise is about optimization and explicitly moving the data using the “target data” family constructs. Three incomplete functions are added to explicitly move the data around in core.cpp or core.F90. You need to add the directives for data movement for them.

The exercise is under /content/exercise/data_mapping

// Copyright (c) 2019 CSC Training
// Copyright (c) 2021 ENCCS
// Main routine for heat equation solver in 2D.

#include <stdio.h>
#include <omp.h>

#include "heat.h"

int main(int argc, char **argv)
{
    // Image output interval
    int image_interval = 1500;

    // Number of time steps
    int nsteps;
    // Current and previous temperature fields
    field current, previous;
    initialize(argc, argv, &current, &previous, &nsteps);

    // Output the initial field 
    write_field(&current, 0);

    double average_temp = average(&current);
    printf("Average temperature at start: %f\n", average_temp);

    // Diffusion constant
    double a = 0.5;

    // Compute the largest stable time step
    double dx2 = current.dx * current.dx;
    double dy2 = current.dy * current.dy;
    // Time step
    double dt = dx2 * dy2 / (2.0 * a * (dx2 + dy2));

    // Get the start time stamp
    double start_clock = omp_get_wtime();

    // Copy fields to device 
    enter_data(&current, &previous);

    // Time evolution
    for (int iter = 1; iter <= nsteps; iter++) {
        evolve(&current, &previous, a, dt);
        if (iter % image_interval == 0) {
	  // update data on host for output
            update_host(&current);
            write_field(&current, iter);
        }
        // Swap current field so that it will be used
        // as previous for next iteration step
        swap_fields(&current, &previous);
    }
  
    // copy data back to host
    exit_data(&current, &previous);

    double stop_clock = omp_get_wtime();

    // Average temperature for reference
    average_temp = average(&previous);

    // Determine the CPU time used for all the iterations
    printf("Iterations took %.3f seconds.\n", (stop_clock - start_clock));
    printf("Average temperature: %f\n", average_temp);
    if (argc == 1) {
        printf("Reference value with default arguments: 59.281239\n");
    }

    // Output the final field
    write_field(&previous, nsteps);

    return 0;
}

! Copyright (c) 2019 CSC Training
! Copyright (c) 2021 ENCCS
! Heat equation solver in 2D.

program heat_solve
  use heat
  use core
  use io
  use setup
  use utilities
  use omp_lib

  implicit none

  real(dp), parameter :: a = 0.5 ! Diffusion constant
  type(field) :: current, previous    ! Current and previus temperature fields

  real(dp) :: dt     ! Time step
  integer :: nsteps       ! Number of time steps
  integer, parameter :: image_interval = 1500 ! Image output interval

  integer :: iter

  real(dp) :: average_temp   !  Average temperature

  real(kind=dp) :: start, stop ! Timers

  call initialize(current, previous, nsteps)

  ! Draw the picture of the initial state
  call write_field(current, 0)

  average_temp = average(current)
  write(*,'(A,F9.6)') 'Average temperature at start: ', average_temp

  ! Largest stable time step
  dt = current%dx**2 * current%dy**2 / &
       & (2.0 * a * (current%dx**2 + current%dy**2))

  ! Main iteration loop

  start =  omp_get_wtime()

  ! copy data to device
  call enter_data(current, previous)

  do iter = 1, nsteps
     call evolve(current, previous, a, dt)
     if (mod(iter, image_interval) == 0) then
        ! update data on host for output
        call update_host(current)
        call write_field(current, iter)
     end if
     call swap_fields(current, previous)
  end do

  ! copy data back to host
  call exit_data(current, previous)

  stop = omp_get_wtime()

  ! Average temperature for reference
  average_temp = average(previous)

  write(*,'(A,F7.3,A)') 'Iteration took ', stop - start, ' seconds.'
  write(*,'(A,F9.6)') 'Average temperature: ',  average_temp
  if (command_argument_count() == 0) then
      write(*,'(A,F9.6)') 'Reference value with default arguments: ', 59.281239
  end if

  call finalize(current, previous)

end program heat_solve

Solution

// Copyright (c) 2019 CSC Training
// Copyright (c) 2021 ENCCS
// Main solver routines for heat equation solver

#include "heat.h"

// Update the temperature values using five-point stencil
// Arguments:
//   curr: current temperature values
//   prev: temperature values from previous time step
//   a: diffusivity
//   dt: time step
void evolve(field *curr, field *prev, double a, double dt)
{
  // Help the compiler avoid being confused by the structs
  double *currdata = curr->data.data();
  double *prevdata = prev->data.data();
  int nx = curr->nx;
  int ny = curr->ny;

  // Determine the temperature field at next time step
  // As we have fixed boundary conditions, the outermost gridpoints
  // are not updated.
  double dx2 = prev->dx * prev->dx;
  double dy2 = prev->dy * prev->dy;
  #pragma omp target teams distribute parallel for 
  for (int i = 1; i < nx + 1; i++) {
    for (int j = 1; j < ny + 1; j++) {
      int ind = i * (ny + 2) + j;
      int ip = (i + 1) * (ny + 2) + j;
      int im = (i - 1) * (ny + 2) + j;
      int jp = i * (ny + 2) + j + 1;
      int jm = i * (ny + 2) + j - 1;
      currdata[ind] = prevdata[ind] + a*dt*
	    ((prevdata[ip] - 2.0*prevdata[ind] + prevdata[im]) / dx2 +
	     (prevdata[jp] - 2.0*prevdata[ind] + prevdata[jm]) / dy2);
    }
  }
}

// Start a data region and copy temperature fields to the device 
void enter_data(field *curr, field *prev)
{
    int nx, ny;
    double *currdata, *prevdata;

    currdata = curr->data.data();
    prevdata = prev->data.data();
    nx = curr->nx;
    ny = curr->ny;

// adding data mapping here
    #pragma omp target enter data \
    map(to: currdata[0:(nx+2)*(ny+2)], prevdata[0:(nx+2)*(ny+2)])
}

// End a data region and copy temperature fields back to the host 
void exit_data(field *curr, field *prev)
{
    int nx, ny;
    double *currdata, *prevdata;

    currdata = curr->data.data();
    prevdata = prev->data.data();
    nx = curr->nx;
    ny = curr->ny;

// adding data mapping here
    #pragma omp target exit data \
    map(from: currdata[0:(nx+2)*(ny+2)], prevdata[0:(nx+2)*(ny+2)])
}

// Copy a temperature field from the device to the host 
void update_host(field *temperature)
{
    int nx, ny;
    double *data;

    data = temperature->data.data();
    nx = temperature->nx;
    ny = temperature->ny;

// adding data mapping here
    #pragma omp target update from(data[0:(nx+2)*(ny+2)])
}

! Copyright (c) 2019 CSC Training
! Copyright (c) 2021 ENCCS
! Main solver routines for heat equation solver
module core
  use heat

contains

  ! Update the temperature values using five-point stencil
  ! Arguments:
  !   curr (type(field)): current temperature values
  !   prev (type(field)): temperature values from previous time step
  !   a (real(dp)): diffusivity
  !   dt (real(dp)): time step
  subroutine evolve(curr, prev, a, dt)

    implicit none

    type(field),target, intent(inout) :: curr, prev
    real(dp) :: a, dt
    integer :: i, j, nx, ny
    real(dp) :: dx, dy
    real(dp), pointer, contiguous, dimension(:,:) :: currdata, prevdata

    ! Help the compiler avoid being confused
    nx = curr%nx
    ny = curr%ny
    dx = curr%dx
    dy = curr%dy
    currdata => curr%data
    prevdata => prev%data

    ! Determine the temperature field at next time step As we have
    ! fixed boundary conditions, the outermost gridpoints are not
    ! updated.
    !$omp target teams distribute parallel do  
    do j = 1, ny
       do i = 1, nx
          currdata(i, j) = prevdata(i, j) + a * dt * &
               & ((prevdata(i-1, j) - 2.0 * prevdata(i, j) + &
               &   prevdata(i+1, j)) / dx**2 + &
               &  (prevdata(i, j-1) - 2.0 * prevdata(i, j) + &
               &   prevdata(i, j+1)) / dy**2)
       end do
    end do
    !$omp end target teams distribute parallel do 
  end subroutine evolve

  ! Start a data region and copy temperature fields to the device
  !   curr (type(field)): current temperature values
  !   prev (type(field)): values from previous time step
  subroutine enter_data(curr, prev)
    implicit none
    type(field), target, intent(in) :: curr, prev
    real(kind=dp), pointer, contiguous :: currdata(:,:), prevdata(:,:)

    currdata => curr%data
    prevdata => prev%data

  ! adding data mapping here
    !$omp target enter data map(to: currdata, prevdata)

  end subroutine enter_data

  ! End a data region and copy temperature fields back to the host
  !   curr (type(field)): current temperature values
  !   prev (type(field)): values from previous time step
  subroutine exit_data(curr, prev)
    implicit none
    type(field), target :: curr, prev
    real(kind=dp), pointer, contiguous :: currdata(:,:), prevdata(:,:)

    currdata => curr%data
    prevdata => prev%data

  ! adding data mapping here
    !$omp target exit data map(from: currdata, prevdata)

  end subroutine exit_data

  ! Copy a temperature field from the device to the host
  !   temperature (type(field)): temperature field
  subroutine update_host(temperature)
    implicit none
    type(field), target :: temperature
    real(kind=dp), pointer, contiguous :: tempdata(:,:)

    tempdata => temperature%data

  ! adding data mapping here
    !$omp target update from(tempdata)

  end subroutine update_host

end module core

check point	x on host	x on device
check point1	0	0
check point2	10	0
check point3	10	10