Data environment

Objectives

  • Understand explicit and implicit data movement

  • Understand structured and unstructured data clauses

  • Understand different mapping types

Data mapping

Due to distinct memory spaces on host and device, transferring data becomes inevitable. A combination of both explicit and implicit data mapping is used.

The MAP cluase on a device construct explicitly specifies how items are mapped from the host to the device data environment. The common mapped items consist of arrays(array sections), scalars, pointers, and structure elements. The various forms of the map cluase are summarised in the following table

map([map-type]:list)

map clause

map(to:list)

On entering the region, variables in the list are initialized on the device using the original values from the host

map(from:list)

At the end of the target region, the values from variables in the list are copied into the original variables on the host. On entering the region, the initial value of the variables on the device is not initialized

map(tofrom:list)

the effect of both a map-to and a map-from

map(alloc:list)

On entering the region, data is allocated and uninitialized on the device

map(list)

equivalent to ``map(tofrom:list)``

If the variables are not explicitly mapped, the compiler will do it implicitly:
  • Since v4.5, scalar is mapped as firstprivate, and the variable is not copied back to the host

  • non-scalar variables are mapped with a map-type tofrom

  • a C/C++ pointer is mapped as a zero-length array section

  • note that only the pointer value is mapped, but not the data it points to

Note

When mapping data arrays or pointers, be careful about the array section notation:
  • In C/C++: array[lower-bound:length]. The notation :N is equivalent to 0:N.

  • In Fortran:array[lower-bound:upper-bound]. The notation :N is equivalent to 1:N.

Exercise04: explicit and implicit data mapping

  1. explicitly adding the map clauses for data transfer between the host and device

  2. offloading the part where it “calculates the sum”

 1/* Copyright (c) 2019 CSC Training */
 2/* Copyright (c) 2021 ENCCS */
 3#include <stdio.h>
 4#include <math.h>
 5#define NX 102400
 6
 7int main(void)
 8{
 9  double vecA[NX],vecB[NX],vecC[NX];
10  double r=0.2;
11
12/* Initialization of vectors */
13  for (int i = 0; i < NX; i++) {
14     vecA[i] = pow(r, i);
15     vecB[i] = 1.0;
16  }
17
18/* dot product of two vectors */
19  #pragma omp target teams distribute
20  for (int i = 0; i < NX; i++) {
21     vecC[i] = vecA[i] * vecB[i];
22  }
23
24  double sum = 0.0;
25  /* calculate the sum */
26  for (int i = 0; i < NX; i++) {
27    sum += vecC[i];
28  }
29  printf("The sum is: %8.6f \n", sum);
30  return 0;
31}

Data region

How the TARGET construct creates storage, transfer data, and remove storage on the device are clasiffied as two categories: structured data region and unstructured data region.

Structured Data Regions

The TARGET DATA construct is used to create a structured data region which is convenient for providing persistent data on the device which could be used for subseqent target constructs.

Syntax

#pragma omp target data clause [clauses]
      structured-block
clause:
if( [target data:]scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
use_device_ptr(list)

Unstructured Data Regions

The TARGET DATA construct however is inconvenient in real applications. The unstructured data constructs (TARGET ENTER DATA and TARGET EXIT DATA) have much more freedom in creating and deleting of data on the device at any appropriate point.

Syntax

#pragma omp target enter data [clauses]
#pragma omp target exit data [clauses]
clause:
if(scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
depend(dependence-type:list)
nowait

Keypoints

Structured Data Region
  • start and end points within a single subroutine

  • Memory exists within the data region

Unstructured Data Region
  • multiple start and end points across different subroutines

  • Memory exists until explicitly deallocated

TARGET UPDATE construct

The TARGET UPDATE construct is used to keep the variable consistent between the host and the device. Data can be updated within a target regions with the transfer direction specified in the clause.

Syntax

#pragma omp target update [clause]
clause is motion-clause or one of:
if(scalar-logical-expression)
device(scalar-integer-expression)
nowait
depend(dependence-type:list)

motion-clause:
to(list)
from(list)

Exercise05: TARGET DATA structured region

Create a data region using TARGET DATA and add map clauses for data transfer.

 1/* Copyright (c) 2019 CSC Training */
 2/* Copyright (c) 2021 ENCCS */
 3#include <stdio.h>
 4#include <math.h>
 5#define NX 102400
 6
 7int main(void)
 8{
 9  double vecA[NX],vecB[NX],vecC[NX];
10  double r=0.2;
11
12/* Initialization of vectors */
13  for (int i = 0; i < NX; i++) {
14     vecA[i] = pow(r, i);
15     vecB[i] = 1.0;
16  }
17
18/* dot product of two vectors */
19     #pragma omp target
20     for (int i = 0; i < NX; i++) {
21        vecC[i] = vecA[i] * vecB[i];
22     }
23
24/* Initialization of vectors again */
25     for (int i = 0; i < NX; i++) {
26        vecA[i] = 1.0;
27        vecB[i] = 1.0;
28     }
29
30     #pragma omp target 
31     for (int i = 0; i < NX; i++) {
32        vecC[i] = vecC[i] + vecA[i] * vecB[i];
33     }
34  double sum = 0.0;
35  /* calculate the sum */
36  for (int i = 0; i < NX; i++) {
37    sum += vecC[i];
38  }
39  printf("The sum is: %8.6f \n", sum);
40  return 0;
41}

Exercise06: TARGET UPDATE

Trying to figure out the variable values on host and device at each check point.

 1/* Copyright (c) 2021 ENCCS */
 2#include <stdio.h>
 3int main(void)
 4{
 5  int x = 0;
 6
 7  #pragma omp target data map(tofrom:x)
 8  {
 9/* check point 1 */
10    x = 10;                        
11/* check point 2 */
12  #pragma omp target update to(x)       
13/* check point 3 */
14  }
15
16return 0;
17}

Optimize Data Transfers

  • Explicitely map the data instead of using the implicit mapping

  • Reduce the amount of data mapping between host and device, get rid of unneeded data transfer

  • Try to keep data environment residing on the target device as long as possible

Exercise: Data Movement

This exercise is about optimization and explicitly moving the data using the “target data” family constructs. Three incomplete functions are added to explicitly move the data around in core.cpp or core.F90. You need to add the directives for data movement for them.

The exercise is under /content/exercise/data_mapping

 1// Copyright (c) 2019 CSC Training
 2// Copyright (c) 2021 ENCCS
 3// Main routine for heat equation solver in 2D.
 4
 5#include <stdio.h>
 6#include <omp.h>
 7
 8#include "heat.h"
 9
10int main(int argc, char **argv)
11{
12    // Image output interval
13    int image_interval = 1500;
14
15    // Number of time steps
16    int nsteps;
17    // Current and previous temperature fields
18    field current, previous;
19    initialize(argc, argv, &current, &previous, &nsteps);
20
21    // Output the initial field 
22    write_field(&current, 0);
23
24    double average_temp = average(&current);
25    printf("Average temperature at start: %f\n", average_temp);
26
27    // Diffusion constant
28    double a = 0.5;
29
30    // Compute the largest stable time step
31    double dx2 = current.dx * current.dx;
32    double dy2 = current.dy * current.dy;
33    // Time step
34    double dt = dx2 * dy2 / (2.0 * a * (dx2 + dy2));
35
36    // Get the start time stamp
37    double start_clock = omp_get_wtime();
38
39    // Copy fields to device 
40    enter_data(&current, &previous);
41
42    // Time evolution
43    for (int iter = 1; iter <= nsteps; iter++) {
44        evolve(&current, &previous, a, dt);
45        if (iter % image_interval == 0) {
46	  // update data on host for output
47            update_host(&current);
48            write_field(&current, iter);
49        }
50        // Swap current field so that it will be used
51        // as previous for next iteration step
52        swap_fields(&current, &previous);
53    }
54  
55    // copy data back to host
56    exit_data(&current, &previous);
57
58    double stop_clock = omp_get_wtime();
59
60    // Average temperature for reference
61    average_temp = average(&previous);
62
63    // Determine the CPU time used for all the iterations
64    printf("Iterations took %.3f seconds.\n", (stop_clock - start_clock));
65    printf("Average temperature: %f\n", average_temp);
66    if (argc == 1) {
67        printf("Reference value with default arguments: 59.281239\n");
68    }
69
70    // Output the final field
71    write_field(&previous, nsteps);
72
73    return 0;
74}