Data environment
Objectives
Understand explicit and implicit data movement
Understand structured and unstructured data clauses
Understand different mapping types
Data mapping
Due to distinct memory spaces on host and device, transferring data becomes inevitable. A combination of both explicit and implicit data mapping is used.
The MAP
cluase on a device construct explicitly
specifies how items are mapped from the host to the device data
environment. The common mapped items consist of arrays(array
sections), scalars, pointers, and structure elements. The various
forms of the map cluase are summarised in the following table
|
map clause |
|
On entering the region, variables in the list are initialized on the device using the original values from the host |
|
At the end of the target region, the values from variables in the list are copied into the original variables on the host. On entering the region, the initial value of the variables on the device is not initialized |
|
the effect of both a map-to and a map-from |
|
On entering the region, data is allocated and uninitialized on the device |
|
equivalent to ``map(tofrom:list)`` |
- If the variables are not explicitly mapped, the compiler will do it implicitly:
Since v4.5, scalar is mapped as firstprivate, and the variable is not copied back to the host
non-scalar variables are mapped with a map-type tofrom
a C/C++ pointer is mapped as a zero-length array section
note that only the pointer value is mapped, but not the data it points to
Note
- When mapping data arrays or pointers, be careful about the array section notation:
In C/C++: array[lower-bound:length]. The notation :N is equivalent to 0:N.
In Fortran:array[lower-bound:upper-bound]. The notation :N is equivalent to 1:N.
Exercise04: explicit and implicit data mapping
explicitly adding the
map
clauses for data transfer between the host and deviceoffloading the part where it “calculates the sum”
1/* Copyright (c) 2019 CSC Training */
2/* Copyright (c) 2021 ENCCS */
3#include <stdio.h>
4#include <math.h>
5#define NX 102400
6
7int main(void)
8{
9 double vecA[NX],vecB[NX],vecC[NX];
10 double r=0.2;
11
12/* Initialization of vectors */
13 for (int i = 0; i < NX; i++) {
14 vecA[i] = pow(r, i);
15 vecB[i] = 1.0;
16 }
17
18/* dot product of two vectors */
19 #pragma omp target teams distribute
20 for (int i = 0; i < NX; i++) {
21 vecC[i] = vecA[i] * vecB[i];
22 }
23
24 double sum = 0.0;
25 /* calculate the sum */
26 for (int i = 0; i < NX; i++) {
27 sum += vecC[i];
28 }
29 printf("The sum is: %8.6f \n", sum);
30 return 0;
31}
1! Copyright (c) 2019 CSC Training
2! Copyright (c) 2021 ENCCS
3program dotproduct
4 implicit none
5
6 integer, parameter :: nx = 102400
7 real, parameter :: r=0.2
8
9 real, dimension(nx) :: vecA,vecB,vecC
10 real :: sum
11 integer :: i
12
13 ! Initialization of vectors
14 do i = 1, nx
15 vecA(i) = r**(i-1)
16 vecB(i) = 1.0
17 end do
18
19 ! Dot product of two vectors
20 !$omp target teams distribute
21 do i = 1, nx
22 vecC(i) = vecA(i) * vecB(i)
23 end do
24 !$omp end target teams distribute
25
26 sum = 0.0
27 ! Calculate the sum
28 do i = 1, nx
29 sum = vecC(i) + sum
30 end do
31
32 write(*,*) 'The sum is: ', sum
33
34end program dotproduct
Solution
1/* Copyright (c) 2019 CSC Training */
2/* Copyright (c) 2021 ENCCS */
3#include <stdio.h>
4#include <math.h>
5#define NX 102400
6
7int main(void)
8{
9 double vecA[NX],vecB[NX],vecC[NX];
10 double r=0.2;
11
12/* Initialization of vectors */
13 for (int i = 0; i < NX; i++) {
14 vecA[i] = pow(r, i);
15 vecB[i] = 1.0;
16 }
17
18/* dot product of two vectors */
19 #pragma omp target teams distribute map(from:vecC[0:NX]) map(to:vecA[0:NX],vecB[0:NX])
20 for (int i = 0; i < NX; i++) {
21 vecC[i] = vecA[i] * vecB[i];
22 }
23
24 double sum = 0.0;
25 /* calculate the sum */
26 #pragma omp target map(tofrom:sum)
27 for (int i = 0; i < NX; i++) {
28 sum += vecC[i];
29 }
30 printf("The sum is: %8.6f \n", sum);
31 return 0;
32}
1! Copyright (c) 2019 CSC Training
2! Copyright (c) 2021 ENCCS
3program dotproduct
4 implicit none
5
6 integer, parameter :: nx = 102400
7 real, parameter :: r=0.2
8
9 real, dimension(nx) :: vecA,vecB,vecC
10 real :: sum
11 integer :: i
12
13 ! Initialization of vectors
14 do i = 1, nx
15 vecA(i) = r**(i-1)
16 vecB(i) = 1.0
17 end do
18
19 ! Dot product of two vectors
20 !$omp target teams distribute map(from:vecC) map(to:vecA,vecB)
21 do i = 1, nx
22 vecC(i) = vecA(i) * vecB(i)
23 end do
24 !$omp end target teams distribute
25
26 sum = 0.0
27 ! Calculate the sum
28 !$omp target map(tofrom:sum)
29 do i = 1, nx
30 sum = vecC(i) + sum
31 end do
32 !$omp end target
33 write(*,*) 'The sum is: ', sum
34
35end program dotproduct
Data region
How the TARGET
construct creates storage, transfer data, and remove
storage on the device are clasiffied as two categories: structured
data region and unstructured data region.
Structured Data Regions
The TARGET DATA
construct is used to create a structured data region
which is convenient for providing persistent data on the device which
could be used for subseqent target constructs.
Syntax
#pragma omp target data clause [clauses]
structured-block
clause:
if( [target data:]scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
use_device_ptr(list)
!$omp target data clause [clauses]
structured-block
!$omp end target data
clause:
if( [target data:]scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
use_device_ptr(list)
Unstructured Data Regions
The TARGET DATA
construct however is inconvenient in real applications.
The unstructured data constructs (TARGET ENTER DATA
and TARGET EXIT DATA
)
have much more freedom in creating and deleting of data on the device at any appropriate point.
Syntax
#pragma omp target enter data [clauses]
#pragma omp target exit data [clauses]
clause:
if(scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
depend(dependence-type:list)
nowait
!$omp target enter data [clauses]
!$omp target exit data [clauses]
clause:
if(scalar-logical-expression)
device(scalar-integer-expression)
map([map-type :] list)
depend(dependence-type:list)
nowait
Keypoints
- Structured Data Region
start and end points within a single subroutine
Memory exists within the data region
- Unstructured Data Region
multiple start and end points across different subroutines
Memory exists until explicitly deallocated
TARGET UPDATE construct
The TARGET UPDATE
construct is used to keep the variable consistent between the host and the device.
Data can be updated within a target regions with the transfer direction specified in the clause.
Syntax
#pragma omp target update [clause]
clause is motion-clause or one of:
if(scalar-logical-expression)
device(scalar-integer-expression)
nowait
depend(dependence-type:list)
motion-clause:
to(list)
from(list)
!$omp target udpate clause
clause is motion-clause or one of:
if(scalar-logical-expression)
device(scalar-integer-expression)
nowait
depend(dependence-type:list)
motion-clause:
to(list)
from(list)
Exercise05: TARGET DATA
structured region
Create a data region using TARGET DATA
and add map
clauses for data transfer.
1/* Copyright (c) 2019 CSC Training */
2/* Copyright (c) 2021 ENCCS */
3#include <stdio.h>
4#include <math.h>
5#define NX 102400
6
7int main(void)
8{
9 double vecA[NX],vecB[NX],vecC[NX];
10 double r=0.2;
11
12/* Initialization of vectors */
13 for (int i = 0; i < NX; i++) {
14 vecA[i] = pow(r, i);
15 vecB[i] = 1.0;
16 }
17
18/* dot product of two vectors */
19 #pragma omp target
20 for (int i = 0; i < NX; i++) {
21 vecC[i] = vecA[i] * vecB[i];
22 }
23
24/* Initialization of vectors again */
25 for (int i = 0; i < NX; i++) {
26 vecA[i] = 1.0;
27 vecB[i] = 1.0;
28 }
29
30 #pragma omp target
31 for (int i = 0; i < NX; i++) {
32 vecC[i] = vecC[i] + vecA[i] * vecB[i];
33 }
34 double sum = 0.0;
35 /* calculate the sum */
36 for (int i = 0; i < NX; i++) {
37 sum += vecC[i];
38 }
39 printf("The sum is: %8.6f \n", sum);
40 return 0;
41}
1! Copyright (c) 2019 CSC Training
2! Copyright (c) 2021 ENCCS
3program dotproduct
4 implicit none
5
6 integer, parameter :: nx = 102400
7 real, parameter :: r=0.2
8
9 real, dimension(nx) :: vecA,vecB,vecC
10 real :: sum
11 integer :: i
12
13 ! Initialization of vectors
14 do i = 1, nx
15 vecA(i) = r**(i-1)
16 vecB(i) = 1.0
17 end do
18
19 ! Dot product of two vectors
20 !$omp target
21 do i = 1, nx
22 vecC(i) = vecA(i) * vecB(i)
23 end do
24 !$omp end target
25
26 ! Initialization of vectors again
27 do i = 1, nx
28 vecA(i) = r**(i-1)
29 vecB(i) = 1.0
30 end do
31
32 !$omp target
33 do i = 1, nx
34 vecC(i) = vecC(i) + vecA(i) * vecB(i)
35 end do
36 !$omp end target
37
38 sum = 0.0
39 ! Calculate the sum
40 do i = 1, nx
41 sum = vecC(i) + sum
42 end do
43 write(*,'(A,F18.6)') 'The sum is: ', sum
44
45end program dotproduct
Solution
1/* Copyright (c) 2019 CSC Training */
2/* Copyright (c) 2021 ENCCS */
3#include <stdio.h>
4#include <math.h>
5#define NX 102400
6
7int main(void)
8{
9 double vecA[NX],vecB[NX],vecC[NX];
10 double r=0.2;
11
12/* Initialization of vectors */
13 for (int i = 0; i < NX; i++) {
14 vecA[i] = pow(r, i);
15 vecB[i] = 1.0;
16 }
17
18/* dot product of two vectors */
19 #pragma omp target data map(from:vecC[0:NX])
20 {
21 #pragma omp target map(to:vecA[0:NX],vecB[0:NX])
22 for (int i = 0; i < NX; i++) {
23 vecC[i] = vecA[i] * vecB[i];
24 }
25
26/* Initialization of vectors again */
27 for (int i = 0; i < NX; i++) {
28 vecA[i] = 0.5;
29 vecB[i] = 2.0;
30 }
31
32 #pragma omp target map(to:vecA[0:NX],vecB[0:NX])
33 for (int i = 0; i < NX; i++) {
34 vecC[i] = vecC[i] + vecA[i] * vecB[i];
35 }
36 }
37 double sum = 0.0;
38 /* calculate the sum */
39 for (int i = 0; i < NX; i++) {
40 sum += vecC[i];
41 }
42 printf("The sum is: %8.6f \n", sum);
43 return 0;
44}
1! Copyright (c) 2019 CSC Training
2! Copyright (c) 2021 ENCCS
3program dotproduct
4 implicit none
5
6 integer, parameter :: nx = 102400
7 real, parameter :: r=0.2
8
9 real, dimension(nx) :: vecA,vecB,vecC
10 real :: sum
11 integer :: i
12
13 ! Initialization of vectors
14 do i = 1, nx
15 vecA(i) = r**(i-1)
16 vecB(i) = 1.0
17 end do
18
19 ! Dot product of two vectors
20 !$omp target data map(from:vecC)
21 !$omp target map(to:vecA,vecB)
22 do i = 1, nx
23 vecC(i) = vecA(i) * vecB(i)
24 end do
25 !$omp end target
26
27 ! Initialization of vectors again
28 do i = 1, nx
29 vecA(i) = 0.5
30 vecB(i) = 2.0
31 end do
32
33 !$omp target map(to:vecA,vecB)
34 do i = 1, nx
35 vecC(i) = vecC(i) + vecA(i) * vecB(i)
36 end do
37 !$omp end target
38 !$omp end target data
39
40 sum = 0.0
41 ! Calculate the sum
42 do i = 1, nx
43 sum = vecC(i) + sum
44 end do
45 write(*,'(A,F18.6)') 'The sum is: ', sum
46
47end program dotproduct
Exercise06: TARGET UPDATE
Trying to figure out the variable values on host and device at each check point.
1/* Copyright (c) 2021 ENCCS */
2#include <stdio.h>
3int main(void)
4{
5 int x = 0;
6
7 #pragma omp target data map(tofrom:x)
8 {
9/* check point 1 */
10 x = 10;
11/* check point 2 */
12 #pragma omp target update to(x)
13/* check point 3 */
14 }
15
16return 0;
17}
1! Copyright (c) 2021 ENCCS
2program dotproduct
3 implicit none
4
5 integer :: x
6
7 x = 0
8 !$omp target data map(tofrom:x)
9 ! check point 1
10 x = 10
11 ! check point 2
12 !$omp target update to(x)
13 ! check point 3
14 !$omp end target data
15
16end program dotproduct
check point |
x on host |
x on device |
---|---|---|
check point1 |
0 |
0 |
check point2 |
10 |
0 |
check point3 |
10 |
10 |
Optimize Data Transfers
Explicitely map the data instead of using the implicit mapping
Reduce the amount of data mapping between host and device, get rid of unneeded data transfer
Try to keep data environment residing on the target device as long as possible
Exercise: Data Movement
This exercise is about optimization and explicitly moving the data using the “target data” family constructs. Three incomplete functions are added to explicitly move the data around in core.cpp or core.F90. You need to add the directives for data movement for them.
The exercise is under /content/exercise/data_mapping
1// Copyright (c) 2019 CSC Training
2// Copyright (c) 2021 ENCCS
3// Main routine for heat equation solver in 2D.
4
5#include <stdio.h>
6#include <omp.h>
7
8#include "heat.h"
9
10int main(int argc, char **argv)
11{
12 // Image output interval
13 int image_interval = 1500;
14
15 // Number of time steps
16 int nsteps;
17 // Current and previous temperature fields
18 field current, previous;
19 initialize(argc, argv, ¤t, &previous, &nsteps);
20
21 // Output the initial field
22 write_field(¤t, 0);
23
24 double average_temp = average(¤t);
25 printf("Average temperature at start: %f\n", average_temp);
26
27 // Diffusion constant
28 double a = 0.5;
29
30 // Compute the largest stable time step
31 double dx2 = current.dx * current.dx;
32 double dy2 = current.dy * current.dy;
33 // Time step
34 double dt = dx2 * dy2 / (2.0 * a * (dx2 + dy2));
35
36 // Get the start time stamp
37 double start_clock = omp_get_wtime();
38
39 // Copy fields to device
40 enter_data(¤t, &previous);
41
42 // Time evolution
43 for (int iter = 1; iter <= nsteps; iter++) {
44 evolve(¤t, &previous, a, dt);
45 if (iter % image_interval == 0) {
46 // update data on host for output
47 update_host(¤t);
48 write_field(¤t, iter);
49 }
50 // Swap current field so that it will be used
51 // as previous for next iteration step
52 swap_fields(¤t, &previous);
53 }
54
55 // copy data back to host
56 exit_data(¤t, &previous);
57
58 double stop_clock = omp_get_wtime();
59
60 // Average temperature for reference
61 average_temp = average(&previous);
62
63 // Determine the CPU time used for all the iterations
64 printf("Iterations took %.3f seconds.\n", (stop_clock - start_clock));
65 printf("Average temperature: %f\n", average_temp);
66 if (argc == 1) {
67 printf("Reference value with default arguments: 59.281239\n");
68 }
69
70 // Output the final field
71 write_field(&previous, nsteps);
72
73 return 0;
74}
1! Copyright (c) 2019 CSC Training
2! Copyright (c) 2021 ENCCS
3! Heat equation solver in 2D.
4
5program heat_solve
6 use heat
7 use core
8 use io
9 use setup
10 use utilities
11 use omp_lib
12
13 implicit none
14
15 real(dp), parameter :: a = 0.5 ! Diffusion constant
16 type(field) :: current, previous ! Current and previus temperature fields
17
18 real(dp) :: dt ! Time step
19 integer :: nsteps ! Number of time steps
20 integer, parameter :: image_interval = 1500 ! Image output interval
21
22 integer :: iter
23
24 real(dp) :: average_temp ! Average temperature
25
26 real(kind=dp) :: start, stop ! Timers
27
28 call initialize(current, previous, nsteps)
29
30 ! Draw the picture of the initial state
31 call write_field(current, 0)
32
33 average_temp = average(current)
34 write(*,'(A,F9.6)') 'Average temperature at start: ', average_temp
35
36 ! Largest stable time step
37 dt = current%dx**2 * current%dy**2 / &
38 & (2.0 * a * (current%dx**2 + current%dy**2))
39
40 ! Main iteration loop
41
42 start = omp_get_wtime()
43
44 ! copy data to device
45 call enter_data(current, previous)
46
47 do iter = 1, nsteps
48 call evolve(current, previous, a, dt)
49 if (mod(iter, image_interval) == 0) then
50 ! update data on host for output
51 call update_host(current)
52 call write_field(current, iter)
53 end if
54 call swap_fields(current, previous)
55 end do
56
57 ! copy data back to host
58 call exit_data(current, previous)
59
60 stop = omp_get_wtime()
61
62 ! Average temperature for reference
63 average_temp = average(previous)
64
65 write(*,'(A,F7.3,A)') 'Iteration took ', stop - start, ' seconds.'
66 write(*,'(A,F9.6)') 'Average temperature: ', average_temp
67 if (command_argument_count() == 0) then
68 write(*,'(A,F9.6)') 'Reference value with default arguments: ', 59.281239
69 end if
70
71 call finalize(current, previous)
72
73end program heat_solve
Solution
1// Copyright (c) 2019 CSC Training
2// Copyright (c) 2021 ENCCS
3// Main solver routines for heat equation solver
4
5#include "heat.h"
6
7// Update the temperature values using five-point stencil
8// Arguments:
9// curr: current temperature values
10// prev: temperature values from previous time step
11// a: diffusivity
12// dt: time step
13void evolve(field *curr, field *prev, double a, double dt)
14{
15 // Help the compiler avoid being confused by the structs
16 double *currdata = curr->data.data();
17 double *prevdata = prev->data.data();
18 int nx = curr->nx;
19 int ny = curr->ny;
20
21 // Determine the temperature field at next time step
22 // As we have fixed boundary conditions, the outermost gridpoints
23 // are not updated.
24 double dx2 = prev->dx * prev->dx;
25 double dy2 = prev->dy * prev->dy;
26 #pragma omp target teams distribute parallel for
27 for (int i = 1; i < nx + 1; i++) {
28 for (int j = 1; j < ny + 1; j++) {
29 int ind = i * (ny + 2) + j;
30 int ip = (i + 1) * (ny + 2) + j;
31 int im = (i - 1) * (ny + 2) + j;
32 int jp = i * (ny + 2) + j + 1;
33 int jm = i * (ny + 2) + j - 1;
34 currdata[ind] = prevdata[ind] + a*dt*
35 ((prevdata[ip] - 2.0*prevdata[ind] + prevdata[im]) / dx2 +
36 (prevdata[jp] - 2.0*prevdata[ind] + prevdata[jm]) / dy2);
37 }
38 }
39}
40
41// Start a data region and copy temperature fields to the device
42void enter_data(field *curr, field *prev)
43{
44 int nx, ny;
45 double *currdata, *prevdata;
46
47 currdata = curr->data.data();
48 prevdata = prev->data.data();
49 nx = curr->nx;
50 ny = curr->ny;
51
52// adding data mapping here
53 #pragma omp target enter data \
54 map(to: currdata[0:(nx+2)*(ny+2)], prevdata[0:(nx+2)*(ny+2)])
55}
56
57// End a data region and copy temperature fields back to the host
58void exit_data(field *curr, field *prev)
59{
60 int nx, ny;
61 double *currdata, *prevdata;
62
63 currdata = curr->data.data();
64 prevdata = prev->data.data();
65 nx = curr->nx;
66 ny = curr->ny;
67
68// adding data mapping here
69 #pragma omp target exit data \
70 map(from: currdata[0:(nx+2)*(ny+2)], prevdata[0:(nx+2)*(ny+2)])
71}
72
73// Copy a temperature field from the device to the host
74void update_host(field *temperature)
75{
76 int nx, ny;
77 double *data;
78
79 data = temperature->data.data();
80 nx = temperature->nx;
81 ny = temperature->ny;
82
83// adding data mapping here
84 #pragma omp target update from(data[0:(nx+2)*(ny+2)])
85}
1! Copyright (c) 2019 CSC Training
2! Copyright (c) 2021 ENCCS
3! Main solver routines for heat equation solver
4module core
5 use heat
6
7contains
8
9 ! Update the temperature values using five-point stencil
10 ! Arguments:
11 ! curr (type(field)): current temperature values
12 ! prev (type(field)): temperature values from previous time step
13 ! a (real(dp)): diffusivity
14 ! dt (real(dp)): time step
15 subroutine evolve(curr, prev, a, dt)
16
17 implicit none
18
19 type(field),target, intent(inout) :: curr, prev
20 real(dp) :: a, dt
21 integer :: i, j, nx, ny
22 real(dp) :: dx, dy
23 real(dp), pointer, contiguous, dimension(:,:) :: currdata, prevdata
24
25 ! Help the compiler avoid being confused
26 nx = curr%nx
27 ny = curr%ny
28 dx = curr%dx
29 dy = curr%dy
30 currdata => curr%data
31 prevdata => prev%data
32
33 ! Determine the temperature field at next time step As we have
34 ! fixed boundary conditions, the outermost gridpoints are not
35 ! updated.
36 !$omp target teams distribute parallel do
37 do j = 1, ny
38 do i = 1, nx
39 currdata(i, j) = prevdata(i, j) + a * dt * &
40 & ((prevdata(i-1, j) - 2.0 * prevdata(i, j) + &
41 & prevdata(i+1, j)) / dx**2 + &
42 & (prevdata(i, j-1) - 2.0 * prevdata(i, j) + &
43 & prevdata(i, j+1)) / dy**2)
44 end do
45 end do
46 !$omp end target teams distribute parallel do
47 end subroutine evolve
48
49 ! Start a data region and copy temperature fields to the device
50 ! curr (type(field)): current temperature values
51 ! prev (type(field)): values from previous time step
52 subroutine enter_data(curr, prev)
53 implicit none
54 type(field), target, intent(in) :: curr, prev
55 real(kind=dp), pointer, contiguous :: currdata(:,:), prevdata(:,:)
56
57 currdata => curr%data
58 prevdata => prev%data
59
60 ! adding data mapping here
61 !$omp target enter data map(to: currdata, prevdata)
62
63 end subroutine enter_data
64
65 ! End a data region and copy temperature fields back to the host
66 ! curr (type(field)): current temperature values
67 ! prev (type(field)): values from previous time step
68 subroutine exit_data(curr, prev)
69 implicit none
70 type(field), target :: curr, prev
71 real(kind=dp), pointer, contiguous :: currdata(:,:), prevdata(:,:)
72
73 currdata => curr%data
74 prevdata => prev%data
75
76 ! adding data mapping here
77 !$omp target exit data map(from: currdata, prevdata)
78
79 end subroutine exit_data
80
81 ! Copy a temperature field from the device to the host
82 ! temperature (type(field)): temperature field
83 subroutine update_host(temperature)
84 implicit none
85 type(field), target :: temperature
86 real(kind=dp), pointer, contiguous :: tempdata(:,:)
87
88 tempdata => temperature%data
89
90 ! adding data mapping here
91 !$omp target update from(tempdata)
92
93 end subroutine update_host
94
95end module core