Page 1

Pre-Exascale Machines on the Horizon

You are sitting at your desk. Your are looking at your code: a carefully constructed masterpiece, with solid foundations and a beautiful user interface.

A new fleet of (pre)-exascale machines is being built around Europe. LUMI is expected to deliver 550PFLOPs: the most powerful of its generation.

These machines will open new avenues for high-performance computing. But you wonder, is the code ready to harness these machines? And if not, how can you port it to the next generation of supercomputers?

If your code can already run on GPUs, turn to page 37.
If your code cannot run on GPUs, turn to page 75.
Realizing that your boss may be looking over your shoulder, you quickly hit the BOSS key.

Page 37

A Box Full of GPUs

Your code is already taking advantage of GPU acceleration, likely using the CUDA programming language or OpenACC directives.

The LUMI GPU partition will be powered by AMD Instinct MI250X accelerators: some changes are in order!

There is a dizzying array of options in front of you: you roll up your sleeves and start exploring.

If you write C++ turn to page 38.
If you write C turn to page 39.
If you write Fortran turn to page 42.

Too many decisions to take! You procrastinate for another day and go back to page 1.

Page 75

A Box Full of GPUs

Your code does not use GPU acceleration, yet.

The LUMI GPU partition will be powered by AMD Instinct MI250X accelerators: some changes are in order!

There is a dizzying array of options in front of you: you roll up your sleeves and start exploring.

If you write C++ turn to page 38.
If you write C turn to page 39.
If you write Fortran turn to page 42.

Too many decisions to take! You procrastinate for another day and go back to page 1.

Page 38

Porting C++ Code

C++ is a multi-paradigm programming language. It is flexible and allows you to get performance where you need it.

It enables you to build high-level, zero-cost abstractions.

It is also a complex language: you have many options to choose from in your quest for a path to GPU acceleration.

You are adventurous. You take this opportunity to rewrite your performance-critical kernels using one the new high-level, parallel programming libraries and turn to page 77.
You are intimately familiar with CUDA. You decide to try the automated conversion tools to the HIP language and turn to page 88.
You are more comfortable with a framework for descriptive parallelism. You decide to use pragma-based technology and turn to page 56.

Too many decisions to take! You procrastinate for another day and go back to page 1.

Page 39

Porting C Code

.

C is a low-level, systems-programming language. You can be as close to the metal as you need to squeeze performance from your application.

You have two options to choose from in your quest for a path to GPU acceleration.

You are intimately familiar with CUDA. You decide to try the automated conversion tools to the HIP language and turn to page 88.
You are more comfortable with a framework for descriptive parallelism. You decide to use pragma-based technology and turn to page 56.

Too many decisions to take! You procrastinate for another day and go back to page 1.

Page 42

Porting Fortran Code

Fortran is the veteran of languages in computational sciences and engineering.

You have two options to choose from in your quest for a path to GPU acceleration.

You are intimately familiar with CUDA Fortran and the iso_c_binding Fortran2003 standard module. You decide to try hipfort and turn to page 88.
You are more comfortable with a framework for descriptive parallelism. You decide to use pragma-based technology and turn to page 56.

Too many decisions to take! You procrastinate for another day and go back to page 1.

Page 77

High-level Libraries for Portable High Performace

High-level C++ language extensions, such as SYCL, and libraries, such as Kokkos and Alpaka, aim at providing solutions to enable portable, high-performance code.

These efforts want to ensure programmer productivity at the highest possible level: deep knowledge of hardware details and low-level programming toolchains is not needed to start working with these frameworks.

Your mileage may vary:

The frameworks are based on fairly recent C++ standards.
Furthermore, they are actively developed: compiler support may differ between vendors.

You have reached the end of this adventure: you leaf through to the last page.

Page 88

The Importance of Being HIP

HIP code can be compiled to work on both AMD and Nvidia accelerators, minimizing the risk of code divergence.

Existing CUDA codebases can be automatically converted to HIP using the hipify tool.

Your mileage may vary:

The automatic conversion tools may not be able to convert the entirety of a CUDA codebase to HIP.
HIP code may not run optimally on Nvidia hardware.

You have reached the end of this adventure: you leaf through to the last page.

Page 56

A `pragma`tic Approach

The use of a pragma-based framework allows an incremental port of computational kernels to GPU hardware.

The OpenACC project was the first such effort in this direction. OpenMP has also added GPU offloading capabilities in its latest standards.

Your mileage may vary:

Some computational kernels might have to be rewritten in low-level HIP code to perform optimally.
OpenMP and OpenACC are continuously evolving: compiler support may differ between vendors.

You have reached the end of this adventure: you leaf through to the last page.

Page 98

Epilogue

You feel ready to go on and write some high-performance code.

Want to talk with HPC experts about your software? Send us an email: info@enccs.se. Good luck!

Before moving on to your next adventure:

You subscribe to the ENCCS Newsletter and get the latest news on LUMI and HPC training.
"Maybe I should this," you say to no one in particular.
You decide to start over and turn back to page 1.

Your Code and LUMI

The Many Paths to Exascale

Pre-Exascale Machines on the Horizon

A Box Full of GPUs

A Box Full of GPUs

Porting C++ Code

Porting C Code

Porting Fortran Code

High-level Libraries for Portable High Performace

The Importance of Being HIP

A `pragma`tic Approach

Epilogue

Your Code and LUMI

The Many Paths to Exascale

Pre-Exascale Machines on the Horizon

A Box Full of GPUs

A Box Full of GPUs

Porting C++ Code

Porting C Code

Porting Fortran Code

High-level Libraries for Portable High Performace

The Importance of Being HIP

A pragmatic Approach

Epilogue

A `pragma`tic Approach