You are sitting at your desk. Your are looking at your code: a carefully constructed masterpiece, with solid foundations and a beautiful user interface.
A new fleet of (pre)-exascale machines is being built around Europe. LUMI is expected to deliver 550PFLOPs: the most powerful of its generation.
These machines will open new avenues for high-performance computing. But you wonder, is the code ready to harness these machines? And if not, how can you port it to the next generation of supercomputers?
Your code is already taking advantage of GPU acceleration, likely using the CUDA programming language or OpenACC directives.
The LUMI GPU partition will be powered by AMD Instinct MI250X accelerators: some changes are in order!
There is a dizzying array of options in front of you: you roll up your sleeves and start exploring.
Your code does not use GPU acceleration, yet.
The LUMI GPU partition will be powered by AMD Instinct MI250X accelerators: some changes are in order!
There is a dizzying array of options in front of you: you roll up your sleeves and start exploring.
C++ is a multi-paradigm programming language. It is flexible and allows you to get performance where you need it.
It enables you to build high-level, zero-cost abstractions.
It is also a complex language: you have many options to choose from in your quest for a path to GPU acceleration.
pragma
-based technology and turn to page 56..
C is a low-level, systems-programming language. You can be as close to the metal as you need to squeeze performance from your application.
You have two options to choose from in your quest for a path to GPU acceleration.
pragma
-based technology and turn to page 56.Fortran is the veteran of languages in computational sciences and engineering.
You have two options to choose from in your quest for a path to GPU acceleration.
iso_c_binding
Fortran2003 standard module. You decide to try hipfort
and turn to page 88.pragma
-based technology and turn to page 56.High-level C++ language extensions, such as SYCL, and libraries, such as Kokkos and Alpaka, aim at providing solutions to enable portable, high-performance code.
These efforts want to ensure programmer productivity at the highest possible level: deep knowledge of hardware details and low-level programming toolchains is not needed to start working with these frameworks.
Your mileage may vary:
You have reached the end of this adventure: you leaf through to the last page.
HIP code can be compiled to work on both AMD and Nvidia accelerators, minimizing the risk of code divergence.
Existing CUDA codebases can be automatically converted to HIP using the hipify
tool.
Your mileage may vary:
You have reached the end of this adventure: you leaf through to the last page.
pragma
tic ApproachThe use of a pragma
-based framework allows an incremental port of computational kernels to GPU hardware.
The OpenACC project was the first such effort in this direction. OpenMP has also added GPU offloading capabilities in its latest standards.
Your mileage may vary:
You have reached the end of this adventure: you leaf through to the last page.
You feel ready to go on and write some high-performance code.
Want to talk with HPC experts about your software? Send us an email: info@enccs.se. Good luck!
Before moving on to your next adventure: