The first two ones are the ability to define subarrays and subkernels, which distribute kernels on different devices. In this paper we extend an existing framework for the programming of accelerators called Heterogeneous Programming Library (HPL) with three kinds of improvements that facilitate these tasks. In addition, when devices with different characteristics participate in a computation, optimally distributing the work among them is not trivial. Multi-device applications require to distribute kernel executions and, even worse, arrays portions that must be kept coherent among the different device memories and the host memory.
![cudalaunch nvprof cudalaunch nvprof](https://files.speakerdeck.com/presentations/c4210f2fa76e4a59abeb690cf39d3d8c/slide_46.jpg)
Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space.
![cudalaunch nvprof cudalaunch nvprof](https://files.speakerdeck.com/presentations/c4210f2fa76e4a59abeb690cf39d3d8c/slide_45.jpg)
Several existing sections have been updated to reflect recent changes. The following new topics were added: GPU architecture especially highlighting new features of NVIDIA Pascal, OpenCL Programming, OpenMP 4.x Offloading. The GPGPU Best Practice Guide is based on the PRACE-2IP GPGPU Best Practice Mini-Guide. Focus is given to NVIDIA GPUs, which are most widespread today.
CUDALAUNCH NVPROF HOW TO
The guide includes information on how to get started with programming GPUs, which cannot be used in isolation but only as "accelerators" in conjunction with CPUs, and how to get good performance. They offer advantages over traditional CPUs because they have greater computational capability and use high-bandwidth memory systems (with memory bandwidth being the main bottleneck for many scientific applications). GPUs were originally developed for computer gaming and other graphical tasks, but for many years have been exploited for general purpose computing across a number of areas. This Best Practice Guide describes general purpose computation on Graphics Processing Units (GPUs).