Evaluation of Directive-Based Performance Portable Programming Models

M. Graham Lopez; Wayne Joubert; Verónica Larrea; Oscar Hernandez; Azzam Haidar; Stanimire Tomov; Jack Dongarra

Submitted by scrawford on Fri, 12/07/2018 - 11:58

Title	Evaluation of Directive-Based Performance Portable Programming Models
Publication Type	Journal Article
Year of Publication	2019
Authors	Lopez, M. G., W. Joubert, V. Larrea, O. Hernandez, A. Haidar, S. Tomov, and J. Dongarra
Journal	International Journal of High Performance Computing and Networking
Volume	14
Issue	2
Pagination	165-182
Date Published	2019–07
Keywords	OpenACC, OpenMP 4, performance portability, Programming models
Abstract	We present an extended exploration of the performance portability of directives provided by OpenMP 4 and OpenACC to program various types of node architecture with attached accelerators, both self-hosted multicore and offload multicore/GPU. Our goal is to examine how successful OpenACC and the newer offload features of OpenMP 4.5 are for moving codes between architectures, and we document how much tuning might be required and what lessons we can learn from these experiences. To do this, we use examples of algorithms with varying computational intensities for our evaluation, as both compute and data access efficiency are important considerations for overall application performance. To better understand fundamental compute vs. bandwidth bound characteristics, we add the compute-bound Level 3 BLAS GEMM kernel to our linear algebra evaluation. We implement the kernels of interest using various methods provided by newer OpenACC and OpenMP implementations, and we evaluate their performance on various platforms including both x86_64 and Power8 with attached NVIDIA GPUs, x86_64 multicores, self-hosted Intel Xeon Phi KNL, as well as an x86_64 host system with Intel Xeon Phi coprocessors. We update these evaluations with the newest version of the NVIDIA Pascal architecture (P100), Intel KNL 7230, Power8+, and the newest supporting compiler implementations. Furthermore, we present in detail what factors affected the performance portability, including how to pick the right programming model, its programming style, its availability on different platforms, and how well compilers can optimise and target multiple platforms.
DOI	10.1504/IJHPCN.2017.10009064

File:

icl-utk-1339-2019.pdf

External Publication Flag: