Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers

Hartwig Anzt; Stanimire Tomov; Jack Dongarra

Submitted by webmaster on Thu, 03/05/2015 - 20:33

Title	Energy Efficiency and Performance Frontiers for Sparse Computations on GPU Supercomputers
Publication Type	Conference Paper
Year of Publication	2015
Authors	Anzt, H., S. Tomov, and J. Dongarra
Conference Name	Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM '15)
Date Published	2015-02
Publisher	ACM
Conference Location	San Francisco, CA
ISBN Number	978-1-4503-3404-4
Abstract	In this paper we unveil some energy efficiency and performance frontiers for sparse computations on GPU-based supercomputers. To do this, we consider state-of-the-art implementations of the sparse matrix-vector (SpMV) product in libraries like cuSPARSE, MKL, and MAGMA, and their use in the LOBPCG eigen-solver. LOBPCG is chosen as a benchmark for this study as it combines an interesting mix of sparse and dense linear algebra operations with potential for hardware-aware optimizations. Most notably, LOBPCG includes a blocking technique that is a common performance optimization for many applications. In particular, multiple memory-bound SpMV operations are blocked into a SpM-matrix product (SpMM), that achieves significantly higher performance than a sequence of SpMVs. We provide details about the GPU kernels we use for the SpMV, SpMM, and the LOBPCG implementation design, and study performance and energy consumption compared to CPU solutions. While a typical sparse computation like the SpMV reaches only a fraction of the peak of current GPUs, we show that the SpMM achieves up to a 6x performance improvement over the GPU's SpMV, and the GPU-accelerated LOBPCG based on this kernel is 3 to 5x faster than multicore CPUs with the same power draw, e.g., a K40 GPU vs. two Sandy Bridge CPUs (16 cores). In practice though, we show that currently available CPU implementations are much slower due to missed optimization opportunities. These performance results translate to similar improvements in energy consumption, and are indicative of today's frontiers in energy efficiency and performance for sparse computations on supercomputers.
DOI	10.1145/2712386.2712387

Project Tags:

magma

File:

icl-utk-790-2015.pdf