Domain Overlap for Iterative Sparse Triangular Solves on GPUs

TitleDomain Overlap for Iterative Sparse Triangular Solves on GPUs
Publication TypeConference Proceedings
Year of Publication2016
AuthorsAnzt, H., E. Chow, D. Szyld, and J. Dongarra
EditorBungartz, H-J., P. Neumann, and W. E. Nagel
Conference NameSoftware for Exascale Computing - SPPEXA
Series TitleLecture Notes in Computer Science and Engineering
Volume113
Pagination527–545
Date Published2016-09
PublisherSpringer International Publishing
AbstractIterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the block-asynchronous Jacobi method with directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and time-to-solution.
DOI10.1007/978-3-319-40528-5_24