Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation

TitleBidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation
Publication TypeConference Paper
Year of Publication2017
AuthorsFaverge, M., J. Langou, Y. Robert, and J. Dongarra
Conference NameIEEE International Parallel and Distributed Processing Symposium (IPDPS)
Date Published2017-05
PublisherIEEE
Conference LocationOrlando, FL
KeywordsAlgorithm design and analysis, Approximation algorithms, Kernel, Multicore processing, Shape, Software algorithms, Transforms
Abstract

We study tiled algorithms for going from a “full” matrix to a condensed “band bidiagonal” form using orthog-onal transformations: (i) the tiled bidiagonalization algorithm BIDIAG, which is a tiled version of the standard scalar bidiago-nalization algorithm; and (ii) the R-bidiagonalization algorithm R-BIDIAG, which is a tiled version of the algorithm which consists in first performing the QR factorization of the initial matrix, then performing the band-bidiagonalization of the R- factor. For both BIDIAG and R-BIDIAG, we use four main types of reduction trees, namely FLATTS, FLATTT, GREEDY, and a newly introduced auto-adaptive tree, AUTO. We provide a study of critical path lengths for these tiled algorithms, which shows that (i) R-BIDIAG has a shorter critical path length than BIDIAG for tall and skinny matrices, and (ii) GREEDY based schemes are much better than earlier proposed algorithms with unbounded resources. We provide experiments on a single multicore node, and on a few multicore nodes of a parallel distributed shared- memory system, to show the superiority of the new algorithms on a variety of matrix sizes, matrix shapes and core counts.

DOI10.1109/IPDPS.2017.46