Publications
Export 10 results:
Filters: First Letter Of Title is I and Author is Jakub Kurzak [Clear All Filters]
Implementation of the C++ API for Batch BLAS,”
SLATE Working Notes, no. 07, ICL-UT-18-04: Innovative Computing Laboratory, University of Tennessee, June 2018.
(1.07 MB)
“Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs,”
IEEE Transactions on Parallel and Distributed Systems, no. 1045-9219, November 2015.
“Implementing a systolic algorithm for QR factorization on multicore clusters with PaRSEC,”
Lawn 277, no. UT-CS-13-709, May 2013.
(298.63 KB)
“An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,”
University of Tennessee Computer Science Technical Report (also LAWN 283), no. ut-eecs-13-720: University of Tennessee, October 2013.
(1.23 MB)
“An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware,”
Supercomputing 2013, Denver, CO, November 2013.
“An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs,”
Applied Parallel and Scientific Computing, vol. 7133, pp. 248-257, 00 2012.
(623.5 KB)
“Implementation of Mixed Precision in Solving Systems of Linear Equations on the Cell Processor,”
Concurrency and Computation: Practice and Experience, vol. 19, no. 10, pp. 1371-1385, July 2007.
(453.78 KB)
“The Impact of Multicore on Math Software,”
PARA 2006, Umea, Sweden, June 2006.
(223.53 KB)
“Implementation of the Mixed-Precision High Performance LINPACK Benchmark on the CELL Processor,”
University of Tennessee Computer Science Tech Report, no. UT-CS-06-580, LAPACK Working Note #177, September 2006.
(506.18 KB)
“Implementing Linear Algebra Routines on Multi-Core Processors with Pipelining and a Look Ahead,”
University of Tennessee Computer Science Tech Report, UT-CS-06-581, LAPACK Working Note #178, January 2006.
(304.4 KB)
“