Publications

2023

Barry, D., H. Jagode, A. Danalis, and J. Dongarra, Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements , St. Petersburg, FL, 28th HIPS Workshop, May 2023.

(3.99 MB)

Barry, D., H. Jagode, A. Danalis, and J. Dongarra, “Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements,” 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, Florida, IEEE, August 2023.

(1.81 MB)

Tsai, Y-H. Mike, N. Beams, and H. Anzt, “Mixed Precision Algebraic Multigrid on GPUs,” Parallel Processing and Applied Mathematics (PPAM 2022), vol. 13826, Cham, Springer International Publishing, April 2023.

Schuchart, J., and G. Bosilca, “MPI Continuations And How To Invoke Them,” Sustained Simulation Performance 2021, Cham, Springer International Publishing, pp. 67 - 83, February 2023.

2022

Cayrols, S., J. Li, G. Bosilca, S. Tomov, A. Ayala, and J. Dongarra, “Mixed precision and approximate 3D FFTs: Speed for accuracy trade-off with GPU-aware MPI and run-time data compression,” ICL Technical Report, no. ICL-UT-22-04, May 2022.

(706.14 KB)

2021

Tomov, S., MAGMA: Evolution and Revolution , Knoxville, TN, ICL Lunch Talk Seminar, July 2021.

(8.88 MB)

Spannaus, A., K. J. H. Law, P. Luszczek, F. Nasrin, C. Putman Micucci, P. K. Liaw, L. J. Santodonato, D. J. Keffer, and V. Maroulas, “Materials fingerprinting classification,” Computer Physics Communications, pp. 108019, May Jan.

(3.8 MB)

Benoit, A., R. Elghazi, and Y. Robert, “Max-Stretch Minimization on an Edge-Cloud Platform,” IPDPS'2021, the 34th IEEE International Parallel and Distributed Processing Symposium: IEEE Computer Society Press, 2021.

(4.94 MB)

Tsai, Y. M., P. Luszczek, and J. Dongarra, “Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices,” ICL Technical Report, no. ICL-UT-21-05, August 2021.

(3.93 MB)

Sharp, D., M. Stoyanov, S. Tomov, and J. Dongarra, “A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms,” ICL Technical Report, no. ICL-UT-21-04: University of Tennessee, August 2021.

(493.17 KB)

2020

Farhan, M. Al, A. Abdelfattah, S. Tomov, M. Gates, D. Sukkari, A. Haidar, R. Rosenberg, and J. Dongarra, “MAGMA Templates for Scalable Linear Algebra on Emerging Architectures,” The International Journal of High Performance Computing Applications, vol. 34, issue 6, pp. 645-658, November 2020.

Tomov, S., MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Seattle, WA, 2020 NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Principal Investigator Meeting, February 2020.

(2.28 MB)

Abdelfattah, A., S. Tomov, and J. Dongarra, “Matrix Multiplication on Batches of Small Matrices in Half and Half-Complex Precisions,” Journal of Parallel and Distributed Computing, vol. 145, pp. 188-201, November 2020.

(1.3 MB)

Lopez, F., and T. Mary, “Mixed Precision LU Factorization on GPU Tensor Cores: Reducing Data Movement and Memory Footprint,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-13: University of Tennessee, September 2020.

(409 KB)

Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, “Mixed-Precision Iterative Refinement using Tensor Cores on GPUs to Accelerate Solution of Linear Systems,” Proceedings of the Royal Society A, vol. 476, issue 2243, November 2020.

(2.24 MB)

Haidar, A., H. Bayraktar, S. Tomov, J. Dongarra, and N. J. Higham, “Mixed-Precision Solution of Linear Systems Using Accelerator-Based Computing,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-05: University of Tennessee, May 2020.

(1.03 MB)

Goebel, F., H. Anzt, T. Cojean, G. Flegar, and E. S. Quintana-Orti, “Multiprecision Block-Jacobi for Iterative Triangular Solves,” European Conference on Parallel Processing (Euro-Par 2020): Springer, August 2020.

2019

Ng, L., S. Chen, A. Gessinger, D. Nichols, S. Cheng, A. Meenasorna, K. Wong, S. Tomov, A. Haidar, E. D'Azevedo, et al., MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019.

(7.84 MB)

Nichols, D., K. Wong, S. Tomov, L. Ng, S. Chen, and A. Gessinger, “MagmaDNN: Accelerated Deep Learning Using MAGMA,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.

(1.09 MB)

Nichols, D., N-S. Tomov, F. Betancourt, S. Tomov, K. Wong, and J. Dongarra, “MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.

(1.37 MB)

(8.72 MB)

Kurzak, J., Y. Tsai, M. Gates, A. Abdelfattah, and J. Dongarra, “Massively Parallel Automated Software Tuning,” 48th International Conference on Parallel Processing (ICPP 2019), Kyoto, Japan, ACM Press, August 2019.

(911.88 KB)

Bai, Z., J. Dongarra, D. Lu, and I. Yamazaki, “Matrix Powers Kernels for Thick-Restart Lanczos with Explicit External Deflation,” International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, IEEE, May 2019.

(480.73 KB)

2018

Abdelfattah, A., J. Dongarra, A. Haidar, S. Tomov, and I. Yamazaki, MATEDOR: MAtrix, TEnsor, and Deep-learning Optimized Routines , Dallas, TX, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Research Poster, November 2018.

(2.55 MB)

Haidar, A., S. Tomov, A. Abdelfattah, I. Yamazaki, and J. Dongarra, MAtrix, TEnsor, and Deep-learning Optimized Routines (MATEDOR) , Washington, DC, NSF PI Meeting, Poster, April 2018.

(2.4 MB)

Benoit, A., A. Cavelan, Y. Robert, and H. Sun, “Multi-Level Checkpointing and Silent Error Detection for Linear Workflows,” Journal of Computational Science, vol. 28, pp. 398–415, September 2018.

2017

Tomov, S., and A. Haidar, MAGMA Tensors and Batched Computing for Accelerating Applications on GPUs , San Jose, CA, GPU Technology Conference (GTC17), Presentation in Session S7728, May 2017.

(11.12 MB)

Ng, L., K. Wong, A. Haidar, S. Tomov, and J. Dongarra, MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs , Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.

(5.06 MB)

Anzt, H., E. Boman, J. Dongarra, G. Flegar, M. Gates, M. Heroux, M. Hoemmen, J. Kurzak, P. Luszczek, S. Rajamanickam, et al., “MAGMA-sparse Interface Design Whitepaper,” Innovative Computing Laboratory Technical Report, no. ICL-UT-17-05, September 2017.

(1.28 MB)

2016

Dong, T., A. Haidar, P. Luszczek, S. Tomov, A. Abdelfattah, and J. Dongarra, “MAGMA Batched: A Batched BLAS Approach for Small Matrix Factorizations and Applications on GPUs,” Innovative Computing Laboratory Technical Report, no. ICL-UT-16-02: University of Tennessee, August 2016.

(929.79 KB)

2015

Haidar, A., S. Tomov, P. Luszczek, and J. Dongarra, “MAGMA Embedded: Towards a Dense Linear Algebra Library for Energy Efficient Extreme Computing,” 2015 IEEE High Performance Extreme Computing Conference (HPEC ’15), (Best Paper Award), Waltham, MA, IEEE, September 2015.

(678.86 KB)

Anzt, H., J. Dongarra, M. Gates, A. Haidar, K. Kabir, P. Luszczek, S. Tomov, and I. Yamazaki, MAGMA MIC: Optimizing Linear Algebra for Intel Xeon Phi , Frankfurt, Germany, ISC High Performance (ISC15), Intel Booth Presentation, June 2015.

(2.03 MB)

Yamazaki, I., S. Tomov, J. Kurzak, J. Dongarra, and J. Barlow, “Mixed-precision Block Gram Schmidt Orthogonalization,” 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, Austin, TX, ACM, November 2015.

(235.69 KB)

Yamazaki, I., S. Tomov, and J. Dongarra, “Mixed-Precision Cholesky QR Factorization and its Case Studies on Multicore CPU with Multiple GPUs,” SIAM Journal on Scientific Computing, vol. 37, no. 3, pp. C203-C330, May 2015.

(374.8 KB)

Yamazaki, I., J. Barlow, S. Tomov, J. Kurzak, and J. Dongarra, “Mixed-precision orthogonalization process Performance on multicore CPUs with GPUs,” 2015 SIAM Conference on Applied Linear Algebra, Atlanta, GA, SIAM, October 2015.

(301.01 KB)

Faverge, M., J. Herrmann, J. Langou, B. Lowery, Y. Robert, and J. Dongarra, “Mixing LU-QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers,” Journal of Parallel and Distributed Computing, vol. 85, pp. 32-46, November 2015.

(5.06 MB)

2014

Marin, G., J. Dongarra, and D. Terpstra, “MIAMI: A Framework for Application Performance Diagnosis ,” IPASS-2014, Monterey, CA, IEEE, March 2014.

(1010.75 KB)

Yamazaki, I., S. Tomov, T. Dong, and J. Dongarra, “Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs,” VECPAR 2014 (Best Paper), Eugene, OR, June 2014.

(438.54 KB)

Dongarra, J., A. Haidar, J. Kurzak, P. Luszczek, S. Tomov, and A. YarKhan, “Model-Driven One-Sided Factorizations on Multicore, Accelerated Systems,” Supercomputing Frontiers and Innovations, vol. 1, issue 1, 2014.

(1.86 MB)

Bouteiller, A., T. Herault, and G. Bosilca, “A Multithreaded Communication Substrate for OpenSHMEM,” 8th International Conference on Partitioned Global Address Space Programming Models (PGAS), Eugene, OR, October 2014.

(261.66 KB)

2013

Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, “Multi-criteria checkpointing strategies: optimizing response-time versus resource utilization,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-01, February 2013.

(497.64 KB)

Bouteiller, A., F. Cappello, J. Dongarra, A. Guermouche, T. Herault, and Y. Robert, “Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization,” Euro-Par 2013, Aachen, Germany, Springer, August 2013.

(431.84 KB)

Kurzak, J., P. Luszczek, A. YarKhan, M. Faverge, J. Langou, H. Bouwmeester, and J. Dongarra, “Multithreading in the PLASMA Library,” Multi and Many-Core Processing: Architecture, Programming, Algorithms, & Applications: Taylor & Francis, 00 2013.

(536.28 KB)

2012

Tomov, S., J. Dongarra, A. Haidar, I. Yamazaki, T. Dong, T. Schulthess, and R. Solcà, MAGMA: A Breakthrough in Solvers for Eigenvalue Problems , San Jose, CA, GPU Technology Conference (GTC12), Presentation, May 2012.

(9.23 MB)

Dongarra, J., T. Dong, M. Gates, A. Haidar, S. Tomov, and I. Yamazaki, MAGMA: A New Generation of Linear Algebra Library for GPU and Multicore Architectures , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), Presentation, November 2012.

(4.69 MB)

Dongarra, J., M. Gates, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov, MAGMA MIC: Linear Algebra Library for Intel Xeon Phi Coprocessors , Salt Lake City, UT, The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC12), November 2012.

(6.4 MB)

Gates, M., MAGMA Tutorial , Atlanta, GA, Keeneland Workshop, February 2012.

(2.47 MB)

Agullo, E., G. Bosilca, C. Castagnède, J. Dongarra, H. Ltaeif, and S. Tomov, “Matrices Over Runtime Systems at Exascale,” Supercomputing '12 (poster), Salt Lake City, Utah, November 2012.

Weaver, V. M., M. Johnson, K. Kasichayanula, J. Ralph, P. Luszczek, D. Terpstra, and S. Moore, “Measuring Energy and Power with PAPI,” International Workshop on Power-Aware Systems and Architectures, Pittsburgh, PA, September 2012.

(146.79 KB)

2011

Tomov, S., MAGMA - LAPACK for GPUs , Atlanta, GA, Keeneland GPU Tutorial, April 2011.

(742.14 KB)

Tomov, S., and J. Dongarra, MAGMA - LAPACK for HPC on Heterogeneous Architectures , Oak Ridge, TN, Titan Summit at Oak Ridge National Laboratory, Presentation, August 2011.

(20.43 MB)

Main menu

Publications

Pages