Publications

2023

Li, J., G. Bosilca, A. Bouteiller, and B. Nicolae, “Elastic deep learning through resilient collective operations,” SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, Denver, CO, ACM, November 2023.

Aggarwal, I., P. Nayak, A. Kashi, and H. Anzt, “Preconditioners for Batched Iterative Linear Solvers on GPUs,” Smoky Mountains Computational Sciences and Engineering Conference, vol. 169075: Springer Nature Switzerland, pp. 38 - 53, January 2023.

2022

Kashi, A., P. Nayak, D. Kulkarni, A. Scheinberg, P. Lin, and H. Anzt, “Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations,” 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.

(1.26 MB)

Schuchart, J., P. Nookala, M. Mahdi Javanmard, T. Herault, E. F. Valeev, G. Bosilca, and R. J. Harrison, “Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment,” 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, IEEE, July 2022.

Anzt, H., T. Cojean, G. Flegar, F. Göbel, T. Grützmacher, P. Nayak, T. Ribizel, Y. Mike Tsai, and E. S. Quintana-Ortí, “Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing,” ACM Transactions on Mathematical Software, vol. 48, issue 12, pp. 1 - 33, March 2022.

(4.2 MB)

Whitlock, M., N. Morales, G. Bosilca, A. Bouteiller, B. Nicolae, K. Teranishi, E. Giem, and V. Sarkar, “Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach,” 2022 IEEE International Conference on Cluster Computing (CLUSTER 2022), Heidelberg, Germany, September 2022.

Schuchart, J., P. Nookala, T. Herault, E. F. Valeev, and G. Bosilca, “Pushing the Boundaries of Small Tasks: Scalable Low-Overhead Data-Flow Programming in TTG,” 2022 IEEE International Conference on Cluster Computing (CLUSTER), Heidelberg, Germany, IEEE, September 2022.

Nance, D., S. Tomov, and K. Wong, “A Python Library for Matrix Algebra on GPU and Multicore Architectures,” 2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, IEEE, December 2022.

(414.36 KB)

Cao, Q., S. Abdulah, R. Alomairy, Y. Pei, P. Nag, G. Bosilca, J. Dongarra, M. G. Genton, D. Keyes, H. Ltaief, et al., “Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications,” 2022 International Conference for High Performance Computing, Networking, Storage and Analysis (SC22), Dallas, TX, IEEE Press, November 2022.

2021

Schuchart, J., P. Samfass, C. Niethammer, J. Gracia, and G. Bosilca, “Callback-based completion notification using MPI Continuations,” Parallel Computing, vol. 21238566, issue 0225, pp. 102793, May Jan.

Iqbal, Z., S. Nooshabadi, I. Yamazaki, S. Tomov, and J. Dongarra, “Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems,” IEEE Access, 2021.

(1.35 MB)

Anzt, H., N. Beams, T. Cojean, F. Göbel, T. Grützmacher, A. Kashi, P. Nayak, T. Ribizel, and Y. M. Tsai, Gingko: A Sparse Linear Algebrea Library for HPC : 2021 ECP Annual Meeting, April 2021.

(893.04 KB)

Spannaus, A., K. J. H. Law, P. Luszczek, F. Nasrin, C. Putman Micucci, P. K. Liaw, L. J. Santodonato, D. J. Keffer, and V. Maroulas, “Materials fingerprinting classification,” Computer Physics Communications, pp. 108019, May Jan.

(3.8 MB)

Schuchart, J., C. Niethammer, J. Gracia, and G. Bosilca, “Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication,” EuroMPI'21, Garching, Munich Germany, 2021.

(835.27 KB)

2020

Nicolae, B., J. Li, J. M. Wozniak, G. Bosilca, M. Dorier, and F. Cappello, “DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, IEEE, May 2020.

(424.19 KB)

Nayak, P., T. Cojean, and H. Anzt, “Evaluating Asynchronous Schwarz Solvers on GPUs,” International Journal of High Performance Computing Applications, August 2020.

Anzt, H., T. Cojean, Y-C. Chen, F. Goebel, T. Gruetzmacher, P. Nayak, T. Ribizel, and Y-H. Tsai, “Ginkgo: A High Performance Numerical Linear Algebra Library,” Journal of Open Source Software, vol. 5, issue 52, August 2020.

(721.84 KB)

Anzt, H., T. Cojean, Y-C. Chen, F. Goebel, T. Gruetzmacher, P. Nayak, T. Ribizel, Y-H. Tsai, and J. Dongarra, Ginkgo: A Node-Level Sparse Linear Algebra Library for HPC (Poster) , Houston, TX, 2020 Exascale Computing Project Annual Meeting, February 2020.

(699 KB)

Wong, K., S. Tomov, D. Nichols, R. Febbo, F. Lopez, J. Halloy, and X. Ma, How to Build Your Own Deep Neural Network : PEARC20, July 2020.

(18.8 MB)

Tomov, S., K. Wong, J. Dongarra, R. Archibald, E. Chow, E. D'Azevedo, M. Eisenbach, R. Febbo, F. Lopez, D. Nichols, et al., Integrating Deep Learning in Domain Science at Exascale (MagmaDNN) , virtual, DOD HPCMP seminar, December 2020.

(11.12 MB)

Archibald, R., E. Chow, E. D'Azevedo, J. Dongarra, M. Eisenbach, R. Febbo, F. Lopez, D. Nichols, S. Tomov, K. Wong, et al., “Integrating Deep Learning in Domain Sciences at Exascale,” Innovative Computing Laboratory Technical Report, no. ICL-UT-20-10: University of Tennessee, August 2020.

(1.09 MB)

Archibald, R., E. Chow, E. D'Azevedo, J. Dongarra, M. Eisenbach, R. Febbo, F. Lopez, D. Nichols, S. Tomov, K. Wong, et al., “Integrating Deep Learning in Domain Sciences at Exascale,” 2020 Smoky Mountains Computational Sciences and Engineering Conference (SMC 2020), August 2020.

Anzt, H., T. Cojean, C. Yen-Chen, J. Dongarra, G. Flegar, P. Nayak, S. Tomov, Y. M. Tsai, and W. Wang, “Load-Balancing Sparse Matrix Vector Product Kernels on GPUs,” ACM Transactions on Parallel Computing, vol. 7, issue 1, March 2020.

(5.67 MB)

Abdelfattah, A., H. Anzt, E. Boman, E. Carson, T. Cojean, J. Dongarra, M. Gates, T. Gruetzmacher, N. J. Higham, S. Li, et al., “A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic,” SLATE Working Notes, no. 15, ICL-UT-20-08: University of Tennessee, July 2020.

(3.98 MB)

Bosilca, G., R. Harrison, T. Herault, M. Mahdi Javanmard, P. Nookala, and E. Valeev, “The Template Task Graph (TTG) - An Emerging Practical Dataflow Programming Paradigm for Scientific Simulation at Extreme Scale,” 2020 IEEE/ACM 5th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2): IEEE, November 2020.

(139.6 KB)

2019

Badia, R. M., M. Beck, F. Bodin, T. Boku, F. Cappello, A. Choudhary, C. Costa, E. Deelman, N. Ferrier, K. Fujisawa, et al., “A Collection of Presentations from the BDEC2 Workshop in Kobe, Japan,” Innovative Computing Laboratory Technical Report, no. ICL-UT-19-09: University of Tennessee, Knoxville, February 2019.

(58.85 MB)

Ng, L., S. Chen, A. Gessinger, D. Nichols, S. Cheng, A. Meenasorna, K. Wong, S. Tomov, A. Haidar, E. D'Azevedo, et al., MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019.

(7.84 MB)

Ng, L., S. Chen, A. Gessinger, D. Nichols, S. Cheng, A. Meenasorna, K. Wong, S. Tomov, A. Haidar, E. D'Azevedo, et al., MagmaDNN 0.2 High-Performance Data Analytics for Manycore GPUs and CPUs : University of Tennessee, January 2019.

(7.84 MB)

Nichols, D., K. Wong, S. Tomov, L. Ng, S. Chen, and A. Gessinger, “MagmaDNN: Accelerated Deep Learning Using MAGMA,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.

(1.09 MB)

Nichols, D., K. Wong, S. Tomov, L. Ng, S. Chen, and A. Gessinger, “MagmaDNN: Accelerated Deep Learning Using MAGMA,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.

(1.09 MB)

Nichols, D., N-S. Tomov, F. Betancourt, S. Tomov, K. Wong, and J. Dongarra, “MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing,” ISC High Performance, Frankfurt, Germany, Springer International Publishing, June 2019.

(1.37 MB)

(8.72 MB)

Betancourt, F., K. Wong, E. Asemota, Q. Marshall, D. Nichols, and S. Tomov, “OpenDIEL: A Parallel Workflow Engine and DataAnalytics Framework,” Practice and Experience in Advanced Research Computing (PEARC ’19), Chicago, IL, ACM, July 2019.

(1.48 MB)

Anzt, H., Y. Chen Chen, T. Cojean, J. Dongarra, G. Flegar, P. Nayak, E. S. Quintana-Orti, Y. M. Tsai, and W. Wang, “Towards Continuous Benchmarking,” Platform for Advanced Scientific Computing Conference (PASC 2019), Zurich, Switzerland, ACM Press, June 2019.

(1.51 MB)

Tseng, S-M., B. Nicolae, G. Bosilca, E. Jeannot, A. Chandramowlishwaran, and F. Cappello, “Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring,” 2019 European Conference on Parallel Processing (Euro-Par 2019), Göttingen, Germany, Springer, August 2019.

(1.07 MB)

Li, J., B. Nicolae, J. M. Wozniak, and G. Bosilca, “Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training,” 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), Denver, CO, IEEE, November 2019.

(696.89 KB)

2018

Balaprakash, P., J. Dongarra, T. Gamblin, M. Hall, J. Hollingsworth, B. Norris, and R. Vuduc, “Autotuning in High-Performance Computing Applications,” Proceedings of the IEEE, vol. 106, issue 11, pp. 2068–2083, November 2018.

(2.5 MB)

Bernholdt, D. E., S. Boehm, G. Bosilca, M G. Venkata, R. E. Grant, T. Naughton, H. P. Pritchard, M. Schulz, and G. R. Vallee, “A Survey of MPI Usage in the US Exascale Computing Project,” Concurrency Computation: Practice and Experience, September 2018.

(359.54 KB)

2017

Ng, L., K. Wong, A. Haidar, S. Tomov, and J. Dongarra, MagmaDNN – High-Performance Data Analytics for Manycore GPUs and CPUs , Knoxville, TN, 2017 Summer Research Experiences for Undergraduate (REU), Presentation, December 2017.

(5.06 MB)

Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, “Structure-aware Linear Solver for Realtime Convex Optimization for Embedded Systems,” IEEE Embedded Systems Letters, vol. 9, issue 3, pp. 61–64, May 2017.

(339.11 KB)

2016

Anzt, H., E. Chow, D. Szyld, and J. Dongarra, “Domain Overlap for Iterative Sparse Triangular Solves on GPUs,” Software for Exascale Computing - SPPEXA, vol. 113: Springer International Publishing, pp. 527–545, September 2016.

Newburn, C. J., G. Bansal, M. Wood, L. Crivelli, J. Planas, A. Duran, P. Souza, L. Borges, P. Luszczek, S. Tomov, et al., “Heterogeneous Streaming,” The Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), IPDPS 2016, Chicago, IL, IEEE, May 2016.

(2.73 MB)

Yamazaki, I., S. Nooshabadi, S. Tomov, and J. Dongarra, “High Performance Realtime Convex Solver for Embedded Systems,” University of Tennessee Computer Science Technical Report, no. UT-EECS-16-745, October 2016.

(225.43 KB)

2014

Nelson, J., “Analyzing PAPI Performance on Virtual Machines,” VMWare Technical Journal, vol. Winter 2013, January 2014.

2013

Nelson, J., “Analyzing PAPI Performance on Virtual Machines,” ICL Technical Report, no. ICL-UT-13-02, August 2013.

(437.37 KB)

Marin, G., C. McCurdy, and J. Vetter, “Diagnosis and Optimization of Application Prefetching Performance,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.

(827.31 KB)

Weaver, V., D. Terpstra, H. McCraw, M. Johnson, K. Kasichayanula, J. Ralph, J. Nelson, P. Mucci, T. Mohan, and S. Moore, PAPI 5: Measuring Power, Energy, and the Cloud , Austin, TX, 2013 IEEE International Symposium on Performance Analysis of Systems and Software, April 2013.

(78.39 KB)

Haidar, A., M. Gates, S. Tomov, and J. Dongarra, “Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication,” Proceedings of the 27th ACM International Conference on Supercomputing (ICS '13), Eugene, Oregon, USA, ACM Press, June 2013.

(1.27 MB)

2012

Kurzak, J., R. Nath, P. Du, and J. Dongarra, “An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs,” Applied Parallel and Scientific Computing, vol. 7133, pp. 248-257, 00 2012.

(623.5 KB)

Johnson, M., H. McCraw, S. Moore, P. Mucci, J. Nelson, D. Terpstra, V. M. Weaver, and T. Mohan, “PAPI-V: Performance Monitoring for Virtual Machines,” CloudTech-HPC 2012, Pittsburgh, PA, September 2012.

(2.69 MB)

Main menu

Publications

Pages