Publications

2014

Nelson, J., “Analyzing PAPI Performance on Virtual Machines,” VMWare Technical Journal, vol. Winter 2013, January 2014.

Genet, D., A. Guermouche, and G. Bosilca, “Assembly Operations for Multicore Architectures using Task-Based Runtime Systems,” Euro-Par 2014, Porto, Portugal, Springer International Publishing, August 2014.

(481.52 KB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Assessing the Impact of ABFT and Checkpoint Composite Strategies,” 16th Workshop on Advances in Parallel and Distributed Computational Models, IPDPS 2014, Phoenix, AZ, IEEE, May 2014.

(1.02 MB)

2013

Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, “Accelerating Linear System Solutions Using Randomization Techniques,” ACM Transactions on Mathematical Software (also LAWN 246), vol. 39, issue 2, February 2013.

(358.79 KB)

Nelson, J., “Analyzing PAPI Performance on Virtual Machines,” ICL Technical Report, no. ICL-UT-13-02, August 2013.

(437.37 KB)

Bosilca, G., A. Bouteiller, T. Herault, Y. Robert, and J. Dongarra, “Assessing the impact of ABFT and Checkpoint composite strategies,” University of Tennessee Computer Science Technical Report, no. ICL-UT-13-03, 2013.

(968.47 KB)

2012

Dong, T., T. Kolev, R. Rieben, V. Dobrev, S. Tomov, and J. Dongarra, “Acceleration of the BLAST Hydro Code on GPU,” Supercomputing '12 (poster), Salt Lake City, Utah, SC12, November 2012.

Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, “Algorithm-Based Fault Tolerance for Dense Matrix Factorization,” Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP 2012, New Orleans, LA, USA, ACM, pp. 225-234, February 2012.

(865.79 KB)

Donfack, S., J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, “On Algorithmic Variants of Parallel Gaussian Elimination: Comparison of Implementations in Terms of Performance and Numerical Properties,” University of Tennessee Computer Science Technical Report, no. UT-CS-13-715, July 2013, 2012.

(358.98 KB)

Luszczek, P., and J. Dongarra, “Anatomy of a Globally Recursive Embedded LINPACK Benchmark,” 2012 IEEE High Performance Extreme Computing Conference, Waltham, MA, pp. 1-6, September 2012.

(204.74 KB)

Kurzak, J., S. Tomov, and J. Dongarra, “Autotuning GEMM Kernels for the Fermi GPU,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, no. 11, November 2012.

(742.5 KB)

2011

Baboulin, M., J. Dongarra, J. Herrmann, and S. Tomov, “Accelerating Linear System Solutions Using Randomization Techniques,” INRIA RR-7616 / LAWN #246 (presented at International AMMCS’11), Waterloo, Ontario, Canada, July 2011.

(358.79 KB)

Dongarra, J., M. Faverge, H. Ltaeif, and P. Luszczek, “Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization,” University of Tennessee Computer Science Technical Report (also as a LAWN), no. ICL-UT-11-08, September 2011.

(618.53 KB)

Agullo, E., L. Giraud, A. Guermouche, A. Haidar, S. Lanteri, and J. Roman, “Algebraic Schwarz Preconditioning for the Schur Complement: Application to the Time-Harmonic Maxwell Equations Discretized by a Discontinuous Galerkin Method.,” The Twentieth International Conference on Domain Decomposition Methods, La Jolla, California, February 2011.

Du, P., A. Bouteiller, G. Bosilca, T. Herault, and J. Dongarra, “Algorithm-based Fault Tolerance for Dense Matrix Factorizations,” University of Tennessee Computer Science Technical Report, no. UT-CS-11-676, Knoxville, TN, August 2011.

(865.79 KB)

Haidar, A., H. Ltaeif, A. YarKhan, and J. Dongarra, “Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” University of Tennessee Computer Science Technical Report, UT-CS-11-666, (also Lawn 243), March 2011.

(1.65 MB)

You, H., B. Rekapalli, Q. Liu, and S. Moore, “Autotuned Parallel I/O for Highly Scalable Biosequence Analysis,” TeraGrid'11, Salt Lake City, Utah, July 2011.

(275.34 KB)

Kurzak, J., S. Tomov, and J. Dongarra, “Autotuning GEMMs for Fermi,” University of Tennessee Computer Science Technical Report, UT-CS-11-671, (also Lawn 245), April 2011.

(397.45 KB)

2010

Nath, R., S. Tomov, and J. Dongarra, “Accelerating GPU Kernels for Dense Linear Algebra,” Proc. of VECPAR'10, Berkeley, CA, June 2010.

(615.07 KB)

Tomov, S., G. Bosilca, and C. Augonnet, Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers : 2010 Symposium on Application Accelerators in. High-Performance Computing (SAAHPC'10), Tutorial, July 2010.

(499.51 KB)

Tomov, S., R. Nath, and J. Dongarra, “Accelerating the Reduction to Upper Hessenberg, Tridiagonal, and Bidiagonal Forms through Hybrid GPU-Based Computing,” Parallel Computing, vol. 36, no. 12, pp. 645-654, 00 2010.

(1.39 MB)

Haidar, A., H. Ltaeif, A. YarKhan, and J. Dongarra, “Analysis of Dynamically Scheduled Tile Algorithms for Dense Linear Algebra on Multicore Architectures,” Submitted to Concurrency and Computations: Practice and Experience, November 2010.

(1.65 MB)

Luszczek, P., and J. Dongarra, “Analysis of Various Scalar, Vector, and Parallel Implementations of RandomAccess,” Innovative Computing Laboratory (ICL) Technical Report, no. ICL-UT-10-03, June 2010.

(226.9 KB)

Nath, R., S. Tomov, E. Agullo, and J. Dongarra, Autotuning Dense Linear Algebra Libraries on GPUs , Basel, Switzerland, Sixth International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2010), June 2010.

(579.44 KB)

2009

Baboulin, M., A. Buttari, J. Dongarra, J. Kurzak, J. Langou, J. Langou, P. Luszczek, and S. Tomov, “Accelerating Scientific Computations with Mixed Precision Algorithms,” Computer Physics Communications, vol. 180, issue 12, pp. 2526-2533, December 2009.

(402.69 KB)

Tomov, S., and J. Dongarra, “Accelerating the Reduction to Upper Hessenberg Form through Hybrid GPU-Based Computing,” University of Tennessee Computer Science Technical Report, UT-CS-09-642 (also LAPACK Working Note 219), May 2009.

(2.37 MB)

Demmel, J., J. Dongarra, A. Fox, S. Williams, V. Volkov, and K. Yelick, “Accelerating Time-To-Solution for Computational Science and Engineering,” SciDAC Review, 00 2009.

(739.11 KB)

Dongarra, J., G. Bosilca, R. Delmas, and J. Langou, “Algorithmic Based Fault Tolerance Applied to High Performance Computing,” Journal of Parallel and Distributed Computing, vol. 69, pp. 410-416, 00 2009.

(313.55 KB)

Song, F., S. Moore, and J. Dongarra, “Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems,” IEEE Cluster 2009, New Orleans, August 2009.

(395.53 KB)

2008

Chen, Z., and J. Dongarra, “Algorithm-Based Fault Tolerance for Fail-Stop Failures,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 12, January 2008.

(340.49 KB)

Bosilca, G., R. Delmas, J. Dongarra, and J. Langou, “Algorithmic Based Fault Tolerance Applied to High Performance Computing,” University of Tennessee Computer Science Technical Report, UT-CS-08-620 (also LAPACK Working Note 205), January 2008.

(313.55 KB)

Song, F., S. Moore, and J. Dongarra, “Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms,” University of Tennessee Computer Science Technical Report, UT-CS-08-626, January 2008.

(650.75 KB)

2007

You, H., K. Seymour, J. Dongarra, and S. Moore, “Automated Empirical Tuning of a Multiresolution Analysis Kernel,” ICL Technical Report, no. ICL-UT-07-01, pp. 10, January 2007.

(120.7 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic Analysis of Inefficiency Patterns in Parallel Applications,” Concurrency and Computation: Practice and Experience, vol. 19, no. 11, pp. 1481-1496, August 2007.

(233.31 KB)

2006

Chen, Z., and J. Dongarra, “Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,” IPDPS 2006, 20th IEEE International Parallel and Distributed Processing Symposium, Rhodes Island, Greece, January 2006.

(266.54 KB)

Bhowmick, S., V. Eijkhout, Y. Freund, E. Fuentes, and D. Keyes, “Application of Machine Learning to the Selection of Sparse Linear Solvers,” International Journal of High Performance Computing Applications (submitted), 00 2006.

(392.96 KB)

Dongarra, J., N. Emad, and S. Abolfazl Shahzadeh-Fazeli, “An Asynchronous Algorithm on NetSolve Global Computing System,” Future Generation Computer Systems, vol. 22, issue 3, pp. 279-290, February 2006.

(568.92 KB)

Seymour, K., H. You, and J. Dongarra, “ATLAS on the BlueGene/L – Preliminary Results,” ICL Technical Report, no. ICL-UT-06-10, January 2006.

(46.19 KB)

2005

Chen, Z., and J. Dongarra, “Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources,” University of Tennessee Computer Science Department Technical Report, vol. –05-561, November 2005.

(266.54 KB)

Andersson, U., and P. Mucci, “Analysis and Optimization of Yee_Bench using Hardware Performance Counters,” Proceedings of Parallel Computing 2005 (ParCo), Malaga, Spain, January 2005.

(72.27 KB)

Wolf, F., B. Mohr, J. Dongarra, and S. Moore, “Automatic analysis of inefficiency patterns in parallel applications,” Concurrency and Computation: Practice and Experience, Special issue "Automatic Performance Analysis" (submitted), 00 2005.

(233.31 KB)

Bhatia, N., F. Song, F. Wolf, J. Dongarra, B. Mohr, and S. Moore, “Automatic Experimental Analysis of Communication Patterns in Virtual Topologies,” In Proceedings of the International Conference on Parallel Processing, Oslo, Norway, IEEE Computer Society, June 2005.

(227.13 KB)

2004

Dongarra, J., S. Moore, P. Mucci, K. Seymour, and H. You, “Accurate Cache and TLB Characterization Using Hardware Counters,” International Conference on Computational Science (ICCS 2004), Krakow, Poland, Springer, June 2004.

(167.1 KB)

Beck, M., J. Dongarra, J. Huang, T. Moore, and J. Plank, “Active Logistical State Management in the GridSolve/L,” 4th International Symposium on Cluster Computing and the Grid (CCGrid 2004)(submitted), Chicago, Illinois, January 2004.

(123.69 KB)

Song, F., F. Wolf, N. Bhatia, J. Dongarra, and S. Moore, “An Algebra for Cross-Experiment Performance Analysis,” 2004 International Conference on Parallel Processing (ICCP-04), Montreal, Quebec, Canada, August 2004.

(166.12 KB)

Emad, N., S. A. S. Fazeli, and J. Dongarra, “An Asynchronous Algorithm on NetSolve Global Computing System,” PRiSM - Laboratoire de recherche en informatique, Université de Versailles St-Quentin Technical Report, March 2004.

(377.33 KB)

Yi, Q., K. Kennedy, H. You, K. Seymour, and J. Dongarra, “Automatic Blocking of QR and LU Factorizations for Locality,” 2nd ACM SIGPLAN Workshop on Memory System Performance (MSP 2004), Washington, DC, ACM, June 2004.

(212.77 KB)

Mucci, P., J. Dongarra, R. Kufrin, S. Moore, F. Song, and F. Wolf, “Automating the Large-Scale Collection and Analysis of Performance,” 5th LCI International Conference on Linux Clusters: The HPC Revolution, Austin, Texas, May 2004.

(511.6 KB)

2003

Eidson, T., J. Dongarra, and V. Eijkhout, “Applying Aspect-Oriented Programming Concepts to a Component-based Programming Model,” IPDPS 2003, Workshop on NSF-Next Generation Software, Nice, France, March 2003.

(66.99 KB)

Wolf, F., and B. Mohr, “Automatic performance analysis of hybrid MPI/OpenMP applications,” Journal of Systems Architecture, Special Issue 'Evolutions in parallel distributed and network-based processing', vol. 49(10-11): Elsevier, pp. 421-439, November 2003.

Main menu

Pages