Authors | J. D. Trotter, J. Langguth and X. Cai |
Title | Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication |
Afilliation | Scientific Computing |
Project(s) | Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing |
Status | Published |
Publication Type | Journal Article |
Year of Publication | 2020 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 144 |
Pagination | 189--205 |
Date Published | 06/2020 |
Publisher | Elsevier |
ISSN | 0743-7315 |
Keywords | AMD Epyc, Cache simulation, Intel Xeon, Performance model, Sparse matrix–vector multiplication |
Abstract | Parallel computations with irregular memory access patterns are often limited by the memory subsystems of multi-core CPUs, though it can be difficult to pinpoint and quantify performance bottlenecks precisely. We present a method for estimating volumes of data traffic caused by irregular, parallel computations on multi-core CPUs with memory hierarchies containing both private and shared caches. Further, we describe a performance model based on these estimates that applies to bandwidth-limited computations. As a case study, we consider two standard algorithms for sparse matrix–vector multiplication, a widely used, irregular kernel. Using three different multi-core CPU systems and a set of matrices that induce a range of irregular memory access patterns, we demonstrate that our cache simulation combined with the proposed performance model accurately quantifies performance bottlenecks that would not be detected using standard best- or worst-case estimates of the data traffic volume. |
URL | http://www.sciencedirect.com/science/article/pii/S0743731520302999 |
DOI | 10.1016/j.jpdc.2020.05.020 |
Citation Key | TROTTER2020189 |