|Authors||C. Jarvis, G. T. Lines, J. Langguth, K. Nakajima and X. Cai|
|Title||Combining algorithmic rethinking and AVX-512 intrinsics for efficient simulation of subcellular calcium signaling|
|Project(s)||Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing|
|Publication Type||Proceedings, refereed|
|Year of Publication||2019|
|Conference Name||International Conference on Computational Science (ICCS 2019)|
Calcium signaling is vital for the contraction of the heart. Physiologically realistic simulation of this subcellular process requires nanometer resolutions and a complicated mathematical model of differential equations. Since the subcellular space is composed of several irregularly-shaped and intricately-connected physiological domains with distinct properties, one particular challenge is to correctly compute the diffusion-induced calcium fluxes between the physiological domains. The common approach is to pre-calculate the effective diffusion coefficients between all pairs of neighboring computational voxels, and store them in large arrays. Such a strategy avoids complicated if-tests when looping through the computational mesh, but suffers from substantial memory overhead. In this paper, we adopt a memory-efficient strategy that uses a small lookup table of diffusion coefficients. The memory footprint and traffic are both drastically reduced, while also avoiding the if-tests. However, the new strategy induces more instructions on the processor level. To offset this potential performance pitfall, we use AVX-512 intrinsics to effectively vectorize the code. Performance measurements on a Knights Landing processor and a quad-socket Skylake server show a clear performance advantage of the manually vectorized implementation that uses lookup tables, over the counterpart using coefficient arrays.