AuthorsJ. D. Trotter, X. Cai and S. W. Funke
TitleOn memory traffic and optimisations for low-order finite element assembly algorithms on multi-core CPUs
AfilliationScientific Computing
Project(s)Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing
Publication TypeJournal Article
Year of Publication2022
JournalACM Transactions on Mathematical Software
Date Published05/2022
PublisherAssociation for Computing Machinery (ACM)

Motivated by the wish to understand the achievable performance of finite element assembly on unstructured computational meshes, we dissect the standard cellwise assembly algorithm into four kernels, two of which are dominated by irregular memory traffic. Several optimisation schemes are studied together with associated lower and upper bounds on the estimated memory traffic volume. Apart from properly reordering the mesh entities, the two most significant optimisations include adopting a lookup table in adding element matrices or vectors to their global counterparts, and using a row-wise assembly algorithm for multi-threaded parallelisation. Rigorous benchmarking shows that, due to the various optimisations, the actual volumes of memory traffic are in many cases very close to the estimated lower bounds. These results confirm the effectiveness of the optimisations, while also providing a recipe for developing efficient software for finite element assembly.

Citation Key28251

Contact person