|Authors||J. D. Trotter, X. Cai, S. W. Funke and J. Langguth|
|Title||Compiling finite element variational forms for GPU-based assembly|
|Project(s)||Meeting Exascale Computing with Source-to-Source Compilers, Department of High Performance Computing , Department of Numerical Analysis and Scientific Computing|
|Publication Type||Talks, contributed|
|Year of Publication||2019|
|Location of Talk||FEniCS‘19, Washington DC, USA|
|Keywords||Code translation, GPU, HPC|
We present an experimental form compiler for exploring GPU-based algorithms for assembling vectors, matrices, and higher-order tensors from finite element variational forms.
Previous studies by Cecka et al. (2010), Markall et al. (2013), and Reguly and Giles (2015) have explored different strategies for using GPUs for finite element assembly, demonstrating the potential rewards and highlighting some of the difficulties in offloading assembly to a GPU. Even though these studies are limited to a few specific cases, mostly related to the Poisson problem, they already indicate that to achieve high performance, the appropriate assembly strategy depends on the problem at hand and the chosen discretisation.
By using a form compiler to automatically generate code for GPU-based assembly, we can explore a range of problems based on different variational forms and finite element discretisations. In this way, we aim to get a better picture of the potential benefits and challenges of assembling finite element variational forms on a GPU. Ultimately, the goal is to explore algorithms based on automated code generation that offload entire finite element methods to a GPU, including assembly of vectors and matrices and solution of linear systems.
In this talk, we give an exact characterisation of the class of finite element variational forms supported by our compiler, comprising a small subset of the Unified Form Language that is used by FEniCS and Firedrake. Furthermore, we describe a denotational semantics that explains how expressions in the form language are translated to low-level C or CUDA code for performing assembly over a computational mesh. We also present some initial results and discuss the performance of the generated code.