Apply the DGX-2 computing powerhouse to unstructured mesh computations
DGX-2 is currently the most powerful single-box commodity computer. It contains 16 latest-generation Nvidia V100 GPUs, as well as two top-notch Intel multicore Xeon processors. The GPUs are also tightly connected by NVlink, offering staggering computing power in total. While computations associated with deep learning and structured meshes are known to run efficiently on the GPUs, the suitability of DGX-2 remains unclear for the much more challenging unstructured mesh computations. This is due to the extra difficulties related to work partitioning and data locality preservation.
This master project will rigorously test unstructured mesh computations on the DGX-2 box inside eX3, which is the national infrastructure for experimental exploration of exascale computing. The research subjects include mesh partitioning, data re-ordering, overlapping communication with computation, all having a common goal of achieving good performance for unstructured mesh computations. Existing CUDA software libraries will be examined while identifying the need for further adaption and extension. Performance modeling and profiling will also be used to ensure a good understanding of the achieved performance. Last but not least, the findings will be applied to a real-world program.
The candidate will become an expert in CUDA programming, as well as the related performance optimisation techniques. The candidate will also gain substantial knowledge about parallel programming in general. Such knowledge and skills are deemed as an important component of the expertise needed by the future workforce of scientific/technical computing.
The candidate must either have experience in parallel programming (not necessarily CUDA programming) or have adequate knowledge about numerical methods. Very important: The candidate must be hard-working and eager to learn and explorer new skills and knowledge.
- Xing Cai
- Johannes Langguth