Benchmarking Collective Communication on GPU-based Supercomputers
Communication among GPUs is often the most performance-critical part of scientific computing and AI workloads on today's supercomputers. This thesis will tune, analyze and compare collective communication algorithms from state-of-the-art GPU communication libraries.
Communication among GPUs is often the most performance-critical part of scientific computing and AI workloads on today's supercomputers. This thesis will tune, analyze and compare collective communication algorithms from state-of-the-art GPU communication libraries.
Goal
- Survey the literature on collective communication algorithms for GPUs, and review implementations provided by libraries such as HPC-X (MPI), NCCL and NVSHMEM.
- Measure, analyze and tune the performance of collective operations on GPU systems.
- Use these findings to accelerate real-world applications and workloads in scientific computing and machine learning/AI.
Learning outcome
- Enhanced GPU programming skills with CUDA.
- Understanding of collective communication algorithms and their use in machine learning and scientific computing applications.
- Experience with high-performance GPU computing on supercomputers.
Qualifications
- Experience with C/C++ programming
- Familiarity with GPU programming is very helpful
Supervisors
- Johannes Langguth
- Xing Cai
- James D Trotter