GPU Performance Optimization: Perfect Matchings and the Wasserstein distance
The Wasserstein distance is a key metric in machine learning, but it is hard to compute optimally on GPUs. By using approximation algorithms, we can unlock the full power of the GPU.
Computing the Wasserstein distance requires finding a minimum weight perfect matching using methods such as the Hungarian algorithm which is hard to implement efficiently on GPUs. The Heavy Weight Perfect Matching algorithm is much faster and computes solutions close to the optimum, but it has never been implemented on the GPU.
Goal
The goal of this thesis is to implement the Heavy Weight Perfect Matching algorithm on a GPU using CUDA, to optimise its performance for dense instances, and to test it against existing implementation of the Hungarian algorithm.
Learning outcome
- Enhanced CUDA programming skills.
- Enhanced understanding of graph algorithms.
Qualifications
- Good understanding of graph algorithms, CUDA/OpenMP programming.
Supervisors
- Johannes Langguth
References
- A distributed-memory algorithm for computing a heavy-weight perfect matching on bipartite graphs. (https://arxiv.org/pdf/1801.09809)