Authors | M. Sourouri, T. Gillberg, S. Baden and X. Cai |
Title | Effective Multi-GPU Communication Using Multiple CUDA Streams and Threads |
Afilliation | Scientific Computing, Scientific Computing, , |
Project(s) | Center for Biomedical Computing (SFF) |
Status | Published |
Publication Type | Proceedings, refereed |
Year of Publication | 2014 |
Conference Name | 20th International Conference on Parallel and Distributed Systems (ICPADS 2014) |
Pagination | 981-986 |
Publisher | IEEE |
Abstract | In the context of multiple GPUs that share the same PCIe bus, we propose a new communication scheme that leads to a more effective overlap of communication and computation. Multiple CUDA streams and OpenMP threads are adopted so that data can simultaneously be sent and received. A representative 3D stencil example is used to demonstrate the effectiveness of our scheme. We compare the performance of our new scheme with an MPI-based state-of-the-art scheme. Results show that our approach outperforms the state-of-the-art scheme, being up to 1.85× faster. However, our performance results also indicate that the current underlying PCIe bus architecture needs improvements to handle the future scenario of many GPUs per node. |
DOI | 10.1109/PADSW.2014.7097919 |
Citation Key | 19138 |