Effective Multi-GPU Communication Using Multiple CUDA Streams and Threads