Learn about the latest developments in MVAPICH2-GDR library that helps MPI developers to exploit maximum performance and scalability on HPC clusters with NVIDIA GPUs. Multiple designs focusing on GPUDirect RDMA(GDR)_Async, non-blocking collectives, support for unified memory and datatype processing will be highlighted to boost the performance of HPC applications. Furthermore, targeting emerging deep learning frameworks, we'll present novel designs and enhancements to the MVAPICH2-GDR library to accommodate the large message and dense GPU computing requirements of the DL frameworks. Using a co-designed scheme between MVAPICH2-GDR and the Caffe workflow, we'll present OSU-Caffe, which supports an MPI-based distributed and scalable DL framework. Performance and scalability numbers of OSU-Caffe for various system configurations and datasets will also be presented.