Learn about techniques and solutions that bring GPU computing to the world of partitioned global address space (PGAS) models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. We'll discuss simple extensions to the OpenSHMEM model to address this issue. We'll also present challenges and solutions in designing NVIDIA CUDA aware runtimes to support these extensions and optimize data movement using CUDA IPC and GPUDirect RDMA features. And we'll demonstrate the impact of these concepts to application performance.