We'll evaluate CUDA 8's new unified memory's impact to applications with benchmarks and share practices on how to tune or build high-performance apps. Since CUDA 6, unified memory has aimed at simplifying the programmability of heterogeneous memory management while maintaining good performance. However, practical limitations prevent applications from fully taking advantage of it. The CUDA 8 release highlights an updated unified memory that both simplifies programmability and improves performance, especially when married with the new Pascal GPU architecture. We'll evaluate the new system, benchmark its performance, and share our best practices in tuning code, which could be good reference for app developers. In addition, we'll explore options and solutions on moving/exchanging data efficiently between heterogeneous devices, such as NVMe/NVRAM in modern data center or cloud environments.