Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various NVIDIA?CUDA?features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on the challenges in combining and fully utilizing GPUDirect RDMA (GDR) and hardware InfiniBand multicast technologies in tandem to design support for high-performance heterogeneous broadcast operation for streaming applications. Further, we present associated challenges and designs in supporting reliability for clusters with multi-HCA and multi-GPU configurations. Performance evaluation of the proposed designs on various system configurations will be presented and analyzed.