The Kokkos library provides C++ HPC applications with a performance portable programming model for disparate manycore architectures such as NVIDIA?Pascal?, AMD Fusion, and Intel Xeon Phi. Until last year Kokkos supported only composition of data parallel patterns (foreach, reduce, and scan) with range and hierarchical team parallel execution policies. Our latest parallel pattern is a dynamic, directed acyclic graph (DAG) of heterogeneous tasks where each task supports internal data parallelism. At GTC16 we presented preliminary results based upon just-in-time access to an early release of NVIDIA CUDA?8. We've had a year to mature this highly challenging task-DAG capability and present results using the NVIDIA Pascal GPU.