How to solve massive graphs BFS on GPU architecture, case of the Graph500 benchmark. Our work present the results from CPU to Tesla Kepler GPUs and then the new P100 GPU provided in the DGX-1. This session will present the algorithms for single and multiGPU BFS on large graphs with results up to 256 GPUs on the french cluster ROMEO in the Reims University. The ways to solve these kind of very irregular problem will be discussed and detail the algorithms. We'll show that even if the algorithms do not fit the GPU architecture, the real limitation stays in the communications between the nodes. But using the Infiniband QdR interconnect with a GPU-aware MPI and GPUDirect implementation allowed us to provide very interesting results. We'll also show the performance we get by using the new NVIDIA DGX-1 applied on these kinds of problems.