Recurrent Neural Networks(RNNs) are a powerful tool for solving sequence-based problems, but their execution time are dependent on the size of the network. Baidu introduced persistent RNN to solve this issue by minimizing bandwidth of the network parameters, but the size of on-chip storage imposes a strict upper limit on the network size. Model pruning can lead to signiﬁcant reductions in RNN parameters, which makes network sparse. We design a efﬁcient method for accelerating sparse RNN, which includes several optimizations: Lamport barriers, wide memory loads, and a bank-aware weight layout. With these optimizations, on GP100, we achieve 1) ~4.5 TFlops for a hidden layer of size 1792, batch size of 4, and a density of 10%; and 2）18 TFLOP/s (36X speedup of cudnn RNN) with 45 SMs for a hidden layer of size 5760, batch size of 2, and a density of 1%.