We'll introduce neural machine translation (NMT) and some practical methods to make full use of GPUs during training and testing phases of NMT systems. With the rapid development of deep learning, neural network-based machine translation, that is, NMT, has become one of the most popular approaches in the machine translation community and achieved state-of-the-art performance on various language pairs. We'll briefly introduce the typical sequence-to-sequence architecture in NMT, and also the attention mechanism that allows the model to attend specific words during translation. Then we'll compare the two different parallel frameworks of training NMT using GPU, the asynchronous one and the synchronous one, and show that synchronous training is better and gains a good acceleration rate. Finally, we'll describe a number of techniques to substantially speed up the decoding of large-scale NMT systems.