We'll describe a method for converting FP32 models to 8-bit integer (INT8) models for improved efficiency. Traditionally, convolutional neural networks are trained using 32-bit floating-point arithmetic (FP32) and, by default, inference on these models employs FP32 as well. Our conversion method doesn't require re-training or fine-tuning of the original FP32 network. A number of standard networks (AlexNet, VGG, GoogLeNet, ResNet) have been converted from FP32 to INT8 and have achieved comparable Top 1 and Top 5 inference accuracy. The methods are implemented in TensorRT and can be executed on GPUs that support new INT8 inference instructions.