To increase performance, high-performance systems are adopting a heterogeneous approach through the use of accelerators (for example, GPUs). These accelerators provide this performance increase with massive parallelization. Unfortunately, these HPC systems, with or without accelerators, are hitting a wall: an increasing divergence between compute and bandwidth. As core counts have increased and bandwidth at all levels of the system have stagnated, data movement has become the bottleneck for performance at multiple places between subsystems: storage, network, accelerator, and memory levels. To address these bandwidth issues in heterogeneous systems, we developed a lossy fixed-rated compression algorithm, cuZFP, for the GPU. The ZFP compressor specifically addresses the needs of lossy compression for high-performance floating point data like those used in scientific codes. By extending lossy compression to the GPU, the compression is up to an order of magnitude faster than the CPU version. Further, bandwidth limitations can be eased directly on the accelerator without copying the data back to the CPU.