The NVIDIA Tegra K1 and X1 have revolutionized embedded computing. Combining ARM cores and a powerful GPU, these devices have found their way into everything from cars to low-power sensor systems. The high computational efficiency of Tegra SoCs enables potential new markets that have long been held by FPGAs. However, some apps do not map well into the typical CUDA execution model. Persistent threading (PT) is a relatively unexplored model for GPU computing, enabling FPGA-like behavior. Like an FPGA, PT executes until the device is reset or a rare halt condition is met. Memory management and application synchronization are shifted from the NVIDIA API to the developer as the PT kernel runs in parallel with the host application. Leveraging the Tegra unified memory model, PT is able to reduce API overhead to only launch of the kernel and scheduler workload.