Convolutional neural networks have achieved impressive success in many tasks in computer vision. However, they come at a high memory and computational cost, thus making it difficult for deep learning to be commercially viable. In addition, selecting the architecture is still an engineering process. We'll introduce DecomposeMe, an efficient architecture based on filter-compositions. This architecture can be trained quickly and is capable of achieving real-time operation in embedded platforms (250+ fps in an NVIDIA Jetson TX1). We'll also introduce our approach to automatically determining the number of neurons of the architecture during the training process. Finally, we'll introduce a novel approach to quantizing the network parameters.