Building intelligent systems that are capable of extracting meaningful representations from high-dimensional data lies at the core of solving many artificial intelligence tasks, including visual object recognition, information retrieval, speech perception, and language understanding. We'll first introduce a broad class of deep learning models and show that they can learn useful hierarchical representations from large volumes of high-dimensional data with applications in information retrieval, object recognition, and speech perception. We'll next introduce deep models that are capable of extracting a unified representation that fuses together multiple data modalities. In particular, we'll introduce models that can generate natural language descriptions (captions) of images, as well as generate images from captions using attention mechanism. Finally, we'll discuss an approach for unsupervised learning of a generic, distributed sentence encoder, as well as introduce multiplicative and fine-grained gating mechanisms with application to question/answering systems and reading comprehension.