We'll demonstrate recent advances in the field of deep learning and computer vision aimed at scene understanding from images. We'll present two research works on this subject. The first one relates to the use of deep learning for monocular simultaneous localization and mapping (SLAM) and semantic segmentation. The outcome is a technique able to carry out accurate real-time semantic mapping and 3D reconstruction from a single RGB camera. Since in many computer vision problems a single prediction cannot express the uncertainty or ambiguity that is given in a scene, the second research work that we'll present employs deep learning for solving ambiguous prediction problems. Finally, we'll demonstrate how the two approaches can be merged together to enable robust extraction of 3D semantic information such as pixel-wise labeling and object detection in real time by means of a simple webcam.