Learn how Microsoft is extending WebRTC to enable real-time, interactive 3D Streaming from the cloud to any remote device. The purpose is to provide an open toolkit to enable industries to leverage remote cloud rendering in their service and product pipelines. This is required for many industries where the scale and complexity of 3D models, scenes, physics and rendering is beyond the capabilities of a mobile device platform. We are extending the industry standard WebRTC framework to 3D scenarios including mixed reality and will walk through the work we are doing to realize the goal of delivering high-quality 3D applications to any client - web, mobile, desktop and embedded. This is only possible using the NVIDIA nvencode pipeline for server-side rendering on the cloud.25-minute Talk Tyler Gibson - Senior Software Engineer, Microsoft
3D DeepObject achieves mapping-level positional accuracy. In the geospatial intelligence space, positional accuracy is as important as precision and recall. Unfortunately, convolutional networks in deep learning are invariant to translation. In other words, the positional accuracy from deep learning object detection is inherently poor. Combining deep learning and 3D model fitting, our 3D DeepObject has the best of both worlds. Deep learning can detect object (a bounding box) with close to human-level accuracy, while 3D model fitting can achieve pixel-level positional accuracy. The output (bounding boxes) from deep learning are the input for 3D model fitting. A bounding box from deep learning can significantly reduce the search space for 3D model fitting. Our latest test indicates that 3D DeepObject can achieve much higher positional accuracy than deep learning or 3D model fitting alone can achieve.25-minute Talk Bingcai Zhang - Tech Fellow, BAE Systems
Improvements in 3D printing allow for unique processes, finer details, better quality control, and a wider range of materials as printing hardware improves. With these improvements comes the need for greater computational power and control over 3D-printed objects. We introduce NVIDIA GVDB Voxels as an open source SDK for voxel-based 3D printing workflows. Traditional workflows are based on processing polygonal models and STL files for 3D printing. However, such models don't allow for continuous interior changes in color or density, for descriptions of heterogeneous materials, or for user-specified support lattices. Using the new NVIDIA GVDB Voxels SDK, we demonstrate practical examples of design workflows for complex 3D printed parts with high-quality ray-traced visualizations, direct data manipulation, and 3D printed output.25-minute Talk Rama Hoetzlein - Graphics Research Engineer, NVIDIA
Learn how to build a platform for processing and streaming 4K video on the NVIDIA Jetson TX1 processor. To achieve real-time video processing, the diverse processing resources of this high-performance embedded architecture need to be employed optimally. The heterogeneous system architecture of the Jetson TX1 allows capturing, processing, and streaming of video with a single chip. The main challenges lie in the optimal utilization of the different hardware resources of the Jetson TX1 (CPU, GPU, dedicated hardware blocks) and in the software frameworks. We'll discuss variants, identify bottlenecks, and show the interaction between hardware and software. Simple capturing and displaying 4K video can be achieved using existing out-of-the-box methods. However, GPU-based enhancements were developed and integrated for real-time video processing tasks (scaling and video mixing).25-minute Talk Tobias Kammacher - Researcher, Zurich University of Applied Sciences
We'll describe a method for converting FP32 models to 8-bit integer (INT8) models for improved efficiency. Traditionally, convolutional neural networks are trained using 32-bit floating-point arithmetic (FP32) and, by default, inference on these models employs FP32 as well. Our conversion method doesn't require re-training or fine-tuning of the original FP32 network. A number of standard networks (AlexNet, VGG, GoogLeNet, ResNet) have been converted from FP32 to INT8 and have achieved comparable Top 1 and Top 5 inference accuracy. The methods are implemented in TensorRT and can be executed on GPUs that support new INT8 inference instructions.25-minute Talk Szymon Migacz - CUDA Library Software Engineer, NVIDIA
In this lab, you will learn how to use a GPU-accelerated graph visualization engine in combination with a GPU-accelerated database. By combining these technologies we can visually explore a large network dataset, identify port scan, distributed denial of service, and data exfiltration events. At the end of this lab, you will learn how to load data for accelerated querying and analysis; build graph visualizations using the GPU-accelerated database as a data source and explore large-scale data visualization. Prerequisites: No prerequisite skills are necessary, but basic knowledge of SQL and Python would be helpful This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.120 Instructor-Led Lab Keith Kraus - Senior Engineer of Applied Solutions Engineering , NVIDIA
Companies of all sizes and in all industries are driven towards digital transformation. Failure to adapt to this movement places businesses at an increased risk in current and future competitive markets. With the slow compute limitation, enterprises struggle to gain valuable insights fast, monetize the data, enhance customer experience, optimize operational efficiency, and prevent fraudulent attacks all at the same time. NVIDIA helps provide deeper insights, enable dynamic correlation, and deliver predictive outcomes at superhuman speed, accuracy, and scale. We'll highlight specific accelerated analytics use cases -- powered by the NVIDIA Tesla platform, DGX-1 AI supercomputer, and NVIDIA GPU-accelerated cloud computing -- in finance, oil and gas, manufacture, retail, and telco industries.25-minute Talk Renee Yao - Product Marketing Manager, Deep Learning and Analytics, NVIDIA
Get an overview of how GPUs are used by computational astrophysicists to perform numerical simulations and process massive survey data. Astrophysics represents one of the most computationally heavy sciences, where supercomputers are used to analyze enormous amounts of data or to simulate physical processes that cannot be reproduced in the lab. Astrophysicists strive to stay on the cutting edge of computational methods to simulate the universe or process data faster and with more fidelity. We'll discuss two important applications of GPU supercomputing in astrophysics. We'll describe the astrophysical fluid dynamics code CHOLLA that runs on the GPU-enabled supercomputer Titan at Oak Ridge National Lab and can perform some of the largest astrophysical simulations ever attempted. Then we'll describe the MORPHEUS deep learning framework that classifies galaxy morphologies using the NVIDIA DGX-1 deep learning system.25-minute Talk Brant Robertson - Associate Professor of Astronomy and Astrophysics, University of California, Santa Cruz
Deep learning practitioners have traditionally been forced to spend protracted cycle time cobbling together platforms using consumer-grade components and unsupported open source software. Learn (1) the benefits of rapid experimentation and deep learning framework optimization as a precursor to scalable production training in the data center, (2) the technical challenges that must be overcome for extending deep learning to more practitioners across the enterprise, and (3) how many organizations can benefit from a powerful enterprise-grade solution that's pre-built, simple to manage, and readily accessible to every practitioner.25-minute Talk Markus Weber - Senior Product Manager, NVIDIA
This session describes the design and implementation of ISAAC, an open-source framework for GEMM and CONV that provides improved performance over cuBLAS and cuDNN. Attendees will learn about input-aware auto-tuning, a technique that relies on machine learning models to automatically derive input- and hardware- portable PTX kernels. Benchmarks will be provided for GEMM and CONV in the context of LINPACK, DeepBench, ICA and SVD, showing up to 3x performance gains over vendor libraries on a GTX980 and a Tesla P100.25-minute Talk Philippe Tillet - Ph.D. Candidate, Harvard University
Analyzing vast amounts of enterprise cyber security data to find threats is hard. Cyber threat detection is also a continuous task, and because of financial pressure, companies have to find optimized solutions for this volume of data. We'll discuss the evolution of big data architectures used for cyber defense and how GPUs are allowing enterprises to do better threat detection more efficiently. We'll discuss (1) briefly the evolution of traditional platforms to lambda architectures with new approaches like Apache Kudu to ultimately GPU-accelerated solutions; (2) current GPU-accelerated database, analysis, and visualization technologies (such as MapD and Graphistry), and discuss the problems they solve; (3) the need to move beyond traditional table-based data-stores to graphs for more advanced data explorations, analytics, and visualization; and (4) the latest advances in GPU-accelerated graph analytics and their importance all for improved cyber threat detection.50-minute Talk Joshua Patterson - Applied Solutions Engineering Director , NVIDIA
We'll explain how GPUs can accelerate the development of HD maps for autonomous vehicles. Traditional mapping techniques take weeks to result in highly detailed maps because massive volumes of data, collected by survey vehicles with numerous sensors, are processed, compiled, and registered offline manually. We'll describe how Japan's leading mapping company uses the concept of a cloud-to-car AI-powered HD mapping system to automate and accelerate the HD mapping process, including actual examples of GPU data processing that use real-world data collected from roads in Japan.
25-minute Talk Shigeyuki Iwata - Manager, Research & Development Office, ZENRIN Corporation
The highly nonlinear, multiscale dynamics of large earthquakes is a difficult physics problem that challenges HPC systems at extreme scale. This presentation will introduce our optimized CUDA implementation of the Drucker-Prager plasticity in AWP-ODC that utilize the GPU's memory bandwidth highly efficiently, which helps to scale to the full size of the Titan system. We demonstrate the dramatic reduction in the level of shaking in the Los Angeles basin by performing a nonlinear M 7.7 earthquake simulation on the southern San Andreas fault for frequencies up to 4 Hz using Blue Waters and Titan. Full realization of the projected gains in using nonlinear ground-motion simulations for controlling sources will improve the hazard estimates, which has a broad impact on risk-reduction and enhanced community resilience, especially for critical facilities such as large dams, nuclear power plants, and energy transportation networks.25-minute Talk Daniel Roten - Computational Scientist, SDSC
We'll cover the optimizing details and the inspiring performance result using NVIDIA Kepler GPUs to accelerate the 10th-order three-dimensional elastic Reverse-Time-Migration (RTM) algorithm. As an essential migration method in seismic application to image the underground geology, RTM algorithm is particularly complex due to its computational workflow and is generally the most time-consuming kernel. Especially, RTM algorithms based on elastic wave equations (elastic RTM) are generally more computationally intense compared to RTM methods for acoustic constant-density media (acoustic RTM). In recent years, the desire for covering larger regions and acquiring better resolution has further increased the algorithmic complexity of RTM. Therefore, computing platforms and optimizing methods that can better meet such challenges in seismic applications become great demands. In this work, we first modify the backward process in the RTM matrix format by adding extra layers, to generate a straightforward stencil that fits well with GPU architecture. A set of optimizing techniques, such as memory tuning and computing occupancy configuration, is then performed to exploit the performance over a set of different GPU cards. By further using the the streaming mechanism, we manage to obtain a communication-computation overlapping among multiple GPUs. The best performance employing four Tesla K40 GPU cards is 28 times better over a fully optimized reference based on a socket with two E5-2697 CPUs. This work proves the great potential to employ NVIDIA GPU accelerators in future geophysics exploration algorithms.25-minute Talk Lin Gan - Dr., Tsinghua University
Across graphics, audio, video, and physics, the NVIDIA VRWorks suite of technologies helps developers maximize performance and immersion for VR applications. We'll explore the latest features of VRWorks, explain the VR-specific challenges they address, and provide application-level tips and tricks to take full advantage of these features. Special focus will be given to the details and inner workings of our latest VRWorks feature, Lens Matched Shading, along with the latest VRWorks integrations into Unreal Engine and Unity.50-minute Talk Edward Liu - Sr. Developer Technology Engineer, NVIDIA
There has been a surge of success in using deep learning in imaging and speech applications for its relatively automatic feature generation and, in particular, for convolutional neural networks, high-accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, multi-node evolutionary neural networks for deep learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms. MENNDL is capable of evolving not only the numeric hyper-parameters (for example, number of hidden nodes or convolutional kernel size), but is also capable of evolving the arrangement of layers within the network.25-minute Talk Steven Young - Research Scientist in Deep Learning, Oak Ridge National Laboratory
We'll address how next-generation informational ADAS experiences are created by combining machine learning, computer vision, and real-time signal processing with GPU computing. Computer vision and augmented reality (CVNAR) is a real-time software solution, which encompasses a set of advanced algorithms that create mixed augmented reality for the driver by utilizing vehicle sensors, map data, telematics, and navigation guidance. The broad range of features includes augmented navigation, visualization, driver infographics, driver health monitoring, lane keeping, advanced parking assistance, adaptive cruise control, and autonomous driving. Our approach augments drivers' visual reality with supplementary objects in real time, and works with various output devices such as head unit displays, digital clusters, and head-up displays.
25-minute Talk Sergii Bykov - Technical Lead, Luxoft
Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance levels that augment Artificial Intelligence. New server architecture innovations will drive higher levels of performance in ML applications. As GPUs become more powerful, GPU networks will need to be more efficient as well. Supermicro has advanced the state-of-the-art in GPU-optimized server architectures, perfect for the emerging deep learning applications.Hear the latest in GPU server architectures and deep learning customer case-studies of how customers achieved incredible deep learning results from Supermicro solutions.50-minute Talk Jason Pai - Director, GPU Servers, Super Micro Computer Inc.
IBM PowerAI provides the easiest on-ramp for enterprise deep learning. PowerAI helped users break deep learning training benchmarks AlexNet and VGGNet thanks to the world's only CPU-to-GPU NVIDIA NVLink interface. See how new feature development and performance optimizations will advance the future of deep learning in the next twelve months, including NVIDIA NVLink 2.0, leaps in distributed training, and tools that make it easier to create the next deep learning breakthrough. Learn how you can harness a faster, better and more performant experience for the future of deep learning.
50-minute Talk Sumit Gupta - VP, HPC, AI, and Analytics
We'll cover state-of-the-art algorithms for image classification, object detection, object instance segmentation, and human pose prediction that we recently developed at Facebook AI Research. Our image classification results are based on the recently developed "ResNeXt" model that supersedes ResNet's accuracy on ImageNet, but much more importantly yields better features with stronger generalization performance on object detection tasks. Using ResNeXt as a backbone, we'll present a unified approach for detailed object instance recognition tasks, such as instance segmentation and human pose estimation. This model builds on our prior work on the Faster R-CNN system with Feature Pyramid Networks, which enables efficient multiscale recognition. We'll describe our platform for object detection research that enables a fast and flexible research cycle. Our platform is implemented on Caffe2 and can train many of these state-of-the-art models on the COCO dataset in 1-2 days using sync SGD over eight GPUs on a single Big Sur server.25-minute Talk Ross Girshick - Research Scientist, Facebook
The need for helping elderly individuals or couples remain in their home is increasing as our global population ages. Cognitive processing offers opportunities to assist the elderly by processing information to identify opportunities for caregivers to offer assistance and support. This project seeks to demonstrate means to improve the elderlys' ability to age at home through understanding of daily activities inferred from passive sensor analysis. This project is an exploration of the IBM Watson Cloud and Edge docker-based Blue Horizon platforms for the use of high-fidelity, low-latency, private sensing and responding at the edge using a RaspberryPi, including deep learning using NVIDIA DIGITS software, K80 GPU servers in the IBM Cloud, and Jetson TX2 edge computing.50-minute Talk David C Martin - Hacker-in-residence, IBM Watson Cloud CTO Office
We'll talk about how artificial intelligence has led to market-leading innovation in trading and the huge opportunity of using deep learning in trading today. There are three dominant trades: fast information extraction ("speed trade"), trade construction ("stat arb"), and prediction ("market timing"). AI has been very successful in all three aspects. We have been key innovators in the speed trade, having started with a $10,000 risk limit and, over the last 10 years, making more than $1.4 billion in profits. The reason is a purist adherence to AI. There is a huge opportunity for using deep learning in the prediction part of the trade, which is not latency sensitive and is mostly about high accuracy. Our mission is to make investing a science, a research-driven utility, and not a competition or a game that it is today. Deep learning has had a lot of success in bringing method to social science settings. We believe over the next five to 10 years that every trading operation will become deep learning based. However, at this time there is a lot of opportunity for innovation using deep learning in trading.25-minute Talk Gaurav Chakravorty - Head of Trading Strategy Development, qplum
The security domain presents a unique landscape for the application of artificial intelligence. Defenders in the security space are often charged with securing ever changing and complex networks, while attacks continue to probe for and exploit any system weakness. We'll dive into the state of cyber security, why it is well suited for artificial intelligence-based approaches, and how AI is actively defending against attacks today.50-minute Talk Matt Wolff - Chief Data Scientist, Cylance
Vahana started in early 2016 as one of the first projects at A? the advanced projects outpost of Airbus Group in Silicon Valley. The aircraft we're building doesn't need a runway, is self-piloted, and can automatically detect and avoid obstacles and other aircraft. Designed to carry a single passenger or cargo, Vahana is meant to be the first certified passenger aircraft without a pilot. We'll discuss the key challenges to develop the autonomous systems of a self-piloted air taxi that can be operated in urban environments.25-minute Talk Arne Stoschek - Head of Autonomous Systems, Airbus A3
Modern computing hardware and NVIDIA Jetson TX1 performance create new possibilities for drones and enable autonomous AI systems, where image processing can be done on-board during flight. We'll present how Magma Solutions developed the AirVision system to cover advanced vision processing tasks for drones, e.g., image stabilization, moving object detection, tracking, and classification using deep neural networks, and visual position estimation using preloaded maps. We'll describe how Magma Solutions used software frameworks Caffe with cuDNN, OpenVX /NVIDIA VisionWorks, and NVIDIA CUDA to achieve real-time vision processing and object recognition. The AirVision system is in part developed with Lithuanian Ministry of Defence funding and is being used as a tactical UAV system prototype.25-minute Talk Mindaugas Eglinskas - CEO, Magma Solutions, UAB
It's simple to take the output of one type of sensor in multiple cars and produce a map based on that data. However, a map created in this way will not have sufficient coverage, attribution, or quality for autonomous driving. Our multi-source, multi-sensor approach leads to HD maps that have greater coverage, are more richly attributed, and have higher quality than single-source, single-sensor maps. In this session, we will discuss how we have created the world's largest HD map, are able to continuously update it, and are making autonomous driving safer and more comfortable.
25-minute Talk Willem Strijbosch - Head of Autonomous Driving, TomTom
We'll examine an innovative approach using an optimized algorithm to create a decision tree for the basis of regime dependent and pattern classification of financial and macroeconomic time-series data. Implemented in a supervised and unsupervised learning framework, the algorithm relies on the GPU for high performance computing and the host processor to further integrate the results in a deep learning framework. Also, we implement random number generation, in part, using a hardware quantum based true random number generator, balanced with the pseudo-random number generator in CUDA, so as to optimize overall speed where an exhaustive search is not feasible.25-minute Talk Yigal Jhirad - Head of Quantitative and Derivatives Strategies , Cohen & Steers
Learn the benefits that virtualization provides for an architecture and engineering design firm, along with the journey through the advancements in virtualization technology it took to finally meet the graphics-intensive needs of our design software. We'll share our experiences in how virtualization allows a large company, with over 15 offices and 1,000 people worldwide, to collaborate and work as a single firm. We'll show some cost comparisons with virtualization, along with their management benefits and requirements. We'll also look at the methods we used to set and test metrics specific to our requirements, and follow the results of those metrics through the changes in graphics virtualization technology.50-minute Talk Andrew Schilling - Chief Infrastructure Officer, CannonDesign
Join us for an informative introductory tutorial intended for those new to CUDA and which serves as the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We'll explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. We'll deliver a programming demonstration of a simple CUDA kernel. We'll also provide printed copies of the material to all attendees for each session - collect all four!80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. Attend session 1, "An Introduction to GPU Programming," to learn the basics of CUDA programming that are required for Session 2. We'll begin with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We'll define shared, constant, and global memory, and discuss the best locations to store your application data for optimized performance. We'll deliver a programming demonstration of shared and constant memory. We'll also provide printed copies of the material to all attendees for each session ? collect all four!80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
We'll describe how deep learning can be applied to detect anomalies, such as network intrusions, in a production environment. In part one of the talk, we'll build an end-to-end data pipeline using Hadoop for storage, Streamsets for data flow, Spark for distributed GPUs, and Deeplearning for anomaly detection. In part two, we'll showcase a demo environment that demonstrates how a deep net uncovers anomalies. This visualization will illustrate how system administrators can view malicious behavior and prioritize efforts to stop attacks. It's assumed that registrants are familiar with popular big data frameworks on the JVM.25-minute Talk David Kale - Deep Learning Engineer, Skymind
Predictive AI is often associated with product recommenders. We present a landscape of multi-domain behavioral models that predict multi-modal user preferences and behavior. This session will take the audience from first principles of the new Correlated Cross-Occurrence (CCO) algorithms showing the important innovations that lead to new ways to predict behavior into a deep dive into as variety different use cases, for instance using dislikes to predict likes, using search terms to predict purchase, and using conversion to augment search indexes with behavioral data to produce behavioral search. Some of these are nearly impossible to address without this new technique. We show the tensor algebra that makes up the landscape. Next, we walk through the computation using real-world data. Finally, we show how Mahout's generalized CPU/GPU integration and recently added CUDA support bring significant reductions in time and cost to calculate the CCO models. We expect the audience to come away with an understanding of the kind of applications to be built CCO and how to do so in performant in cost reducing ways.
50-minute Talk Pat Ferrel - Chief Consultant, PMC member of Apache Mahout, ActionML