View More
View Less
System Message
An unknown error has occurred and your request could not be completed. Please contact support.
Wait Listed
Personal Calendar
Conference Event
Schedule TBD
Conflict Found
This session is already scheduled at another time. Would you like to...
Please enter a maximum of {0} characters.
Please enter a maximum of {0} words.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Replies ()
New Post
Microblog Thread
Post Reply
Your session timed out.
NVIDIA GTC San Jose 2017
Add to My Interests
Remove from My Interests

Add items to My Interests list by starring at any time.

Login required to reserve seat in Instructor-Led Labs for Conference + Training passholders.

Times and locations subject to change.

S7149 - 3D DeepObject for Precision 3D Mapping 3D DeepObject achieves mapping-level positional accuracy. In the geospatial intelligence space, positional accuracy is as important as precision and recall. Unfortunately, convolutional networks in deep learning are invariant to translation. In other words, the positional accuracy from deep learning object detection is inherently poor. Combining deep learning and 3D model fitting, our 3D DeepObject has the best of both worlds. Deep learning can detect object (a bounding box) with close to human-level accuracy, while 3D model fitting can achieve pixel-level positional accuracy. The output (bounding boxes) from deep learning are the input for 3D model fitting. A bounding box from deep learning can significantly reduce the search space for 3D model fitting. Our latest test indicates that 3D DeepObject can achieve much higher positional accuracy than deep learning or 3D model fitting alone can achieve. 25-minute Talk Bingcai Zhang, Tech Fellow, BAE Systems
S7289 - 3D Human Motion Capture from 2D Video Using Cloud-Based CNNs This talk provides a brief overview of how to apply GPU-based deep learning techniques to extract 3D human motion capture from standard 2D RGB video. We describe in detail the stages of our CUDA-based pipeline from training to cloud-based deployment. Our training system is a novel mix of real world data collected with Kinect cameras and synthetic data based on rendering thousands of virtual humans generated in the Unity game engine. Our execution pipeline is a series of connected models including 2D video to 2D pose estimation and 2D pose to 3D pose estimation. We describe how this system can be integrated into a variety of mobile applications ranging from social media to sports training. A live demo using a mobile phone connected into an AWS GPU cluster will be presented. 25-minute Talk Paul Kruszewski, Founder, wrnch
S7425 - 3D Printing with NVIDIA GVDB Voxels Improvements in 3D printing allow for unique processes, finer details, better quality control, and a wider range of materials as printing hardware improves. With these improvements comes the need for greater computational power and control over 3D-printed objects. We introduce NVIDIA GVDB Voxels as an open source SDK for voxel-based 3D printing workflows. Traditional workflows are based on processing polygonal models and STL files for 3D printing. However, such models don't allow for continuous interior changes in color or density, for descriptions of heterogeneous materials, or for user-specified support lattices. Using the new NVIDIA GVDB Voxels SDK, we demonstrate practical examples of design workflows for complex 3D printed parts with high-quality ray-traced visualizations, direct data manipulation, and 3D printed output. 25-minute Talk Rama Hoetzlein, Graphics Research Engineer, NVIDIA
Jun Zeng, Principal Scientist, HP Labs
S7197 - 4K Video Processing and Streaming Platform on TX1 Learn how to build a platform for processing and streaming 4K video on the NVIDIA Jetson TX1 processor. To achieve real-time video processing, the diverse processing resources of this high-performance embedded architecture need to be employed optimally. The heterogeneous system architecture of the Jetson TX1 allows capturing, processing, and streaming of video with a single chip. The main challenges lie in the optimal utilization of the different hardware resources of the Jetson TX1 (CPU, GPU, dedicated hardware blocks) and in the software frameworks. We'll discuss variants, identify bottlenecks, and show the interaction between hardware and software. Simple capturing and displaying 4K video can be achieved using existing out-of-the-box methods. However, GPU-based enhancements were developed and integrated for real-time video processing tasks (scaling and video mixing). 25-minute Talk Tobias Kammacher, Researcher, Zurich University of Applied Sciences
S7310 - 8-Bit Inference with TensorRT We'll describe a method for converting FP32 models to 8-bit integer (INT8) models for improved efficiency. Traditionally, convolutional neural networks are trained using 32-bit floating-point arithmetic (FP32) and, by default, inference on these models employs FP32 as well. Our conversion method doesn't require re-training or fine-tuning of the original FP32 network. A number of standard networks (AlexNet, VGG, GoogLeNet, ResNet) have been converted from FP32 to INT8 and have achieved comparable Top 1 and Top 5 inference accuracy. The methods are implemented in TensorRT and can be executed on GPUs that support new INT8 inference instructions. 25-minute Talk Szymon Migacz, CUDA Library Software Engineer, NVIDIA
L7132 - Accelerated Analytics and Graph Visualization In this lab, you will learn how to use a GPU-accelerated graph visualization engine in combination with a GPU-accelerated database. By combining these technologies we can visually explore a large network dataset, identify port scan, distributed denial of service, and data exfiltration events. At the end of this lab, you will learn how to load data for accelerated querying and analysis; build graph visualizations using the GPU-accelerated database as a data source and explore large-scale data visualization. Prerequisites: No prerequisite skills are necessary, but basic knowledge of SQL and Python would be helpful This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Keith Kraus, Senior Applied Solutions Engineer, NVIDIA
Michael Balint, Senior Manager Applied Solutions Engineering, NVIDIA
Deepti Jain, Senior Applied Solutions Engineer, NVIDIA
S7774 - Accelerated Analytics Industry Use Cases Companies of all sizes and in all industries are driven towards digital transformation. Failure to adapt to this movement places businesses at an increased risk in current and future competitive markets. With the slow compute limitation, enterprises struggle to gain valuable insights fast, monetize the data, enhance customer experience, optimize operational efficiency, and prevent fraudulent attacks all at the same time. NVIDIA helps provide deeper insights, enable dynamic correlation, and deliver predictive outcomes at superhuman speed, accuracy, and scale. We'll highlight specific accelerated analytics use cases -- powered by the NVIDIA Tesla platform, DGX-1 AI supercomputer, and NVIDIA GPU-accelerated cloud computing -- in finance, oil and gas, manufacture, retail, and telco industries. 25-minute Talk Renee Yao, Product Marketing Manager, Deep Learning and Analytics, NVIDIA
S7332 - Accelerated Astrophysics: Using NVIDIA?DGX-1? to Simulate and Understand the Universe Get an overview of how GPUs are used by computational astrophysicists to perform numerical simulations and process massive survey data. Astrophysics represents one of the most computationally heavy sciences, where supercomputers are used to analyze enormous amounts of data or to simulate physical processes that cannot be reproduced in the lab. Astrophysicists strive to stay on the cutting edge of computational methods to simulate the universe or process data faster and with more fidelity. We'll discuss two important applications of GPU supercomputing in astrophysics. We'll describe the astrophysical fluid dynamics code CHOLLA that runs on the GPU-enabled supercomputer Titan at Oak Ridge National Lab and can perform some of the largest astrophysical simulations ever attempted. Then we'll describe the MORPHEUS deep learning framework that classifies galaxy morphologies using the NVIDIA DGX-1 deep learning system. 25-minute Talk Brant Robertson, Associate Professor of Astronomy and Astrophysics, University of California, Santa Cruz
S7204 - Accelerated Compute Workloads on Azure Learn how you can scale your traditional HPC-based applications or workloads in Azure using powerful NVIDIA Tesla-based GPUs and Azure's low-latency networking backed by InfiniBand infrastructure. This is a great session to learn about Azure's accelerated offerings and roadmap in the future. This session will dig into specific workloads such as Deep Learning and Ray-Traced Rendering along with exciting customer case studies and partner solutions. Learn how everyone can now have a supercomputer at their fingertips! 25-minute Talk Karan Batta, Seniro Program Manager, Azure HPC Team, Microsoft
S7753 - Accelerated Deep Learning Within Reach - Supercomputing Comes to Your Cube Deep learning practitioners have traditionally been forced to spend protracted cycle time cobbling together platforms using consumer-grade components and unsupported open source software. Learn (1) the benefits of rapid experimentation and deep learning framework optimization as a precursor to scalable production training in the data center, (2) the technical challenges that must be overcome for extending deep learning to more practitioners across the enterprise, and (3) how many organizations can benefit from a powerful enterprise-grade solution that's pre-built, simple to manage, and readily accessible to every practitioner. 25-minute Talk Markus Weber, Senior Product Manager, NVIDIA
S7117 - Accelerating Cross-Validation in Spark Using GPU Learn how to utilize GPUs better to accelerate cross-validation in Spark, which is widely used in many bigdata analytics/machine learning applications. 25-minute Talk Minsik Cho, Research Staff Member, IBM Research
S7150 - ACCELERATING CUBLAS/CUDNN USING INPUT-AWARE AUTO-TUNING: THE ISAAC LIBRARY This session describes the design and implementation of ISAAC, an open-source framework for GEMM and CONV that provides improved performance over cuBLAS and cuDNN. Attendees will learn about input-aware auto-tuning, a technique that relies on machine learning models to automatically derive input- and hardware- portable PTX kernels. Benchmarks will be provided for GEMM and CONV in the context of LINPACK, DeepBench, ICA and SVD, showing up to 3x performance gains over vendor libraries on a GTX980 and a Tesla P100. 25-minute Talk Philippe Tillet, Ph.D. Candidate, Harvard University
S7383 - Accelerating Cyber Threat Detection with GPU Analyzing vast amounts of enterprise cyber security data to find threats is hard. Cyber threat detection is also a continuous task, and because of financial pressure, companies have to find optimized solutions for this volume of data. We'll discuss the evolution of big data architectures used for cyber defense and how GPUs are allowing enterprises to do better threat detection more efficiently. We'll discuss (1) briefly the evolution of traditional platforms to lambda architectures with new approaches like Apache Kudu to ultimately GPU-accelerated solutions; (2) current GPU-accelerated database, analysis, and visualization technologies (such as Kinetica and Graphistry), and discuss the problems they solve; (3) the need to move beyond traditional table-based data-stores to graphs for more advanced data explorations, analytics, and visualization; and (4) the latest advances in GPU-accelerated graph analytics and their importance all for improved cyber threat detection. 50-minute Talk Joshua Patterson, Applied Solutions Engineering Director , NVIDIA
Michael Wendt, Senior Applied Solutions Engineer, NVIDIA
S7321 - Accelerating Document Retrieval and Ranking for Cognitive Applications Based on a comprehensive performance study of Watson workloads, we'll deep dive into optimizing critical retrieve and rank functions using GPU acceleration. The performance of cognitive applications like answering natural language questions heavily depends on quickly selecting the relevant documents needed to generate a correct answer. While analyzing the question to determine appropriate search terms, weights, and relationships is relatively quick, retrieving and ranking a relevant subset from millions of documents is a time-consuming task. Only after completing it can any advanced natural language processing algorithms be effective. 25-minute Talk David Wendt, Programmer, IBM
Tim Kaldewey, Performance Architect, IBM Watson
S7656 - Accelerating HD Map Creations with GPUs We'll explain how GPUs can accelerate the development of HD maps for autonomous vehicles. Traditional mapping techniques take weeks to result in highly detailed maps because massive volumes of data, collected by survey vehicles with numerous sensors, are processed, compiled, and registered offline manually. We'll describe how Japan's leading mapping company uses the concept of a cloud-to-car AI-powered HD mapping system to automate and accelerate the HD mapping process, including actual examples of GPU data processing that use real-world data collected from roads in Japan.   25-minute Talk Shigeyuki Iwata, Manager, Research & Development Office, ZENRIN Corporation
S7831 - Accelerating High-Frequency Nonlinear Earthquake Simulations on OLCF Titan and NCSA Blue Waters The highly nonlinear, multiscale dynamics of large earthquakes is a difficult physics problem that challenges HPC systems at extreme scale. This presentation will introduce our optimized CUDA implementation of the Drucker-Prager plasticity in AWP-ODC that utilize the GPU's memory bandwidth highly efficiently, which helps to scale to the full size of the Titan system. We demonstrate the dramatic reduction in the level of shaking in the Los Angeles basin by performing a nonlinear M 7.7 earthquake simulation on the southern San Andreas fault for frequencies up to 4 Hz using Blue Waters and Titan. Full realization of the projected gains in using nonlinear ground-motion simulations for controlling sources will improve the hazard estimates, which has a broad impact on risk-reduction and enhanced community resilience, especially for critical facilities such as large dams, nuclear power plants, and energy transportation networks. 25-minute Talk Daniel Roten, Computational Scientist, SDSC
Yifeng Cui, Lab Director, San Diego Supercomputing Center
S7291 - Accelerating Semi-Global Block Matching for Stereo Image Processing Using CUDA Real-time stereo matching is the need of many practical applications. Matching algorithms are required to perform at high speeds. We'll present a semi-global matching (SGBM) algorithm, which has several advantages. We'll present our hybrid implementation, which achieves around 23x performance over well known OpenCV implementations. We'll present a simplified approach to break problems into multiple modules and port suitable sections to CUDA and optimize sequential sections to the CPU itself. Our CUDA implementation is accelerated on a Tesla K20 card with Kepler architecture. We focused on basic CUDA performance optimizations like coalesced access pattern, collapsing of nested loops, reduction of iterative data transfers between CPU and GPU, etc. We'll present how with a simplified CPU/GPU hybrid programming approach we achieved 23 times faster performance. 25-minute Talk Amit Kalele, Consultant, Tata Consultancy Services Limited
Anubhav Jain, IT Analyst, Tata Consultancy Services Limited
S7593 - Accelerating the 3D Elastic Reverse-Time-Migration Algorithms Through NVIDIA GPUs We'll cover the optimizing details and the inspiring performance result using NVIDIA Kepler GPUs to accelerate the 10th-order three-dimensional elastic Reverse-Time-Migration (RTM) algorithm. As an essential migration method in seismic application to image the underground geology, RTM algorithm is particularly complex due to its computational workflow and is generally the most time-consuming kernel. Especially, RTM algorithms based on elastic wave equations (elastic RTM) are generally more computationally intense compared to RTM methods for acoustic constant-density media (acoustic RTM). In recent years, the desire for covering larger regions and acquiring better resolution has further increased the algorithmic complexity of RTM. Therefore, computing platforms and optimizing methods that can better meet such challenges in seismic applications become great demands. In this work, we first modify the backward process in the RTM matrix format by adding extra layers, to generate a straightforward stencil that fits well with GPU architecture. A set of optimizing techniques, such as memory tuning and computing occupancy configuration, is then performed to exploit the performance over a set of different GPU cards. By further using the the streaming mechanism, we manage to obtain a communication-computation overlapping among multiple GPUs. The best performance employing four Tesla K40 GPU cards is 28 times better over a fully optimized reference based on a socket with two E5-2697 CPUs. This work proves the great potential to employ NVIDIA GPU accelerators in future geophysics exploration algorithms. 25-minute Talk Lin Gan, Dr., Tsinghua University
S7578 - Accelerating your VR Applications with VRWorks Across graphics, audio, video, and physics, the NVIDIA VRWorks suite of technologies helps developers maximize performance and immersion for VR applications. We'll explore the latest features of VRWorks, explain the VR-specific challenges they address, and provide application-level tips and tricks to take full advantage of these features. Special focus will be given to the details and inner workings of our latest VRWorks feature, Lens Matched Shading, along with the latest VRWorks integrations into Unreal Engine and Unity. 50-minute Talk Edward Liu, Senior Developer Technology Engineer, NVIDIA
Cem Cebenoyan, Director of Engineering, NVIDIA
S7810 - Acceleration of Multi-Object Detection and Classification Training Process with NVIDIA Iray SDK (Presented by SAP) Many works using deep CNN for multi-object detection and classification observe that a high-quality dataset for the training is even more important than the choice of a network type for the best results. We employ the NVIDIA Iray rendering engine and SDK for the automatic generation of the synthetic images and their annotation that can be either combined with real manually annotated images and used as the input for the training process or used on their own. In most cases, adding a new entity to the classification/detection list requires reviewing the existing dataset and relabeling it. Our contribution allows the acceleration of the process dramatically and allows for the specialization of the training set. 50-minute Talk Tatiana Surazhsky, 3D Graphics Research Expert, SAP Labs Israel LTD
S7564 - Accelerator Programming Ecosystems Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA and Kokkos for GPU programming. This panel will throw light on what would be the primary objective(s) for a choice of model, whether its availability across multiple platforms, its rich feature set or its applicability for a certain type of scientific code or compilers' stability or other factors. This will be an interactive Q/A session where participants can discuss their experiences with programming model experts and developers. 50 minutes Panel Michael Wolfe, Senior Compiler Engineer, NVIDIA
Christian Trott, Senior Member Technical Staff, Sandia National Laboratories
Fernanda Foertter, ?HPC User Support Specialist/Programmer, Oak Ridge National Laboratory
Stephen Olivier, Principal Member of Technical Staff, Sandia National Laboratories
Mark Harris, Chief Technologist, GPU Computing Software, NVIDIA
Randy Allen, Director of Advanced Research, Mentor Graphics
Fernanda Foertter, ?HPC User Support Specialist/Programmer, Oak Ridge National Laboratory
S7193 - Achieving Portable Performance for GTC-P with OpenACC on GPU, Multi-Core CPU, and Sunway Many-Core Processor Gyrokinetic Toroidal Code developed in Princeton (GTC-P) delivers highly-scalable plasma turbulence simulations at extreme scales on world-leading supercomputers such as Tianhe-2 and Titan. The aim of this work to achieve portable performance in a single source code for GTC-P. We developed the first OpenACC implementation for GPU, CPU, and Sunway processor. The results showed the OpenACC version achieved nearly 90% performance of NVIDIA?CUDA?version on GPU and OpenMP version on CPU; the Sunway OpenACC version achieved 2.5X speedup in the entire code. Our work demonstrates OpenACC can deliver portable performance to complex real-science codes like GTC-P. In additional, we request adding thread-id support in OpenACC standard to avoid expensive atomic operations for reductions. 25-minute Talk Stephen Wang, GPU Specalist, Shanghai Jiao Tong University
S7435 - Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks There has been a surge of success in using deep learning in imaging and speech applications for its relatively automatic feature generation and, in particular, for convolutional neural networks, high-accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, multi-node evolutionary neural networks for deep learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms. MENNDL is capable of evolving not only the numeric hyper-parameters (for example, number of hidden nodes or convolutional kernel size), but is also capable of evolving the arrangement of layers within the network. 25-minute Talk Steven Young, Research Scientist in Deep Learning, Oak Ridge National Laboratory
S7312 - ADAS Computer Vision and Augmented Reality Solution We'll address how next-generation informational ADAS experiences are created by combining machine learning, computer vision, and real-time signal processing with GPU computing. Computer vision and augmented reality (CVNAR) is a real-time software solution, which encompasses a set of advanced algorithms that create mixed augmented reality for the driver by utilizing vehicle sensors, map data, telematics, and navigation guidance. The broad range of features includes augmented navigation, visualization, driver infographics, driver health monitoring, lane keeping, advanced parking assistance, adaptive cruise control, and autonomous driving. Our approach augments drivers' visual reality with supplementary objects in real time, and works with various output devices such as head unit displays, digital clusters, and head-up displays.   25-minute Talk Sergii Bykov, Technical Lead, Luxoft
S7641 - Additive Manufacturing Simulation on the GPU Learn how GPUs can accelerate large-scale finite element-based additive manufacturing (AM) simulation. We'll discuss the computational challenges underlying AM simulation, followed by their solution through fast GPU solvers. We'll also present case studies of metal AM and fused-deposition-modeling simulation, with experimental results. 25-minute Talk Krishnan Suresh, Professor, University of Wisconsin, Madison
S7347 - A Deep Hierarchical Model for Joint Object Detection and Semantic Segmentation How do we tackle multiple vision tasks from within the same deep neural network? We'll address this problem by proposing a neural network architecture that can simultaneously segment and detect objects within an image. We'll begin with a brief overview of deep learning as applied to computer vision, and various popular methods for object detection and semantic segmentation. We'll then propose our model: a hierarchical architecture that explicitly allows fine-grain information from one task to aid in the performance of coarser tasks. We'll show that our multi-task network outperforms and is faster than networks trained to tackle each task independently. We'll then visualize our network results on the Cityscapes data set and discuss potential applications of our ideas, especially in the context of autonomous driving. 25-minute Talk Zhao Chen, Machine Learning Software Intern, NVIDIA
S7834 - Advanced GPU Server Architectures and Deep Learning Training for HPC Customers (Presented by Super Micro Computer Inc.) Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance levels that augment Artificial Intelligence. New server architecture innovations will drive higher levels of performance in ML applications. As GPUs become more powerful, GPU networks will need to be more efficient as well. Supermicro has advanced the state-of-the-art in GPU-optimized server architectures, perfect for the emerging deep learning applications.Hear the latest in GPU server architectures and deep learning customer case-studies of how customers achieved incredible deep learning results from Supermicro solutions. 50-minute Talk Jason Pai, Senior Product Manager, Super Micro Computer Inc.
Don Clegg, VP Marketing & WW Business Development, Super Micro Computer, Inc.
S7482 - Advances in Real-Time Graphics at Pixar Explore how real-time graphics are used at Pixar Animation Studios. We'll describe the unique needs for film production and our custom solutions, including Presto and our open-source projects Universal Scene Description (USD), OpenSubdiv, and Hydra. Don't miss this great opportunity to learn about graphics, algorithms, and movies! 50-minute Talk Pol Jeremias-Vila, Graphics Engineer, Pixar
David Yu, Senior Graphics Software Engineer, Pixar Animation Studios
Dirk Van Gelder, Software Engineer, Pixar Animation Studios
S7647 - Advancing Our Understanding of Evolutionary Histories Using GPUs: The BEAGLE Library Estimating the evolutionary history of organisms, phylogenetic inference, is a critical step in many analyses involving biological sequence data such as DNA. These phylogenetic relationships are essential in understanding the evolutionary dynamics of organisms. The likelihood calculations at the heart of the most effective methods for phylogenetic analyses are extremely computationally intensive, and hence these analyses become a bottleneck in many studies. In collaboration with some of the foremost researchers in our area, we have developed an open source library, BEAGLE, which uses GPUs to greatly accelerate phylogenetic analyses. BEAGLE is used by some of the leading programs in the field. We'll describe the phylogenetic inference problem and its importance, and go into details on how we used GPU computing to achieve broad impact in the field. 25-minute Talk Daniel L. Ayres, Graduate Student, University of Maryland
Michael P Cummings, Professor, University of Maryland
S7783 - A Fast, Unified Method for Object Detection, Instance Segmentation, and Human Pose Estimation We'll cover state-of-the-art algorithms for image classification, object detection, object instance segmentation, and human pose prediction that we recently developed at Facebook AI Research. Our image classification results are based on the recently developed "ResNeXt" model that supersedes ResNet's accuracy on ImageNet, but much more importantly yields better features with stronger generalization performance on object detection tasks. Using ResNeXt as a backbone, we'll present a unified approach for detailed object instance recognition tasks, such as instance segmentation and human pose estimation. This model builds on our prior work on the Faster R-CNN system with Feature Pyramid Networks, which enables efficient multiscale recognition. We'll describe our platform for object detection research that enables a fast and flexible research cycle. Our platform is implemented on Caffe2 and can train many of these state-of-the-art models on the COCO dataset in 1-2 days using sync SGD over eight GPUs on a single Big Sur server. 25-minute Talk Ross Girshick, Research Scientist, Facebook
S7262 - A General Framework for Hybrid Stochastic Model Calibration on the GPU We'll present an overview of a GPU-based approach to calibrating hybrid models in finance, that is, multi-factor correlated stochastic processes to market data (term structure and volatility surfaces). Examples of such models range from the relatively benign 3-factor JY inflation model, to single currency and forex equity baskets, up to a completely general basket of rate/inflation/equity/forex/credit processes described by a global correlation matrix. Due to the inherently multi-threaded nature of Monte Carlo path generation, and the availability of cuRAND, a GPU implementation vastly outperforms CPU or PDE solvers, which are plagued by high dimensionality. Details of the algorithm, as well as a demonstration and analysis of timings and memory limitations will be covered. 25-minute Talk Mark York, Senior Quantitative Analyst, Renaissance Risk Management Labs
S7286 - A High-Quality and Fast Maximal Independent Set Algorithm for GPUs Learn how to efficiently parallelize Maximal Independent Set computations for GPUs. Our CUDA implementation is at least three times faster than the leading GPU codes on every one of the 16 real-world and synthetic graphs we tested. Moreover, it produces a larger maximal independent set in all but one case. It is asynchronous, atomic free, and requires fewer than 30 kernel statements. We'll present the included code optimizations to achieve heretofore unreached performance and describe how to exploit monotonicity to minimize the memory footprint of this important irregular graph algorithm. 25-minute Talk Martin Burtscher, Professor, Texas State University
S7592 - AI and Deep Learning in Trading We'll talk about how artificial intelligence has led to market-leading innovation in trading and the huge opportunity of using deep learning in trading today. There are three dominant trades: fast information extraction ("speed trade"), trade construction ("stat arb"), and prediction ("market timing"). AI has been very successful in all three aspects. We have been key innovators in the speed trade, having started with a $10,000 risk limit and, over the last 10 years, making more than $1.4 billion in profits. The reason is a purist adherence to AI. There is a huge opportunity for using deep learning in the prediction part of the trade, which is not latency sensitive and is mostly about high accuracy. Our mission is to make investing a science, a research-driven utility, and not a competition or a game that it is today. Deep learning has had a lot of success in bringing method to social science settings. We believe over the next five to 10 years that every trading operation will become deep learning based. However, at this time there is a lot of opportunity for innovation using deep learning in trading. 25-minute Talk Gaurav Chakravorty, Head of Trading Strategy Development, qplum
S7739 - AI and the Battle for Cyber Security The security domain presents a unique landscape for the application of artificial intelligence. Defenders in the security space are often charged with securing ever changing and complex networks, while attacks continue to probe for and exploit any system weakness. We'll dive into the state of cyber security, why it is well suited for artificial intelligence-based approaches, and how AI is actively defending against attacks today. 50-minute Talk Matt Wolff, Chief Data Scientist, Cylance
S7770 - AI in Healthcare: Beyond Deep Learning in Medical Imaging We'll give an overview of how deep-learning in healthcare can be utilized beyond medical imaging, if applied to clinical decision support and medical asset management. Deep learning is capable of addressing many, if not all, main challenges for care givers: information overflow, work overload, impacted accuracy due to data constrains, optimism bias, and optimal utilization of medical equipment. This needs to involve multiple data sources, and deals with data harmonization, semantic interoperability, and different health data types. Deep learning in healthcare has three main aspects: medical imaging, multi-data (structured, unstructured, streaming, etc.) based decision support, and asset utilization data. 25-minute Talk Michael Dahlweid, Chief Medical Officer, GE Healthcare
S7805 - Airbus Vahana - Development of a Self-Piloted Air Taxi Vahana started in early 2016 as one of the first projects at A? the advanced projects outpost of Airbus Group in Silicon Valley. The aircraft we're building doesn't need a runway, is self-piloted, and can automatically detect and avoid obstacles and other aircraft. Designed to carry a single passenger or cargo, Vahana is meant to be the first certified passenger aircraft without a pilot. We'll discuss the key challenges to develop the autonomous systems of a self-piloted air taxi that can be operated in urban environments. 25-minute Talk Arne Stoschek, Head of Autonomous Systems, Airbus A3
S7313 - AirVision: AI Based, Real-Time Computer Vision System for Drones Modern computing hardware and NVIDIA Jetson TX1 performance create new possibilities for drones and enable autonomous AI systems, where image processing can be done on-board during flight. We'll present how Magma Solutions developed the AirVision system to cover advanced vision processing tasks for drones, e.g., image stabilization, moving object detection, tracking, and classification using deep neural networks, and visual position estimation using preloaded maps. We'll describe how Magma Solutions used software frameworks Caffe with cuDNN, OpenVX /NVIDIA VisionWorks, and NVIDIA CUDA to achieve real-time vision processing and object recognition. The AirVision system is in part developed with Lithuanian Ministry of Defence funding and is being used as a tactical UAV system prototype. 25-minute Talk Mindaugas Eglinskas, CEO, Magma Solutions, UAB
S7674 - All That Glisters Is Not Convnets: Hybrid Architectures for Faster, Better Solvers Convolutional neural networks have proven themselves to be very effective parametric learners of complex functions. However, the non-linearities present in conventional networks are not strong; both halves of a (possibly leaky) RELU are linear and the non-linearity is computed independently for each channel. We'll present techniques that create decision tree and RBF units that are designed to respond non-linearly to complex joint distributions across channels. This makes it possible to pack more non-linearity into a small space and this is a particularly valuable replacement for the latter layers of a network - in particular the solver. The result is hybrid networks that outperform conventional pure neural networks that can be trained orders of magnitude more quickly. 50-minute Talk Tom Drummond, Professor, Monash University
S7809 - A Multi-Source, Multi-Sensor Approach to HD Map Creation It's simple to take the output of one type of sensor in multiple cars and produce a map based on that data. However, a map created in this way will not have sufficient coverage, attribution, or quality for autonomous driving. Our multi-source, multi-sensor approach leads to HD maps that have greater coverage, are more richly attributed, and have higher quality than single-source, single-sensor maps. In this session, we will discuss how we have created the world's largest HD map, are able to continuously update it, and are making autonomous driving safer and more comfortable.   25-minute Talk Willem Strijbosch, Head of Autonomous Driving, TomTom
S7404 - An Approach to a High-Performance Decision Tree Optimization Within a Deep Learning Framework for Investment and Risk Management We'll examine an innovative approach using an optimized algorithm to create a decision tree for the basis of regime dependent and pattern classification of financial and macroeconomic time-series data. Implemented in a supervised and unsupervised learning framework, the algorithm relies on the GPU for high performance computing and the host processor to further integrate the results in a deep learning framework. Also, we implement random number generation, in part, using a hardware quantum based true random number generator, balanced with the pseudo-random number generator in CUDA, so as to optimize overall speed where an exhaustive search is not feasible. 25-minute Talk Blay Tarnoff, Senior Application Developer and Database Architect, Cohen & Steers
Yigal Jhirad, Head of Quantitative and Derivatives Strategies , Cohen & Steers
S7174 - An Architectural Design Firm's Journey Through Virtual GPU Technology for Global Collaboration Learn the benefits that virtualization provides for an architecture and engineering design firm, along with the journey through the advancements in virtualization technology it took to finally meet the graphics-intensive needs of our design software. We'll share our experiences in how virtualization allows a large company, with over 15 offices and 1,000 people worldwide, to collaborate and work as a single firm. We'll show some cost comparisons with virtualization, along with their management benefits and requirements. We'll also look at the methods we used to set and test metrics specific to our requirements, and follow the results of those metrics through the changes in graphics virtualization technology. 50-minute Talk Andrew Schilling, Director of Information Technology, CannonDesign
Jimmy Rotella, Design Application Specialist, CannonDesign
S7252 - An Efficient Connected Components Algorithm for Massively Parallel Devices Learn how to efficiently parallelize connected components, an important irregular graph algorithm. Our CUDA implementation is asynchronous, lock free, converges rapidly, and employs load balancing. It is faster than other GPU codes on all 18 real-world and synthetic graphs we tested. We'll describe how to parallelize this graph algorithm by exploiting algorithmic properties, discuss important optimizations to improve the efficiency, and compare the performance with some of the fastest prior GPU implementations of connected components. 25-minute Talk Jayadharini Jaiganesh, Graduate Student, Texas State University
S7261 - A New Approach to Active Learning by Query Synthesis Using Deep Generative Networks We'll introduce a new active learning algorithm that is made practical using GPUs. Active learning concerns carefully choosing training data to minimize human labeling effort. In a nutshell, we apply deep generative models to synthesize informative "queries" that, when answered by a human labeler, allow the learner to learn faster. The learning is "active" in the sense that these questions are synthesized in an online manner adaptive to the current knowledge, thus minimizing the number of queries needed. Unlike traditional supervised machine training, our training is performed mostly on machine-synthesized data. To our knowledge, this is the first work that shows promising results in active learning by query synthesis. 25-minute Talk Jia-Jie Zhu, Postdoctoral Fellow, Boston College
Jia-Jie Zhu, Postdoctoral Fellow, Boston College
S7594 - An Industrial Perspective on the Next Generation of Social Consumer Robots: Needs, Challenges, and Potentials We aim to create awareness around both the R&D needs and the potential at the intersection of robotics, AI and IoT. As robotics technology evolves, we at SoftBank Robotics predict that the acceleration of personal, social robots will be the next big thing in the robotics sector. We believe that robots will play a key role in everyday life, and that we will soon co-exist with robots, leading to smarter, healthier, and happier lives. The SoftBank Robotics R&D and Innovation team is quickly establishing itself as a leader in the field of humanoid robotics by being committed to creating an intelligent and harmonious ecosystem through IoT with robots. We'll illustrate some of the use cases SoftBank Robotics has implemented to better understand the companionship between humans and robots, highlight some of the research and subsequent results towards achieving that goal, present the feedback from users, and conclude by outlining some of the grand challenges ahead. 50-minute Talk Amit Kumar Pandey, Head Principal Scientist (Chief Scientist), SoftBank Robotics (formerly Aldebaran Robotics)
S7699 - An Introduction to CUDA Programming Presented by Acceleware (Session 1 of 4) Join us for an informative introductory tutorial intended for those new to CUDA and which serves as the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We'll explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. We'll deliver a programming demonstration of a simple CUDA kernel. We'll also provide printed copies of the material to all attendees for each session ? collect all four! 80-minute Tutorial Chris Mason, Technical Product Manager, Acceleware Ltd.
S7700 - An Introduction to the GPU Memory Model - Presented by Acceleware (Session 2 of 4) This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. Attend session 1, "An Introduction to GPU Programming," to learn the basics of CUDA programming that are required for Session 2. We'll begin with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We'll define shared, constant, and global memory, and discuss the best locations to store your application data for optimized performance. We'll deliver a programming demonstration of shared and constant memory. We'll also provide printed copies of the material to all attendees for each session ? collect all four! 80-minute Tutorial Chris Mason, Technical Product Manager, Acceleware Ltd.
S7143 - Anomaly Detection for Network Intrusions Using Deep Learning We'll describe how deep learning can be applied to detect anomalies, such as network intrusions, in a production environment. In part one of the talk, we'll build an end-to-end data pipeline using Hadoop for storage, Streamsets for data flow, Spark for distributed GPUs, and Deeplearning4j for anomaly detection. In part two, we'll showcase a demo environment that demonstrates how a deep net uncovers anomalies. This visualization will illustrate how system administrators can view malicious behavior and prioritize efforts to stop attacks. It's assumed that registrants are familiar with popular big data frameworks on the JVM. 25-minute Talk David Kale, Deep Learning Engineer, Skymind
Adam Gibson, CTO, Skymind
S7829 - Apache Mahout's New Recommender Algorithm and Using GPUs to Speed Model Creation Predictive AI is often associated with product recommenders. We present a landscape of multi-domain behavioral models that predict multi-modal user preferences and behavior. This session will take the audience from first principles of the new Correlated Cross-Occurrence (CCO) algorithms showing the important innovations that lead to new ways to predict behavior into a deep dive into as variety different use cases, for instance using dislikes to predict likes, using search terms to predict purchase, and using conversion to augment search indexes with behavioral data to produce behavioral search. Some of these are nearly impossible to address without this new technique. We show the tensor algebra that makes up the landscape. Next, we walk through the computation using real-world data. Finally, we show how Mahout's generalized CPU/GPU integration and recently added CUDA support bring significant reductions in time and cost to calculate the CCO models. We expect the audience to come away with an understanding of the kind of applications to be built CCO and how to do so in performant in cost reducing ways.   50-minute Talk Pat Ferrel, Chief Consultant, PMC member of Apache Mahout, ActionML
S7510 - Apache Spark and GPUs for Scaling Deep Learning Libraries Apache Spark has become a popular tool for data warehousing, ETL, and advanced analytics. Meanwhile, deep learning has become one of the most powerful classes of machine learning methods, in large part due to the computational power of modern machines with GPUs and specialized hardware. Spark and GPUs combine well for large deep learning workflows: Spark can handle ETL and data management, and it can distribute data parallel tasks to scale out across many GPUs. 50-minute Talk Tim Hunter, Software Engineer, Databricks, Inc
Joseph Bradley, Software Engineer, Databricks, Inc
S7649 - Applications of Deep Learning: Hardware QA Hardware testing is a multifaceted challenge, but one that stands to benefit greatly from the advances in deep learning. The tricky formula of balancing good coverage against risk is consistently challenged with the rapid evolution of the problem space. The landscape in the industry today points to one that has been more or less linearly refined and improved upon, with the constant refrain of more resources being touted as the go-to solution. We'll discuss one of the ways we're working to evolve the approach to test: by harnessing the available tools in the deep learning space, offering a far more efficient path to providing better quality, while providing the flexibility of better coverage/risk decisions. 25-minute Talk Martina Sourada, Senior Director, SWQA, NVIDIA
S7513 - Applications of Generative Adversarial Networks to Drug Discovery in Oncology and Infectious Diseases Recent advances in deep learning and specifically in generative adversarial networks have demonstrated surprising results in generating new images and videos upon request, even using natural language as input. We'll present the first application of generative adversarial autoencoders (AAE) for generating novel molecules with a defined set of parameters. In the first proof of concept experiment, we developed a seven-layer AAE architecture with the latent middle layer serving as a discriminator. As an input and output, the AAE uses a vector of binary fingerprints and concentration of the molecule. In the latent layer, we also introduced a neuron responsible for growth inhibition percentage, which, when negative, indicates the reduction in the number of tumor cells after the treatment. To train the AAE, we used the NCI-60 cell line assay data for 6252 compounds profiled on MCF-7 cell line. The output of the AAE was used to screen 72 million compounds in PubChem and select candidate molecules with potential anti-cancer properties. This approach is a proof of concept of an artificially intelligent drug discovery engine, where AAEs are used to generate new molecular fingerprints with the desired molecular properties. We'll also present the applications of this approach to discovering new anti-infective drugs and present the roadmap for generating drugs for rare diseases and even for individual patients. 50-minute Talk Polina Mamoshina, Sr. Research Scientist, Pharmaceutical Artificial Intelligence, Insilico Medicine, Inc
Artur Kadurin, Chief AI Officer, Insilico Medicine, Inc
Alex Zhavoronkov, CEO, Insilico Medicine, Inc
S7696 - Applying Deep Learning to Financial Market Signal Identification with News Data We'll discuss how natural language processing techniques can be used for predicting financial markets from news data. By adapting techniques from other natural language processing applications to news data and market signals, predictive models can be built. Due to the large volume of news data available, models must be trained, optimized, and tested using GPU acceleration. 25-minute Talk Steven Thornton, Data Scientist, Triumph Asset Management
S7351 - Applying GPU Technology to Combat System Integration and Maintenance Lockheed Martin Rotary and Mission Systems has a rich history of integrating combat systems into naval ships and buildings. The integration of complex radar and support systems into modern war-fighting entities demands the use of a unique set of design and simulation tools to verify and optimize engineering designs before production begins. After the combat system is in the field, it is important to equip the warfighter with informative training and maintenance systems. The goal is to keep the combat system fully operational at all times. GPU technologies such as OpenGL, CUDA, OptiX, and Iray, along with virtual reality and augmented reality, make these unique design and maintenance environments possible. These design practices are being examined in the Surface Navy Innovation Center through dedicated research for domestic and international combat system integration and maintenance. 25-minute Talk Rich Rabbitz, Principal Member of Engineering Staff, Lockheed Martin
Christopher Crouch, Associated Member of Engineering Staff, Lockheed Martin
S7623 - Approach to Practical Application of Deep Learning in Manufacturer's Production Line We'll present how deep learning is applied in a manufacturer's production line. Fujikura and OPTOENERGY are introducing a visual inspection system incorporating deep learning in the production process of semiconductor lasers. The same inspection accuracy as skilled workers was achieved by optimizing the image size and the hyper parameters of a CNN model. The optimized image size is less than one quarter of the image size required for the visual inspection by skilled workers, which leads to large cost reduction of the production line. It was also confirmed that the highlighted region in the heatmaps of NG images didn't meet the criteria of the visual inspection. The visual inspection incorporating deep learning is being applied to other products such as optical fibers and electrical cables. 25-minute Talk Masahiro Kashiwagi, Manager, Fujikura Ltd.
S7295 - Are We Done with Object Recognition? The R1-Robot Perspective. Today Deep Learning achieved stunning results in visual recognition as such to raise the question of whether this problem is actually solved. Should this be the case, the advantages for robotics could be dramatic. Indeed, the lack of reliable visual skills is a major bottle neck for robots deployment in everyday life. With this respect in mind, we started an effort to quantify the benefits and limits, if any, of DL in the context of robot vision. By exploiting R1, our latest humanoid equipped with an NVIDIA Jetson TX1 , we investigated key differences between robot vision and other applications where DL typically excels, as image retrieval. Our study identified critical issues to be tackled via computer vision and machine learning, while taking advantage of a robot platform. Our results confirm the huge impact of DL, testified by the great real-time recognition capabilities of R1, while pointing at specific open challenges that need to be addressed for seamless deployment in robotics. 25-minute Talk Giulia Pasquale, Ph.D. Candidate, Istituto Italiano di Tecnologia
S7777 - A Road to 3D for Everyone 3D content remains extremely expensive and difficult to create. With virtual reality opening up an opportunity for many industries to create both consumer and professional experiences, we'll present Unbound's approach to make it easy for everyone to create things in 3D. We'll share our R&D journey, experimental engines, and how CUDA ultimately helped us to create the powerful parallel algorithms necessary to enable robust volumetric modeling and rendering in VR. This has immediate utility for content creators, professional and novice alike. 25-minute Talk Florian Hoenig, CEO, Unbound Technologies, Inc.
S7622 - A Robust and Scalable CUDA Parallel Programming Model The next release of CUDA introduces Cooperative Groups, a new programming model that significantly improves cooperative thread programming. Cooperative Groups, along with new warp synchronous primitives, enables threads and blocks within a CUDA grid to synchronize, exchange data, and perform collective operations in a safe, explicit, and reliable manner. Cooperative Groups is an elegant and scalable programming model for expressing synchronization and communication between groups of parallel threads ranging in size from a subset of a warp to an entire CUDA grid launch. Both Cooperative Groups and the lower-level warp-synchronous primitives offer a safe and explicit mechanism for high-performance intra-warp communications. We'll cover the new programming model features in depth, including best practice examples. 50-minute Talk Yuan Lin, Principal Engineer, NVIDIA
Kyrylo Perelygin, Senior Systems Software Engineer, NVIDIA
S7723 - ArrayFire Graph: Dynamic Graph Library for GPUs ArrayFire Graph is an out-of-core dynamic graph library that runs on NVIDIA GPUs. It enables users to create and update graphs at a very high rate. AF Graph has a number of high-performance graph analytic algorithms that can be run on the dynamic data. Dynamic graphs allow users to provide incremental edge updates instead of rebuilding the whole graph. AF Graph's out-of-core support can handle graphs that cannot fit in GPU memory and can handle billions of edges. 25-minute Talk Kumar Aatish, Software Engineer, ArrayFire LLC
S7239 - Artificial General Intelligence for the Internet of Things What do we need to achieve artificial general intelligence? How do we distribute intelligence over the internet-of-things? We'll dive deep into the heart of the matter, which is machine reasoning. Following recent advances in mathematical foundations and homotopy-type theory, we conclude that the crux is to formally separate intents from implementations. We can teach neural networks to understand these intents and to use a divide-and-conquer method for compiling these intents into implementations. Our goal is to outline a distributed strategy for accomplishing this moonshot. 25-minute Talk Shaowei Lin, Assistant Professor, Singapore University of Technology and Design
S7677 - Artificial Intelligence for Digital Pathology We'll introduce why artificial intelligence is needed for digital pathology and how it can be used to diagnosis breast and prostate cancer. By applying AI to two types of cancer diagnoses, it shows what challenges exist in digital pathology and how we overcome them. First, we'll introduce a system for predicting the tumor proliferation in breast cancer. Predicting the tumor proliferation can be integrated into current prognostic grading systems, being more relevant to actual clinical practice. In addition, we'll present a system for predicting Gleason's score, an important factor in the diagnosis of prostate cancer. A system for accurate and consistent diagnosis based on artificial intelligence will bring much value to digital pathology. 25-minute Talk Kyunghyun Paeng, Research Scientist, Lunit Inc.
S7787 - Artificial Intelligence on Benchmark We'll discuss AI developments within the last decade with the help of public academic benchmark. Xiaodi believes benchmarks like CityScapes and KITTI are helpful for the development of AI worldwide, however, these benchmarks have disadvantages in that there is a need to propose new datasets to incorporate more autonomous driving sections in computer vision benchmarks. 25-minute Talk Yinan Sun, Business Development Manager, TuSimple
S7187 - Artificial Reality: Deep Learning With Synthetic Driving Data Learn how to boost your deep learning training process by utilizing features of a driving simulation. Besides a customizable source of video camera input, enhanced driving simulations can also provide information from non-visual sensors like lidar, radar, or ultrasound simultaneously. Train deep learning algorithms with visual, non-visual, or intermediate data like point clouds, bounding boxes, or object lists. Instead of labeling real videos by hand, use the information of the simulation to feedback and correct the results of your neural network. Run your simulation in faster than real time for distributed headless simulations or trigger every frame of the simulation to capture data for further processing. Embed your algorithms within the simulation (software in the loop) and test your AI in unusual situations, which are too risky in reality. Artificial reality? Not perfect, but a perfect complement in developing AI algorithms for autonomous driving. 25-minute Talk Bernhard Bieder, Software Engineer, VIRES GmbH
Daniel Wiesenhutter, Software Engineer, VIRES GmbH
S7755 - AR & VR Showcase The VR Showcase is an opportunity for 10 companies or teams using augmented or virtual reality to present their innovative work for a chance to win up to $30,000 in cash and prizes along with valuable venture capital, PR, and marketing exposure. Companies will have the opportunity to pitch their idea on stage for 5 minutes and 3 minutes for questions. A judging panel will have the chance to try the applicants' demos on the GTC expo floor. Demos will be featured in a 12x12 booth in the NVIDIA VR Village Startup Pavilion. At the end of the showcase, a winner will be selected by the judging committee and presented with the cash award and prizes. 80-minute Talk Mark Rein, VP & Co-Founder, Epic Games
Jeff Herbst, Vice President of Business Devleopment, NVIDIA
Victoria Rege, Mgmt, Bus Dev (Business Development), NVIDIA
S7626 - A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA Learn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU sequential code, we guide the audience through the different steps that allowed us to increase performances on a GPU around 149 times the original runtime of the code (performances evaluated on a K20Xm). The same optimization strategy is applied to the CPU code and increases performances around 35 times the original run time (performances evaluated on a E5-1650v3 processor). Finally, different hardware architectures (Xeon CPUs, GPUs, KNL) are benchmarked with the native CUDA implementation and one based on OpenACC. 25-minute Talk Ludomir Oteski, Postdoctoral researcher, ONERA
S7339 - A Sleepless Eye on Patient Monitors: Real-Time AI in Healthcare Critical medical decisions are made each second, and are often informed by the real-time interpretation of complex or subtle patterns in continuous patient monitoring data. Manual review is intermittent and imperfect, but traditional automation attempts have been unreliable and often suffer from high false positive rates, limiting their practical utility in clinical settings. Recent advances in deep learning algorithms and GPU acceleration enable the creation of streaming systems that reliably, continuously, and tirelessly pick out patterns and trends to support timely and appropriate clinical decisions for the benefit of the patient. We'll describe the purpose, design, and impact of one such system, as created by Delta Brain Inc. 25-minute Talk Kevin Lung, Co-Founder & Director of Engineering, Delta Brain Inc.
Adam Lichtl, Founder & CEO, Delta Brain Inc.
S7441 - Assembly Chain Training with Professional VR by Optis Optis has been involved in advanced optical simulation for the past 25 years and has recently invested in VR for virtual prototyping. Its latest HIM built for human ergonomics evaluation in combination with advanced, real-time, physics-based rendering enables precise environment reproduction for appropriate prototyping or training. We'll present the latest integration for assembly line training with HTC Vive and feedback powered by NVIDIA PhysX. Companies such as Tesla Motors and Bentley are the proud early adopters of this solution. We'll demonstrate our software and show customer use cases and their data to explain how to improve the VR experience with haptics and audio simulation in the future. 25-minute Talk Nicolas Dalmasso, Innovation Director, Optis
S7705 - Asynchronous Operations and Dynamic Parallelism in CUDA - Presented by Acceleware (Session 3 of 4) This tutorial builds on the two previous sessions ("An Introduction to GPU Programming" and "An Introduction to GPU Memory Model") and is intended for those with a basic understanding of CUDA programming. This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We'll demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. In the second part of the session, we'll focus on dynamic parallelism. We'll deliver a programming demo involving asynchronous operations. We'll also provide printed copies of the material to all attendees for each session ? collect all four! 80-minute Tutorial Chris Mason, Technical Product Manager, Acceleware Ltd.
S7426 - Automated Truck Driving and Platooning with DRIVE PX 2 We'll present achievements in the field of automated truck driving, specifically the use case of lane keeping in platooning scenarios based on mirror cameras. Lane detection, generating control parameters, controller, and arbitration functions all run on the NVIDIA DRIVE PX 2 with three cameras attached to it.  25-minute Talk Devid Will, Manager Automated Driving Functions, fka Forschungsgesellschaft Kraftfahrwesen mbH Aachen
S7267 - Automatic Compiler-Based Optimization of Graph Analytics for the GPU Learn how to use IrGL, our newly developed language and compiler, to obtain high-speed graph algorithm implementations without writing a lot of low-level NVIDIA CUDA. IrGL can be used for parallel graph algorithm research, graph analytics, and graph database query processing. IrGL performance for graph algorithms meets or exceeds the performance of low-level handwritten CUDA code because our optimizing compiler automatically tackles three key challenges encountered in writing graph algorithms -- atomics, load imbalance due to serialization of loops, and kernel launch throughput -- freeing up the programmer to focus on higher-level optimizations. We'll introduce the IrGL language, its compiler, and how they can use IrGL to target problems with irregular data-parallelism. 50-minute Talk Sreepathi Pai, Postdoctoral Research Fellow, The University of Texas at Austin
S7648 - Automating High-Content Screening Image Analysis with Deep Learning Deep learning can automate the analysis of the hundreds of thousands of images produced by automated microscopy systems each day. High-content screening (HCS) systems that combine high-throughput biotechnology with automated microscopy are revolutionizing drug development and cell biology research. The images produced by these systems provide valuable insight into how cells respond to many chemical or genetic perturbations. Existing image analysis pipelines rely on hand-tuning the segmentation, feature extraction, and machine learning steps for each screen. For many research groups, tuning these pipelines remains a bottleneck in implementing HCS. We'll demonstrate how deep learning-based pipelines overcome this bottleneck and outperform existing methods. We'll show improved results on classifying sub-cellular protein localization in genome-wide screens of the GFP-tagged yeast collection. 25-minute Talk Oren Kraus, PhD Student, University of Toronto
S7215 - Automating VR and Photoreal Imagery From Siemens Teamcenter Learn how manufacturers are automating and in-housing their digital photorealistic and VR/AR visualization pipelines out of Siemens Teamcenter and NX through JT. This is leading to improved efficiency and cost reduction and, crucially, enabling manufacturer control over digital assets that allows them to be repurposed across the business. We'll demonstrate how to set up an automated visual digital pipeline out of Siemens Teamcenter into NVIDIA Iray and Epic Unreal Engine, accounting for configuration rules and buildability. 25-minute Talk Dave Coldron, Product Director, Lightwork Design Ltd.
S7826 - Autonomous Driving, Redefined  Autonomous driving, redefined 25-minute Talk Gu Weihao, General Manager of Baidu Intelligent Vehicle Business Unit, Baidu
S7172 - Autonomous Drone Navigation with Deep Learning We'll present an autonomous drone piloted by a deep neural network (DNN) that can autonomously navigate through a forest by following trails and can avoid obstacles. DNN gets video frames from the onboard drone camera as its input and computes high-level control commands as its output. The control commands are sent to the low-level drone's autopilot for execution. Our DNN runs onboard an NVIDIA?Tegra?TX1 in real time. The drone uses open source PX4 flight stack for the low-level control and ROS for its runtime. We'll present the DNN's architecture, describe how we train it and run it as ROS node. We'll also demo the flight videos and show some qualitative analysis of the autonomous flights. 50-minute Talk Nikolai Smolyanskiy, Principal Software Engineer, NVIDIA
Alexey Kamenev, Senior Deep Learning and Computer Vision Engineer, NVIDIA
Jeffrey Smith, Senior Computer Vision Software Engineer, NVIDIA
S7263 - Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs We'll discuss the Bayesian statistical paradigm and Markov Chain Monte Carlo (MCMC) algorithms - the cornerstone of modern Bayesian computation. Scalable MCMC for big datasets and complex models is currently an open research question. Using GPUs provides a promising and largely unexplored avenue for accelerating these algorithms, but is nontrivial, because MCMC is inherently sequential and has traditionally been considered difficult to parallelize. We'll show how Gibbs sampling, a widely used MCMC algorithm, can be effectively parallelized on GPUs for a large class of exchangeable hierarchical Bayesian models. Participants will learn the mathematical and hardware/software challenges in bringing GPUs to the Bayesian community. Background in Bayesian statistics or MCMC is not assumed. 25-minute Talk Alexander Terenin, PhD Student, UC Santa Cruz
David Draper, Professor, UC Santa Cruz
S7325 - Behavioral Additive Manufacturing: Adaptive 3D Printing Using Multi-Agent Systems and Deep Learning We'll introduce autonomously constructed architecture by using multi-agent systems (MAS) and deep learning. 3D printing path adapts in real time to the unpredictable material behavior, by using an NVIDIA Jetson card on an industrial robotic arm. We'll explain path generation, real-time visual tracking of material, recomputing of robotic targets, and finally experiments with real-time MAS adaptation for emergent stable structures through code and video recordings of 3D printing process and its printed structures. 25-minute Talk Alisa Andrasek, Director, University College London, Wonderlab/Biothing
S7362 - Benchmarking the New Unified Memory of CUDA 8 We'll evaluate CUDA 8's new unified memory's impact to applications with benchmarks and share practices on how to tune or build high-performance apps. Since CUDA 6, unified memory has aimed at simplifying the programmability of heterogeneous memory management while maintaining good performance. However, practical limitations prevent applications from fully taking advantage of it. The CUDA 8 release highlights an updated unified memory that both simplifies programmability and improves performance, especially when married with the new Pascal GPU architecture. We'll evaluate the new system, benchmark its performance, and share our best practices in tuning code, which could be good reference for app developers. In addition, we'll explore options and solutions on moving/exchanging data efficiently between heterogeneous devices, such as NVMe/NVRAM in modern data center or cloud environments. 25-minute Talk Frank Zhao, Software Architect, Dell EMC
Yifan Sun, College Coop Student, Dell EMC
L7106 - Best GPU Code Practices Combining OpenACC, CUDA, and OmpSs We'll guide you step by step to port and optimize an oil-and-gas mini application to efficiently leverage the amazing computing power of NVIDIA GPUs. While OpenACC focuses on coding productivity and portability, CUDA enables extracting the maximum performance from NVIDIA GPUs. OmpSs, on the other hand, is a GPU-aware task-based programming model that may be combined with CUDA, and recently with OpenACC as well. Using OpenACC, we'll start benefiting from GPU computing, obtaining great coding productivity, and a nice performance improvement. We can next fine-tune the critical application parts developing CUDA kernels to hand-optimize the problem. OmpSs combined with either OpenACC or CUDA will enable seamless task parallelism leveraging all system devices. Prerequisites: Basic knowledge of OpenACC and CUDA. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Antonio J. Pena, Senior Researcher, Barcelona Supercomputing Center (BSC)
Guray Ozen, Research Assistant , Barcelona Supercomputing Center
Pau Farre, Software Engineer, Barcelona Supercomputing Center (BSC)
S7786 - Beyond Games: How Unreal Engine is Putting the Reality into Virtual Reality Epic Games presents a panel discussion with partners who are using Unreal Engine to bring real-time, high-fidelity interactive experiences to their customers. From product design and visualization, to virtual production, photorealism, and final pixels, VR content creators are uncovering the power of Unreal Engine. Hear from company executives, technology partners, and customers about applying game engine technology to revolutionize the conventions of filmmaking, product design, and the future of customer engagement. 50 minutes Panel Marc Petit, General Manager, Epic Games (Unreal Engine)
Mark Roberts, Design Operations Manager , McLaren Automotive Limited
Matthew Noyes, Aerospace Technologist/Hybrid Reality Lab Software Lead, National Aeronautics and Space Administration
Doug Wolff, Partner Technology Manager, Epic Games (Unreal Engine)
S7817 - Beyond Visualization, Harnessing the Power of Compute for Design Autodesk Project Dreamcatcher takes the next step in the world of computation, artificial intelligence, and machine learning by harnessing the power of computing to deliver on the promise of Computer Aided Design. Today's GPU's allow for massive exploration of the design space for any problem, empowering designers and engineers to truly allow computation capacity to aid them in design and problem solving. Come learn how Autodesk is harnessing the power of computation in the cloud, powered by tomorrow's next generation hardware, to help everyone make better decisions. 25-minute Talk Brian Frank, Sr. Product Line Manager | Simulation, Autodesk
S7170 - Bicycle Green Waves Powered by Deep Learning We'll explore using deep learning to improve urban traffic signaling. Bicycles (both self-powered and pedelecs) are the future of urban transport alongside (self-driving) electric cars, buses, and rail services. Green waves make cycling more efficient, attractive, and safer. Instead of fixed ""green wave"" timings or priorities, a work in progress system is presented that learns to increase the flow of bicycle traffic while minimizing the impact on other traffic actors -- and in many use cases also results in improvements in general traffic times. Using low power efficient SoCs -- Tegra X1 -- the ""smarts"" are integrated in traffic lights and provide V2I interfaces -- also to mobile phones of cyclists -- about signal changes and warn of pedestrians or cyclists. Dispensing with inductive loop, magnetometer, or radar-based sensors buried in the pavement makes the system inexpensive. We'll present initial results from pilot testing in a German city. 25-minute Talk Edward Zimmermann, Principal Consultant, Nonmonotonic Networks / joint R&D with GESIG. Gesellschaft fur Signalanlagen
S7178 - Bidirectional Recurrent Convolutional Networks and Their Applications to Video Super-Resolution We'll discuss a fully convolutional version of recurrent neural networks, namely bidirectional recurrent convolutional networks, which can greatly reduce the number of learning parameters from millions to several hundreds. We'll demonstrate its effectiveness by achieving significant performance and running time improvements for the task of video super-resolution. Using GPUs can further accelerate the speed by 20 times. 25-minute Talk Yan Huang, Research Assistant, Institute of Automation, Chinese Academy of Sciences
S7405 - Bifrost: A Python/C++ Framework for Easy High-Throughput Computing Bogged down trying to build a fast GPU processing pipeline? We'll present a solution: Bifrost, a framework for rapidly composing real-time data collection and analysis pipelines. Real-time data processing lies at the heart of most modern radio telescopes, and while hardware capabilities and data collection rates advance to the petascale regime, development of efficient real-time processing codes remains difficult and time-consuming. Bifrost solves this problem by combining a TensorFlow-like Python API with a library of common algorithms and highly efficient data transport. We'll describe the design and implementation of this framework, and demonstrate its use as the backend for a large radio telescope. 25-minute Talk Miles Cranmer, Research Assistant, Harvard-Smithsonian Center for Astrophysics
S7475 - Big Data, Little Cluster: Using a Small Footprint of GPU Servers to Interactively Query and Visualize Massive Datasets We'll discuss the approach to and advantages of using GPUs to not only power through large-scale database queries but also use the graphics pipeline of the GPU to rapidly and efficiently visualize the outputs of billions of rows of data. The application of the GPU for both query and render results in a fast system for multi-terabyte scale analytic challenges. We'll cover the high-level benefits of the approach and delve into the technical details associated with GPU-powered databases, server side rendering, and other software refinements needed to squeeze the maximum amount of performance from this exceptional hardware platform. 50-minute Talk Todd Mostak, Founder and CEO, MapD
S7481 - Big Image-Omics Data Analytics for Clinical Outcome Prediction We'll introduce how to develop big image-omics data analytics algorithms with GPU computing tools for clinical outcome prediction from pathological images and cell profiling data of cancer patients. Recent technological innovations are enabling scientists to capture image-omics data at increasing speed and resolution, where the image-omics refers to both image data (pathology images or radiology images) and omics data (genomics, proteomics, or metabolomics) captured from the same patient. This is generating a deluge of heterogeneous data from different views. Thus, a compelling need exists to develop novel data analytics tools to foster and fuel the next generation of scientific discovery in image-omics data-related research. However, the major computational challenges are due to the unprecedented scale and complexity of heterogeneous image-omics data analytics. There is a critical need for large-scale modeling and mining strategies to bridge the gap and facilitate knowledge discovery from complex image-omics data. We'll introduce our recent work on developing novel deep learning methods to detect cells in the terapixel histopathological images with 10,000+ speedup and automatically discovering biomarkers for clinical outcome prediction. 25-minute Talk Junzhou Huang, Associate Professor, University of Texas at Arlington
S7298 - Blasting Sand with NVIDIA CUDA: MPM Sand Simulation for VFX We'll present our challenges and solutions for creating a material point method (MPM)-based simulation system that meets the production demands of fast turnaround for artistic look development. Our method fully utilizes the GPU and performs an order of magnitude faster than the latest published results. With this improvement, the technique's main limiting factor - its speed - has been eliminated, making MPM appealing for a wider range of VFX applications. Practitioners in computational physics and related fields are likely to benefit from attending the session as our techniques are applicable to other hybrid Eulerian-Lagrangian simulations. 25-minute Talk Ken Museth, Director of R&D, DreamWorks Animation
Gergely Klar, Software Engineer, DreamWorks Animation
S7652 - Blending the Worlds of Machine Learning and Deep Learning to Make the Fastest AI Platform on GPUs Deep learning algorithms have benefited greatly from the recent performance gains of GPUs. However, it has been unclear whether GPUs can speed up data manipulations such as joins and aggregations and machine learning algorithms such as generalized linear modeling, random forests, gradient boosting machines, and clustering., the leading open source AI company, is bringing the best-of-breed data science and machine learning algorithms to GPUs, not just deep learning. In addition, is porting data.table to GPUs, already the fastest open-source columnar data frame library and the world's fastest implementation of the sort algorithm. This powerful combination will enable the fastest data science and machine learning pipelines for AI transformations for applications such as IoT time series, fraud prevention, anomaly detection, and many more. We'll demonstrate benchmarks for the most common algorithms relevant to enterprise AI and showcase performance gains as compared to running on CPUs. 25-minute Talk Arno Candel, CTO,
SriSatish Ambati, CEO and Co-Founder, H2O
S7450 - Boosting Performance and Earnings of Cloud Computing Deployments with rCUDA We'll present how cloud computing facilities using GPUs can boost overall performance while generating increased economic benefits. To achieve these important improvements, we'll propose to move from the traditional model for using GPUs within virtual machines to a new model that leverages the remote GPU virtualization mechanism. This mechanism allows GPUs to be detached, in a logical way, from the nodes where they are installed so that GPUs now can be transparently used from any node of the cluster. Furthermore, the remote GPU virtualization mechanism allows GPUs to be concurrently shared among many different applications. We'll use the rCUDA middleware as a case study for demonstrating how GPUs can be concurrently shared among virtual machines in a cloud computing deployment. We'll show performance results to quantify the improvements attained by using rCUDA in cloud deployments.  25-minute Talk Federico Silla, Associate Professor, Technical University of Valencia
S7436 - Boosting Visual Object Tracking Using Deep Features and GPU Implementations We'll explain how to use Deep Features for enabling state-of-the-art results in visual object tracking. Visual object tracking is a difficult task in three respects, since (1) it needs to be performed in real-time, (2) the only available information about the object is an image region in the first frame, and (3) the internal object models needs to be updated in each frame. The use of Deep Features gives significant improvements regarding accuracy and robustness of the object tracker, but straightforward frame-wise updates of the object model become prohibitively slow for real-time performance. By introducing a compact representation of Deep Features, a smart updating mechanism, and exploiting systematically GPU implementations for feature extraction and optimization, real-time performance is achievable without jeopardizing tracking quality. 25-minute Talk Michael Felsberg, Professor, Linkoping University
S7819 - Bringing Gaming, VR, and AR to Life with Deep Learning Game development is a complex and labor-intensive effort. Game environments, storylines, and character behaviors are carefully crafted requiring graphics artists, storytellers, and software to work in unison. Often games end up with a delicate mix of hard-wired behavior in the form of traditional code and somewhat more responsive behavior in the form of large collections of rules. Over the last few years, data-intensive machine learning solutions have obliterated rule-based systems in the enterprise -- think Amazon, Netflix, and Uber. At Unity, we've explored the use of deep learning in content creation and deep reinforcement learning in character development. We'll share our learnings and the Unity APIs we use with the audience. 25-minute Talk Danny Lange, Vice President, Unity Technologies
S7250 - Bringing Low-Latency and Fault-Tolerant Computing to Tegra SoCs with Persistent Threading The NVIDIA Tegra K1 and X1 have revolutionized embedded computing. Combining ARM cores and a powerful GPU, these devices have found their way into everything from cars to low-power sensor systems. The high computational efficiency of Tegra SoCs enables potential new markets that have long been held by FPGAs. However, some apps do not map well into the typical CUDA execution model. Persistent threading (PT) is a relatively unexplored model for GPU computing, enabling FPGA-like behavior. Like an FPGA, PT executes until the device is reset or a rare halt condition is met. Memory management and application synchronization are shifted from the NVIDIA API to the developer as the PT kernel runs in parallel with the host application. Leveraging the Tegra unified memory model, PT is able to reduce API overhead to only launch of the kernel and scheduler workload. 25-minute Talk Andrew Milluzzi, Doctoral Candidate, University of Florida
S7324 - Bringing NVIDIA GPUs to the PGAS/OpenSHMEM World: Challenges and Solutions Learn about techniques and solutions that bring GPU computing to the world of partitioned global address space (PGAS) models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. We'll discuss simple extensions to the OpenSHMEM model to address this issue. We'll also present challenges and solutions in designing NVIDIA?CUDA?aware runtimes to support these extensions and optimize data movement using CUDA IPC and GPUDirect RDMA features. And we'll demonstrate the impact of these concepts to application performance. 25-minute Talk Dhabaleswar K. (DK) Panda, Professor and University Distinguished Scholar, The Ohio State University
S7223 - Bring the Power of CUDA to Small Devices Learn how to bring the power of GPUs and CUDA to small machines and IoT edge devices. Experience the development process from proof of concept to a production-ready device. NVIDIA TK1 and Jetson TX1 SoCs allow for the first time the use of high-performance GPGPUs on small, power-constrained devices. The complexity and cost to get from a maker board like the Jetson TK1 to a hardware design ready for customers are for many preventing progress. We'll explain how computer modules like the Jetson X1 module can be used to simplify the process and get you to market faster and cheaper. We'll go step by step through a typical development process. You'll learn what skills and resources you require to create an industrial-grade device. We'll evaluate how this approach compares to other solutions like single board computers and designs from scratch. If you know the power of GPUs, but don't know how to bring it to machines or IoT devices, this talk is for you! 50-minute Talk Daniel Lang, CTO, Toradex Inc.
S7634 - Build a Neural Translation System from Scratch with PyTorch As recently covered by the New York Times, Google has totally revamped its Translate tool using deep learning. We'll learn about what's behind this system, and similar state of the art systems?including some more recent advances that haven't yet found their way into Google's tool. We'll start with looking at the original encoder-decoder model that neural machine translation is based on, and will discuss the various potential applications of this kind of sequence to sequence algorithm. We'll then look at attentional models, including applications in computer vision (where they are useful for large and complex images). In addition, we'll investigate stacking layers, both in the form of bidirectional layers and deep RNN architectures. We'll focus on the practical details of training real-world translation systems, and showing how to take advantage of PyTorch's dynamic nature to heavily customize an RNN as required for modern translation approaches. 50-minute Tutorial Jeremy Howard, Entrepreneur,
S7366 - Building a GPU-enabled OpenStack Cloud for HPC M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by Monash's R@CMon Research Cloud team. Built to support Monash University's high-throughput instrument processing requirements, M3 is half-half GPU-accelerated and CPU-only. We'll discuss the design and tech used to build this innovative platform as well as detailing approaches and challenges to building GPU-enabled and HPC clouds. 25-minute Talk Blair Bethwaite, Lead Cloud Architect, Monash University
S7704 - Building an L4 Autonomous Driving R&D Platform We'll give a step-by-step description of how to use NVIDIA DRIVE PX 2 and the NVIDIA DriveWorks SDK to enable Level 4 autonomous research vehicles. We'll consider choice of sensors (camera, lidar, radar) and mounting locations for highway and urban autonomous driving. We'll also discuss optimal use of DriveWorks for sensor data gathering and processing using NVIDIA's AI solutions. The presentation will include video demonstrations of real-life examples showcasing the utilization of DRIVE PX 2 and DriveWorks as an end-to-end deep learning platform for automated driving.   25-minute Talk Wolfgang Juchmann, VP Sales and Business Development , AutonomouStuff
S7350 - Building a Successful Deep Learning Platform: Experiences in Building GPU-Enabled HPC Clusters Conducting deep learning research and development requires a combination of cutting-edge hardware, elastic software frameworks, and a collaborative research community. We'll provide the scaffolding for participants to construct an enterprise-scale, GPU-enabled high performance computing solution for machine learning and data science by drawing on the experiences gained while IBM Research built its Cognitive Computing Cluster. We'll start by discussing how to build a secure, shared-resource computing cluster optimized for deep learning. Next, we'll cover how to provide deep learning frameworks supporting speech, vision, language, and text processing and their underlying primitives. Finally, we'll discuss how to build a best practice knowledge base to improve research quality and accelerate discovery. 25-minute Talk Brian Belgodere, Research Software Engineer, IBM Research
S7670 - Building Emotionally Aware Cars Advanced and autonomous AI systems surround us daily, but as smart as these are, they lack the ability to sense and adapt to human emotions. At Affectiva, our mission is to humanize technology by bringing artificial emotional intelligence (Emotion AI) to the digital world. Using computer vision and deep learning, Affectiva measures facial expressions of emotions. We'll explore the applications of Emotion AI in automotive. We'll show how driver's emotion can be measured in human-driven cars and (semi-) autonomous vehicles to improve road safety and deliver a more personalized transportation experience. In addition, we'll share our findings from over 28 hours of in-car data collected, such as the most frequently observed emotions. 25-minute Talk Abdelrahman Mahmoud, Product Manager, Affectiva
S7780 - Building Exascale Deep Text Comprehension Tools for Effective Cancer Surveillance We'll share our experience in developing novel text comprehension tools for enabling population-level cancer surveillance and research at scale to support the National Cancer Institute's Surveillance, Epidemiology, and End Results program. 25-minute Talk Arvind Ramanathan, Staff Scientist, Oak Ridge National Laboratory
S7815 - Building Scale-out Deep Learning Infrastructure: Lessons Learned from Facebook A.I. Research Facebook AI Research (FAIR) in partnership with NVIDIA has designed a scale-out infrastructure built on NVIDIA DGX-1. This initiative began with an extensive evaluation of design approaches for multi-system scale, as well as considerations for networking and storage supporting one of the world's largest DGX-1 clusters. Attend this session to gain valuable insights into how one of the world's leading AI innovators is building a scale-out infrastructure for deep learning, learn architectural best practices, and participate in Q&A with featured panelists from FAIR and NVIDIA. 25-minute Talk Soumith Chintala, Facebook AI Research, NVIDIA
S7585 - Building the World's First AI for Retail Banking, or, How to Do Deep Learning in a 185-Year-Old Bank We'll discuss putting into production the world's first AI for retail banking. This work was done in 2015, by software startup for a large international bank. The algorithm we built ingests vast amounts of data and, over time, learns how to select the most effective treatment option for a customer based on past behavior ? for example, through phone calls, email alerts, or SMS notifications. Our system is the foundation for the bank's multiyear plan to embed AI into virtually all areas of its retail business. We'll discuss (1) training and iterating through different neural network models on a GPU cloud cluster -- hardware at the time, which was foreign to the bank, (2) strategies for working within the constraints of a large organization with inconsistent and disperate datasets and embedded legacy systems, and (3) navigating privacy concerns of the bank as an external software company. 25-minute Talk Stephen Piron, founder,
S7595 - Building Truly Large-Scale Medical Image Databases: Deep Label Discovery and Open-Ended Recognition The recent rapid and tremendous success of deep neural networks on many challenging computer vision tasks derives from the accessibility of the well-annotated ImageNet and PASCAL VOC datasets. Nevertheless, unsupervised image categorization (that is, without ground-truth labeling) is much less investigated, critically important, and difficult when annotations are extremely hard to obtain in the conventional way of "Google Search" + crowd sourcing (exactly how ImageNet was constructed). We'll present recent work on building two truly large-scale radiology image databases at NIH to boost the development in this important domain. The first one is a chest X-ray database of 110,000+ images from 30,000+ patients, where the image labels were obtained by sophisticated natural language processing-based text mining and the image recognition benchmarks were conducted using weakly supervised deep learning. The other database contains about 216,000 CT/MRI images with key medical findings from 61,845 unique patients, where a new looped deep pseudo-task optimization framework is proposed for joint mining of deep CNN features and image labels. Both medical image databases will be released to the public 50-minute Talk Le Lu, Staff Scientist, National Institutes of Health
S7792 - Buildling Exascale Deep Learning Tools to Help Understand Cancer Biology at the Molecular Scale Understanding the biology of cancer at the molecular scale is a critical challenge for the RAS oncogene family of cancers. We are developing an adaptive molecular dynamics simulation framework that uses multi-scale models to achieve simulation time scales that allow biologically interesting behaviors to emerge. We'll develop new deep learning techniques that can help identify phase transitions, the formation of complex structures, and the detection of interesting events between the RAS protein and cell membrane. This molecular dynamics simulation data will drive the need for new techniques in both model and data parallelism within deep learning toolkits, and require the capabilities of next-generation supercomputers such as SIERRA and Summit at LLNL and ORNL, respectively. 25-minute Talk Brian Van Essen, Computer Scientist, Lawrence Livermore National Laboratory
S7438 - Build Systems: Combining CUDA and Modern CMake Learn all about CMake's new CUDA support and how best to combine it with "modern" CMake usage requirements. CMake is an open-source, cross-platform meta build generator. This year CMake was updated to fully support CUDA as a first-class language on all major platforms. This enables projects to fully leverage "modern" target-based features inside projects that require CUDA compilation. We'll iteratively develop the CMake logic for a sample project using modern CMake with a focus on CUDA. We'll cover transitive usage requirements, how to request language standard levels, mix language libraries, CUDA separable compilation, and generating export configuration files. We expect people to already have some familiarity with the CMake language. 25-minute Talk Robert Maynard, Staff R&D Engineer, Kitware, Inc.
S7636 - Cache Directive Optimization in OpenACC Programming Model OpenACC is a directive-based programming model that provides a simple interface to exploit GPU computing. As the GPU employs deep memory hierarchy, appropriate management of memory resources becomes crucial to ensure performance. The OpenACC programming model offers the cache directive to use on-chip hardware (read-only data cache) or software-managed (shared memory) caches to improve memory access efficiency. We have implemented several strategies to promote the shared memory utilization in our PGI compiler suite. We'll briefly discuss our investigation of cases that can be potentially optimized by the cache directive and then dive into the underlying implementation. Our compiler is evaluated with self-written micro-benchmarks as well as some real-world applications.  25-minute Talk Xiaonan Tian, GPU Compiler Engineer, NVIDIA
S7540 - CAE Productivity and GPU Technology We'll present performance results for the NVIDIA Tesla P100. Simulation is the key to greater productivity in many areas of product development and GPU technology plays a crucial role in achieving that goal. We'll use the simulation of a full 3D particle compaction process to compare run times with the NVIDIA Tesla K40. The results are generated from a commercially available nonlinear explicit transient dynamic finite element solver that takes full advantage of GPU technology for parallelization. The commercial software used to create the finite element mesh includes newly developed meshing techniques that make it easy to create the model. We'll also discuss details of the commercially available hardware used to perform the simulation, which has been certified for the P100. 25-minute Talk Wayne Mindle, Director of Sales & Marketing, CertaSIM, LLC
S7601 - Caffe2: A New Lightweight, Modular, and Scalable Deep Learning Framework Caffe2 is a new lightweight, modular, and scalable deep learning framework refactored from the previous Caffe. Caffe2 is widely used at Facebook for production to enable new AI experiences. We'll explain the strengths of Caffe2 and many improvements we made from the original Caffe. 25-minute Talk Yangqing Jia, Research Scientist, Facebook
S7731 - Can an Artificial Intelligence Win a Nobel Prize? We're investigating if deep learning can help scientists exploring fundamental physics with ultra-cold atoms. The Nobel prize was awarded to scientists who first discovered how to cool atoms to near absolute zero to create a special phase of matter called a Bose-Einstein Condensate (BEC). In a BEC all atoms are in the same quantum state, meaning they move together as if they are one super atom. We can use BECs to make ultra-precise measurements of gravity, potentially allowing us to make gravitational images to see hidden features in the world around us. BECs are made using a process of evaporative cooling, where the boundaries that trap the atoms are changed over time to let the hotter atoms escape. This approach has hit a limit, and BECs have remained around the same size for the last 10 years. We are handing over control of our ultra-cold atom experiment to a deep-learning algorithm, and investigating if it can find entirely new ways to make BECs. In particular we let the deep learning algorithm take control of not only the boundaries of the atoms but the interactions between the atoms as well. 25-minute Talk Michael Hush, Lecturer, University of New South Wales
S7788 - CANDLE: Predicting Tumor Cell Response to Drug Treatments We'll focus on one of the three pilots of the DOE and NCI partnership on precision oncology and the Cancer Moonshot, namely predicting tumor cell response to drug treatments with deep learning. Predicting tumor cell response to drug treatments is a critical challenge for accomplishing the promise of precision medicine in oncology. As part of a joint project between DOE and NCI to develop advanced computing solutions for caner, we are developing a deep learning-based framework for modeling tumor-drug interaction and predicting dose response in pre-clinical screening. 25-minute Talk Fangfang Xia, Computer Scientist, Argonne National Laboratory
S7698 - CanvoX: High-Resolution VR Painting for Large Volumetric Canvas As Tilt Brush and Quill are not voxel based, a new VR-based voxel painting system with large (40km^3) and detailed (0.3mm^3) canvas would be interesting. We develop an array of octree of depth 24, using 5 indices per cell: parent, child, and 3-neighbors to accelerate ray traversal. We adaptively refine or coarsen the octree in CPU and sync it with GPU, and then ray cast front to back. To accelerate, we develop a foveated rendering algorithm. We design a quadtree render target whose resolution is dynamically adjusted to heat map, traverse ray, and then interpolate the color in screen space. We traverse ray through upper-level cells as the ray cone widens. We analyze floating point error propagations to thoroughly understand precision problems in deep cells and ray intersections. 25-minute Talk Yeojin Kim, PhD Student, Ewha Womans University
S7606 - Capture and Rendering of Interactive 3D Audio for Virtual and Augmented Reality The goal of VR and AR is to immerse the user in a created world by fooling the human perceptual system into perceiving rendered objects as real. This must be done without the brain experiencing fatigue: accurate audio representation plays a crucial role in achieving this. Unlike vision with a narrow foveated field of view, human hearing covers all directions in full 3D. When the rendered audio and vision do not agree, the user falls out of the experience. The importance of audio for VR and AR are being increasingly realized, and VisiSonics is developing a comprehensive toolset to address the needs of industry. We'll describe several products developed by VisiSonics that are based on over a decade of research. These include propagation engines that are embedded in standard authoring workflows for gaming (Unity, Unreal, Wwise, FMOD) and movie postproduction (Adobe, ProTools); capture of audio into high-order ambisonics and MPeg-H; personalization of 3D audio to the individual's head shape via customization of the head-related transfer function and others. We'll demonstrate workflow solutions designed to enrich the audio immersion for the gaming, video post-production and capture in VR/AR. 25-minute Talk Ramani Duraiswami, CEO, VisiSonics Corporation
S7202 - Capturing Real-Time 360 Stereo Video from 3D Applications 360 video is a new and exciting way to share immersive content with other people. We'll describe both the techniques required to optimize performance and the best practices to avoid various visual artifacts. We'll cover efficient cube-map rendering, stereo-conversion of the cube-map, and handling of translucent objects. We'll share some of the pitfalls of working with particles, billboards, lighting, tone mapping, screen-space effects, etc. 50-minute Talk Alexey Panteleev, Senior Developer Technology Engineer, NVIDIA
S7600 - ChainerMN: Scalable Distributed Deep Learning with Chainer We'll present ChainerMN, a multi-node distributed deep learning framework, together with the basics of distributed deep learning. Even though GPUs are continuously gaining more computation throughput, it is still very time-consuming to train state-of-the-art deep neural network models. For better scalability and productivity, it is paramount to accelerate the training process by using multiple GPUs. To enable high-performance and flexible distributed training, we developed ChainerMN, built on top of Chainer. We'll first introduce the basic approaches to distributed deep learning. Then, we'll explain the design choice, basic usage, and implementation details of Chainer and ChainerMN. We'll report benchmark results and discuss the future directions of distributed deep learning. 25-minute Talk Takuya Akiba, Researcher, Preferred Networks, Inc.
S7280 - CLBlast: A Tuned BLAS Library for Faster Deep Learning We'll demonstrate how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at deep learning training and inference and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the convolutional layers: the computational heart of all deep-learning frameworks (TensorFlow, Caffe, etc.). CLBlast has three main advantages over other BLAS libraries: 1) it can be explicitly tuned for specific matrix-sizes and hardware platforms, 2) it runs on less common devices (and it is fast), such as embedded and low-power GPUs, and 3) it can perform operations in half-precision FP16 format, saving precious bandwidth, time, and power. 25-minute Talk Cedric Nugteren, GPU / deep learning specialist, TomTom
S7837 - Cloud and Edge Deep Learning Platform for various real business fields "ABEJA Platform" is PaaS (Platform as a Service) architected for "Society5.0" and "Industry4.0" Ecosystem of IoT, Big data and AI, that collect sensor data from IoT devices, collaborate with existing data, training on cloud from their Big data using Deep Learning, inference on edge and cloud based the trained models, output inferred data via API. In the training phase of Deep Learning, the platform is optimized on distribution by GPU. In the inference phase, they admin various models(versioned), and deploy on cloud base distributed inference system or edge-side computer (ex: Jetson). Also, since the system is automatically scaled per the request on cloud. And the cloud system monitor the edge-side computer and user can control it like cloud system. 25-minute Talk Naoki Tonogi, COO/CFO, ABEJA
S7654 - Cloud-Based Deep Learning as the Radiologist's Best Friend Sad but true: most of radiology is mind-numbing tedium. Radiologists spend countless hours on tasks that are onerous and error-prone, resulting in high costs and frequent misdiagnoses. Our first product designed to address these deficiencies is Arterys Cardio DL, a web-based, zero-footprint cardiac MRI postprocessing suite. Arterys Cardio DL includes a deep learning-based contouring algorithm that vastly reduces the time required to diagnose heart disease in cardiac MRI. Arterys Cardio DL is the first technology ever to be cleared by the FDA that leverages cloud computing and deep learning in a clinical setting. We'll discuss the technology behind the software and how we proved its safety and efficacy to secure FDA clearance in the United States and the CE Mark in Europe. 25-minute Talk Daniel Golden, Director of Machine Learning, Arterys
S7657 - CloudBrain: AI SaaS Case Study in China CloudBrain is providing deep learning/AI SaaS to enterprises to automatically optimize their key performance indexes. We build a deep learning/AI platform using the latest NVIDIA GPUs and CUDA technologies, which enables us to research and implement state-of-the-art learning/inference algorithms with fast iterations. This platform also reduces the hardware/operation cost and hence improve clients' return over investment. We'll present two case studies in the fintech and energy sectors. 25-minute Talk Benyu Zhang, CEO, CloudBrain
S7296 - CloudLighting: Merging GPU-based HPC with Cloud Services Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ww look at how to integrate Heterogenous Computing with Cloud technologies. 25-minute Talk Anne C Elster, Professor of High Performance Computing, Norwegian University of Science &Technoloy /Univ. of Texas at Austin
S7489 - Clustering GPUs with Ethernet As GPUs get more widely deployed for machine learning, training is being done over larger datasets than ever before resulting in longer training time. Reducing training time from days to hours or less, requires clustering of large number of GPUs. As more users are starting to see the benefits of machine learning to their businesses, there is also a need to provide on-demand access to the users of these data center-based clusters. The ideal technology for such large-scale clustering in the data center is Ethernet. We'll discuss the work Broadcom is doing with NVIDIA to enable GPUDirect using its RoCE v2 line of Ethernet NICs. 25-minute Talk Fazil Osman, Distinguished Engineer, Broadcom Limited
S7471 - Combining NVIDIA Docker and Databases to Enhance Agile Development and Optimize Resource Allocation Learn how to use NVIDIA Docker combined with database analysis to improve your agile development process, generalize hardware requirements, speed up deployment, and identify optimal configurations. Discover how to leverage the resource isolation of Docker containers to test different GPU-architecture performances and resource allocation to optimize system use and maximize processing throughput. Learn how to test this resource isolation using agile methods including development of a processing chain from multi-threaded CPU, to single GPU, and finally to multi-GPU architecture. Hear our observations about compilation timing, execution performance, resource allocation, and generation of CUDA binaries within containers while showcasing an automated image registration pipeline. 50-minute Talk Sophie Voisin, Research & Development Associate, Oak Ridge National Laboratory
Christopher Davis, Geospatial Software Engineer , Oak Ridge National Laboratory
S7423 - Community Detection on the GPU Community detection is a key kernel in the analysis of complex networks for a variety of fields. We'll present our implementation of a new GPU algorithm for community detection based on the Louvain Method. Our approach parallelizes the access to individual edges, enabling load balancing of networks with nodes of highly varying degrees. We're able to obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order of magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads. 25-minute Talk Antonino Tumeo, Research Scientist, Pacific Northwest National Laboratory
Mahantesh Halappanavar, Research Scientist, Pacific Northwest National Laboratory
S7472 - Comparative Study of CNN Models for Detection of Clouds in Overhead Imagery Learn how to improve pixel-wise image quality and geolocation accuracy by leveraging high-end hybrid computing resources. This particular test case involves the use of deep learning in the detection and masking of cloud objects, and imagery content that reduces image quality and usability, from overhead imagery. Timely results are attained through expediting selection and deployment of a deep learning model for overhead imagery for the cloud detection problem. An optimum deep learning model is selected through evaluation of a set of convolutional neural networks for their ability to detect cloud objects. Evaluation of each network is performed using a number of open-source neural network packages to give comparative performance results. In addition, two complementary image segmentation techniques are implemented in parallel, one operating on CPUs and the other on GPUs, to rapidly obtain candidate regions for cloud objects at a fine resolution. 25-minute Talk Byung Hoon Park, R&D Staff Scientist, Oak Ridge National Laboratory
S7635 - Comparison of OpenACC and OpenMP4.5 Offloading: Speeding Up Simulations of Stellar Explosions Learn about a case-study comparing OpenACC and OpenMP4.5 in the context of stellar explosions. Modeling supernovae requires multi-physics simulation codes to capture hydrodynamics, nuclear burning, gravitational forces, etc. As a nuclear detonation burns through the stellar material, it also increases the temperature. An equation of state (EOS) is then required to determine, say, the new pressure associated with this temperature increase. In fact, an EOS is needed after the thermodynamic conditions are changed by any physics routines. This means it is called many times throughout a simulation, requiring the need for a fast EOS implementation. Fortunately, these calculations can be performed independently during each time step, so the work can be offloaded to GPUs. Using the IBM/NVIDIA early test system (precursor to the upcoming Summit supercomputer) at Oak Ridge National Laboratory, we use a hybrid MPI+OpenMP (traditional CPU threads) driver program to offload work to GPUs. We'll compare the performance results as well as some of the currently available features of OpenACC and OpenMP4.5. 25-minute Talk Tom Papatheodore, Solutions Architect, NVIDIA
S7334 - Computational Focus-Tunable Near-eye Displays We'll explore unprecedented display modes afforded by computational focus-tunable near-eye displays with the goal of increasing visual comfort and providing more realistic and effective visual experiences in virtual and augmented reality. Applications of VR/AR systems range from communication, entertainment, education, collaborative work, simulation, and training to telesurgery, phobia treatment, and basic vision research. In every immersive experience, the primary interface between the user and the digital world is the near-eye display. Many characteristics of near-eye displays that define the quality of an experience, such as resolution, refresh rate, contrast, and field of view, have been significantly improved over the last years. However, a pervasive source of visual discomfort prevails: the vergence-accommodation conflict (VAC). Further, natural focus cues are not supported by any existing near-eye display. 25-minute Talk Nitish Padmanaban, PhD Student, Stanford Computational Imaging Lab
S7507 - Computer Preemption and TotalView Have Made Debugging Pascal Much More Seemless With Pascal, NVIDIA released compute preemption built right into the card. Debugging now is much smoother because when we stop a thread on the GPU we no longer stop the whole GPU, enabling interactive debugging on single-GPU systems and debugging multiple processes using the same GPU. Having said that, TotalView, the leading multi-threaded Linux debugger, has invested into improving its architecture to support multi-GPU systems at scale, resulting in a much more seamless debugging experience. Come get a better understanding of the latest technology and how and where we are looking to go next. 25-minute Talk Martin Bakal, Product Manager, Rogue Wave Software
Larry Edelstein, Sales Engineer, Rogue Wave Software
S7277 - Computer Virtual Experiment on Fluidized Beds Using GPU Accelerated CFD-DEM Method Learn how to use GPUs to accelerate CFD-DEM, the computational fluid dynamics - discrete element method, to achieve computer virtual experiment on fluidized beds in the chemical engineering field. We'll discuss how to organize the gas- and solid-phase equations solved concurrently by CPUs and GPUs in a heterogeneous supercomputing system. With systematic optimization of the model, numerical method, software, and hardware, we can simulate lab- to pilot-scale fluidized beds at quasi-realtime speed, and conduct demos of such systems. Our method realizes some real applications tthat need very long time simulations. 25-minute Talk Ji Xu, Associate processor, Institute of Process Engineering, Chinese Academy of Sciences
S7173 - Concept to Production: An Architectural Design Firm's Jump to Virtualization About a year ago, CannonDesign embarked on a journey to relocate and upgrade its entire data center, implementing NVIDIA GRID technology, to allow us to collaborate on architectural and engineering design projects throughout all of our offices worldwide. Now we're using our graphics-intensive applications on virtual desktops in our new data center. The design of the infrastructure and implementation of the migration was not without its hurdles, but we're here to share our journey. We'll give some insight into our designs for the virtual desktops, how the machines performed compared to our initial benchmarks, lessons learned, recommendations of tweaks we made, and a glimpse into some of our future plans. If you're planning a virtual desktop infrastructure, interested in creating a virtual environment designed around graphics-intensive applications, or are looking to upgrade and tweak your current environment, come learn from our journey. 50-minute Talk Andrew Schilling, Director of Information Technology, CannonDesign
Jimmy Rotella, Design Application Specialist, CannonDesign
H7113 - Connect with the Experts: Accelerated Graph & Data Analytics Learn about the latest capabilities for Accelerated Graph & Data Analytics. How do GPUs excel at communication driven workloads like graph analytics? Come and find out! We will discuss libraries, benchmarks, tools and frameworks. Share your experiences, suggestions and questions regarding GPUs as a platform for batch and streaming analytics. 1 Hour Connect with the Experts Frank Eaton, Technical Lead, Accelerated Graph & Data Analytic, NVIDIA
H7129 - Connect with the Experts: Accelerated Libraries - cuFFT, cuSPARSE, cuSOLVER, nvGRAPH This Connect with the Experts session focuses on GPU-accelerated libraries and gives an opportunity for attendees to connect with NVIDIA engineers. The libraries focused in this session are - cuFFT, cuFFTW - cuSPARSE - cuSOLVER - nvGRAPH 1 Hour Connect with the Experts Alex Fit-Florea, Library Engineering Manager, NVIDIA
H7125 - Connect with the Experts: Advanced Deep Learning Attend this session to get your technical questions about Deep Neural Network architectures and scaling Deep Learning applications answered. Learn more about strategies you can employ to explore the right neural network architectures for your problem and train at scale to converge to your solution faster. NVIDIA deep learning research and HPC experts can provide you with the right guidance to maximize the performance and accuracy of your Deep Learning based solution. 1 Hour Connect with the Experts Bryan Catanzaro, Mgmt, Hardware (Engineering), NVIDIA
Michael Houston, Distinguished Engineer, NVIDIA
Ujval Kapasi, Mgmt, Sys SW, NVIDIA
Sylvain Jeaugey, Senior Communication and Computing Engineer, NVIDIA
H7114 - Connect with the Experts: Building Autonomous Vehicles using DRIVE Platforms Connect with NVIDIA experts and discuss why autonomous technologies powered by deep learning have become a key focus for every car manufacturer, as well as transportation services and technology companies. The car needs to know exactly where it is, recognize the objects around it, and continuously calculate the optimal path for a safe driving experience. This situational and contextual awareness of the car and its surroundings demands a powerful visual computing system that can merge data from cameras and other sensors, plus navigation sources, while also figuring out the safest path - all in real-time. This autonomous driving platform is NVIDIA DRIVE PX. 1 Hour Connect with the Experts Shri Sundaram, null, null
H7116 - Connect with the Experts: Build Product (FOR INCEPTION PROGRAM PARTNERS) This is one of three Connect with the Experts sessions created exclusively for the members of our AI and deep learning startup program, Inception, and will focus on how to build your product. We will be focusing on product design and how you can scale out. Speak with experts on your technology stack (data, computer, model, deployment, etc), and considering scaling out through training and inference through the cloud or in-house. 1 Hour Connect with the Experts
H7120 - Connect with the Experts: Containers for GPU Applications Interactive session to answer any question you might have regarding using GPUs with Linux containers technologies (such as Docker, Rkt or Singularity) and how to deploy GPU applications in your cluster with containers orchestrators (such as Kubernetes or Mesos). We will also share tips on how to tune containers for high-performance applications. This session complements the presentation "S7177 - Using Containers for GPU-Accelerated Applications". Containers technologies are evolving very quickly, your use case might not have been covered in this presentation. 1 Hour Connect with the Experts Jonathan Calmels, Systems Software Engineer, NVIDIA
Felix Abecassis, Systems Software Engineer, NVIDIA
Renaud Gaubert, null, null
H7109 - Connect with the Experts: Creating Efficient OpenCL Software In this free-format interactive session, get to meet and interact directly with engineers who build the NVIDIA OpenCL system software. Key focus areas for the session are efficient memory management and performance optimizations, but all topics welcome! 1 Hour Connect with the Experts Karthik Raghavan Ravi, Engineering Manager, Compute SW, NVIDIA
H7124 - Connect with the Experts: Deep Learning Applications Attend this session to get your questions answered on deep learning applications in computer vision, signal processing, natural language processing and others. Learn more about the different types of deep neural networks and algorithms used in various applications. NVIDIA experts can help you choose the right approach for your application and project. 1 Hour Connect with the Experts Dennis Lui, Solutions Architect, Nvidia
Jeremy Appleyard, Engineer, Tech SW, NVIDIA
Julie Bernauer, Mgmt, Solutions Architect, NVIDIA
Su Inn Park, Architect, Solutions, NVIDIA
Nathan Luehr, Engineer, Tech SW, NVIDIA
H7121 - Connect with the Experts: Deep Learning Basics Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic. 1 Hour Connect with the Experts Joohoon Lee, Certified Instructor, NVIDIA
Jonathan Bentz, Certified Instructor, NVIDIA
Slawomir Stepniewski, Engineer, Tech SW, NVIDIA
Lick-Kong Tam, Architect, Solutions, NVIDIA
Lawrence Brown, Mgmt, Solutions Architect, NVIDIA
Khairul Kabir, Engineer, Tech SW, NVIDIA
Simon Layton, Engineer, Tech SW, NVIDIA
Scott Yokim, Engineer, Tech SW, NVIDIA
Jonathan Barker, Architect, NVIDIA
Ben Barsdell, Developer Technology Engineer, NVIDIA
Natalia Gimelshein, Engineer, Tech SW, NVIDIA
H7123 - Connect with the Experts: Deep Learning Deployment (Cloud, Datacenter and Embedded) Attend this session to get your questions on deep neural network deployment answered. Learn more about deployment platforms such as cloud, datacenters and embedded and merits and limitations of each approach. NVIDIA experts can help you choose the right deployment platform for your application and project. 1 Hour Connect with the Experts Kismat Singh, Mgmt, Dev Tech SW, NVIDIA
Sharanyan Chetlur, Engineer, Tech SW, NVIDIA
Mostafa Hagog, Mgmt, Dev Tech SW, NVIDIA
Micah Villmow, Engineer, Tech SW, NVIDIA
Dilip Sequeira, Engineer, Tech SW, NVIDIA
H7132 - Connect with the Experts: Deep Technical Dive into NVIDIA GRID We will be taking a deep dive on both the software and hardware for NVIDIA GRID technology. Maybe you are in the process of implementing GRID technology for your enterprise. Maybe you are just curious. Stop by for a chat. 1 Hour Connect with the Experts
H7122 - Connect with the Experts: Frameworks for Training Deep Neural Networks Attend this session to get you questions on deep learning frameworks answered. Learn more about widely used Deep Learning Frameworks such as Caffe, Theano, Torch, TensorFlow, CNTK, and MXNet and let NVIDIA experts can help you with choosing the right framework for your research or project. 1 Hour Connect with the Experts Luke Yeager, Engineer, Sys SW, NVIDIA
Deyu Fu, Engineer, Tech SW, NVIDIA
Boris Ginsburg, Deep Learning Engineer, NVIDIA
Khairul Kabir, Engineer, Tech SW, NVIDIA
Sharanyan Chetlur, Engineer, Tech SW, NVIDIA
Kevin Vincent, Engineer, Tech SW, NVIDIA
Michael O'Connor, Senior Engineering Manager, Deep Learning, NVIDIA
John Woolley, Mgmt, Dev Tech SW, NVIDIA
Ryan Olson, Architect, Solutions, NVIDIA
Allison Gray, Solutions Architect, NVIDIA
H7118 - Connect with the Experts: Go To Market Strategy (FOR INCEPTION PROGRAM PARTNERS) This is one of three Connect with the Experts sessions created exclusively for the members of our AI and deep learning startup program, Inception, and will focus on how to go-to-market. We will be focusing on your startup's go-to-market strategy and how you can scale out. Speak with experts on how to optimize your vertical strategy and leverage partnerships. Enhance your marketing efforts and perfect your business models. 1 Hour Connect with the Experts
H7133 - Connect with the Experts: How Many Users Per Host with NVIDIA GRID Learn about depending on the user profile, how many hosts can be supported with NVIDIA GRID. Learn from engineers what the best options are for your business.  1 Hour Connect with the Experts
H7137 - Connect with the Experts: HPC Visualization in Virtual Reality Attend this special session and get a first glimpse of scientific data in ParaView being exported to a scene composed with the Unreal Engine. We will discuss the workflow that shows how geometry generated in a running ParaView session can be uploaded to a running pre-composed Unreal VR scene on-the-fly. You can now enjoy the immersive experience from consumer VR while taking advantage of easy-to-create high-quality graphical environment for scientific data. 3-Hour Connect with the Experts Kees van Kooten, Scientific Visualization Software Engineer, NVIDIA
H7110 - Connect with the Experts: Jetson Developer Kit and Software Development Connect with the experts to learn about the Jetson developer kit and processor module. Experts will be on-hand to discuss the Jetson platform and answer your questions. NVIDIA Jetson with GPU-accelerated parallel processing is the world's leading embedded visual computing platform. It features high-performance, low-energy computing for deep learning and computer vision making the Jetson platform ideal for compute-intensive embedded projects like drones, autonomous robotic systems, mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, makers and hobbyists can use the NVIDIA Jetson Developer Kit and module to explore the future of embedded computing. 1 Hour Connect with the Experts Eric Brower, null, null
H7111 - Connect with the Experts: MDL Ask questions about MDL, or the NVIDIA vMaterial library. Exchange ideas or just give feedback. 1 Hour Connect with the Experts Jan Jordan, Software Product Manager MDL, nvidia
H7108 - Connect with the Experts: Mental Ray and Iray Rendering Workflows Come discuss rendering workflows with the experts on NVIDIA Mental Ray and NVIDIA Iray. NVIDIA Mental Ray rendering software generates images of outstanding quality and unsurpassed realism. It combines physically based light simulation with full programmability to let you create any imaginable visual effect. NVIDIA Iray is a highly interactive and intuitive physically based rendering technology that generates photorealistic imagery by simulating the physical behavior of light and materials. It's a highly predictive approach that marries with the scalable, world-class performance across NVIDIA GPUs to give constant feedback and rapid results. 1 Hour Connect with the Experts Barton Gawboy, Product Designer, NVIDIA mental ray for Maya , NVIDIA
Peter de Lappe, Product Manager, NVIDIA mental ray, NVIDIA
Jay Axe, Technical Product Manager, NVIDIA Corporation
H7117 - Connect with the Experts: Moving from Machine Learning to Deep Learning (For Inception Program Partners) This is one of three Connect with the Experts sessions created exclusively for the members of our AI and deep learning startup program, Inception, and will focus on how you can switch from machine learning to deep learning. Speak with experts on how to use GPUs on the edge. This is your chance to show and tell us what you are working and work with experts about pushing your company further in the deep learning space. Learn more how the Deep Learning Institute can help too. 1 Hour Connect with the Experts
H7119 - Connect with the Experts: Multi-GPU Programming Wondering how to scale your code to multiple GPUs in a node or cluster? Having the need to discuss some CUDA-aware MPI details? Interested in knowing more about the new entry into GPUDirect Technologies: GPUDirect Async? This is a right session for you to ask your beginner or expert questions on Multi-GPU Programming, GPUDirect, NCCL, NVSHMEM and MPI. 1 Hour Connect with the Experts Jiri Kraus, Senior Devtech Compute, NVIDIA
Sreeram Potluri, Senior CUDA Software Engineer, NVIDIA
H7135 - Connect with the Experts: NVIDIA Data Center Tools Attendees will learn the latest about the NVIDIA Data Center Tools, including Data Center GPU Manager (DCGM), NVIDIA Validation Suite (NVVS), NVIDIA Management Library (NVML), and new tools to verify system health. 1 Hour Connect with the Experts Brent Stolle, Software Engineer, NVIDIA
Scott McMillan, Software Architect, NVIDIA
H7128 - Connect with the Experts: NVIDIA Deep Learning Institute Certified instructors from the NVIDIA Deep Learning Institute (DLI) will share how developers, data scientists, and researchers can access hands-on technical training from NVIDIA to solve challenging problems with deep learning. This session will cover everything you need to know about DLI, including which labs are offered, how to access labs online, how to find a workshop near you, and more. Plus, our experts are available to answer your technical questions about deep learning for Automotive, Healthcare, Finance, and other important industries. 1 Hour Connect with the Experts Charles Killam, Senior Certified Instructor, NVIDIA
Kelvin Lwin, Senior Deep Learning Institute Instructor, NVIDIA
H7130 - Connect with the Experts: NVIDIA GPUDirect Technologies on Mellanox Network Interconnects NVIDIA GPUDirect family of technology is meant to accelerate data exchange in GPU accelerated applications. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. Since 2013, Mellanox works with NVIDIA to enable GPUDirect support, with large scale deployments in HPC and Artificial Intelligence. During this session, we will briefly discuss the state of the art capabilities of GPUDirect RDMA and GPUDirect Async, while devoting most of the time to a Q&A session with users. 1 Hour Connect with the Experts Davide Rossetti, Senior SW engineer, NVIDIA
Gil Bloch, Principal Architect, Mellanox
Scot Schultz, Sr. Director, HPC/Artificial Intelligence & Technical Computing, Mellanox
H7134 - Connect with the Experts: NVIDIA GRID Archictects Speak with NVIDIA engineers and architects to answer your datacenter questions. This is the best place to get your queries concerning visualization answered, from user-level to developer-level. Learn how to achieve GPU-accelerated graphics while maintaining security and get your questions answered as to the best methods implementing NVIDIA GRID for your enterprise. 1 Hour Connect with the Experts
H7112 - Connect with the Experts: NVIDIA Video and Capture SDK Join this Connect with the Experts session for answering any specific questions and understand feedback from customers of NVIDIA Video SDK and NVIDIA Capture SDK. 1 Hour Connect with the Experts Abhijit Patait, Director, System Software, NVIDIA
H7103 - Connect with the Experts: OpenACC: Start with GPUs and Optimize Your Code This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others. 1 Hour Connect with the Experts Guido Juckeland, IT Architect and Leader Hardware Accelerator Group, TU Dresden - ZIH
Jiri Kraus, Senior Devtech Compute, NVIDIA
Michael Wolfe, Senior Compiler Engineer, NVIDIA
Jeff Larkin, DevTech Software Engineer, NVIDIA
Mathew Colgrove, Developer Technologist, NVIDIA
Sunita Chandrasekaran, Associate Professor, University of Delaware
H7115 - Connect with the Experts: OpenGL and CUDA Come by to ask questions on OpenGL and CUDA. 1 Hour Connect with the Experts Chris Hebert, Computergraphics engineer at Devtech team (Developer technical relations), NVIDIA
H7102 - Connect with the Experts: Programming at Scale Discover the forthcoming features and techniques for harvesting parallelism on large-scale systems. Bigger, better, faster. Scaling up and out can help get us there. NVIDIA platforms have unique features that offer better power efficiency, easier access, higher bandwidth and lower communication latency than our competition. But how do we write, refactor and optimize codes to get to scale? NVIDIA offer a vision that encompasses work distribution, efficient data communication, and ease of programming. It covers asynchronous task parallelism, pushing control and communication down to where the data is, and selective exercise of control over how work and data are mapped to the underlying platform. Expect an informative and engaging discussion about how NVIDIA tech applies to your work. 1 Hour Connect with the Experts CJ Newburn, HPC Lead, Compute SW, NVIDIA
H7107 - Connect with the Experts: VR: GL, DX & VK Come talk to us about anything VR related. We invite you to discuss anything from efficient rendering over multi-GPU rendering to the newest hardware features. 1 Hour Connect with the Experts Ingo Esser, Sr. DevTech Engineer, NVIDIA
Christoph Kubisch, Sr. Developer Technology Engineer, NVIDIA
Patrick Mours, DevTech Engineer, NVIDIA
Robert Menzel, DevTech Engineer, NVIDIA
H7127 - Connect with the Experts: VRWorks Tools Come meet experts from the NVIDIA Software, Developer technology and tech marketing team to learn how to use DesignWorks, VRWorks and GameWorks to improve your VR experience 1 Hour Connect with the Experts Vincent Brisebois, Senior Product Marketing Manager, DesignWorks/VRWorks, NVIDIA
Manuel Kraemer, Engineer, Tech SW, NVIDIA
Rochelle Pereira, Mgmt, Sys SW, NVIDIA
Edward Liu, Senior Developer Technology Engineer, NVIDIA
H7104 - Connect with the Experts: Vulkan, OpenGL, Graphics pipeline Opened to discuss anything around realtime rendering using OpenGL API or Vulkan API; discuss about NVIDIA-specific features that could be of any interest to speed-up the rendering process 1 Hour Connect with the Experts Christoph Kubisch, Sr. Developer Technology Engineer, NVIDIA
Tristan Lorach, null, null
Chris Hebert, Computergraphics engineer at Devtech team (Developer technical relations), NVIDIA
S7568 - Continuous GPU-Based Tensorflow AI Model Deployments across Hybrid Cloud and On-Premise Environments In a completely demo-based session, we'll demo the latest 100% open source research in high-scale, fault-tolerant Spark ML and TensorFlow AI model serving using NVIDIA GPUs across a hybrid AWS, Google, and Azure deployment environment. We'll focus on continuous ML/AI model deployment, auto-scaling within a cloud environment, and "auto-shifting" between cloud environments for eXtreme High Availability (XHA) and cost savings. We'll use 100% open source tools, including Jupyter Notebook, Docker, Kubernetes, Airflow, and NetflixOSS Microservices for all demos. This talk will be one of a kind as there nobody else in the world is using this type of advanced ML/AI deployment strategy. And we'll do it live on stage! 50-minute Tutorial Chris Fregly, Research Scientist, PipelineAI
S7294 - Controlling Hundreds of GPU-Powered Plasma-Physics Simulations with Machine Learning Algorithms Better hardware and algorithms have made plasma-physics particle-in-cell codes much faster. Instead of running individual simulations, it's now common to explore the space of physical parameters with large sets of simulations. However, predefined regularly spaced parameter scans can be inefficient and expensive. Instead, we use an adaptive algorithm that learns from previous simulations and determines the most promising parameters to try next. We illustrate this method on the problem of electron injection in laser-wakefield acceleration. Using hundreds of GPU-powered simulations with the code FBPIC on the Titan cluster at ORNL, the algorithm quickly focuses on the most relevant regions of the explored parameter space. 25-minute Talk Remi Lehe, Postdoctoral Fellow, Lawrence Berkeley National Laboratory
S7605 - Convolutional Neural Networks for Modeling Temporal Biomarkers and Disease Predictions Lab values and biomarkers are often irregularly and asynchronously measured, making them difficult to use in predictive modeling. However, temporal trends can still be recovered from these measurements and are important for predicting disease onsets. We'll present a novel model of high-dimensional temporal input and high-dimensional output. Our model is composed of two convolutional neural network components. The first component is an efficient convolution-based formulation of multivariate kernel regression, which allows us to estimate each biomarker at each time point from the rest of the biomarker time series. The second component is a multi-resolution, multi-task convolutional neural network that recovers temporal trends most predictive of up to 170 diseases. We'll show how this multi-task formulation allows us to retain the correlation structure among the diseases throughout the training. Our experiments on data from 298K individuals over 8 years, up to 100 common lab measurements, and 171 diseases show that the temporal signatures learned via convolution are significantly more predictive than baselines commonly used for early disease diagnosis. 25-minute Talk Narges Razavian, Assistant Professor, New York University Langone Medical Center
S7440 - Create High-Quality Materials from Scans with MDL and Substance A worldwide leader for procedural texturing in the gaming industry with its Substance technology, Allegorithmic has partnered with NVIDIA to release Substance Designer 5.5, the first MDL visual editor to efficiently author material and transport the material definition across all supporting software. We'll present a full customer workflow, from high-resolution image scanning to actual MDL-defined material that could serve as reference, similarly to those available through Substance Source. We'll demonstrate customer use cases and present results (at GTC 2016 we showcased Hyundai and Harley-Davidson) with a live demo of Substance solutions with NVIDIA Iray rendering on an NVIDIA VCA cluster, as well as an update on new features of Substance Designer 6.0 released in February 2017. 50-minute Talk Pierre Maheut, Product Manager, Allegorithmic
Jerome Derel, Chief Product Officer, Allegorithmic
S7552 - Creating & Exploring Enterprise VR Content Enterprise Virtual Reality offers the promise of accelerating and disrupting traditional design and modeling workflows. By working in VR, architects, designers, and artists can experience their data at life-scale; and they can collaboratively explore design options in a shared virtual environment. But for enterprise VR experiences to become pervasive, easy to use content creation and exploration tools are required. In this presentation, we will discuss challenges and solutions for creating and exploring enterprise VR content. 25-minute Talk David Weinstein, null, NVIDIA
S7823 - Crowdsourcing 3D Semantic Maps for Vehicle Cognition Extracting context from the vehicle's environment remains one of the major challenges to autonomy. While this can be achieved in highly controlled scenarios today, scalable solutions are not yet deployed. In this talk we explore the crucial role of 3D semantic maps in providing cognition to autonomous vehicles. We will look at how Civil Maps uses swarm methods to rapidly crowdsource these maps, and how they are utilized by automotive systems in real time.     25-minute Talk Fabien Chraim, VP of Research and Development, Civil Maps
Scott Harvey, Senior Machine Vision Engineer, Civil Maps
SE7142 - CUDA Developer Tools Round Table This session will be gathering major CUDA Developer Tools vendors, including NVIDIA and PGI to share their latest feature development. In addition, each vendor will share with the audience what they believe are the major application development challenges and solutions they might be working on to tackle these. Each panelist will have a short presentation and/or demo of their latest feature set or illustrate their focus on the type of development problems they feel are being tackled. The panelist will come from HPC, Workstation and Embedded business verticals, such that the audience can appreciate where CUDA is present, what type challenges might be specific to one platform versus another, but also be exposed to common development patterns as there might be convergence of hardware system topologies. The moderator will then bring up a variety of topics of discussions meant to steer participation from the panels as to why such problem is or isn’t solved, how developer can successfully develop on systems with such and such limitation, debate the convergence of HPC and Embedded systems and the inadequacy of the developer tools for certain type of applications – and agree to disagree what is missing. The audience will be probed and the moderator will use techniques to engage with the audience by taking surveys/show of hands to validate some statements made by the panelists, opening for questions and comments from developer themselves.     2-Hour Special Event Rafael Campana, Senior Engineering Manager, Developer Tools, NVIDIA
David Lecomber, Senior Director, HPC Tools, ARM
Sheridan Ethier, Director Engineering, Middleware and Verticals, QNX
Allen Malony, Professor, University of Oregon
Annemarie Southwell, Mgmt, Sys SW, NVIDIA
Martin Bakal, Product Manager, Rogue Wave Software
Ken Jackson, SVP, Real-Time and Linux, Concurrent
Sebastien Domine, SW VP, Developer Tools, NVIDIA
S7122 - CUDA Optimization Tips, Tricks and Techniques Optimizing your code can be one of the most challenging tasks in GPU programming, but also one of the most rewarding: the performance difference between an initial version and well-tuned code can be a factor of 10 or more. Some optimizations can be quite straightforward while others require care and deep understanding of how the code is executing. A particular focus will be on optimization of the CPU part of your code, which is frequently overlooked even though it is often easier to tune and just as effective. Sometimes the biggest obstacle is just knowing what to look for, so we'll cover a range of techniques that everyone from beginners to CUDA ninjas might not have thought of before. 50-minute Talk Stephen Jones, Principal Software Engineer, NVIDIA
L7108 - CUDA Programming in Python with Numba In this lab, we'll teach you how to do GPU-accelerated numerical computing from Python using the Numba compiler. Numba is an open source compiler that can translate Python functions for execution on the GPU, all without having to write any C or C++ code. Numba's just-in-time compilation ability makes it easy to interactively experiment with GPU computing in the Jupyter notebook. We'll teach you techniques for both automatically parallelizing certain kinds of array functions, as well as how to create and launch CUDA kernels entirely from Python. At the end of the lab, we'll demonstrate how Numba can be combined with Dask for distributed computing on a GPU cluster. Prerequisites: Familiarity with CUDA, Python and NumPy This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Stanley Seibert, Director of Community Innovation, Continuum Analytics
Siu Kwan Lam, Software Developer, Continuum Analytics
S7127 - cuMF_sgd: Fast and Scalable Matrix Factorization on GPUs Matrix factorization (MF) has been widely used in recommender systems, topic modeling, word embedding, and more. Stochastic gradient descent (SGD) for MF is memory bound. Meanwhile, single-node CPU systems with caching performs well only for small datasets. Distributed systems have higher aggregated memory bandwidth but suffer from relatively slow network connections. This observation inspires us to accelerate MF by utilizing GPUs's high memory bandwidth and fast intra-node connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF problems. On a single CPU, we design two workload schedule schemes, i.e., batch-Hogwild! and wavefront-update, that fully exploit the massive amount of cores. batch-Hogwild! as a vectorized version of Hogwild! especially overcomes the issue of memory discontinuity. On three datasets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1 to 28.2x as fast compared with state-of-art CPU solutions on 1 to 64 CPU nodes. 25-minute Talk Wei Tan, Research Staff Member, IBM T. J. Watson Research Center
S7255 - cuTT: A High-Performance Tensor Transpose Library for GPUs We'll introduce cuTT, a tensor transpose library for GPUs that on average achieves over 70% of the attainable memory bandwidth, independent of tensor rank. Tensor transposing is important in many applications such as multi-dimensional Fast Fourier Transforms and deep learning, and in quantum chemistry calculations. Until now, no runtime library existed that fully utilized the remarkable memory bandwidth of GPUs and could perform well independent of tensor rank. We'll describe two transpose algorithms, "Tiled" and "Packed," which achieve high-memory bandwidth in most use cases, as well as their variations that take care of many important corner cases. We'll also discuss a heuristic method based on GPU performance modeling that helps cuTT choose the optimal algorithm for the particular use case. Finally, we'll present benchmarks for tensor ranks 2 to 12 and show that cuTT, a fully runtime library, performs as well as an approach based on code generation. 25-minute Talk Antti-Pekka Hynninen, Developer Technology Engineer, NVIDIA
S7452 - Cutting Edge OptiX Ray Tracing Techniques for Visualization of Biomolecular and Cellular Simulations in VMD We'll present the latest advances in the use of NVIDIA Optix for high-fidelity rendering of state-of-the-art biomolecular and cellular simulations. We'll present the latest technical advances in the OptiX-based ray -racing engines in VMD, which are heavily used for both interactive progressive ray-tracing (local and remote), and for batch mode in-situ or post-hoc visualization of petascale molecular dynamics simulations. 25-minute Talk John Stone, Senior Research Programmer, University of Illinois Urbana-Champaign
S7401 - Daino: A High-level Framework for Parallel and Efficient AMR on GPUs We'll present a high-level framework for producing parallel and efficient adaptive mesh refinement code on GPU-accelerated supercomputers. AMR methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult, considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated supercomputers. We'll present a compiler-based, high-level framework that can automatically transform serial uniform mesh code annotated by the user into parallel adaptive mesh code optimized for GPU-accelerated supercomputers. We show experimental results on three production applications. The speedups of code generated by our framework are comparable to hand-written AMR code while achieving good strong and weak scaling up to 3,640 GPUs. 25-minute Talk Mohamed Wahib, Postdoctoral Researcher, RIKEN Advanced Institute for Computational Science
S7577 - Data Science Bowl Lung Challenge Deep learning is currently overhauling the field of medical image analysis and computer-aided diagnosis. Recent results in various areas show that deep networks that analyze the contents of medical images, trained with large amounts of data, obtain results close to or better than human experts for diagnostic tasks in radiology, pathology, ophthalmology, and dermatology. One particular area is the analysis of chest computed tomography (CT) scans. This is of particular interest because screening with low-dose CT for lung cancer is currently being implemented on a large scale in the Unitied States and other countries, after large studies have shown that this is the most promising strategy to reduce the number of deaths due to lung cancer, by far the largest cancer killer. Screening for lung cancer will produce many millions of CT scans that under current guidelines would have to be analyzed by radiologists. Automation could streamline and improve that process, and reduce the high costs associated with screening. We'll show the background of CT image analysis, explain how clinical experts read CT scans following the current guidelines, and show results from deep learning, in particular 25-minute Talk Bram van Ginneken, Professor of Medical Image Analysis, Radboud University Medical Center
S7693 - Data Science Bowl to Improve Lung Cancer Screening The Data Science Bowl (DSB), sponsored by Booz Allen Hamilton and Kaggle, is the premier data science for social good competition, catalyzing the world's data science community. DSB 2017, organized in collaboration with the National Cancer Institute, seeks to improve on the accuracy of low dose computed tomography, currently the best method for lung cancer screening. Teams competed to develop open-source algorithms using artificial intelligence techniques to reduce false positives. Hear about the research leading up to DSB 2017 and about the top placing teams' prize-winning algorithms ($1M prize purse provided by the Laura and John Arnold Foundation). 80-minute Tutorial Keyvan Farahani, Program Director, National Cancer Institute
William Cukierski, Head of Competitions, Kaggle
Steven Mills, Director of Machine Intelligence, Booz Allen
Mark-Jan Harte, null, Aidence
Anna Fernandez, Health Informatics/Precision Medicine Lead, Booz Allen
Josh Sullivan, Senior Vice President, Booz Allen
Eric Syphard, Chief Technologist, Booz Allen
S7667 - DeepAD: Alzheimer's Disease Classification via Deep Convolutional Neural Networks Using MRI and fMRI Applications of deep learning have expanded rapidly into the healthcare industry, including medical imaging, over the past few years. Alzheimer's disease is a type of dementia that causes problems with memory and behavior. Symptoms usually develop slowly and worsen over time. Although there is currently no definitive treatment for Alzheimer's disease and research is still ongoing, early prediction of this brain disorder will save lives and dramatically decrease the cost of care for patients. We developed a deep learning-based pipeline using convolutional neural networks architecture to recognize Alzheimer's structural and functional magnetic resonance imaging data in older adults. In this work, we placed emphasis on both data augmentation and architecture when designing the pipeline. The pipeline resulted in high accuracy rates for both MRI and fMRI data. The flexibility of the design enables researchers to use the pipeline to predict other brain disorders as well. 25-minute Talk Saman Sarraf, Algorithm Engineer - Machine Learning Scientist, Konica Minolta
S7752 - Deep Dive on DGX Deep Learning Frameworks: Engineered for Performance Data science practitioners can find themselves investing significant effort in tuning popular open source distributions to improve deep learning performance. NVIDIA engineering teams bring extensive skills and expertise in improving today's popular deep learning frameworks for maximized performance on NVIDIA DGX systems. Attend this session to learn: (1) the genesis for NVIDIA's unique, integrated software stack built on NVDocker container technology, (2) how NVIDIA engineering optimizes deep learning frameworks for I/O data path performance, along with integration with cuDNN and cuBLAS, and how multi-GPU scale and performance is maximized with NCCL, and (3) why DGX users can quickly deploy a system, and expect a seamless, streamlined out of the box experience. 50-minute Talk Michael O'Connor, Senior Engineering Manager, Deep Learning, NVIDIA
S7680 - Deep Incremental Scene Understanding We'll demonstrate recent advances in the field of deep learning and computer vision aimed at scene understanding from images. We'll present two research works on this subject. The first one relates to the use of deep learning for monocular simultaneous localization and mapping (SLAM) and semantic segmentation. The outcome is a technique able to carry out accurate real-time semantic mapping and 3D reconstruction from a single RGB camera. Since in many computer vision problems a single prediction cannot express the uncertainty or ambiguity that is given in a scene, the second research work that we'll present employs deep learning for solving ambiguous prediction problems. Finally, we'll demonstrate how the two approaches can be merged together to enable robust extraction of 3D semantic information such as pixel-wise labeling and object detection in real time by means of a simple webcam. 25-minute Talk Federico Tombari, Senior Research Scientist, Technical University of Munich (TUM)
Christian Rupprecht, Graduate Student, Technical University of Munich (TUM)
S7549 - Deep Learning Acceleration of Progress toward Delivery of Fusion Energy Expediting delivery of fusion power -- identified by the 2015 CNN "Moonshots for the 21st Century" series as one of six grand challenges for the modern world -- can be enabled by engaging big-data-driven machine/deep learning predictive methods. Princeton's associated project has access to over a half-petabyte of the EUROFUSION/JET disruption database, and it's new FRNN (Fusion Recurrent Neural Net) code exhibits excellent scaling to nearly 200 GPUs. We'll target extending this exciting trend on NVIDIA's powerful SATURN V to its nearly 1,000 GPUs (124 nodes with eight Pascal P100 GPUs per node) in time for presentation at GTC 2017. 50-minute Talk William Tang, Principal Research Physicist, Princeton University
S7844 - Deep Learning: An Artificial Brain That Detects Any Type of Cyber Threat Join our presentation on the first application of deep learning to cybersecurity. Deep learning is inspired by the brain's ability to learn: once a brain learns to identify an object, its identification becomes second nature. Similarly, as a deep learning-based artificial brain learns to detect any type of cyber threat, its prediction capabilities become instinctive. As a result, the most evasive and unknown cyber-attacks are immediately detected and prevented. We'll cover the evolution of artificial intelligence, from old rule-based systems to conventional machine learning models until current state-of-the-art deep learning models.  25-minute Talk Eli David, CTO, Deep Instinct
S7554 - Deep Learning Application Development on Multi-GPU/ Multi-Node Environment We'll show a brief overview of our deep learning applications such as image recognition and taxi demand forecasts and how we have accelerated our development using NVIDIA Docker, the NVIDIA DGX-1 AI supercomputer, and tens of GPU servers. As deep learning applications become widespread, it becomes more essential for engineers to quickly adapt deep learning to new data and to efficiently seek optimal configurations. To improve the development speed by engineers on the shared GPU resources, we developed a job management system, which provides the separated learning environment for each engineer using NVIDIA Docker and queuing functions on the multi-GPU/multi-node system. This system helps us improve our productivity and create more sophisticated solutions to offer better services. 25-minute Talk Toshiki Sakai, Data Scientist, NTT DOCOMO, INC.
S7210 - Deep Learning Applications for Embedded Avionics on the Jetson Platform We'll discuss the uses and tradeoffs of semantic segmentation and detection networks when deployed on the Jetson TX1. There is significant research into deep learning semantic segmentation and detection networks since these can both detect and localize numerous objects within the image. We use FCN ( as an example of a semantic segmentation network, and the DIGITS DetectNet as an example of a detection network. These networks require significant computing resources for inferencing, and within embedded avionics applications we wish to provide the best tradeoff of performance-per-watt by leveraging these networks on the Jetson TX1. We'll explore characteristics of these deep learning networks, how these deep learning capabilities can be utilized on the Jetson TX1 platform, and characterize their runtime performance on the Jetson TX1 compared to larger GPU systems. 25-minute Talk Aaron Mosher, Design and Analysis Engineer, The Boeing Company
S7378 - Deep Learning Approaches to Timeseries Data Survey of successful deep learning (DL) applications within several domains featuring continuous streaming data [ time-series ]. Overview of what network architectures have yielded results and why these networks work. Network architectures reviewed included: RNNs (dynamic models and prediction), CNNs (for frequency transformed time series data, i.e., spectrograms), Autoencoders (anomaly detection and unsupervised data-structure visualization), and deep MLPs (sliding window event detection and classification). Example case studies: Industrial { Industrial Robotics, Automotive Telematics, Prognostics/Zero-Down-Time }, IoT { Event & Anomaly Detection, Information Leakage Attacks/Defenses }, Financial { Limit Books, Mortgage Risk Markets}. 25-minute Talk Jeff Weiss, Director, Solution Architects West Territory, NVIDIA
Miro Enev, Solution Architect & Certified Instructor, Deep Learning Institute, NVIDIA
S7437 - Deep Learning-Based Accelerated Analytics for Medical Imaging Medical Accelerated Analytics includes electronic health records, medical imaging, genomic data, and more. Meanwhile, medical imaging data occupies more than 90 percent among them. How to apply medical big data into clinical practice? This is a question that concerns medical and computational researchers, and deep learning and GPU computing provide an excellent answer for this question. We'll introduce our research of deep learning-based disease diagnosis such as Alzheimer's disease and mild cognitive impairment, and discuss current statuses and approaches of deep learning-based medical Accelerated Analytics. 25-minute Talk Di Zhao, Dr., Chinese Academy of Sciences
S7457 - Deep Learning Demystified What is deep learning? In what fields is it useful, and how does it relate to artificial intelligence? Join this session to get a working understanding of deep learning and why this powerful new technology is getting so much attention. Learn how deep neural networks are trained to perform tasks with super-human accuracy, and the challenges organizations face in adopting this new approach. We'll also cover the software, hardware, and training resources that many organizations are using to overcome the challenges and deliver breakthrough results. 50-minute Tutorial Will Ramey, Director, Developer Marketing, NVIDIA
S7465 - Deep Learning for 3D Design and Making We'll look at the application of deep learning to design information to provide AI-assisted 3D design as well as AI-assisted robotic assembly during the manufacturing process. Autodesk is working on facilitating a more efficient and open design-manufacture-use cycle using intelligent sensors, data aggregation, and deep learning. We'll discuss the DeepForm project for generating novel 3D forms as well as an intelligent robotic assembly project for making industrial robotic assembly a closed loop, general-purpose solution that is amenable to environmental and design changes. 50-minute Talk Yotto Koga, Software Architect, Autodesk, Inc.
Massimiliano Meneghin, Principal Research Scientist, Autodesk, Inc.
S7732 - Deep Learning for Condition Assessment of Civil Infrastructure Systems We'll present the use of deep learning for autonomous condition assessment of civil infrastructure systems. Regular inspection of civil infrastructure systems is crucial for safe operations. Manual inspection is currently the predominant method of inspection and is time-consuming, tedious, and subjective. A less time-consuming and inexpensive alternative is the use of optical instrumentation (for example, digital cameras), where the feasibility of using image processing techniques to detect deterioration in structures has been acknowledged by leading experts in the field. Due to the recent advances in using CNNs, the vision-based classification performance of computers has been improved significantly. A CNN learns the appropriate classification features that in traditional algorithms were hand-engineered. Eliminating the need for dependence on prior knowledge and human effort in designing features is a major advantage of CNNs. We'll discuss CNN-based approaches for condition assessment of infrastructure systems, including a new framework that combines deep convolutional neural network and Naive Bayes classifier to detect cracks in videos. The crack patches are spatially and temporally clustered and the posterior probabilities of being real cracks are derived. Experimental tests have been carried out to evaluate the performance of the proposed system. 25-minute Talk Mohammad Jahanshahi, Assistant Professor, Purdue Univeristy
L7147 - Deep Learning for Genomics using DragoNN with Keras and Theano (Presented by NVIDIA Deep Learning Institute) In this lab, we use the dragonn toolkit on simulated and real regulatory genomic data, demystify popular DragoNN (Deep RegulAtory GenOmics Neural Network) architectures and provide guidelines for modeling and interpreting regulatory sequence using DragoNN models. We will answer questions such as When is a DragoNN good choice for a learning problem in genomics? How does one design a high-performance model? And more importantly, can we interpret these models to discover predictive genome sequence patterns to gain new biological insights? 120 Instructor-Led Lab Charles Killam, Senior Certified Instructor, NVIDIA
Johnny Israeli, Biophysics PhD Candidate & SIGF Bio-X Fellow, Stanford
S7381 - Deep Learning for Human-Centered Semi-Autonomous Driving We'll show how deep convolutional networks can be used to sense both the state of the driver and the external driving scene to achieve a safe, semi-autonomous driving experience. At the core of our talk is a demonstration using NVIDIA DGX-1 and NVIDIA DRIVE PX 2 to train and run, respectively, a deep end-to-end network. The network takes the visual scene both inside and outside the car as inputs, and produces shared-control decisions as output. The demo presents a case study of a distracted driver in imminent danger, and shows how an intelligent shared autonomy system can step in to determine a safe trajectory that avoids hazards. We will also address the challenges of semi-autonomous driving and discuss how deep learning can help solve those challenges with both decoupled sensing-planning and end-to-end learning approaches. 50-minute Talk Lex Fridman, Postdoctoral Researcher, Massachusetts Institute of Technology (MIT)
L7126 - Deep Learning for Image and Video Captioning (Presented by NVIDIA Deep Learning Institute) Effective descriptions of content within images and video clips has been performed with convolutional and recurrent neural networks. Attendees will apply a deep learning technique via a framework to create captions on data and generate their own captions. Prerequisite: Familiarity with deep learning and a framework. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 240 Instructor-Led Lab Allison Gray, Solutions Architect, NVIDIA
S7711 - Deep Learning for Long-Term Value Investing We'll introduce the work being done at Quantenstein GmbH, a joint venture between Swiss AI startup NNAISENSE and Acatis Investment, that harnesses the latest advances in deep learning to automatically build custom portfolios for long-term value investing based on company fundamentals. The efficient GPU implementation of deep learning architectures and distributed computation are essential to Quantenstein's mission, enabling the testing of financial models in a walk-forward fashion, where retraining the entire system can be done monthly. We'll introduce the trading framework, learning process, principles guiding our design decisions, and show how deep learning and GPU computing make it possible to learn everything end-to-end, taking the human out of the loop. 25-minute Talk Jonathan Masci, General Manager , Quantenstein GmbH
L7135 - Deep Learning for Medical Image Analysis using R and MXNet (Presented by NVIDIA Deep Learning Institute) Convolutional neural networks (CNNs) have proven to be just as effective in visual recognition tasks involving non-visible image types as regular RGB camera imagery. One important application of these capabilities is medical image analysis, where we wish to detect features indicative of medical conditions and use them to infer patient status. In addition to processing non-visible imagery, such as CT scans and MRI, these applications often require us to process higher dimensionality imagery that may be volumetric and have a temporal component. In this lab you will use the deep learning framework MXNet to train a CNN to infer the volume of the left ventricle of the human heart from a time-series of volumetric MRI data. You will learn how to extend the canonical 2D CNN to be applied to this more complex data and how to directly predict the ventricle volume rather than generating an image classification. In addition to the standard Python API, you will also see how to use MXNet through R, which is an important data science platform in the medical research community. Prerequisites: Basic knowledge of CNNs. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Charles Killam, Senior Certified Instructor, NVIDIA
Abel Brown, Mgmt, Solutions Architect, NVIDIA
S7653 - Deep Learning for Medical Knowledge Extraction from Unstructured Biomedical Text We'll present work in progress on a deep learning system that extracts expert-level knowledge from the published and less formal medical literature. Using a large curated source of 5 million biomedical journal articles, disease encyclopedias such as The Merck Manuals and The Mayo Clinic's Guide to Diseases and Conditions, as well as hospital-based physician reference material, we'll demonstrate that it's possible to infer existing medical concepts such as disease-disease, disease symptom, and disease-drug relationships with an unsupervised deep learning model. We'll extend this model to show that it's capable of answering multiple-choice medical questions that are typically given to medical students as part of the licensing examination. 25-minute Talk Andrew Beam, Postdoctoral Fellow, Harvard Medical School
S7587 - Deep Learning for Predictive Maintenance The talk is dedicated to the machines' failures prediction (Predictive Maintenance - PdM). We'll clearly set the goal, present the methodology, and sketch the estimations on the size of the market, including automotive, oil and gas, chemistry, energy, etc. We'll then present new prediction techniques, including deep learning, as well as a broad performance comparison to the state-of-the-art PdM methods together with an idea of dealing with long-period prediction with DL models. We'll show the gain and its origins in detail. We'll introduce two approaches: centralized PdM system and autonomous predictive maintenance devices. The former is the best option for IIoT-typed problems – where all the monitored devices are constantly connected to the internet – and the latter broadens the range of PdM for devices with or without costly network connections, such as cars, trains, or mining equipment. Within the centralized system, we use NVIDIA Tesla GPUs and for the autonomous devices we use NVIDIA Tegra chipsets, which guarantees us both the energy and the computational efficiency. Finally, we'll present case studies of real, production data and the experience gathered while implementing solutions for our clients. 25-minute Talk Pawel Morkisz, CTO, Reliability Solutions
Mateusz Marzec, CEO, Reliability Solutions Sp. z o.o.
S7690 - Deep Learning for Retail Analytics and Reference Data Management We'll show how state-of-the-art deep learning techniques can be applied to retail analytics. Namely, we'll show how one can retrieve various information about the product, including its category and ingredients, using a mixture of visual and textual information. We'll start with depicting the business scenario and operational needs of such a system, and then move into a technical and in-depth discussion of the underlying deep learning pipeline. The solution is based on an interplay of region-based convolutional neural networks and NLP techniques. This is a joint effort of Nielsen and 25-minute Talk Alessandro Zolla, VP technology - Machine Learning Program Lead, Nielsen
Robert Bogucki, Chief Science Officer,
S7701 - Deep Learning for the IoT: Leveraging Representation Learning Machine learning applications for the Internet of Things (IoT) pose unique challenges and necessitate understanding of large-scale multi-dimensional heterogeneous sensor data at varying granularities. We'll highlight the unique challenges posed by IoT applications especially for deep learning algorithms and we'll present some work on leveraging representation learning in conjunction with deep learning to design successful algorithms for these problems. We'll demonstrate the effectiveness of the proposed approaches on real-world IoT use cases. The proposed deep representation learning models are each trained using an NVIDIA Tesla M40 GPU. Finally, we'll discuss a technology view of deep learning in the context of IoT. 25-minute Talk Mohak Shah, Head of Data Science, Bosch AI Research
S7737 - Deep Learning Frameworks with Spark and GPUs Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel, GPU clusters is fast becoming the default way to quickly develop and train deep learning models. As data science teams and data savvy companies mature, they'll need to invest in both platforms if they intend to leverage both big data and artificial intelligence for competitive advantage. We'll discuss and show in action an examination of TensorflowOnSpark, CaffeOnSpark, DeepLearning4J, IBM's SystemML, and Intel's BigDL and distributed versions of various deep learning frameworks, namely TensorFlow, Caffe, and Torch. 50-minute Talk Subbu Rama, CEO, Bitfusion
S7348 - Deep Learning in's Autonomous Vehicles We'll provide an overview of the models and Ford are using to fuse sensor information, and give examples of the performance optimization. and Ford are leveraging deep learning for autonomous vehicle perception across a multitude of sensors. It is important that these models have optimized performance to process high-resolution images, lidar point clouds, and other sensor inputs in a timely fashion. We will discuss how and Ford are exploring a variety of methods to push the run-time performance to new limits and maximize the use of the resources available, including modifying the underlying models, data structures, and the inference engine itself.   25-minute Talk Bryan Goodman, Staff Engineer, Machine Learning, Argo AI
S7360 - Deep Learning in Business Conversation Analysis Gridspace uses GPU-accelerated deep learning to analyze conversational speech on phone calls. We'll outline our DNN-based approach as well as several commercial applications of call grading. Our GPU-based software stack provides a novel way to process large-scale speech data. Results from a recent case study show call grading to be as accurate as human call grading and highly scalable in production. Deep call analysis with 100% coverage has never been achieved before. Also we'll discuss how this system can be improved by training continuously without expert supervision. 25-minute Talk Anthony Scodary, EVP of Engineering, Co-founder, Gridspace
Wonkyum Lee, S/W Engineer, Gridspace
H7126 - Deep Learning Inference with TensorRT Are you ready to start using Deep Learning to enable features or capabilities in an app or device?  Do you need more throughput for a DNN in the cloud or lower latency in an embedded device? Attend to learn about the TensorRT Deep Learning Inference Software. Experts will be standing by to talk about your use case and also to discuss recent developments like: reduced precision inference, user defined custom layers, and recurrent neural network (LSTM/GRU) support. 1 Hour Connect with the Experts Chris Gottbrath, Accelerated Computing Product Manager, NVIDIA
S7639 - Deep Learning in Medical Imaging: Opportunities and New Developments Learn about some of the key opportunities for deep learning in medical imaging, some of the current challenges, and exciting recent developments that are tackling them. We'll begin with a brief overview of medical imaging, current challenges for human observers of these images, and key applications for deep learning for improving image interpretation. We'll follow with descriptions of several specific use cases for deep learning in radiology, pathology, urology, and ophthalmology imaging, including improvements in image diagnosis that are besting state-of-the-art computerized diagnosis algorithms, approaches for visualizing and explaining to physicians what deep networks have learned to improve confidence in using the information they provide to guide decision making, and new, freely available tools to dramatically enhance the efficiency of creating new deep learning models. We'll provide links for more information about tools and information so attendees can try their hand at tackling problems in this exciting domain. Finally, we'll give a live demonstration for a portable deep learning package optimized for medical imaging. 50-minute Talk Darvin Yi, Graduate Student, Stanford University
Daniel Rubin, Associate Professor of Biomedical Data Science, Radiology, Medicine (Biomedical Informatics Research), and by courtesy, Ophthalmology, Stanford University
S7222 - Deep Learning in the Connected Kitchen We'll present Innit's work applying deep learning technology to build a platform that powers the connected kitchen of the near future. We've been carrying out pioneering work in the applications of modern computing technology to tackle problems in the food space, with a specific focus on empowering the very personal relationship between people and food. Throughout the food ritual (from planning and shopping to cooking and serving), Innit connects information about food with personal preferences and needs, and delivers actionable information via multiple channels such as mobile apps and embedded user interfaces at home and at the store. Deep learning makes multiple appearances in this process, from the latest in CNN-based object detection and classification, to using CNN features for image retrieval and matching, to advanced sensing in extreme environments such as an operating oven. 25-minute Talk Hristo Bojinov, CTO, Innit, Inc.
S7722 - Deep Learning in the Healthcare Enterprise Deep learning tools present a tremendous opportunity to improve healthcare. By increasing efficiency and accuracy of diagnostic testing, and elevating meaning from vast troves of clinical data, deep learning provides a pathway to true precision care. However, there are challenges in the translation of this technology to the clinic: model performance, infrastructure development, data privacy, hospital policy, and vendor relationships are all critical components to this effort. We'll discuss the early experience of the MGH & BWH Center for Clinical Data Science in supporting the translation of deep learning technologies in medicine, touching upon many of the existing and emerging technical, clinical, and cultural challenges that this work presents. 25-minute Talk Mark Michalski, Executive Director, Massachusetts General Hospital & Brigham Women's Hospital Center for Clinical Data Science
S7157 - Deep Learning Meets Motor Sports at ROBORACE Self-driving technology meets motorsport with the Roborace series. Learn how the tech is making its way onto the track, experience exciting milestones achieved and discover what to expect in the near future. This session will cover relevant AI technologies in the Robocar and highlight how software is defining the future of the auto industry and motor racing.  25-minute Talk John Waraniak, Vice President of Vehicle Technology, Specialty Equipment Market Association, SEMA
Bruce Falls, Director of Engineering, AVL
S7768 - Deep Learning Models for Time Series Data Analysis with Applications to Healthcare Many emerging applications of big data involve time series data. We'll discuss a collection of deep learning models to effectively analyze and model large-scale time series data. We'll show experiment results to demonstrate the effectiveness of our models in healthcare. 50-minute Talk Yan Liu, Associate Professor, University of Southern California
S7420 - Deep Learning of Cancer Images for Precision Medicine We'll demonstrate a deep learning framework to predict survival of lung cancer patients by using convolutional networks to learn high-dimensional representations of tumor phenotypes from CT images and clinical parameters. We'll evaluate our framework from three independent cohorts with survival data, and show how the addition of clinical data improves performance. Furthermore, we'll describe how image noise can improve the robustness of our model to delineation errors and introduce the concept of priming, which helps improve performance when trained on one cohort and tested on another. 25-minute Talk Olivier Gevaert, Assistant Professor, Stanford University
S7562 - Deep Learning to Enable Real-Time Gravitational Wave and Multimessenger Astrophysics The aLIGO Advanced Laser Interferometer Gravitational Observatory went on line last year and very rapidly produced data confirming Einstein's theory of gravitational waves. This discovery and the success of the detection device open the door for another dimension to be added to and combined with other electromagnetic detection devices (telescopes, radio telecopes, etc.) to dramatically increase the potential to understand the workings of deep space and astronomical phenomena at the origins of the universe. The project used data produced by the CACTUS HPC simulation to produce datasets that were used to train a DNN using the MXNet framework. The results were that the prediction accuracy increased over classical waveform analysis and reduced the number of processors from hundreds of CPUs to one GPU, where the prediction was achieved with a latency of 1 millisecond. The work was done on the BlueWaters supercomputer and at the Innovation Lab at NCSA. The reduction in the "pipeline size" (number of CPUs needed to make a detection) and the improved latency open up the potential for multi-messenger astrophysics, where an observation that is "heard" with the gravitational wave detector can be used to steer a detector in the visible or EM spectrum where to look. 25-minute Talk Eliu Huerta, Gravity Group Leader, University of Illinois at Urbana-Champaign
Daniel George, Scientist, University of Illinois at Urbana-Champaign, National Center for Supercomputing Applications
L7104 - Deep Learning Using Microsoft Cognitive Toolkit This lab will provide hands-on experience with Microsoft's open-source production-grade deep learning Cognitive Toolkit, formerly CNTK. The Cognitive Toolkit is used in several Microsoft products for training and evaluating deep neural networks. The same features are available for everyone outside Microsoft and is supported for both Windows and Linux platforms with Python/C++ API. The Cognitive Toolkit supports feed-forward, convolutional, recurrent networks, and reinforcement learning for speech, vision, and text data, also in combination. The hands-on lab will help you build end-to-end use cases with basic FCN to more advanced CNN, RNN/LSTM and auto-encoders in different domains. You'll also learn how the toolkit leverages multiple GPUs for advanced optimization and run the models on Azure cloud. Attendees need to install the CNTK Binaries on their local machines if they want to have a hands-on experience on their local machines. The instructions can be found here - This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Sayan Pathak, Principal Engineer and ML Scientist , Microsoft
S7520 - DeepLumen: Fast and Accurate Segmentation of Coronary Arteries for Improved Cardiovascular Care Learn about HeartFlow's unique approach for better diagnosis and treatment of cardiovascular disease. From CT images, HeartFlow creates a complete geometric and physiologic model of the patient's coronary anatomy. Blood flow is simulated using computational fluid dynamics to functionally assess narrowings of the coronary artery. HeartFlow's approach is approved by regulatory bodies and in commercial use around the world today. We'll focus on DeepLumen, the fast and highly accurate method for extracting coronary arteries from a CT scan. It is formulated as a novel 3D rotational CNN that exploits translational and cyclic symmetries. DeepLumen is shown to be at least as accurate as expert radiologists in quantifying disease compared to invasive catheterization measurements. 25-minute Talk Kersten Petersen, Senior Medical Imaging Researcher, HeartFlow
L7136 - Deep Multitask Prediction with Digital Health Data In multitask learning, we aim to improve performance on multiple prediction tasks by solving them simultaneously using models that are related. Neural networks can especially benefit from multitask training in ways that simpler (linear) models cannot. Although multitask neural nets, which were first proposed over 20 years ago, are conceptually simple to design, they can present unexpected challenges. In this lab, we will demonstrate how to build and successfully train multitask neural networks to predict multiple clinical outcomes simultaneously from publicly available digital health data using DeepLearning4J ((DL4J). We will also how to train a similar model using the Keras frontend for TensorFlow and import the resulting model into DL4J for deployment. Prerequisite: Basic knowledge of any programming language. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab David Kale, Deep Learning Engineer, Skymind
S7373 - Deep Neural Networks for Non-Equilibrium Molecular Dynamics Molecular dynamics simulation of matter far from equilibrium presents one possible approach to the discovery of non-equilibrium constitutive relations but are limited to coarse-grained hamiltonians that include electronic effects only implicitly. We'll explore the possibility that deep neural networks -- when trained over the appropriate atomic states -- may provide the hamiltonian for a molecular dynamics simulation, thus providing a sub-grid representation of variables at spatial and temporal scales that cannot otherwise be explicitly resolved. The advent of GPU-accelerated training of deep neural networks, and specifically recent improvements to the CuDNN library, now makes it feasible to handle the large and high dimensional datasets incumbent to such systems. Finally, we'll elucidate a few of the challenges inherent in DNN-coupled dynamics, such as obeying the constraints of momentum and energy conservation. 25-minute Talk Jonathan Belof, Physicist, Lawrence Livermore National Laboratory
Edward W. Lowe, Jr. (Will), Senior Data Scientist , FitNow, Inc
S7468 - Deep Packet Inspection Using GPUs In high-speed networks, packet-based network traffic monitoring and analysis applications require a large amount of computing power and high I/O throughputs. These applications face extreme performance and scalability challenges. GPUs have been widely applied to accelerate general-purpose scientific and engineering computing. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. Fermilab network research group's prototype GPU-based network traffic monitoring and analysis system consists of two major components: a lossless packet capture engine that supports 10/40GE commodity NICs, using our WireCAP technology; and a complete set of GPU libraries for network traffic analysis. Our GPU libraries now supports per-packet-based deep inspection analysis. It is anticipated to support per-flow-based deep inspection analysis very shortly. 25-minute Talk Wenji Wu, Principal Network Research Investigator, Fermilab
S7563 - Deep Patient: Predict the Medical Future of Patients with Deep Learning Precision medicine initiatives bring tremendous opportunities to speed up scientific discovery and promote quality improvement in medicine. However, it also raises big challenges in dealing with massive data from heterogeneous sources, such as electronic health records (EHRs), -omics, and wearables. Traditional data mining and statistical learning methods tend to favor clean and structured data, which may not be able to effectively utilize the rich information embedded in biomedical data. The latest breakthrough in deep learning technologies provides a unique opportunity to retrieve information from complex and heterogeneous sources. We'll review advances in deep learning applied to precision medicine and next-generation healthcare, with a special focus on Deep Patient, a general-purpose patient representation from EHRs that facilitates clinical predictive modeling and medical analysis. 50-minute Talk Riccardo Miotto, Research / Data Scientist, Icahn School of Medicine at Mount Sinai, New York
Joel Dudley, Associate Professor, Icahn School of Medicine at Mount Sinai, New York
L7110 - Deep Reinforcement Learning Agents on Atari 2600 Games (Presented by NVIDIA Deep Learning Institute) Learn the basic principles of reinforcement learning and develop a learning agent (Deep Learning Network -- CNN network trained with Q Learning) capable of playing classic Atari games. In this context, the neural network improves through in-game experience so as to choose the next best possible action by interpreting the screen's raw pixels along with the current score (action-value Q learning). At the beginning of the lab, students will be given an "intermediate" agent (trained for ~20 hours) and asked to continue the improvement/training process on NVIDIA-provided GPUs. At the end of the lab, students will be able to play against their best network and take home code that they can use to train agents in other Atari games. Prerequisites: Introductory knowledge of Lua and/or Python. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Jeff Weiss, Director, Solution Architects West Territory, NVIDIA
Eric Harper, Solutions Architect, NVIDIA
Miro Enev, Solution Architect & Certified Instructor, Deep Learning Institute, NVIDIA
L7137 - Deep Reinforcement Learning for Gameplay and Robotics In this lab, you will learn the basics of Chainer and how to use ChainerRL by training an agent to play text-based games with OpenAI Gym on a Jupyter notebook. ChainerRL contains a set of Chainer implementations of deep reinforcement learning (DRL) algorithms. Following the success of DeepMind's Deep Q-Network (DQN) algorithm on Atari games, DRL has been applied to many tasks from playing Go to robot control. ChainerRL runs on top of Chainer, one of the popular Python-based deep learning frameworks, which enables users to intuitively implement many kinds of models, with a lot of flexibility and comparable performance with GPUs. ChainerRL already includes state-of-the-art DRL algorithms from DQN to DDPG to A3C, so that users can use them on their reinforcement learning applications. Prerequisites: Basic knowledge of Python, deep learning and reinforcement learning. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Shohei Hido, Chief Research Officer, Preferred Networks
S7621 - Deep Reinforcement Learning for Robotics Using DIANNE We'll show how a mobile robot arm can learn to locate and retrieve objects, such as soda cans, using deep reinforcement learning and the DIANNE framework. The robot is equipped with a Jetson TX1 embedded GPU to efficiently process sensory input generated by laser scanners, placed both in the environment and on the robot itself. Deep reinforcement learning allows an intelligent agent to solve complex planning problems with high-dimensional inputs in an efficient and generalisable way. While very promising for the field of robotics, integration of and learning in a physical system is not trivial, and additional simulation is often required to speed up the learning process. 25-minute Talk Sam Leroux, Ph.D. Researcher, Ghent University - imec
S7514 - Deep Representation and Reinforcement Learning for Anomaly Detection and Control in Multi-Modal Aerospace Applications We'll discuss how deep auto-encoder (DAE) and deep reinforcement learning (DRL) can be formulated to address multimodal anomaly detection and additive manufacturing control problems in aerospace domain. DAE-based representation learning is constructed by multi-layered neural-net architecture to model complex data non-linearity. We use DAE via NVIDIA GPU implementation for: (1) unsupervised fault disambiguation from big multimodal data, and (2) structural health monitoring (crack detection) from experiment video frames on aerospace material. At the second half of the talk, we show how guided policy search (GPS) based DRL framework can be implemented for optimally planning and generalizing trajectory nozzle dynamics in a wide range of cold spray type of additive manufacturing application. 50-minute Talk Soumalya Sarkar, Senior Research Scientist , United Technologies Research Center
S7551 - Deep Unconstrained Gaze Estimation with Synthetic Data Gaze tracking in unconstrained conditions, including inside cars, is challenging where traditional gaze trackers fail. We've developed a CNN-based algorithm for unconstrained, head-pose- and subject-independent gaze tracking, which requires only consumer-quality color images of the eyes to determine gaze direction, and points along the boundary of the eye, pupil, and iris. We'll describe how we successfully trained the CNN with millions of synthetic photorealistic eye images, which we rendered on the NVIDIA GPU for a wide range of head poses, gaze directions, subjects, and illumination conditions. Among appearance-based gaze estimation techniques, our algorithm has best-in-class accuracy. 25-minute Talk Shalini De Mello, Senior Research Scientist, NVIDIA
S7588 - Deep Watershed Transform for Instance Segmentation Learn about the design, training, and analysis of a state-of-the-art, deep learning-based, instance-level segmentation pipeline enabled by NVIDIA DGX-1. Instance segmentation is the task of assigning semantic class labels to each pixel of an image (for example, car, person, etc.), as well as a coherent instance identifier such that every pixel belonging to the same object instance shares the same identifier. This has a wide array of applications, including object recognition and tracking, pose estimation, and scene understanding. In the context of autonomous driving, this will allow vehicles to accurately delineate multiple vehicles and pedestrians within an image. We'll present a simple yet powerful end-to-end convolutional neural network to tackle this task with state-of-the-art performance on the challenging Cityscapes Instance-Level Segmentation task. Our model consists of two independently trained individual deep neural networks with innovative training targets, followed by joint fine-tuning. The 30 million parameter network is trained on the new NVIDIA DGX-1 deep learning accelerator in approximately 30 hours. This is a 50% speedup compared to the NVIDIA Maxwell TITAN X, and is immeasurably faster than any CPU implementation. 25-minute Talk Min Bai, PhD Student, University of Toronto
S7763 - Deliver a Transformative 3D Graphics User Experience with VMware Horizon, Blast Extreme Accelerated Transport, and NVIDIA GRID Discover the benefits of virtualizing any desktop or application using VMware Horizon and NVIDIA GRID. Learn about how NVIDIA GRID and VMware Blast Extreme Adaptive Transport (BEAT) now delivers a transformational user experience for LAN and WAN users, understanding graphics use cases enabled by BEAT, NVIDIA GRID Performance Engineering benchmarking and results for BEAT, and high-performance graphics environment demos (considering SxS for this) deployment and TCO considerations. 50-minute Talk Kiran Rao, Director, Product Management, VMware
S7203 - Delivering Immersive Experiences Through GPU Virtualization and Streaming Introducing the transition from traditional workstation to immersive experience workspace, hear about novel NVIDIA and ESI technologies to combine streaming and virtualization for GPUs to provide scalable immersive virtual and augmented reality. We'll discuss the challenges in advancing to the immersive workspace for mobile, desk-side, or team-size immersive experiences through on-premise and cloud-based virtual engineering applications. 50-minute Talk Jan Wurster, Team Leader Software Development, ESI Group
S7808 - Deploying Embedded GPUs into Military Applications (Presented by Abaco) We'll explore how GPUs are being used in military applications (ground vehicles and avionics) and how we can ruggedize GPU technology for use in the harshest environments. Learn how high-bandwidth applications can stream data into the Jetson TX2 for real-time processing and situational awareness. We'll show how data-heavy networks coupled with embedded GPUs can be deployed into mobile platforms and deliver increasing capabilities and greater autonomy. Military open standards now cater to future technology insertion and GPU technology can be deployed into existing and future platforms to deliver deep learning at the edge of the battlefield. 50-minute Talk Ross Newman, Senior Field Applications Engineer, Abaco Systems
S7458 - Deploying Unique DL Networks as Micro-Services with TensorRT, user Extensible Layers, and GPU Rest Engine Once you have trained your neural network to do some unique and interesting task, you might wonder how to make it available to colleagues, collaborators, or perhaps the world. One of the best ways to do that is to create a REST-based microservice. Then anyone with the URL can make a request and get an answer from your neural network. We'll show how three technologies come together to make that possible: 1. TensorRT provides low-latency, high-throughput inference; 2. Custom layer support in TensorRT allows you to express your unique deep learning secret sauce within TensorRT; 3. GPU Rest Engine gives you a fast and easy way to create a GPU-powered microservice. We'll show the steps necessary for you to start creating your own deep learning-powered microservices. 25-minute Talk Chris Gottbrath, Accelerated Computing Product Manager, NVIDIA
S7822 - Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks Skin cancer, the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: malignant carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an AI capable of classifying skin cancer with dermatologist-level accuracy. 25-minute Talk Andre Esteva, PhD Candidate, Stanford University - Sebastian Thrun's Lab
S7687 - Designing Autonomous Vehicle Applications with Real-Time Multisensor Frameworks As embedded software in intelligent vehicles becomes more complex, researchers and engineers need more efficient tools and integration frameworks that simultaneously align ease-of-use, dynamism, execution performance, and portability. We'll introduce Intempora's RTMaps (Real-Time Multisensor applications) framework, which is a component-based design and execution middleware for software development, integration, and testing. This framework reduces software development cycle times and provides easy access to the DRIVE PX 2 capabilities. RTMaps supports most automotive sensors on the market for real-time execution, and also provides recording and synchronized playback capabilities for offline development, testing, and validation. RTMaps is now available on DRIVE PX 2. It offers a drag-and-drop approach for GPU-based computer-vision and AI systems, including an integration of the NVIDIA DriveWorks software modules as independent building-blocks. 25-minute Talk Nicolas Du lac, CEO, Intempora
S7614 - Design with Virtual Reality in Architecture, Engineering & Construction Learn how Gensler is using the latest technology in virtual reality across all aspects of the design process for the AEC industry. We'll cover how VR has added value to the process when using different kinds of VR solutions. Plus we'll talk about some of the challenges Gensler has faced with VR in terms of hardware, software, and workflows. Along with all of this, NVIDIA's latest VR visualization tools are helping with the overall process and realism of our designs. 25-minute Talk Scott DeWoody, Firmwide Creative Media Manager, Gensler
S7293 - Detecting Topological Changes in Dynamic Delaunay Triangulations Using CUDA Learn how to detect topological changes that occur in dynamic 2D Delaunay triangulations using CUDA. We'll present a novel, unified approach that can be applied in all those cases (pedestrian tracking, flocking, moving bubbles, etc.) where objects are triangulated starting from a density map. Topological changes are detected comparing two subsequent triangulations and they show up as "flipped-edges." We'll show new physics results due to the unprecedented statistics of detection of irreversible topological changes, occurring in the triangulation of the droplets of a Lattice Boltzmann emulsion, allowed by our implementation. Such changes are associated to the so-called plastic events that are responsible for the complex behavior of emulsions possessing both liquid and solid features at the same time. In our implementation, we used a suitable mix of in-house developed CUDA kernels and primitives from existing CUDA libraries. 25-minute Talk Massimo Bernaschi, Prof., National Research Council of Italy
S7519 - Developer Tools for Automotive, Drones and Intelligent Cameras Applications Embedded development systems are getting more powerful than ever. With this trend comes the ever-growing complexity of delivering real-time applications that can capitalize on all the potential computational horsepower of the system. The application developer needs to be able to design new software IP, easily port the application to the Embedded system, and then optimize and maximize the CPUs and GPUs utilization, data acquisition and transfers, to provide a reliable real-time visual computing experience that can full fill even the most demanding computational requirements. In this tutorial/talk ? the audience will learn about recommended development flows for the latest embedded systems. We will cover the overall developer tools offering available for each of the specific Software Development Kits provided respectively to Automotive, Embedded and Mobile platforms. For each of these platforms, we will dissect and present important learnings from the development of show casing applications demonstrating advanced Autonomous Driving and Intelligent Video Analytics use cases. The audience will learn what tools are available for each platform and the purpose of each tool and its value proposition that can be taken advantage of. 50-minute Talk Sebastien Domine, SW VP, Developer Tools, NVIDIA
S7824 - Developer Tools update in CUDA 9 This session will provide an overview of developer tools and what is changing in Nsight Eclipse for CUDA 9.0. 25-minute Talk Rafael Campana, Senior Engineering Manager, Developer Tools, NVIDIA
S7388 - Developing an Improved Generalized Eigensolver with Limited CPU Offloading We'll explore strategies to reduce CPU dependencies within existing hybrid CPU/GPU LAPACK routines, such as those implemented with the open-source MAGMA library. This will be carried out within the context developing an improved generalized eigensolver, written in CUDA Fortran for the open-source Quantum ESPRESSO library. The solver aims to replace offloaded subblock CPU computations within the existing hybrid algorithms with GPU resident subblock computations to limit dependencies on available CPU resources. Performance considerations and strategies used in developing the solver, including the use of profiling tools available within the CUDA toolkit will be covered. Additionally, we'll provide an example developing software using CUDA Fortran. 25-minute Talk Joshua Romero, Graduate Student, Stanford University
S7838 - Developing an Open Vehicle Platform for Active Safety Systems Using Deep Learning We'll give an overview of our research activities in the field of automated driving in Austria in collaboration with NVIDIA. (1) We are working on an open vehicle platform to provide a controllable car that can be used for various research projects. Its open and well defined interfaces allow together with partners to use it in projects. (2) Furthermore, we are investigating the proof-of-concept if it is conceivable to use traffic data along with machine learning for the function design of an active safety system directly. (3) The requirements and possibilities trough a diversified public test track in the alpine region for automated driving will be discussed. (4) Finally, a fully digital development and test chain will be presented that allows to seamlessly use real-world data (from public and non-public test tracks) to motivate new and optimize existing automated driving functions. 25-minute Talk Jost Bernasch, CEO, Virtual Vehicle Research Center
S7573 - Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge We'll bring CUDA into a compute-intensive application by learning how to use CUDA-enabled development tools in the process of profiling, optimization, editing, building, and debugging. Using the Allinea Forge development toolkit, we'll cover how to profile an existing application and identify the most compute intensive code regions. We'll then replace these regions with CUDA implementations and review the results - before turning to the task of debugging the GPU-enabled code to fix an error introduced during the exercise. We'll learn debugging techniques for CUDA and debug using Allinea Forge to produce the correct, working, high-performance GPU-accelerated code. As we'll be using GPUs hosted in the cloud, all attendees are required to bring is a laptop with a modern browser. 50-minute Talk Ryan Hulguin, Applications Engineer, ARM
S7617 - Developing Your Own Wake Word Engine Just Like 'Alexa' and 'OK Google' A wake word is a word or phrase like "Alexa" and "OK Google." It provides an always-listening capability to a microphone-enabled device. Developers who want their own version of wake word did not have such a solution until KITT.AI released its Snowboy product, a developer-facing, always-on, offline, real-time wake word engine. It's trained on clusters of GPUs with hundreds of people's voices to provide robustness, while it works on small embedded devices like a $5 Raspberry Pi Zero. We'll demo how to use Snowboy for developing home automation or hands-free projects and we'll show how we used GPUs to build the Snowboy product. 50-minute Tutorial Xuchen Yao, CEO, KITT.AI
Guoguo Chen, CTO, KITT.AI
Yuan Cao, Software Engineer, KITT.AI
S7281 - Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster Learn how GPUs can be time-shared between multiple hosts connected in a PCIe cluster using a method called device lending. Unlike approaches for sharing GPUs that typically require specific programming models, device lending makes a GPU appear to the operating system as if it is locally installed. This allows the GPU to be controlled and used by a remote host without any modifications to existing software. We'll present how device lending is implemented using standard PCIe and non-transparent bridging. As a proof-of- concept, we accelerate EIR, a computer-aided medical diagnosis system using machine learning and computer vision to do polyp detection, from being an offline tool to giving real-time feedback by dynamically borrowing remote GPU resources. 25-minute Talk Jonas Markussen, PhD student, Simula Research Laboratory
S7643 - Diet Networks: Thin Parameters for Fat Genomics Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting when training deep learning models. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. We propose a novel neural network parameterization, that we call Diet Networks, which considerably reduces the number of free parameters in the model. The Diet Networks parametrization is based on the idea that we can first learn or provide an embedding for each input feature and then learn how to map a feature's representation to the parameters linking the value of the feature to each of the hidden units of the classifier network. We experiment on a population stratification task of interest to medical studies and show that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier. This work was accepted at ICLR 2017. 25-minute Talk Adriana Romero Soriano, Postdoc, University of Montreal, Montreal Institute for Learning Algorithms
S7620 - Digital Twin, AI, and Industial Internet of Things We'll cover the emerging area of industrial IoT and the application of deep learning and AI to this space. DIgital Twin was named one of the top five tech trends for 2017 by Gartner and it is the foundational technology for GE's industrial internet platform called Predix. Digital Twin is a live digital representation of a physical system that is predictive in nature and uses continuous learning to get better as new data comes in from the physical system. Digital Twins coupled with AI technologies such as deep learning, and high performance computing are used to precisely predict future behavior under new scenarios and optimize the system: can we get an extra 1% in efficiency and save millions of dollars worth of fuel, can I produce 1% more in output from a manufacturing plant, can we optimize a hospital, can we detect smaller lesions and do so earlier? Deep learning and GPUs play a key role in harnessing the value from massive streams of IIoT data - from anomaly detection, to video analytics to optimization. 50-minute Talk Babu Narayanan, Senior Principal Scientist, General Electric
S7565 - Distributed Deep Learning on AWS Using MXNet Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding, and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer-friendly deep learning frameworks. During this tutorial, members of Amazon's machine learning team will provide a short background on deep learning, focusing on relevant application domains and an introduction to using the powerful and scalable deep learning framework MXNet. You'll gain hands-on experience targeting a variety of applications, including computer vision and recommendation engines, as well as exposure to how to use preconfigured deep learning AMIs and CloudFormation templates to help speed your development. 50-minute Talk Joseph Spisak, Sr. Mgr - Product Management, Amazon
Mu Li, Sr. Applied Scientist, Amazon
S7803 - Distributed TensorFlow TensorFlow gives you the flexibility to scale up to hundreds of GPUs, train models with a huge number of parameters, and customize every last detail of the training process. We'll provide a bottom-up introduction to distributed TensorFlow, showing all the tools available for harnessing this power. 50-minute Talk Wolff Dobson, Developer Programs, Google
L7128 - DIY Deep Learning: a Hands-On Lab with Caffe2 Caffe2 is a new lightweight, modular, and scalable deep learning framework, evolving from the previous Caffe library.This is a hands-on lab of Caffe2. You'll learn how to design, train and deploy state-of-the-art deep learning models, use GPUs to achieve large-scale distributed training, and learn ways to incorporate such deep learning into applications. For Caffe users, you'll also learn how to seamlessly migrate your current Caffe models to Caffe2 and keep productive. In more detail, the lab will cover: • Introductory material on deep learning, its motivations and background • Migration from Caffe to Caffe2 • Training convolutional models for image classification. • Recurrent Neural Network examples and demos for natural language processing • Efficient deep learning & distributed training with multiple GPU machines 120 Instructor-Led Lab Yangqing Jia, Research Scientist, Facebook
Pieter Noordhuis, Software Engineer, Facebook
Alexander Sidorov, Software Engineer, Facebook
S7493 - DNA for Automated Driving We'll showcase an architecture that enables discrete driver assistance systems to all work in tandem. This framework is enabling automakers to develop complex systems more quickly and efficiently, reducing time to market for ADAS functionality. As part of our discussion we'll share a reference implementation that demonstrates a valet parking function, which was built by using the architecture and accessing maps from the cloud.   25-minute Talk Jeremy Dahan, Innovation Project Manager, Elektrobit
S7136 - DNA Sequences Alignment in Multi-GPUs: Energy Payoff on Speculative Executions Find out the energy cost of launching speculative executions when handling data dependencies to enhance parallelism on multi-GPU platforms. We present CUDAlign 4.0 as case study, a multi-GPU execution for an optimal alignment of huge DNA sequences using the exact Smith-Waterman algorithm. Our speculative approach easily attains 10-20x speed-up versus the baseline pipelined version where GPUs are idle waiting for dependencies to be solved. But working on mispredictions, GPUs waste energy. In the green computing era where GFLOPS/w is the trending metric, we need to know which is worse: wasting time or power. Our experimental study analyzes speculation hit ratios to evaluate extra performance and measures energy spent on mispredictions, to conclude to what extent the speculative approach jeopardizes the GFLOPS/w ratio. 25-minute Talk Manuel Ujaldon, Full Professor and NVIDIA CUDA Fellow, University of Malaga (Spain), Computer Architecture Department
S7624 - Driver Monitoring: A Deep Learning Approach for Gaze Estimation A driver monitoring camera will be a valuable component when it comes to autonomous driving for levels 3 ? 4. The camera is able to distinguish the area of the drivers' attention. For this purpose the estimation of the gaze of the driver is needed. Additionally to signal "eyes on road," the user experience for HMI can be significantly improved. We'll present a deep learning approach that trains a neural network in an end-to-end manner. Small patches of the eye serve as input to a convolution neural network. The tradeoff between a deep and shallow net is an important aspect when it comes to a commercial product. The massive use of GPUs can help to find the best tradeoff between accuracy and number of needed FLOPS as well as the best suited DNN architecture. 25-minute Talk Cornelius Wefelscheid, Machine Learning Expert - Advanced Development, Leopold Kostal GmbH & Co. KG
S7427 - DriveWorks: A Look Inside NVIDIA's Autonomous Driving SDK We'll introduce NVIDIA DriveWorks, a software development kit for autonomous driving and processing sensor data through perception, mapping, localization, and path planning steps. DriveWorks provides a rich set of functionalities: sensor abstraction layer, algorithm modules, DNNs, applications, UI and tools for sensor setup and management. The SDK is modular, optimized for GPUs, and runs on top of OS, CUDA/cuDNN, TensorRT, and VPI. This is the foundation for developers working on autonomous vehicle applications, and the session will highlight how to leverage it. 50-minute Tutorial Miguel Sainz, Senior Director, NVIDIA
Gaurav Agarwal, null, null
S7781 - Driving Shareholder Value in the Enterprise with GPU Hardware AI is moving from consumer applications to the enterprise and will soon affect all parts of operations from the customer to the product to the enterprise. Stephen Pratt, the CEO of and former head of Watson for IBM GBS, presents a shareholder value perspective on why enterprise artificial intelligence will be the single largest competitive differentiator in business over the next five years?and what you can do to end up on top:(1)A framework for why AI will be key to creating shareholder value,(2)How to determine where to start and how to progress (with case studies),(3)How to manage spread of AI in your enterprise (with lessons from the past), (4)How to ensure proper adoption of AI solutions, and (5)Early results of applying the DGX-1 to business process optimization challenges. 25-minute Talk Stephen Pratt, CEO,
S7449 - Driving the Assembly of the Zebrafish Connectome through Deep Learning Tracing pathways through large volumes of data is an incredibly tedious, time-consuming process that significantly encumbers progress in neuroscience and the tracing of neurons through an organism. We'll explore the potential for applying deep learning to the automation of high-resolution scanning electron microscope image data segmentation. We've started with neural pathway tracing through 5.1GB of whole-brain serial-section slices from larval zebrafish collected by the Center for Brain Science at Harvard. This kind of manual image segmentation requires years of careful work to properly trace the neural pathways in an organism as small as a zebrafish larvae, which is approximately 5mm in total body length. Automating this process could vastly improve productivity, which would lead to faster data analysis and more breakthroughs in understanding the complexity of the brain. 50-minute Talk Nick Nystrom, Senior Director of Research, Pittsburgh Supercomputing Center
Ishtar Nyawira, Co-President, Timmy Global Health: Pitt Chapter, University of Pittsburgh
S7124 - Drone Net: Using Tegra for Multi-Spectral Detection and Tracking in Shared Air Space The challenge and opportunity presented by use of UAS "drones" in the national airspace has historic significance. The FAA estimates that by 2020 the drone market will be $98 billion with 7 million drones added annually. How drones ranging from professional service to hobby will safely share airspace is unclear. Preliminary research at Embry Riddle to develop a drone detector, which can be placed on rooftops and networked with other detectors and information services, has shown that multi-spectral electro-optical/infrared detection is quite effective. Our team is using NVIDIA Jetson systems in an EO/IR detector system. The NVIDIA Kepler architecture-based NVIDIA Tegra co-processor provides real-time object detection for aircraft and drones using salient object detection algorithms accelerated by GPUs. We'll present the power efficiency and real-time processing advantages GP-GPU provides compared to FPGA and multi-core, which we've also tested for this application. 25-minute Talk Sam Siewert, Assistant Professor, Embry-Riddle Aeronautical University
S7596 - DSD: Dense-Sparse-Dense Training for Deep Neural Networks Learn a new technique to prevent deep learning optimizers from getting stuck in a local minima, and to produce better optimization results. We'll introduce DSD, a dense-sparse-dense training method that regularizes neural networks by pruning and then restoring connections. Our method learns which connections are important during the initial dense solution. Then it regularizes the network by pruning the unimportant connections and retraining to a sparser and more robust solution with same or better accuracy. Finally, the pruned connections are restored and the entire network is retrained again. This increases the dimensionality of parameters, and thus model capacity, from the sparser model. DSD training achieves superior optimization performance. We'll highlight our experiments using GoogLeNet, VGGNet, and ResNet on ImageNet; NeuralTalk on Flickr-8K; and DeepSpeech-1&2 on the WSJ dataset. This shows that the accuracy of CNNs, RNNs, and LSTMs can significnatly benefit from DSD training. At training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn't change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD in our numerical experiments highlights the inadequacy of current deep learning training methods, while DSD effectively achieves superior optimization performance for finding better solutions. 25-minute Talk Song Han, Ph.D. candidate, Stanford University
S7176 - Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Networks We propose to use recurrent neural networks for analyzing facial properties from videos. Facial analysis from consecutive video frames, including head pose estimation and facial landmark localization, is key for many applications such as in-car driver monitoring, facial animation capture, and human-computer interaction. Compared with the traditional Bayesian filtering methods for facial tracking, we show RNNs are a more generic, end-to-end approach for joint estimation and tracking. With the proposed RNN method, we achieved state-of-the-art performance for head pose estimation and facial landmark localization on benchmark datasets. 25-minute Talk Jinwei Gu, Senior Research Scientist, NVIDIA
L7105 - Easy Camera interoperability with CUDA and OpenGL using EGLStreams These will be the key takeaway from the lab:  1)  Participants will get an overview of eglstreams implementation 2)  We will talk about a wrapper over eglstreams which is easy to plug and play 3)  We will describe how to create an eglstream camera producer and how to connect it to an eglstream CUDA consumer. Consumer will do CUDA processing on frame received from camera. 4)  We will describe the means of connecting an eglstream camera producer to an eglstream OpenGL consumer. 5)  We will describe a means to have multiple eglstreams at the camera producer and different ways to connect these to CUDA and OpenGL consumers. 6)  We will also talk about cross process and cross partition eglstreams Platform requirements : TX1 with E3326 camera   120 Instructor-Led Lab Yogesh Kini, Manager, System Software, NVIDIA
Senthil Ramalingam, Software Engineer, NVIDIA
Praveen K, System Software Engineer, NVIDIA
Venugopala Madumbu, Software Architect and Engineering Manager, NVIDIA
S7802 - Edge-AI for Intelligent User Experience We'll showcase how Mercedes-Benz is enabling edge AI in the car by utilizing powerful embedded hardware for sensor processing and fusion in the cabin interior. The focus of AI work today has been dominated by the cloud environment. The availability of computation power, combined with technologies for scaling with massive datasets, makes the cloud a perfect ecosystem for the application of AI technologies. However, there are a myriad of AI applications today that can’t fully live on the cloud, such as an AI application in a moving vehicle where connectivity to the cloud is not guaranteed. In such cases, AI in the edge computing space faces a number of challenges not always present in today's cloud environment. Chief among them is a sense of autonomy: when the edge AI encounters problems that require prompt decision making, the problems have to be resolved by its own intelligence. We’ll talk about how Mercedes-Benz is enabling edge AI to address this issue.  25-minute Talk Kal Mos, VP Connected Car, User Interaction & Telematics, Mercedes-Benz Research and Development North America
S7543 - Effectively Scaling Deep Learning Frameworks to 40 GPUs and Beyond A variety of deep learning frameworks now make it simple to train deep neural networks of many types. However, scaling deep learning frameworks to large models with data parallel training on many GPUs remains a challenge, as the default utilities for inter-device and inter-node communication provided by these frameworks are often not optimal. Using examples from several frameworks, we demonstrate that linear strong scaling to many nodes and many devices can be achieved augmenting deep learning frameworks with CUDA-aware MPI allreduce and allgather operations, which allow them to be used in an HPC setting where multi-GPU nodes are augmented with high-speed Infiniband interconnects. We'll show that these operations allow us to quickly train very large speech recognition models. 25-minute Talk Andrew Gibiansky, Machine Learning Engineer, Baidu SVAIL
S7240 - Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs We'll present a method for highly efficient lattice Monte Carlo simulations with correlation-free updates. Achieving freedom from erroneous correlations requires random selection of lattice sites for updates, which must be restricted by suitable domain decomposition to create parallelism. While approaches based on caching limit the number of allowed states, the multisurface-type approach presented here allows arbitrarily complex states. The effectiveness of the method is illustrated in the fact that it allowed us to solve a long-standing dispute around surface growth under random kinetic deposition in the KPZ-universality class. The method has also been applied to Potts models and is suitable for spin-glass simulations, such as those required to test quantum annealers, like D-Wave. 25-minute Talk Jeffrey Kelling, Scientist, Helmholtz-Zentrum Dresden-Rossendorf
S7130 - Efficient Deep Model Selection Convolutional neural networks have achieved impressive success in many tasks in computer vision. However, they come at a high memory and computational cost, thus making it difficult for deep learning to be commercially viable. In addition, selecting the architecture is still an engineering process. We'll introduce DecomposeMe, an efficient architecture based on filter-compositions. This architecture can be trained quickly and is capable of achieving real-time operation in embedded platforms (250+ fps in an NVIDIA Jetson TX1). We'll also introduce our approach to automatically determining the number of neurons of the architecture during the training process. Finally, we'll introduce a novel approach to quantizing the network parameters. 25-minute Talk Jose Alvarez, Researcher, Commonwealth Scientific and Industrial Research Organisation(CSIRO)
S7125 - Efficient Imaging in Radio Astronomy Using GPUs Realizing the next generation of radio telescopes such as the Square Kilometre Array requires both more efficient hardware and algorithms than today's technology provides. We'll present our work on the recently introduced Image-Domain Gridding (IDG) algorithm that tries to avoid the performance bottlenecks of traditional AW-projection gridding. We'll demonstrate how we implemented this algorithm on various architectures. By applying a modified roofline analysis, we show that our parallelization approaches and optimization leads to nearly optimal performance on all architectures. The analysis also indicates that, by leveraging dedicated hardware to evaluate trigonometric functions, NVIDIA GPUs are much faster and more energy-efficient than regular CPUs. This makes IDG on GPUs a candidate for meeting the computational and energy-efficiency constraints for future telescopes. 25-minute Talk Bram Veenboer, PhD Researcher, Astron
S7544 - Efficient Inference for WaveNet Audio Synthesis Models WaveNet is a generative neural network architecture for audio in the time domain. Due to the high sampling frequency of audio signals and the sequential dependencies between timesteps, inference in a WaveNet model is incredibly expensive, and can take many minutes to generate a single second of audio with an unoptimized implementation. We implement custom WaveNet inference kernels and demonstrate that an efficient implementation on a CPU or a GPU can provide faster than realtime audio generation, even though neither platform is perfectly suited to such a task due to the effective lack of parallelism and high compute requirements. To our knowledge, this is the first demonstration that neural audio generation can be done efficiently enough to deploy in a production text-to-speech system. 50-minute Talk Andrew Gibiansky, Machine Learning Engineer, Baidu SVAIL
S7370 - Efficient Maximum Flow Algorithm and Applications Maximizing data flow is one of the most important graph problems and has numerous applications across various computational domains: transportation networks, power routing, image segmentation, social network clustering, and recommendation systems. There are many efficient algorithms that have been developed for this problem, most of them trying to minimize computational complexity. However, not all these algorithms map well to massively parallel architectures like GPUs. We'll present a novel GPU-friendly approach based on the MPM algorithm that achieves from 5 to 20 times speedup over the state-of-the-art multithreaded CPU implementation from Galois library on general graphs with various diameters. We'll also discuss some real-world applications of the maximum flow problem in computer vision for image segmentation and in data analytics to find communities in social networks. 25-minute Talk Nikolay Sakharnykh, Senior Developer Technology Engineer, NVIDIA
Hugo Braun, MSc, Ecole Polytechnique
S7153 - Efficient Observations Forecast for the World's Biggest Eye Using DGX-1 Have you heard about the largest ground-based telescope ever built? Are you interested in the newest NVIDIA DGX-1 hardware accelerator? Come and learn how the DGX-1 architecture dramatically leaps forward the computational astronomy community in designing major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, we'll explain how the resulting matrix computations associated with an efficient task-based programming model help design the next generation of telescope instruments. 50-minute Talk Hatem Ltaief, Senior Research Scientist, KAUST
Damien Gratadour, Associate Professor, Universite Paris Diderot & Observatoire de Paris
S7515 - Eliminating the Regular Expression with Neural Networks Regular expressions are as old as computing itself. Our deep learning-based approaches aim to retire this tool from the modern data scientist's tool bag. The regular expression is often introduced to computer scientists as part of their early college education, often in their first discrete structures course. In this context, they are an incredible tool used to describe languages, grammars, and syntax. In practice though, developers all over the world use them to detect data types or parse certain structures. Even for common use cases such as email or phone validation, regular expressions that capture the full breadth of cases can become untenably large. We show how neural networks can learn approximation of regular expressions so that modern data scientists and developers never have to write one again. 25-minute Talk Tim Delisle, CEO, Datalogue
S7190 - Embedded Bayesian Perception and V2X Communications for Autonomous Driving We'll present technologies developed by the Inria Chroma team that robustly perceive and interpret dynamic environments using Bayesian systems (such as BOF, HSBOF, and CMCDOT) relying on embedded sensors input and V2X communications (vehicle to vehicle and vehicle to infrastructure). These technologies were initially developed in collaboration with industrial partners such as Toyota, Renault, and Probayes SA. We'll demonstrate how heterogeneous sensors can be used efficiently, merged, and filtered in real time into probabilistic grids, and discuss how to compute collision risks in an optimized way on embedded GPU platforms like the NVIDIA Jetson.  25-minute Talk Christian Laugier, First Class Research Director , Inria Grenoble
S7505 - Enable GPU-Accelerated Simulation Practices on the Cloud with Rescale We'll review the benefits of leveraging NVIDIA GPU technology through Rescale, a cloud-based simulation platform. Through concrete engineering use cases and benchmark results, we'll illustrate performance gains with GPUs across a large selection of simulation software. 25-minute Talk fanny Treheux, director of solutions, Rescale
S7846 - Enabling Intelligent Enterprises with SAP Clea We'll talk about how SAP is realizing its vision to make enterprise applications intelligent. We'll provide a glimpse of the breadth of machine learning use cases SAP addresses through its Clea Portfolio. Then take a deep dive into one of the applications with a detailed business process view. We'll then provide a detailed view of the underlying technology stack and how Nvidia GPUs are enabling SAP to build machine learning solutions at scale. 25-minute Talk Markus Noga, Vice President, Machine Learning, SAP
S7708 - Enabling Scientific Discovery with Large-Scale Interactive Visualization and Tiled Displays We'll focus on leveraging large-scale visualization and large tiled displays to enable scientific discovery. We'll present a case study where domain scientists evaluate the complicated, hierarchical microstructure of enamel in primate teeth to gain insight into the principles governing the evolution of mineralized biological tissues. We integrate X-ray micro-tomography with large-scale visualization and analysis techniques to explore the internal structure of mineralized biological tissues. In an interactive visualization session, we'll bring domain scientists and visualization experts together to collaborate. We'll explore a high-resolution visualization streaming from a GPU-based visualization cluster on a large tiled display, along with a distributed global illumination algorithm, which helps scientists improve depth perception in rendered images. Analyzing the data interactively, domain scientists are able to identify structures previously unseen in the data. 25-minute Talk Silvio Rizzi, Assistant Computer Scientist, Argonne National Laboratory
S7686 - Encrypted Deep Learning: A Guide to Privacy-Preserving Speech Processing In today's cloud, to make your data searchable, you give up its contents to your cloud provider, even if they then encrypt it. While you gain the speed and power of the cloud, you do so by sacrificing the privacy of your data, a common barrier to cloud adoption. Hence, to encourage the migration of sensitive data from behind the firewall to the cloud, we need to process that data without ever decrypting it. We'll demonstrate the state of the art of processing encrypted data using GPU-accelerated cloud. We'll also present a roadmap for near-future plans for cryptographic schemes for secure transcription. Inspired by fully homomorphically encrypted convolution nets for secure image processing, so-called CryptoNets, we'll demonstrate a CNN-based acoustic model and discuss in broader terms how the CryptoNet idea extends to other types of deep learning network, such as RNNs. 25-minute Talk Nigel Cannings, CTO, Intelligent Voice
S7415 - Enhance Multi-Contrast MRI Reconstruction for Improved Diagnosis with Deep Learning Powered by NVIDIA GPUs Advanced computation powered by GPUs is changing the clinical decision-making process. We'll present an exciting example of using NVIDIA GPUs for multi-contrast magnetic resonance imaging exams. Neurological disorders result in great clinical challenges and high societal burdens. Multi-contrast MRI exams are frequently used for diagnosis because the various tissue contrasts provides complementary diagnosis information to distinguish normal tissue from pathology. However, the cost of acquiring these multiple sequences is extensive scanning time, which significantly increases both the diagnosis cost and patients' discomfort and limit the acquired image quality. We'll propose a new approach to accelerate multi-contrast imaging using a deep learning approach powered by GPUs. Validated on both patients and healthy subjects, we'll demonstrate that we can significantly reduce scanning time while improving image resolution and quality and preserving the diagnostic information. 25-minute Talk Enhao Gong, PhD Candidate, Stanford University
S7138 - Enhancing Pricing Performance and Quants Productivity in a Cloud Based Development Environment Misys quants use a groovy-based DSL to write efficient GPU-enabled pricing models without any OpenCL or NVIDIA CUDA knowledge. Allowing progressive migration from legacy code to GPU-enabled models, this framework leverages GPGPU strengths to achieve high-performance pricing with a really short learning curve. We'll start with an overview of the framework, and then focus on the online ecosystem Misys provides to allow third parties to develop and run their custom code on GPUs in the cloud through a PaaS-like interface. 25-minute Talk Nicolas Blanc, Software Engineer, Misys
S7589 - Enterprise AR: Industry Opportunities and Technology Challenges We'll discuss the state of the art and upcoming opportunities and challenges for AR in the enterprise. We'll focus on how enterprise end-users are using AR to accelerate their workflows and reduce project costs; how ISVs are developing new applications and UX models to leverage AR technology, and the challenges they face in UI design and in developing for cutting-edge technology; and the technical and design challenges that AR headset manufacturers are facing as they create portable, powerful displays that smoothly integrate with enterprise workflows. Application areas will include service and maintenance (for example, automotive, BIM), education, and product and building design. 50 minutes Panel Ryan Pamplin, VP, Partnerships and Sales, Meta
Kyle Szostek, Sr. Virtual Construction Engineer, Gilbane Building Company
Dace Campbell, Senior Customer Success Manager, Autodesk
Eric Trabold, VP Sales & Marketing, Avegant
William Newell, CEO, North South Studios
S7747 - Envrmnt: Real-Time Streaming VR Learn how Verizon's R&D built a VR graphics engine and platform that streams HD video and game experiences to massive audiences using GPU scaling and streaming techniques. We'll share architecture and configuration that enables us to serve real-time networked game and augmented reality experiences. We'll also discuss how GameWorks VR was instrumental in our rendering pipeline and how GPUs are being used in our cloud and network to enhance streaming VR. Finally, we'll walk through a 15-minute example of Envrmnt's tools and show a demo of a livestreamed networked Vive experience. 25-minute Talk Mohammad Raheel Khalid, CTO / Chief Engineer, Verizon Labs
S7706 - Essential CUDA Optimization Techniques - Presented by Acceleware (Session 4 of 4) This tutorial is for those with some background in CUDA, including an understanding of the CUDA memory model and streaming multiprocessor. Our previous three tutorials provide the background information necessary for this session. This informative tutorial will provide an overview of the analysis performance tools and key optimization strategies for compute, latency, and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. It'll also include code examples and a programming demonstration highlighting the optimal global memory access pattern applicable to all GPU architectures. We'll provide printed copies of the material to all attendees for each session ? collect all four! 80-minute Tutorial Chris Mason, Technical Product Manager, Acceleware Ltd.
S7181 - Evaluating Windows 10: Learn Why Your Users Need GPU Acceleration Learn why EVERY remote user should have GPU resources available to them. We'll discuss the advantages end-users experience once their virtual desktops/sessions have GPU capabilities. Recent data from the NVIDIA GRID Performance Engineering team shows a significant impact GPUs like the Tesla M10 has on knowledge workers. The data includes real user testing and scientific data like latency, bandwidth, and CPU utilization, which all play a significant role in the overall user experience. 50-minute Talk Uday Kurkure, Staff Engineer, VMware
Lan Vu, Senior Member of Technical Staff, VMware
Hari Sivaraman, Staff Engineer, VMware
Jason Kyungho Lee, Sr. Performance Engineer, NVIDIA GRID, NVIDIA
S7430 - Expert and Customer Roundtable: GPU-Accelerated Desktops and Apps with NVIDIA GRID and Citrix XenDesktop Experts from various industries join us for a roundtable discussion of their experiences implementing GPU-accelerated virtual desktops and apps. Learn: (1) how Windows 10 is creating new urgency around including GPUs in your VDI deployment architecture, (2) how to design your environment for greater scale, superior user experience, and lower cost, and (3) how the latest features in Citrix XenDesktop and NVIDIA GRID make desktop virtualization for every use case a reality. 50-minute Talk Luke Wignall, Sr Mgr Perf Eng / Tech Marketing, NVIDIA
S7429 - Expert and Customer Roundtable: Real-World Tales of GPU-Accelerated Desktops and Apps - Implementers Share Best Practices Experts from various industries join us for a roundtable discussion of their experiences implementing GPU-accelerated virtual desktops and apps. You'll learn how Windows 10 is creating new urgency around including GPUs in VDI deployment architectures; how to design environments for greater scale, superior user experience, and lower cost; and how the latest features in VMware Horizon and NVIDIA GRID can make desktop virtualization for every use case a reality. 50-minute Talk Huong Vu, Director Engineer, Cerner
Luke Wignall, Sr Mgr Perf Eng / Tech Marketing, NVIDIA
Pat Lee, VP Product Management, VMware
Stuart Jackson, Sr. Technology Architect, Cerner
S7175 - Exploratory Visualization of Petascale Particle Data in NVIDIA DGX-1 Learn to leverage the visualization capabilities of the NVIDIA DGX-1 system to visualize particle data. We'll cover techniques suitable for exploratory visualization such as parallel dataset reading and reduction on demand with ADIOS I/O library, GPU-based optimization techniques for particle rendering such as radar view frustum culling, occlusion culling, texture-less point sprites, and OpenGL near zero driver overhead methods. We'll also include implementation details to take advantage of the eight NVIDIA Pascal? GPUs included in the NVIDIA DGX-1. 25-minute Talk Benjamin Hernandez, Computer Scientist, Oak Ridge National Laboratory
S7688 - Exploring Machine Learning in Visual Effects Some aspects of visual effects production are ideally suited to using machine learning technology. Whether it's coming from the digital cameras on set or from motion capture session or other sources, huge amounts of data are captured during the production of a movie. Models are built to modify this data or create new effects from it. Instead of building these models by hand, can machine learning systems be trained to do the same thing? We'll present active research projects where we are using machine learning to either accelerate a process in visual effects or allow the artists to create novel visual effects. This is definitely a work in progress report, some of the techniques show promise but are not fully developed at this time. 25-minute Talk Doug Roble, CEO, Digital Domain
S7553 - Exploring Sparsity in Recurrent Neural Networks Recurrent neural networks are widely used to solve a variety of problems. As the quantity of data and the amount of available compute have increased, model sizes have also grown. We'll describe an approach to reduce the parameter count of RNNs using a simple pruning schedule without increasing the training time. The reduction in parameters achieves two goals. It helps reduce the size of the neural network, allowing it to be deployed on mobile and embedded devices. It also helps speed up evaluation time for inference. We'll demonstrate how this technique works for vanilla RNNs and the more complex gated recurrent units. 25-minute Talk Sharan Narang, Researcher, Baidu
S7608 - Exploring the Latent Visual Space Between Adjectives with Generative Adversarial Networks Generative adversarial networks (GANs) have been applied for multiple cases, such as generating images and image completion. One interesting feature of GANs is the exploration in latent space, where new elements can appear caused by the interpolation between two seed elements. With this in mind, we're interested in exploring latent space in terms of adjective-noun pairs (ANP) able to capture subjectivity in visual content such as "cloudy sky" vs. "pretty sky." Although it is challenging for humans to find a smooth transition between two ANPs (similar to color gradient or color progression), the presented GANs are capable of generating such a gradient in the adjective domain and find new ANPs that lie in this (subjective) transition. As result, GANs offer a more quantified interpretation for this subjective progression and an explainability of the underlying latent space. 50-minute Talk Federico Raue, Researcher, German Research Center for Artificial Intelligence (DFKI)
Damian Borth, Director Deep Learning Competence Center, German Research Center for Artificial Intelligence (DFKI)
S7572 - Extending Mahout-Samsara Linear Algebra DSL to Support GPU Clusters Data scientists love tools like R and Scikit-Learn, as they offer a convenient and familiar syntax for analysis tasks. However, these systems are limited to operating serially on datasets that can fit on a single node and don't allow for distributed execution. Mahout-Samsara is a linear algebra environment that offers both an easy-to-use Scala DSL and efficient distributed execution for linear algebra operations. Data scientists transitioning from R to Mahout can use the Samsara DSL for large-scale data sets with familiar R-like semantics. Machine learning and deep learning algorithms built with the Mahout-Samsara DSL are automatically parallelized and optimized to execute on distributed processing engines like Apache Spark and Apache Flink accelerated natively by CUDA, OpenCL, and OpenMP. We'll look at Mahout's distributed linear algebra capabilities and demonstrate an EigenFaces classification using Distributed SSVD executing on a GPU cluster. Machine learning practitioners will come away from this talk with a better understanding of how Samsara's linear algebra environment can help simplify developing highly scalable, CPU/GPU-accelerated machine learning and deep learning algorithms by focusing solely on the declarative specification of the algorithm without having to worry about the implementation details of a scalable distributed engine or having to learn to program with native math libraries. 25-minute Talk Suneel Marthi, Senior Principal Engineer , Redhat Inc
Trevor Grant, Open Source Analytics Technical Evangelist Committer, Apache Mahout Project, IBM
S7691 - Facial Expression and Emotion Detection for Mobile We'll outline how Affectiva employs CNN-based approaches for the task of detecting individual facial movements (facial actions) from real-world data. Affectiva's mission is to humanize technology by bringing artificial emotional intelligence (emotion AI) to the digital world. Using computer vision and deep learning, Affectiva measures facial expressions of emotion. We'll discuss challenges encountered and advantages from using deep learning models as well as share experimental results. Models explored will include those trying to push accuracy as well as the tradeoff incurred in trying to run smaller models that can operate in environments with more constraints (such as mobile). 25-minute Talk Jay Turcot, Director of Applied AI, Affectiva
S7314 - Fast Flow-Based Distance Quantification and Interpolation for High-Resolution Density Distributions We'll discuss our GPU-targeted algorithm design for the efficient computation of distances and interpolates between high-resolution density distributions (based on the Earth Mover's Distance / the Wasserstein metric). We particularly focus on the changes - and their rationale - to transition from our previous multicore approach to a manycore design (utilizing NVIDIA?CUDA? CUB, and Thrust) that yields a massive improvement in performance. Expressive distances and interpolates are a crucial building block for numerous applications in computer vision, computer graphics, and visualization, and we'll give examples from different areas to demonstrate both utility and performance of our improved approach. 25-minute Talk Steffen Frey, Postdoc, University of Stuttgart, Visualization Research Center
S7480 - Fast Forward Poster Program for the Top 20 Posters GTC Fast Forward Poster program is an accelerated poster presentation program that serves as a catalyst for the advancement of an array of innovations that come from universities, research labs, and industry. The GTC Poster Review Committee selected the best 20 posters submitted to GTC2017. This program gives the author a chance to present his or her GPU project in front of the top technology developers working in a vast array of industries. 80-minute Tutorial
S7268 - Fast GPU Monte Carlo Simulation for Radiotherapy, DNA Ionization and Beyond Learn about techniques used to accelerate a Monte Carlo particle physics simulator. The strategies discussed include sorting to minimize thread divergence and data structures for efficient memory access. The software, named MPEXS, is primarily focused on X-ray radiotherapy and has been recently extended to cellular and DNA levels. Simulation of DNA ionization is particularly challenging, because large numbers of low energy particles have to be managed. Implementation of these strategies has both improved the run-time performance and reduced the memory usage. The results from the performance analysis are likely to be of use in other domains that rely on discrete event simulation. Extension of physics coverage for proton and carbon therapy and neutron radiation protection is envisioned. 50-minute Talk Shogo Okada, Research Associate, Kobe University
Nick Henderson, Research Associate, Stanford University
S7303 - Finding Parallelism in General-Purpose Linear Programming Get to know two different techniques in retrieving parallelism hidden in a general purpose linear programs (LPs) that are broadly used in operations research, computer vision, and machine learning. With conventional solvers often being restricted to serial computation, we'll show two ways of retrieving inherent parallelism, using: (1) parallel sparse linear algebra techniques with an interior-point method, and (2) a higher-level automatic LP decomposition. After a quick introduction to the topic, we'll present details and results for a diverse range of applications on the GPU. 25-minute Talk Daniel Thuerck, Ph.D. Student, Technical University Darmstadt, Graphics, Capture and Massively Parallel Computing
Maxim Naumov, Senior Research Scientist, NVIDIA
S7607 - Floating Point Array Compression on the GPU To increase performance, high-performance systems are adopting a heterogeneous approach through the use of accelerators (for example, GPUs). These accelerators provide this performance increase with massive parallelization. Unfortunately, these HPC systems, with or without accelerators, are hitting a wall: an increasing divergence between compute and bandwidth. As core counts have increased and bandwidth at all levels of the system have stagnated, data movement has become the bottleneck for performance at multiple places between subsystems: storage, network, accelerator, and memory levels. To address these bandwidth issues in heterogeneous systems, we developed a lossy fixed-rated compression algorithm, cuZFP, for the GPU. The ZFP compressor specifically addresses the needs of lossy compression for high-performance floating point data like those used in scientific codes. By extending lossy compression to the GPU, the compression is up to an order of magnitude faster than the CPU version. Further, bandwidth limitations can be eased directly on the accelerator without copying the data back to the CPU. 25-minute Talk Mark Kim, Postdoctoral Researcher, Oak Ridge National Lab
S7196 - FMM with Periodic Boundaries Support on GPU The direct solution of the N-body problem is a simple, yet scientifically important and ubiquitous showcase algorithm for modern GPUs. However, the computational complexity is O(N^2). The fast multipole method is an algorithm that reduces runtime and complexity to optimal O(N) for any required precision. We'll present an optimized, fully NVIDIA CUDA-enabled, templated C++ implementation of the FMM, which considers all stages of the method, from particle input to the forces extraction. We compare different parallelization approaches and show the performance improvement when going from a dynamic parallelization to a presorted list-based approach that fits particular system constraints such as periodic boundary conditions. We'll discuss how to exploit the FMM operators such that both memory access overhead and the number of complex multiplications are minimized. Thereby the kernels are led to the compute bound range, and performance is increased. 25-minute Talk Bartosz Kohnke, Software Developer, Max Planck Institute for Biophysical Chemistry
S7742 - Frame Cloud Workstation Platform: The Promise and the Reality of Cloud Graphics We are still in the early days of the cloud graphics revolution, but things are about to change dramatically. Major cloud providers, like AWS, Microsoft Azure, and Google Cloud, are all rapidly adding or upgrading GPU capabilities. Great user experience, low-latency application delivery, and strong security of a cloud workspace environment are drawing interest from millions of enterprise users around the world. We'll share our experiences from 3+ years on the forefront of the cloud graphics movement, from the early days of GPUs on AWS in 2013, through the recent launch of N-Series on Microsoft Azure, in December. We'll present encoding and rendering benchmarks, share the details of Frame's graphics stack, and profile NVIDIA optimizations. Finally, we'll share customer stories from global enterprise leaders, like PTC, HP, and Adobe, who all use Frame to power their cloud applications delivery services.  25-minute Talk Justin Boitano, VP of Marketing, Frame
Nikola Bozinovic, CEO, Frame
S7575 - From Cracks to Hard Hats: Focusing on Industrial Computer Vision We'll present, in a case study driven presentation, specific examples of how GPU-enabled deep neural networks are powering new methods for analyzing the content of photos and videos from industrial contexts. First, we'll present a collaboration between and Engineering News-Record, the leading publication in the architecture, engineering, and construction vertical. This ongoing initiative leverages computer vision techniques and semantic approaches to help identify and indicate safe and unsafe situations in jobsite photos. Second, we'll present a collaboration with Arup, a London-based engineering firm, on the use of specific classifiers to localize and measure cracks and related defects in infrastructure. 25-minute Talk Sean TRUE, Director of Machine Learning,, Inc.
Josh Kanner, Founder & CEO,, Inc.
S7244 - From Desktop to Cloud to Embedded GPUs: Designing, Training, and Compiling Vision Algorithms and Deep Learning Using MATLAB Learn how to adopt a MATLAB-centric workflow to design, develop, and deploy computer vision and deep learning applications on to GPUs whether on your desktop, a cluster, or on embedded Tegra platforms, including Jetson TK1/TX1 and DRIVE PX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB. Next, those networks are trained using MATLAB's GPU and parallel computing support either on the desktop, a local compute cluster, or in the cloud. Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. We'll use examples of common computer vision algorithms and deep learning networks to describe this workflow, and we'll present their performance benchmarks, including training with multiple GPUs on an Amazon P2 cloud instance. 50-minute Talk Avi Nehemiah, Product Manager- Computer Vision and Automated Driving, MathWorks
Joss Knight, Senior Developer, MathWorks Ltd
Girish Venkataramani, Development Manager, MathWorks
S7675 - From Model to Product: How Did Infervision Become Radiologists' Real Vision? A model is different from a real product. We'll share Infervision's journey from designing algorithms for medical image analysis to actually implementing models inside hospital's PACS systems. A product is different from a model on three aspects: (1) Products make a real difference. Robustness, reliability, and accuracy are no longer simple numbers reported in articles, but criteria that judge the efficacy of algorithms from time to time; (2) Products solve real problems. Models service deep learning science, whereas products service medical decisions. When designing a medical image diagnosis product, we need to identify radiologists' real need and solve problems that matter to clinical decisions. (3) Products take into account all complexities in a real application context. We'll give a brief introduction of China's medical system with an emphasis on radiology imaging diagnosis. We'll also share some challenges and achievements Infervision experienced when attempting to insert A.I. products into radiologists' daily work flow. 50-minute Talk Kuan Chen, CEO, Infervision
S7771 - From PLM to Virtual Reality: The Future Design Pipeline Where does VR fit into the workflow? What are the challenges with today's VR systems and the legacy pipeline? How can these new capabilities be used specifically and what value would it add to the design pipeline? We'll discuss how NVIDIA's VR visualization tools are enhancing the complexity and realism of design, showing example workflows from recent projects. 25-minute Talk Tim Bates, Chief IT Visualization Strategist & Architect, General Motors
L7118 - From Trained Neural Network Model to Deployment for Inference (Presented by NVIDIA Deep Learning Institute) NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. This lab provides hands-on experience using TensorRT to optimize, validate, and deploy trained neural networks for inference in a self-driving car application. Prerequisites: C/C++ programming and basic knowledge of deep learning. 120 Instructor-Led Lab Joohoon Lee, Certified Instructor, NVIDIA
Steve Byun, Certified Instructor, Deep Learning Institute, NVIDIA
Chris Gottbrath, Accelerated Computing Product Manager, NVIDIA
S7372 - Functional Safety: Developing ISO 26262 Compliant GPU Applications Functional safety is an important consideration for many applications of GPU computing, especially autonomous driving, robotics, and healthcare. We'll cover what it means to be compliant with current functional safety standards, learn the basics of functional safety, and uncover how the prevailing standard, ISO26262, can apply to GPUs and GPU programming. Often the development of an application’s core features takes precedence, leaving functional safety considerations until the end of the development cycle. If functional safety is considered and planned from the start, results can improve while cost decreases. We'll explain the support that NVIDIA has implemented inside GPUs for functional safety and the various tools and methodologies that are available to support ISO26262 compliance for both hardware and software.   25-minute Talk Richard Bramley, GPU Architecture: Functional Safety Architect, NVIDIA
S7235 - Fusing Vision and 3D Sensors with AI to Build Cognition Systems Learn how to use GPUs to run 3D and camera deep learning fusion applications for autonomous driving. Cameras provide high resolution 2D information, while lidar has relatively low resolution but provides 3D data. Smart fusing of both RGB and 3D information, in combination with AI software, enables the building of ultra-high reliability classifiers. This facilitates the required cognition application for semi-autonomous and fully autonomous driving.   50-minute Talk Youval Nehmadi, CTO, VayaVision
Ido Goren, SW Manager, VayaVision
S7169 - GA3C: A Hybrid CPU/GPU Implementation of A3C for Deep Reinforcement Learning We'll introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We'll analyze its computational traits and concentrate on the critical aspects to leverage the GPU's computational power. We'll introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed-up compared to a CPU implementation and is publicly available to other researchers. 25-minute Talk Iuri Frosio, Senior Research Scientist, NVIDIA
S7502 - Generative Adversarial Networks Generative adversarial networks are machine learning models that can generate new data drawn from the same distribution as the training data. They are widely used for image generation tasks and are beginning to be used for video generation and reinforcement learning. We'll describe the basics of how GANs work and summarize their latest applications. 50-minute Talk Ian Goodfellow, Research Scientist, Google
L7138 - Getting Started with CUDA C/C++ In this hands-on lab, you will learn how to work with the CUDA platform to accelerate C and C++ code on a massively parallel NVIDIA GPU. We'll start with the basics of writing in a CUDA-enabled language, work through accelerating sections of code on the GPU, learn how to error check, and more! As we'll be using GPUs hosted in the cloud, all you are required to bring is a laptop with a modern browser. Prerequisites: None This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Jonathan Bentz, Certified Instructor, NVIDIA
S7349 - Getting Started with GPUs for Linux Virtual Desktops on VMware Horizon You've just been tasked with building a Linux VDI environment for an engineering team with graphics requirements. Now what? Join an NVIDIA GRID Community Advisor to learn the basics of setting up Linux VDI desktops with GPU capabilities and see the results we captured when we built it in the lab. This is a session for those wanting to get started with Linux virtual desktops that need GPU capabilities. 50-minute Talk Trey Johnson, Sr. Solutions Architect, Dell EMC
Tony Foster, Principal Technical Marketing Engineer for EUC Solutions, Dell EMC
S7525 - GI Next: Global Illumination for Production Rendering on GPUs Learn how to accelerate the computation of global illumination (a very expensive part of the rendering process) with the aid of GPUs. Porting a production renderer to take advantage of GPUs is a considerable effort and often requires rewriting the whole engine; moreover, custom shaders may not be accessible in source code and often introduce performance penalties if not especially adapted to the accelerator. However, function calls to the renderer's API from within shaders may be intercepted and thus costly functions in the render core may be accelerated outside of the shader code. One such render core API function is the calculation of the global illumination contribution, and it is this part that we accelerate on the GPU. 25-minute Talk Rajko Yasui-Schoeffel, Senior Graphics Software Engineer, NVIDIA
Enzo Catalano, Senior Graphics Software Engineer, NVIDIA
S7625 - Going Deeper in Finance How wide is deep learning applicable in finance? We'll provide an overview of promising deep learning applications in finance. We'll then focus on deep (variational) autoencoders, showing how they can learn hidden representations of unlabeled data and generate new data. This opens interesting new applications in anomaly detection, risk analysis, price prediction, and algorithmic trading. We'll explore some of these use cases with real FX data and illustrate the concepts with interactive notebooks, showing how to build the models using frameworks such as Tensorflow and Keras, and how to use latest Tesla P100 GPUs for training. 25-minute Talk Daniel Egloff, Partner, QuantAlea and InCube
S7282 - GPU-Accelerated Convolutional Neural Networks for Protein-Ligand Scoring We'll describe a convolutional neural network that takes as input a comprehensive 3D representation of a protein-ligand interaction and predicts whether the ligand (a small molecule, like a drug) binds to the protein. We'll provide a brief orientation in structure-based drug design, describe how we effectively use the GPU to efficiently train, evaluate, and visualize our neural networks, and discuss preliminary results and current limitations. Our CNN scoring function outperforms the conventional AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening. 25-minute Talk David Koes, Assistant Professor, University of Pittsburgh
S7397 - GPU-Accelerated Deep Learning Framework for Cyber-Enabled Manufacturing We'll present a GPU-accelerated deep-learning framework for cyber-manufacturing, which enables real-time feedback to designers regarding the manufacturability of a computer-aided design model. We'll talk about a 3D-convolutional neural network-based approach for learning the manufacturability of a mechanical component. The 3D-CNN can recognize the features in a CAD model and classify it to be manufacturable or non-manufacturable with a greater accuracy than traditional rule-based methods. We'll discuss a novel GPU-accelerated voxelization algorithm used to discretize the CAD model and prepare it for deep learning. We'll briefly outline the challenges in training a 3D-CNN using complex CAD models on a GPU (NVIDIA TITAN X) with limited memory. Finally, we'll touch upon different methods to extend the framework to other manufacturing processes, such as additive manufacturing and milling. 25-minute Talk Adarsh Krishnamurthy, Assistant Professor, Iowa State University
Aditya Balu, Ph.D. Student, Iowa State University
S7290 - GPU-Accelerated Natural Language Processing We'll give an introduction into natural language processing on GPUs. So far, GPUs are not used in big data as much as they should. We'll show how GPUs can bring deep learning techniques into production for large big data systems. We'll discuss some of the possible use cases of NLP, and w'll see why the techniques used up until now havent been enough. We'll talk about vector embeddings, and see in a live demo why they do convey the semantic information we're looking for when processing language. 50-minute Talk Guillermo Molini, CTO, Wavecrafters
S7367 - GPU-Accelerated Similarity Searching in a Database of Short DNA Sequences The challenge: do interactive similarity searching in a SQL database that contains billions of short DNA sequences. The response: this database query is amenable to GPU acceleration because efficient numerical computation can be carried out in parallel on large numbers of independent data items. Implementation details and performance will be discussed, with emphasis on the integration of GPU computation with the database server environment. 25-minute Talk Richard Wilton, Associate Research Scientist, Johns Hopkins University
Remove From Schedule Add To Schedule Are you sure you would like to Delete this personal time? Edit My Schedule Edit Personal Time This session is full. Would you like to be added to the waiting list? Would you like to remove "{0}" from your schedule? Would you like to add "{0}" from your schedule? Sorry, this session is full. Waitlist Available Sorry, this session and it's waiting list are completely full. Sessions Available Adding this multi-day session automatically enrolls you for all times shown below. Removing this multi-day session automatically removes you for all times shown below. Adding this multi-day session automatically enrolls you for all session times for this session. Removing this multi-day session automatically removes you for all session times for this session. Click to view details Show Interests Hide Interests Search Sessions Export Schedule There is a scheduling conflict. You cannot add this session to your schedule because you are participating in another session at this time. Schedule Conflict. An error occurred while processing this request.. Adding this item creates a conflict with another session on your schedule. Remove from Waiting List Add to waiting list Removing this will remove you from the waiting list for all session times for this session Adding this will add you to the waiting list for all session times for this session.
Get More Results