View More
View Less
System Message
An unknown error has occurred and your request could not be completed. Please contact support.
Wait Listed
Personal Calendar
Conference Event
Schedule TBD
Conflict Found
This session is already scheduled at another time. Would you like to...
Please enter a maximum of {0} characters.
Please enter a maximum of {0} words.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Replies ()
New Post
Microblog Thread
Post Reply
Your session timed out.
NVIDIA GTC San Jose 2017
Add to My Interests
Remove from My Interests

Recordings now available to registered pass holders.

S7859 - 3D Cloud Streaming for Mobile and Web Applications

Learn how Microsoft is extending WebRTC to enable real-time, interactive 3D Streaming from the cloud to any remote device. The purpose is to provide an open toolkit to enable industries to leverage remote cloud rendering in their service and product pipelines. This is required for many industries where the scale and complexity of 3D models, scenes, physics and rendering is beyond the capabilities of a mobile device platform.  We are extending the industry standard WebRTC framework to 3D scenarios including mixed reality and will walk through the work we are doing to realize the goal of delivering high-quality 3D applications to any client - web, mobile, desktop and embedded. This is only possible using the NVIDIA nvencode pipeline for server-side rendering on the cloud.

25-minute Talk Tyler Gibson - Senior Software Engineer, Microsoft
Add to My Interests
S7149 - 3D DeepObject for Precision 3D Mapping

3D DeepObject achieves mapping-level positional accuracy. In the geospatial intelligence space, positional accuracy is as important as precision and recall. Unfortunately, convolutional networks in deep learning are invariant to translation. In other words, the positional accuracy from deep learning object detection is inherently poor. Combining deep learning and 3D model fitting, our 3D DeepObject has the best of both worlds. Deep learning can detect object (a bounding box) with close to human-level accuracy, while 3D model fitting can achieve pixel-level positional accuracy. The output (bounding boxes) from deep learning are the input for 3D model fitting. A bounding box from deep learning can significantly reduce the search space for 3D model fitting. Our latest test indicates that 3D DeepObject can achieve much higher positional accuracy than deep learning or 3D model fitting alone can achieve.

25-minute Talk Bingcai Zhang - Tech Fellow, BAE Systems
Add to My Interests
S7289 - 3D Human Motion Capture from 2D Video Using Cloud-Based CNNs This talk provides a brief overview of how to apply GPU-based deep learning techniques to extract 3D human motion capture from standard 2D RGB video. We describe in detail the stages of our CUDA-based pipeline from training to cloud-based deployment. Our training system is a novel mix of real world data collected with Kinect cameras and synthetic data based on rendering thousands of virtual humans generated in the Unity game engine. Our execution pipeline is a series of connected models including 2D video to 2D pose estimation and 2D pose to 3D pose estimation. We describe how this system can be integrated into a variety of mobile applications ranging from social media to sports training. A live demo using a mobile phone connected into an AWS GPU cluster will be presented. 25-minute Talk Paul Kruszewski - Founder & CEO, wrnch
Add to My Interests
S7425 - 3D Printing with NVIDIA GVDB Voxels

Improvements in 3D printing allow for unique processes, finer details, better quality control, and a wider range of materials as printing hardware improves. With these improvements comes the need for greater computational power and control over 3D-printed objects. We introduce NVIDIA GVDB Voxels as an open source SDK for voxel-based 3D printing workflows. Traditional workflows are based on processing polygonal models and STL files for 3D printing. However, such models don't allow for continuous interior changes in color or density, for descriptions of heterogeneous materials, or for user-specified support lattices. Using the new NVIDIA GVDB Voxels SDK, we demonstrate practical examples of design workflows for complex 3D printed parts with high-quality ray-traced visualizations, direct data manipulation, and 3D printed output.

25-minute Talk Rama Hoetzlein - Graphics Research Engineer, NVIDIA
Jun Zeng - Principal Scientist, HP Labs
Add to My Interests
S7197 - 4K Video Processing and Streaming Platform on TX1

Learn how to build a platform for processing and streaming 4K video on the NVIDIA Jetson TX1 processor. To achieve real-time video processing, the diverse processing resources of this high-performance embedded architecture need to be employed optimally. The heterogeneous system architecture of the Jetson TX1 allows capturing, processing, and streaming of video with a single chip. The main challenges lie in the optimal utilization of the different hardware resources of the Jetson TX1 (CPU, GPU, dedicated hardware blocks) and in the software frameworks. We'll discuss variants, identify bottlenecks, and show the interaction between hardware and software. Simple capturing and displaying 4K video can be achieved using existing out-of-the-box methods. However, GPU-based enhancements were developed and integrated for real-time video processing tasks (scaling and video mixing).

25-minute Talk Tobias Kammacher - Researcher, Zurich University of Applied Sciences
Add to My Interests
S7310 - 8-Bit Inference with TensorRT

We'll describe a method for converting FP32 models to 8-bit integer (INT8) models for improved efficiency. Traditionally, convolutional neural networks are trained using 32-bit floating-point arithmetic (FP32) and, by default, inference on these models employs FP32 as well. Our conversion method doesn't require re-training or fine-tuning of the original FP32 network. A number of standard networks (AlexNet, VGG, GoogLeNet, ResNet) have been converted from FP32 to INT8 and have achieved comparable Top 1 and Top 5 inference accuracy. The methods are implemented in TensorRT and can be executed on GPUs that support new INT8 inference instructions.

25-minute Talk Szymon Migacz - CUDA Library Software Engineer, NVIDIA
Add to My Interests
L7132 - Accelerated Analytics and Graph Visualization

In this lab, you will learn how to use a GPU-accelerated graph visualization engine in combination with a GPU-accelerated database. By combining these technologies we can visually explore a large network dataset, identify port scan, distributed denial of service, and data exfiltration events. At the end of this lab, you will learn how to load data for accelerated querying and analysis; build graph visualizations using the GPU-accelerated database as a data source and explore large-scale data visualization. Prerequisites: No prerequisite skills are necessary, but basic knowledge of SQL and Python would be helpful This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Keith Kraus - Senior Engineer of Applied Solutions Engineering , NVIDIA
Michael Balint - Senior Manager Applied Solutions Engineering, NVIDIA
Deepti Jain - Senior Applied Solutions Engineer, NVIDIA
Add to My Interests
S7774 - Accelerated Analytics Industry Use Cases

Companies of all sizes and in all industries are driven towards digital transformation. Failure to adapt to this movement places businesses at an increased risk in current and future competitive markets. With the slow compute limitation, enterprises struggle to gain valuable insights fast, monetize the data, enhance customer experience, optimize operational efficiency, and prevent fraudulent attacks all at the same time. NVIDIA helps provide deeper insights, enable dynamic correlation, and deliver predictive outcomes at superhuman speed, accuracy, and scale. We'll highlight specific accelerated analytics use cases -- powered by the NVIDIA Tesla platform, DGX-1 AI supercomputer, and NVIDIA GPU-accelerated cloud computing -- in finance, oil and gas, manufacture, retail, and telco industries.

25-minute Talk Renee Yao - Product Marketing Manager, Deep Learning and Analytics, NVIDIA
Add to My Interests
S7332 - Accelerated Astrophysics: Using NVIDIA DGX-1 to Simulate and Understand the Universe

Get an overview of how GPUs are used by computational astrophysicists to perform numerical simulations and process massive survey data. Astrophysics represents one of the most computationally heavy sciences, where supercomputers are used to analyze enormous amounts of data or to simulate physical processes that cannot be reproduced in the lab. Astrophysicists strive to stay on the cutting edge of computational methods to simulate the universe or process data faster and with more fidelity. We'll discuss two important applications of GPU supercomputing in astrophysics. We'll describe the astrophysical fluid dynamics code CHOLLA that runs on the GPU-enabled supercomputer Titan at Oak Ridge National Lab and can perform some of the largest astrophysical simulations ever attempted. Then we'll describe the MORPHEUS deep learning framework that classifies galaxy morphologies using the NVIDIA DGX-1 deep learning system.

25-minute Talk Brant Robertson - Associate Professor of Astronomy and Astrophysics, University of California, Santa Cruz
Add to My Interests
S7753 - Accelerated Deep Learning Within Reach - Supercomputing Comes to Your Cube

Deep learning practitioners have traditionally been forced to spend protracted cycle time cobbling together platforms using consumer-grade components and unsupported open source software. Learn (1) the benefits of rapid experimentation and deep learning framework optimization as a precursor to scalable production training in the data center, (2) the technical challenges that must be overcome for extending deep learning to more practitioners across the enterprise, and (3) how many organizations can benefit from a powerful enterprise-grade solution that's pre-built, simple to manage, and readily accessible to every practitioner.

25-minute Talk Markus Weber - Senior Product Manager, NVIDIA
Add to My Interests
S7117 - Accelerating Cross-Validation in Spark Using GPU Learn how to utilize GPUs better to accelerate cross-validation in Spark, which is widely used in many bigdata analytics/machine learning applications. 25-minute Talk Minsik Cho - Research Staff Member, IBM Research
Add to My Interests

This session describes the design and implementation of ISAAC, an open-source framework for GEMM and CONV that provides improved performance over cuBLAS and cuDNN. Attendees will learn about input-aware auto-tuning, a technique that relies on machine learning models to automatically derive input- and hardware- portable PTX kernels. Benchmarks will be provided for GEMM and CONV in the context of LINPACK, DeepBench, ICA and SVD, showing up to 3x performance gains over vendor libraries on a GTX980 and a Tesla P100.

25-minute Talk Philippe Tillet - Ph.D. Candidate, Harvard University
Add to My Interests
S7383 - Accelerating Cyber Threat Detection with GPU

Analyzing vast amounts of enterprise cyber security data to find threats is hard. Cyber threat detection is also a continuous task, and because of financial pressure, companies have to find optimized solutions for this volume of data. We'll discuss the evolution of big data architectures used for cyber defense and how GPUs are allowing enterprises to do better threat detection more efficiently. We'll discuss (1) briefly the evolution of traditional platforms to lambda architectures with new approaches like Apache Kudu to ultimately GPU-accelerated solutions; (2) current GPU-accelerated database, analysis, and visualization technologies (such as MapD and Graphistry), and discuss the problems they solve; (3) the need to move beyond traditional table-based data-stores to graphs for more advanced data explorations, analytics, and visualization; and (4) the latest advances in GPU-accelerated graph analytics and their importance all for improved cyber threat detection.

50-minute Talk Joshua Patterson - Applied Solutions Engineering Director , NVIDIA
Michael Wendt - Manager of Applied Solutions Engineering , NVIDIA
Add to My Interests
S7321 - Accelerating Document Retrieval and Ranking for Cognitive Applications Based on a comprehensive performance study of Watson workloads, we'll deep dive into optimizing critical retrieve and rank functions using GPU acceleration. The performance of cognitive applications like answering natural language questions heavily depends on quickly selecting the relevant documents needed to generate a correct answer. While analyzing the question to determine appropriate search terms, weights, and relationships is relatively quick, retrieving and ranking a relevant subset from millions of documents is a time-consuming task. Only after completing it can any advanced natural language processing algorithms be effective. 25-minute Talk David Wendt - Programmer, IBM
Tim Kaldewey - Performance Architect, IBM Watson
Add to My Interests
S7656 - Accelerating HD Map Creations with GPUs

We'll explain how GPUs can accelerate the development of HD maps for autonomous vehicles. Traditional mapping techniques take weeks to result in highly detailed maps because massive volumes of data, collected by survey vehicles with numerous sensors, are processed, compiled, and registered offline manually. We'll describe how Japan's leading mapping company uses the concept of a cloud-to-car AI-powered HD mapping system to automate and accelerate the HD mapping process, including actual examples of GPU data processing that use real-world data collected from roads in Japan.


25-minute Talk Shigeyuki Iwata - Manager, Research & Development Office, ZENRIN Corporation
Add to My Interests
S7831 - Accelerating High-Frequency Nonlinear Earthquake Simulations on OLCF Titan and NCSA Blue Waters

The highly nonlinear, multiscale dynamics of large earthquakes is a difficult physics problem that challenges HPC systems at extreme scale. This presentation will introduce our optimized CUDA implementation of the Drucker-Prager plasticity in AWP-ODC that utilize the GPU's memory bandwidth highly efficiently, which helps to scale to the full size of the Titan system. We demonstrate the dramatic reduction in the level of shaking in the Los Angeles basin by performing a nonlinear M 7.7 earthquake simulation on the southern San Andreas fault for frequencies up to 4 Hz using Blue Waters and Titan. Full realization of the projected gains in using nonlinear ground-motion simulations for controlling sources will improve the hazard estimates, which has a broad impact on risk-reduction and enhanced community resilience, especially for critical facilities such as large dams, nuclear power plants, and energy transportation networks.

25-minute Talk Daniel Roten - Computational Scientist, SDSC
Yifeng Cui - Lab Director, San Diego Supercomputing Center
Add to My Interests
S7593 - Accelerating the 3D Elastic Reverse-Time-Migration Algorithms Through NVIDIA GPUs

We'll cover the optimizing details and the inspiring performance result using NVIDIA Kepler GPUs to accelerate the 10th-order three-dimensional elastic Reverse-Time-Migration (RTM) algorithm. As an essential migration method in seismic application to image the underground geology, RTM algorithm is particularly complex due to its computational workflow and is generally the most time-consuming kernel. Especially, RTM algorithms based on elastic wave equations (elastic RTM) are generally more computationally intense compared to RTM methods for acoustic constant-density media (acoustic RTM). In recent years, the desire for covering larger regions and acquiring better resolution has further increased the algorithmic complexity of RTM. Therefore, computing platforms and optimizing methods that can better meet such challenges in seismic applications become great demands. In this work, we first modify the backward process in the RTM matrix format by adding extra layers, to generate a straightforward stencil that fits well with GPU architecture. A set of optimizing techniques, such as memory tuning and computing occupancy configuration, is then performed to exploit the performance over a set of different GPU cards. By further using the the streaming mechanism, we manage to obtain a communication-computation overlapping among multiple GPUs. The best performance employing four Tesla K40 GPU cards is 28 times better over a fully optimized reference based on a socket with two E5-2697 CPUs. This work proves the great potential to employ NVIDIA GPU accelerators in future geophysics exploration algorithms.

25-minute Talk Lin Gan - Dr., Tsinghua University
Add to My Interests
S7578 - Accelerating your VR Applications with VRWorks

Across graphics, audio, video, and physics, the NVIDIA VRWorks suite of technologies helps developers maximize performance and immersion for VR applications. We'll explore the latest features of VRWorks, explain the VR-specific challenges they address, and provide application-level tips and tricks to take full advantage of these features. Special focus will be given to the details and inner workings of our latest VRWorks feature, Lens Matched Shading, along with the latest VRWorks integrations into Unreal Engine and Unity.

50-minute Talk Edward Liu - Sr. Developer Technology Engineer, NVIDIA
Cem Cebenoyan - Director of Engineering, NVIDIA
Add to My Interests
S7810 - Acceleration of Multi-Object Detection and Classification Training Process with NVIDIA Iray SDK (Presented by SAP) Many works using deep CNN for multi-object detection and classification observe that a high-quality dataset for the training is even more important than the choice of a network type for the best results. We employ the NVIDIA Iray rendering engine and SDK for the automatic generation of the synthetic images and their annotation that can be either combined with real manually annotated images and used as the input for the training process or used on their own. In most cases, adding a new entity to the classification/detection list requires reviewing the existing dataset and relabeling it. Our contribution allows the acceleration of the process dramatically and allows for the specialization of the training set. 50-minute Talk Tatiana Surazhsky - 3D Graphics Research Expert, SAP Labs Israel LTD
Add to My Interests
S7564 - Accelerator Programming Ecosystems Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA and Kokkos for GPU programming. This panel will throw light on what would be the primary objective(s) for a choice of model, whether its availability across multiple platforms, its rich feature set or its applicability for a certain type of scientific code or compilers' stability or other factors. This will be an interactive Q/A session where participants can discuss their experiences with programming model experts and developers. 50 minutes Panel Michael Wolfe - Engineer, NVIDIA
Christian Trott - Senior Member Technical Staff, Sandia National Laboratories
Stephen Olivier - Principal Member of Technical Staff, Sandia National Laboratories
Mark Harris - Chief Technologist, GPU Computing Software, NVIDIA
Randy Allen - Director of Advanced Research, Mentor Graphics
Fernanda Foertter - HPC User Support Specialist/Programmer, Oak Ridge National Laboratory
Add to My Interests
S7193 - Achieving Portable Performance for GTC-P with OpenACC on GPU, Multi-Core CPU, and Sunway Many-Core Processor Gyrokinetic Toroidal Code developed in Princeton (GTC-P) delivers highly-scalable plasma turbulence simulations at extreme scales on world-leading supercomputers such as Tianhe-2 and Titan. The aim of this work to achieve portable performance in a single source code for GTC-P. We developed the first OpenACC implementation for GPU, CPU, and Sunway processor. The results showed the OpenACC version achieved nearly 90% performance of NVIDIA?CUDA?version on GPU and OpenMP version on CPU; the Sunway OpenACC version achieved 2.5X speedup in the entire code. Our work demonstrates OpenACC can deliver portable performance to complex real-science codes like GTC-P. In additional, we request adding thread-id support in OpenACC standard to avoid expensive atomic operations for reductions. 25-minute Talk Stephen Wang - GPU Specalist, Shanghai Jiao Tong University
Add to My Interests
S7435 - Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks

There has been a surge of success in using deep learning in imaging and speech applications for its relatively automatic feature generation and, in particular, for convolutional neural networks, high-accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, multi-node evolutionary neural networks for deep learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms. MENNDL is capable of evolving not only the numeric hyper-parameters (for example, number of hidden nodes or convolutional kernel size), but is also capable of evolving the arrangement of layers within the network.

25-minute Talk Steven Young - Research Scientist in Deep Learning, Oak Ridge National Laboratory
Add to My Interests
S7312 - ADAS Computer Vision and Augmented Reality Solution

We'll address how next-generation informational ADAS experiences are created by combining machine learning, computer vision, and real-time signal processing with GPU computing. Computer vision and augmented reality (CVNAR) is a real-time software solution, which encompasses a set of advanced algorithms that create mixed augmented reality for the driver by utilizing vehicle sensors, map data, telematics, and navigation guidance. The broad range of features includes augmented navigation, visualization, driver infographics, driver health monitoring, lane keeping, advanced parking assistance, adaptive cruise control, and autonomous driving. Our approach augments drivers' visual reality with supplementary objects in real time, and works with various output devices such as head unit displays, digital clusters, and head-up displays.


25-minute Talk Sergii Bykov - Technical Lead, Luxoft
Add to My Interests
S7641 - Additive Manufacturing Simulation on the GPU Learn how GPUs can accelerate large-scale finite element-based additive manufacturing (AM) simulation. We'll discuss the computational challenges underlying AM simulation, followed by their solution through fast GPU solvers. We'll also present case studies of metal AM and fused-deposition-modeling simulation, with experimental results. 25-minute Talk Krishnan Suresh - Professor, University of Wisconsin, Madison
Add to My Interests
S7347 - A Deep Hierarchical Model for Joint Object Detection and Semantic Segmentation How do we tackle multiple vision tasks from within the same deep neural network? We'll address this problem by proposing a neural network architecture that can simultaneously segment and detect objects within an image. We'll begin with a brief overview of deep learning as applied to computer vision, and various popular methods for object detection and semantic segmentation. We'll then propose our model: a hierarchical architecture that explicitly allows fine-grain information from one task to aid in the performance of coarser tasks. We'll show that our multi-task network outperforms and is faster than networks trained to tackle each task independently. We'll then visualize our network results on the Cityscapes data set and discuss potential applications of our ideas, especially in the context of autonomous driving. 25-minute Talk Zhao Chen - Machine Learning Software Intern, NVIDIA
Add to My Interests
S7834 - Advanced GPU Server Architectures and Deep Learning Training for HPC Customers (Presented by Super Micro Computer Inc.)

Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance levels that augment Artificial Intelligence. New server architecture innovations will drive higher levels of performance in ML applications. As GPUs become more powerful, GPU networks will need to be more efficient as well. Supermicro has advanced the state-of-the-art in GPU-optimized server architectures, perfect for the emerging deep learning applications.Hear the latest in GPU server architectures and deep learning customer case-studies of how customers achieved incredible deep learning results from Supermicro solutions.

50-minute Talk Jason Pai - Director, GPU Servers, Super Micro Computer Inc.
Don Clegg - VP Marketing & WW Business Development, Super Micro Computer, Inc.
Add to My Interests
S7482 - Advances in Real-Time Graphics at Pixar

Explore how real-time graphics are used at Pixar Animation Studios. We'll describe the unique needs for film production and our custom solutions, including Presto and our open-source projects Universal Scene Description (USD), OpenSubdiv, and Hydra. Don't miss this great opportunity to learn about graphics, algorithms, and movies!

50-minute Talk Pol Jeremias-Vila - Sr. Graphics Engineer, Pixar
David Yu - Senior Graphics Software Engineer, Pixar Animation Studios
Dirk Van Gelder - Software Engineer, Pixar Animation Studios
Add to My Interests
S7862 - Advancing Accelerated Deep Learning with IBM PowerAI

IBM PowerAI provides the easiest on-ramp for enterprise deep learning. PowerAI helped users break deep learning training benchmarks AlexNet and VGGNet thanks to the world's only CPU-to-GPU NVIDIA NVLink interface. See how new feature development and performance optimizations will advance the future of deep learning in the next twelve months, including NVIDIA NVLink 2.0, leaps in distributed training, and tools that make it easier to create the next deep learning breakthrough. Learn how you can harness a faster, better and more performant experience for the future of deep learning.  


50-minute Talk Sumit Gupta - VP, HPC, AI, and Analytics
Add to My Interests
S7647 - Advancing Our Understanding of Evolutionary Histories Using GPUs: The BEAGLE Library Estimating the evolutionary history of organisms, phylogenetic inference, is a critical step in many analyses involving biological sequence data such as DNA. These phylogenetic relationships are essential in understanding the evolutionary dynamics of organisms. The likelihood calculations at the heart of the most effective methods for phylogenetic analyses are extremely computationally intensive, and hence these analyses become a bottleneck in many studies. In collaboration with some of the foremost researchers in our area, we have developed an open source library, BEAGLE, which uses GPUs to greatly accelerate phylogenetic analyses. BEAGLE is used by some of the leading programs in the field. We'll describe the phylogenetic inference problem and its importance, and go into details on how we used GPU computing to achieve broad impact in the field. 25-minute Talk Daniel L. Ayres - Graduate Student, University of Maryland
Michael P Cummings - Professor, University of Maryland
Add to My Interests
S7783 - A Fast, Unified Method for Object Detection, Instance Segmentation, and Human Pose Estimation

We'll cover state-of-the-art algorithms for image classification, object detection, object instance segmentation, and human pose prediction that we recently developed at Facebook AI Research. Our image classification results are based on the recently developed "ResNeXt" model that supersedes ResNet's accuracy on ImageNet, but much more importantly yields better features with stronger generalization performance on object detection tasks. Using ResNeXt as a backbone, we'll present a unified approach for detailed object instance recognition tasks, such as instance segmentation and human pose estimation. This model builds on our prior work on the Faster R-CNN system with Feature Pyramid Networks, which enables efficient multiscale recognition. We'll describe our platform for object detection research that enables a fast and flexible research cycle. Our platform is implemented on Caffe2 and can train many of these state-of-the-art models on the COCO dataset in 1-2 days using sync SGD over eight GPUs on a single Big Sur server.

25-minute Talk Ross Girshick - Research Scientist, Facebook
Add to My Interests
S7857 - AgeAtHome - Deep Learning at the Edge (Presented by IBM)

The need for helping elderly individuals or couples remain in their home is increasing as our global population ages. Cognitive processing offers opportunities to assist the elderly by processing information to identify opportunities for caregivers to offer assistance and support.  This project seeks to demonstrate means to improve the elderlys' ability to age at home through understanding of daily activities inferred from passive sensor analysis. This project is an exploration of the IBM Watson Cloud and Edge docker-based Blue Horizon platforms for the use of high-fidelity, low-latency, private sensing and responding at the edge using a RaspberryPi, including deep learning using NVIDIA DIGITS software, K80 GPU servers in the IBM Cloud, and Jetson TX2 edge computing.

50-minute Talk David C Martin - Hacker-in-residence, IBM Watson Cloud CTO Office
Dima Rekesh - Senior Distinguished Engineer, Optum Technology
Add to My Interests
S7262 - A General Framework for Hybrid Stochastic Model Calibration on the GPU We'll present an overview of a GPU-based approach to calibrating hybrid models in finance, that is, multi-factor correlated stochastic processes to market data (term structure and volatility surfaces). Examples of such models range from the relatively benign 3-factor JY inflation model, to single currency and forex equity baskets, up to a completely general basket of rate/inflation/equity/forex/credit processes described by a global correlation matrix. Due to the inherently multi-threaded nature of Monte Carlo path generation, and the availability of cuRAND, a GPU implementation vastly outperforms CPU or PDE solvers, which are plagued by high dimensionality. Details of the algorithm, as well as a demonstration and analysis of timings and memory limitations will be covered. 25-minute Talk Mark York - Senior Quantitative Analyst, Renaissance Risk Management Labs
Add to My Interests
S7286 - A High-Quality and Fast Maximal Independent Set Algorithm for GPUs Learn how to efficiently parallelize Maximal Independent Set computations for GPUs. Our CUDA implementation is at least three times faster than the leading GPU codes on every one of the 16 real-world and synthetic graphs we tested. Moreover, it produces a larger maximal independent set in all but one case. It is asynchronous, atomic free, and requires fewer than 30 kernel statements. We'll present the included code optimizations to achieve heretofore unreached performance and describe how to exploit monotonicity to minimize the memory footprint of this important irregular graph algorithm. 25-minute Talk Martin Burtscher - Professor, Texas State University
Add to My Interests
S7592 - AI and Deep Learning in Trading

We'll talk about how artificial intelligence has led to market-leading innovation in trading and the huge opportunity of using deep learning in trading today. There are three dominant trades: fast information extraction ("speed trade"), trade construction ("stat arb"), and prediction ("market timing"). AI has been very successful in all three aspects. We have been key innovators in the speed trade, having started with a $10,000 risk limit and, over the last 10 years, making more than $1.4 billion in profits. The reason is a purist adherence to AI. There is a huge opportunity for using deep learning in the prediction part of the trade, which is not latency sensitive and is mostly about high accuracy. Our mission is to make investing a science, a research-driven utility, and not a competition or a game that it is today. Deep learning has had a lot of success in bringing method to social science settings. We believe over the next five to 10 years that every trading operation will become deep learning based. However, at this time there is a lot of opportunity for innovation using deep learning in trading.

25-minute Talk Gaurav Chakravorty - Head of Trading Strategy Development, qplum
Add to My Interests
S7739 - AI and the Battle for Cyber Security

The security domain presents a unique landscape for the application of artificial intelligence. Defenders in the security space are often charged with securing ever changing and complex networks, while attacks continue to probe for and exploit any system weakness. We'll dive into the state of cyber security, why it is well suited for artificial intelligence-based approaches, and how AI is actively defending against attacks today.

50-minute Talk Matt Wolff - Chief Data Scientist, Cylance
Andrew Davis - Staff Data Scientist, Cylance
Add to My Interests
S7770 - AI in Healthcare: Beyond Deep Learning in Medical Imaging We'll give an overview of how deep-learning in healthcare can be utilized beyond medical imaging, if applied to clinical decision support and medical asset management. Deep learning is capable of addressing many, if not all, main challenges for care givers: information overflow, work overload, impacted accuracy due to data constrains, optimism bias, and optimal utilization of medical equipment. This needs to involve multiple data sources, and deals with data harmonization, semantic interoperability, and different health data types. Deep learning in healthcare has three main aspects: medical imaging, multi-data (structured, unstructured, streaming, etc.) based decision support, and asset utilization data. 25-minute Talk Dr. Michael Dahlweid - Chief Medical Officer, Digital, GE Healthcare
Add to My Interests
S7805 - Airbus Vahana - Development of a Self-Piloted Air Taxi

Vahana started in early 2016 as one of the first projects at A? the advanced projects outpost of Airbus Group in Silicon Valley. The aircraft we're building doesn't need a runway, is self-piloted, and can automatically detect and avoid obstacles and other aircraft. Designed to carry a single passenger or cargo, Vahana is meant to be the first certified passenger aircraft without a pilot. We'll discuss the key challenges to develop the autonomous systems of a self-piloted air taxi that can be operated in urban environments.

25-minute Talk Arne Stoschek - Head of Autonomous Systems, Airbus A3
Add to My Interests
S7313 - AirVision: AI Based, Real-Time Computer Vision System for Drones

Modern computing hardware and NVIDIA Jetson TX1 performance create new possibilities for drones and enable autonomous AI systems, where image processing can be done on-board during flight. We'll present how Magma Solutions developed the AirVision system to cover advanced vision processing tasks for drones, e.g., image stabilization, moving object detection, tracking, and classification using deep neural networks, and visual position estimation using preloaded maps. We'll describe how Magma Solutions used software frameworks Caffe with cuDNN, OpenVX /NVIDIA VisionWorks, and NVIDIA CUDA to achieve real-time vision processing and object recognition. The AirVision system is in part developed with Lithuanian Ministry of Defence funding and is being used as a tactical UAV system prototype.

25-minute Talk Mindaugas Eglinskas - CEO, Magma Solutions, UAB
Add to My Interests
S7674 - All That Glisters Is Not Convnets: Hybrid Architectures for Faster, Better Solvers Convolutional neural networks have proven themselves to be very effective parametric learners of complex functions. However, the non-linearities present in conventional networks are not strong; both halves of a (possibly leaky) RELU are linear and the non-linearity is computed independently for each channel. We'll present techniques that create decision tree and RBF units that are designed to respond non-linearly to complex joint distributions across channels. This makes it possible to pack more non-linearity into a small space and this is a particularly valuable replacement for the latter layers of a network - in particular the solver. The result is hybrid networks that outperform conventional pure neural networks that can be trained orders of magnitude more quickly. 50-minute Talk Tom Drummond - Professor, Monash University
Add to My Interests
S7809 - A Multi-Source, Multi-Sensor Approach to HD Map Creation

It's simple to take the output of one type of sensor in multiple cars and produce a map based on that data. However, a map created in this way will not have sufficient coverage, attribution, or quality for autonomous driving. Our multi-source, multi-sensor approach leads to HD maps that have greater coverage, are more richly attributed, and have higher quality than single-source, single-sensor maps. In this session, we will discuss how we have created the world's largest HD map, are able to continuously update it, and are making autonomous driving safer and more comfortable.


25-minute Talk Willem Strijbosch - Head of Autonomous Driving, TomTom
Add to My Interests
S7404 - An Approach to a High-Performance Decision Tree Optimization Within a Deep Learning Framework for Investment and Risk Management

We'll examine an innovative approach using an optimized algorithm to create a decision tree for the basis of regime dependent and pattern classification of financial and macroeconomic time-series data. Implemented in a supervised and unsupervised learning framework, the algorithm relies on the GPU for high performance computing and the host processor to further integrate the results in a deep learning framework. Also, we implement random number generation, in part, using a hardware quantum based true random number generator, balanced with the pseudo-random number generator in CUDA, so as to optimize overall speed where an exhaustive search is not feasible.

25-minute Talk Yigal Jhirad - Head of Quantitative and Derivatives Strategies , Cohen & Steers
Blay Tarnoff - Senior Application Developer and Database Architect, Cohen & Steers
Add to My Interests
S7174 - An Architectural Design Firm's Journey Through Virtual GPU Technology for Global Collaboration

Learn the benefits that virtualization provides for an architecture and engineering design firm, along with the journey through the advancements in virtualization technology it took to finally meet the graphics-intensive needs of our design software. We'll share our experiences in how virtualization allows a large company, with over 15 offices and 1,000 people worldwide, to collaborate and work as a single firm. We'll show some cost comparisons with virtualization, along with their management benefits and requirements. We'll also look at the methods we used to set and test metrics specific to our requirements, and follow the results of those metrics through the changes in graphics virtualization technology.

50-minute Talk Andrew Schilling - Chief Infrastructure Officer, CannonDesign
Jimmy Rotella - Design Application Specialist, CannonDesign
Add to My Interests
S7252 - An Efficient Connected Components Algorithm for Massively Parallel Devices Learn how to efficiently parallelize connected components, an important irregular graph algorithm. Our CUDA implementation is asynchronous, lock free, converges rapidly, and employs load balancing. It is faster than other GPU codes on all 18 real-world and synthetic graphs we tested. We'll describe how to parallelize this graph algorithm by exploiting algorithmic properties, discuss important optimizations to improve the efficiency, and compare the performance with some of the fastest prior GPU implementations of connected components. 25-minute Talk Jayadharini Jaiganesh - Graduate Student, Texas State University
Add to My Interests
S7261 - A New Approach to Active Learning by Query Synthesis Using Deep Generative Networks We'll introduce a new active learning algorithm that is made practical using GPUs. Active learning concerns carefully choosing training data to minimize human labeling effort. In a nutshell, we apply deep generative models to synthesize informative "queries" that, when answered by a human labeler, allow the learner to learn faster. The learning is "active" in the sense that these questions are synthesized in an online manner adaptive to the current knowledge, thus minimizing the number of queries needed. Unlike traditional supervised machine training, our training is performed mostly on machine-synthesized data. To our knowledge, this is the first work that shows promising results in active learning by query synthesis. 25-minute Talk Jia-Jie Zhu - Postdoctoral Fellow, Boston College
Add to My Interests
S7699 - An Introduction to CUDA Programming Presented by Acceleware (Session 1 of 4)

Join us for an informative introductory tutorial intended for those new to CUDA and which serves as the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We'll explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. We'll deliver a programming demonstration of a simple CUDA kernel. We'll also provide printed copies of the material to all attendees for each session - collect all four!

80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
Add to My Interests
S7700 - An Introduction to the GPU Memory Model - Presented by Acceleware (Session 2 of 4)

This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. Attend session 1, "An Introduction to GPU Programming," to learn the basics of CUDA programming that are required for Session 2. We'll begin with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We'll define shared, constant, and global memory, and discuss the best locations to store your application data for optimized performance. We'll deliver a programming demonstration of shared and constant memory. We'll also provide printed copies of the material to all attendees for each session ? collect all four!

80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
Add to My Interests
S7143 - Anomaly Detection for Network Intrusions Using Deep Learning

We'll describe how deep learning can be applied to detect anomalies, such as network intrusions, in a production environment. In part one of the talk, we'll build an end-to-end data pipeline using Hadoop for storage, Streamsets for data flow, Spark for distributed GPUs, and Deeplearning for anomaly detection. In part two, we'll showcase a demo environment that demonstrates how a deep net uncovers anomalies. This visualization will illustrate how system administrators can view malicious behavior and prioritize efforts to stop attacks. It's assumed that registrants are familiar with popular big data frameworks on the JVM.

25-minute Talk David Kale - Deep Learning Engineer, Skymind
Adam Gibson - CTO, Skymind
Add to My Interests
S7829 - Apache Mahout's New Recommender Algorithm and Using GPUs to Speed Model Creation

Predictive AI is often associated with product recommenders. We present a landscape of multi-domain behavioral models that predict multi-modal user preferences and behavior. This session will take the audience from first principles of the new Correlated Cross-Occurrence (CCO) algorithms showing the important innovations that lead to new ways to predict behavior into a deep dive into as variety different use cases, for instance using dislikes to predict likes, using search terms to predict purchase, and using conversion to augment search indexes with behavioral data to produce behavioral search. Some of these are nearly impossible to address without this new technique. We show the tensor algebra that makes up the landscape. Next, we walk through the computation using real-world data. Finally, we show how Mahout's generalized CPU/GPU integration and recently added CUDA support bring significant reductions in time and cost to calculate the CCO models. We expect the audience to come away with an understanding of the kind of applications to be built CCO and how to do so in performant in cost reducing ways.


50-minute Talk Pat Ferrel - Chief Consultant, PMC member of Apache Mahout, ActionML
Andy Palumbo - Data Scientist, Cylance
Add to My Interests
S7510 - Apache Spark and GPUs for Scaling Deep Learning Libraries Apache Spark has become a popular tool for data warehousing, ETL, and advanced analytics. Meanwhile, deep learning has become one of the most powerful classes of machine learning methods, in large part due to the computational power of modern machines with GPUs and specialized hardware. Spark and GPUs combine well for large deep learning workflows: Spark can handle ETL and data management, and it can distribute data parallel tasks to scale out across many GPUs. 50-minute Talk Tim Hunter - Software Engineer, Databricks, Inc
Joseph Bradley - Software Engineer, Databricks, Inc
Add to My Interests
S7649 - Applications of Deep Learning: Hardware QA Hardware testing is a multifaceted challenge, but one that stands to benefit greatly from the advances in deep learning. The tricky formula of balancing good coverage against risk is consistently challenged with the rapid evolution of the problem space. The landscape in the industry today points to one that has been more or less linearly refined and improved upon, with the constant refrain of more resources being touted as the go-to solution. We'll discuss one of the ways we're working to evolve the approach to test: by harnessing the available tools in the deep learning space, offering a far more efficient path to providing better quality, while providing the flexibility of better coverage/risk decisions. 25-minute Talk Martina Sourada - Senior Director, SWQA, NVIDIA
Add to My Interests
S7513 - Applications of Generative Adversarial Networks to Drug Discovery in Oncology and Infectious Diseases Recent advances in deep learning and specifically in generative adversarial networks have demonstrated surprising results in generating new images and videos upon request, even using natural language as input. We'll present the first application of generative adversarial autoencoders (AAE) for generating novel molecules with a defined set of parameters. In the first proof of concept experiment, we developed a seven-layer AAE architecture with the latent middle layer serving as a discriminator. As an input and output, the AAE uses a vector of binary fingerprints and concentration of the molecule. In the latent layer, we also introduced a neuron responsible for growth inhibition percentage, which, when negative, indicates the reduction in the number of tumor cells after the treatment. To train the AAE, we used the NCI-60 cell line assay data for 6252 compounds profiled on MCF-7 cell line. The output of the AAE was used to screen 72 million compounds in PubChem and select candidate molecules with potential anti-cancer properties. This approach is a proof of concept of an artificially intelligent drug discovery engine, where AAEs are used to generate new molecular fingerprints with the desired molecular properties. We'll also present the applications of this approach to discovering new anti-infective drugs and present the roadmap for generating drugs for rare diseases and even for individual patients. 50-minute Talk Polina Mamoshina - Sr. Research Scientist, Pharmaceutical Artificial Intelligence, Insilico Medicine, Inc
Artur Kadurin - Chief AI Officer, Insilico Medicine, Inc
Aleksandrs Zavoronkovs - CEO, Insilico Medicine, Inc
Add to My Interests
S7696 - Applying Deep Learning to Financial Market Signal Identification with News Data

We'll discuss how natural language processing techniques can be used for predicting financial markets from news data. By adapting techniques from other natural language processing applications to news data and market signals, predictive models can be built. Due to the large volume of news data available, models must be trained, optimized, and tested using GPU acceleration.

25-minute Talk Rafael Nicolas Fermin Cota - Partner, Triumph Asset Management Financial Services
Andrew Tan - Data Scientist, Triumph Asset Management
Add to My Interests
S7351 - Applying GPU Technology to Combat System Integration and Maintenance Lockheed Martin Rotary and Mission Systems has a rich history of integrating combat systems into naval ships and buildings. The integration of complex radar and support systems into modern war-fighting entities demands the use of a unique set of design and simulation tools to verify and optimize engineering designs before production begins. After the combat system is in the field, it is important to equip the warfighter with informative training and maintenance systems. The goal is to keep the combat system fully operational at all times. GPU technologies such as OpenGL, CUDA, OptiX, and Iray, along with virtual reality and augmented reality, make these unique design and maintenance environments possible. These design practices are being examined in the Surface Navy Innovation Center through dedicated research for domestic and international combat system integration and maintenance. 25-minute Talk Rich Rabbitz - Principal Member of Engineering Staff, Lockheed Martin
Christopher Crouch - Associate Member of Engineering Staff, Lockheed Martin
Add to My Interests
S7623 - Approach to Practical Application of Deep Learning in Manufacturer's Production Line

We'll present how deep learning is applied in a manufacturer's production line. Fujikura and OPTOENERGY are introducing a visual inspection system incorporating deep learning in the production process of semiconductor lasers. The same inspection accuracy as skilled workers was achieved by optimizing the image size and the hyper parameters of a CNN model. The optimized image size is less than one quarter of the image size required for the visual inspection by skilled workers, which leads to large cost reduction of the production line. It was also confirmed that the highlighted region in the heatmaps of NG images didn't meet the criteria of the visual inspection. The visual inspection incorporating deep learning is being applied to other products such as optical fibers and electrical cables.

25-minute Talk Masahiro Kashiwagi - Manager, Fujikura Ltd.
Add to My Interests
S7295 - Are We Done with Object Recognition? The R1-Robot Perspective.

Today Deep Learning achieved stunning results in visual recognition as such to raise the question of whether this problem is actually solved. Should this be the case, the advantages for robotics could be dramatic. Indeed, the lack of reliable visual skills is a major bottle neck for robots deployment in everyday life. With this respect in mind, we started an effort to quantify the benefits and limits, if any, of DL in the context of robot vision. By exploiting R1, our latest humanoid equipped with an NVIDIA Jetson TX1 , we investigated key differences between robot vision and other applications where DL typically excels, as image retrieval. Our study identified critical issues to be tackled via computer vision and machine learning, while taking advantage of a robot platform. Our results confirm the huge impact of DL, testified by the great real-time recognition capabilities of R1, while pointing at specific open challenges that need to be addressed for seamless deployment in robotics.

25-minute Talk Giulia Pasquale - Ph.D. Candidate, Istituto Italiano di Tecnologia
Add to My Interests
S7777 - A Road to 3D for Everyone 3D content remains extremely expensive and difficult to create. With virtual reality opening up an opportunity for many industries to create both consumer and professional experiences, we'll present Unbound's approach to make it easy for everyone to create things in 3D. We'll share our R&D journey, experimental engines, and how CUDA ultimately helped us to create the powerful parallel algorithms necessary to enable robust volumetric modeling and rendering in VR. This has immediate utility for content creators, professional and novice alike. 25-minute Talk Florian Hoenig - CEO, Unbound Technologies, Inc.
Add to My Interests
S7622 - A Robust and Scalable CUDA Parallel Programming Model

The next release of CUDA introduces Cooperative Groups, a new programming model that significantly improves cooperative thread programming. Cooperative Groups, along with new warp synchronous primitives, enables threads and blocks within a CUDA grid to synchronize, exchange data, and perform collective operations in a safe, explicit, and reliable manner. Cooperative Groups is an elegant and scalable programming model for expressing synchronization and communication between groups of parallel threads ranging in size from a subset of a warp to an entire CUDA grid launch. Both Cooperative Groups and the lower-level warp-synchronous primitives offer a safe and explicit mechanism for high-performance intra-warp communications. We'll cover the new programming model features in depth, including best practice examples.

50-minute Talk Yuan Lin - Principal Engineer, NVIDIA
Kyrylo Perelygin - Senior Systems Software Engineer, NVIDIA
Add to My Interests
S7723 - ArrayFire Graph: Dynamic Graph Library for GPUs

ArrayFire Graph is an out-of-core dynamic graph library that runs on NVIDIA GPUs. It enables users to create and update graphs at a very high rate. AF Graph has a number of high-performance graph analytic algorithms that can be run on the dynamic data. Dynamic graphs allow users to provide incremental edge updates instead of rebuilding the whole graph. AF Graph's out-of-core support can handle graphs that cannot fit in GPU memory and can handle billions of edges.

25-minute Talk Kumar Aatish - Software Engineer, ArrayFire LLC
Add to My Interests
S7239 - Artificial General Intelligence for the Internet of Things What do we need to achieve artificial general intelligence? How do we distribute intelligence over the internet-of-things? We'll dive deep into the heart of the matter, which is machine reasoning. Following recent advances in mathematical foundations and homotopy-type theory, we conclude that the crux is to formally separate intents from implementations. We can teach neural networks to understand these intents and to use a divide-and-conquer method for compiling these intents into implementations. Our goal is to outline a distributed strategy for accomplishing this moonshot. 25-minute Talk Shaowei Lin - Assistant Professor, Singapore University of Technology and Design
Add to My Interests
S7677 - Artificial Intelligence for Digital Pathology We'll introduce why artificial intelligence is needed for digital pathology and how it can be used to diagnosis breast and prostate cancer. By applying AI to two types of cancer diagnoses, it shows what challenges exist in digital pathology and how we overcome them. First, we'll introduce a system for predicting the tumor proliferation in breast cancer. Predicting the tumor proliferation can be integrated into current prognostic grading systems, being more relevant to actual clinical practice. In addition, we'll present a system for predicting Gleason's score, an important factor in the diagnosis of prostate cancer. A system for accurate and consistent diagnosis based on artificial intelligence will bring much value to digital pathology. 25-minute Talk Kyunghyun Paeng - Research Scientist, Lunit Inc.
Add to My Interests
S7187 - Artificial Reality: Deep Learning With Synthetic Driving Data

Learn how to boost your deep learning training process by utilizing features of a driving simulation. Besides a customizable source of video camera input, enhanced driving simulations can also provide information from non-visual sensors like lidar, radar, or ultrasound simultaneously. Train deep learning algorithms with visual, non-visual, or intermediate data like point clouds, bounding boxes, or object lists. Instead of labeling real videos by hand, use the information of the simulation to feedback and correct the results of your neural network. Run your simulation in faster than real time for distributed headless simulations or trigger every frame of the simulation to capture data for further processing. Embed your algorithms within the simulation (software in the loop) and test your AI in unusual situations, which are too risky in reality. Artificial reality? Not perfect, but a perfect complement in developing AI algorithms for autonomous driving.

25-minute Talk Bernhard Bieder - Software Engineer, VIRES GmbH
Daniel Wiesenhutter - Software Engineer, VIRES GmbH
Add to My Interests
S7626 - A Simple Guideline for Code Optimizations on Modern Architectures with OpenACC and CUDA

Learn a simple strategy guideline to optimize applications runtime. The strategy is based on four steps and illustrated on a two-dimensional Discontinuous Galerkin solver for computational fluid dynamics on structured meshes. Starting from a CPU sequential code, we guide the audience through the different steps that allowed us to increase performances on a GPU around 149 times the original runtime of the code (performances evaluated on a K20Xm). The same optimization strategy is applied to the CPU code and increases performances around 35 times the original run time (performances evaluated on a E5-1650v3 processor). Finally, different hardware architectures (Xeon CPUs, GPUs, KNL) are benchmarked with the native CUDA implementation and one based on OpenACC.

25-minute Talk Ludomir Oteski - Postdoctoral researcher, ONERA
Add to My Interests
S7339 - A Sleepless Eye on Patient Monitors: Real-Time AI in Healthcare Critical medical decisions are made each second, and are often informed by the real-time interpretation of complex or subtle patterns in continuous patient monitoring data. Manual review is intermittent and imperfect, but traditional automation attempts have been unreliable and often suffer from high false positive rates, limiting their practical utility in clinical settings. Recent advances in deep learning algorithms and GPU acceleration enable the creation of streaming systems that reliably, continuously, and tirelessly pick out patterns and trends to support timely and appropriate clinical decisions for the benefit of the patient. We'll describe the purpose, design, and impact of one such system, as created by Delta Brain Inc. 25-minute Talk Kevin Lung - Co-Founder & Director of Engineering, Delta Brain Inc.
Adam Lichtl - Founder & CEO, Delta Brain Inc.
Add to My Interests
S7441 - Assembly Chain Training with Professional VR by Optis

Optis has been involved in advanced optical simulation for the past 25 years and has recently invested in VR for virtual prototyping. Its latest HIM built for human ergonomics evaluation in combination with advanced, real-time, physics-based rendering enables precise environment reproduction for appropriate prototyping or training. We'll present the latest integration for assembly line training with HTC Vive and feedback powered by NVIDIA PhysX. Companies such as Tesla Motors and Bentley are the proud early adopters of this solution. We'll demonstrate our software and show customer use cases and their data to explain how to improve the VR experience with haptics and audio simulation in the future.

25-minute Talk Nicolas Dalmasso - Innovation Director, Optis
Add to My Interests
S7705 - Asynchronous Operations and Dynamic Parallelism in CUDA - Presented by Acceleware (Session 3 of 4)

This tutorial builds on the two previous sessions ("An Introduction to GPU Programming" and "An Introduction to GPU Memory Model") and is intended for those with a basic understanding of CUDA programming. This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We'll demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. In the second part of the session, we'll focus on dynamic parallelism. We'll deliver a programming demo involving asynchronous operations. We'll also provide printed copies of the material to all attendees for each session - collect all four!

80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
Add to My Interests
S7426 - Automated Truck Driving and Platooning with DRIVE PX 2

We'll present achievements in the field of automated truck driving, specifically the use case of lane keeping in platooning scenarios based on mirror cameras. Lane detection, generating control parameters, controller, and arbitration functions all run on the NVIDIA DRIVE PX 2 with three cameras attached to it. 

25-minute Talk Devid Will - Manager Automated Driving Functions, fka Forschungsgesellschaft Kraftfahrwesen mbH Aachen
Add to My Interests
S7267 - Automatic Compiler-Based Optimization of Graph Analytics for the GPU

Learn how to use IrGL, our newly developed language and compiler, to obtain high-speed graph algorithm implementations without writing a lot of low-level NVIDIA CUDA. IrGL can be used for parallel graph algorithm research, graph analytics, and graph database query processing. IrGL performance for graph algorithms meets or exceeds the performance of low-level handwritten CUDA code because our optimizing compiler automatically tackles three key challenges encountered in writing graph algorithms -- atomics, load imbalance due to serialization of loops, and kernel launch throughput -- freeing up the programmer to focus on higher-level optimizations. We'll introduce the IrGL language, its compiler, and how they can use IrGL to target problems with irregular data-parallelism.

50-minute Talk Sreepathi Pai - Postdoctoral Fellow, The University of Texas at Austin
Add to My Interests
S7648 - Automating High-Content Screening Image Analysis with Deep Learning

Deep learning can automate the analysis of the hundreds of thousands of images produced by automated microscopy systems each day. High-content screening (HCS) systems that combine high-throughput biotechnology with automated microscopy are revolutionizing drug development and cell biology research. The images produced by these systems provide valuable insight into how cells respond to many chemical or genetic perturbations. Existing image analysis pipelines rely on hand-tuning the segmentation, feature extraction, and machine learning steps for each screen. For many research groups, tuning these pipelines remains a bottleneck in implementing HCS. We'll demonstrate how deep learning-based pipelines overcome this bottleneck and outperform existing methods. We'll show improved results on classifying sub-cellular protein localization in genome-wide screens of the GFP-tagged yeast collection.

25-minute Talk Oren Kraus - PhD Student, University of Toronto
Add to My Interests
S7215 - Automating VR and Photoreal Imagery From Siemens Teamcenter Learn how manufacturers are automating and in-housing their digital photorealistic and VR/AR visualization pipelines out of Siemens Teamcenter and NX through JT. This is leading to improved efficiency and cost reduction and, crucially, enabling manufacturer control over digital assets that allows them to be repurposed across the business. We'll demonstrate how to set up an automated visual digital pipeline out of Siemens Teamcenter into NVIDIA Iray and Epic Unreal Engine, accounting for configuration rules and buildability. 25-minute Talk Dave Coldron - Product Director, Lightwork Design Ltd.
Add to My Interests
S7787 - Autonomous Driving on Benchmark

We'll discuss AI developments within the last decade with the help of public academic benchmark. Xiaodi believes benchmarks like CityScapes and KITTI are helpful for the development of AI worldwide, however, these benchmarks have disadvantages in that there is a need to propose new datasets to incorporate more autonomous driving sections in computer vision benchmarks.

25-minute Talk Xiaodi Hou - Chief Technology Officer, TuSimple
Add to My Interests
S7172 - Autonomous Drone Navigation with Deep Learning

We'll present an autonomous drone piloted by a deep neural network (DNN) that can autonomously navigate through a forest by following trails and can avoid obstacles. DNN gets video frames from the onboard drone camera as its input and computes high-level control commands as its output. The control commands are sent to the low-level drone's autopilot for execution. Our DNN runs onboard an NVIDIA Tegra TX1 in real time. The drone uses open source PX4 flight stack for the low-level control and ROS for its runtime. We'll present the DNN's architecture, describe how we train it and run it as ROS node. We'll also demo the flight videos and show some qualitative analysis of the autonomous flights.

50-minute Talk Nikolai Smolyanskiy - Principal Software Engineer, NVIDIA
Alexey Kamenev - Senior Deep Learning and Computer Vision Engineer, NVIDIA
Jeffrey Smith - Senior Computer Vision Software Engineer, NVIDIA
Add to My Interests
S7263 - Bayesian Inference and Markov Chain Monte Carlo Algorithms on GPUs We'll discuss the Bayesian statistical paradigm and Markov Chain Monte Carlo (MCMC) algorithms - the cornerstone of modern Bayesian computation. Scalable MCMC for big datasets and complex models is currently an open research question. Using GPUs provides a promising and largely unexplored avenue for accelerating these algorithms, but is nontrivial, because MCMC is inherently sequential and has traditionally been considered difficult to parallelize. We'll show how Gibbs sampling, a widely used MCMC algorithm, can be effectively parallelized on GPUs for a large class of exchangeable hierarchical Bayesian models. Participants will learn the mathematical and hardware/software challenges in bringing GPUs to the Bayesian community. Background in Bayesian statistics or MCMC is not assumed. 25-minute Talk Alexander Terenin - PhD Student, UC Santa Cruz
David Draper - Professor, UC Santa Cruz
Add to My Interests
S7325 - Behavioral Additive Manufacturing: Adaptive 3D Printing Using Multi-Agent Systems and Deep Learning

We'll introduce autonomously constructed architecture by using multi-agent systems (MAS) and deep learning. 3D printing path adapts in real time to the unpredictable material behavior, by using an NVIDIA Jetson card on an industrial robotic arm. We'll explain path generation, real-time visual tracking of material, recomputing of robotic targets, and finally experiments with real-time MAS adaptation for emergent stable structures through code and video recordings of 3D printing process and its printed structures.

25-minute Talk Alisa Andrasek - Director, University College London, Wonderlab/Biothing
Add to My Interests
S7362 - Benchmarking the New Unified Memory of CUDA 8 We'll evaluate CUDA 8's new unified memory's impact to applications with benchmarks and share practices on how to tune or build high-performance apps. Since CUDA 6, unified memory has aimed at simplifying the programmability of heterogeneous memory management while maintaining good performance. However, practical limitations prevent applications from fully taking advantage of it. The CUDA 8 release highlights an updated unified memory that both simplifies programmability and improves performance, especially when married with the new Pascal GPU architecture. We'll evaluate the new system, benchmark its performance, and share our best practices in tuning code, which could be good reference for app developers. In addition, we'll explore options and solutions on moving/exchanging data efficiently between heterogeneous devices, such as NVMe/NVRAM in modern data center or cloud environments. 25-minute Talk Frank Zhao - Software Architect, Dell EMC
Yifan Sun - College Coop Student, Dell EMC
Add to My Interests
L7106 - Best GPU Code Practices Combining OpenACC, CUDA, and OmpSs

We'll guide you step by step to port and optimize an oil-and-gas mini application to efficiently leverage the amazing computing power of NVIDIA GPUs. While OpenACC focuses on coding productivity and portability, CUDA enables extracting the maximum performance from NVIDIA GPUs. OmpSs, on the other hand, is a GPU-aware task-based programming model that may be combined with CUDA, and recently with OpenACC as well. Using OpenACC, we'll start benefiting from GPU computing, obtaining great coding productivity, and a nice performance improvement. We can next fine-tune the critical application parts developing CUDA kernels to hand-optimize the problem. OmpSs combined with either OpenACC or CUDA will enable seamless task parallelism leveraging all system devices. Prerequisites: Basic knowledge of OpenACC and CUDA. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Antonio J. Pena - Senior Researcher, Barcelona Supercomputing Center (BSC)
Guray Ozen - Research Assistant , Barcelona Supercomputing Center
Pau Farre - Software Engineer, Barcelona Supercomputing Center (BSC)
Add to My Interests
S7786 - Beyond Games: How Unreal Engine is Putting the Reality into Virtual Reality Epic Games presents a panel discussion with partners who are using Unreal Engine to bring real-time, high-fidelity interactive experiences to their customers. From product design and visualization, to virtual production, photorealism, and final pixels, VR content creators are uncovering the power of Unreal Engine. Hear from company executives, technology partners, and customers about applying game engine technology to revolutionize the conventions of filmmaking, product design, and the future of customer engagement. 50 minutes Panel Emre Deniz - Director, Opaque Space
Marc Petit - General Manager, Epic Games (Unreal Engine)
Mark Roberts - Design Operations Manager , McLaren Automotive Limited
Stephen Phillips - Co-Founder / CTO, Theia Interactive
Matthew Noyes - Aerospace Technologist/Hybrid Reality Lab Software Lead, National Aeronautics and Space Administration
Add to My Interests
S7817 - Beyond Visualization, Harnessing the Power of Compute for Design

Autodesk Project Dreamcatcher takes the next step in the world of computation, artificial intelligence, and machine learning by harnessing the power of computing to deliver on the promise of Computer Aided Design. Today's GPU's allow for massive exploration of the design space for any problem, empowering designers and engineers to truly allow computation capacity to aid them in design and problem solving. Come learn how Autodesk is harnessing the power of computation in the cloud, powered by tomorrow's next generation hardware, to help everyone make better decisions.

25-minute Talk Brian Frank - Sr. Product Line Manager | Simulation, Autodesk
Add to My Interests
S7170 - Bicycle Green Waves Powered by Deep Learning We'll explore using deep learning to improve urban traffic signaling. Bicycles (both self-powered and pedelecs) are the future of urban transport alongside (self-driving) electric cars, buses, and rail services. Green waves make cycling more efficient, attractive, and safer. Instead of fixed ""green wave"" timings or priorities, a work in progress system is presented that learns to increase the flow of bicycle traffic while minimizing the impact on other traffic actors -- and in many use cases also results in improvements in general traffic times. Using low power efficient SoCs -- Tegra X1 -- the ""smarts"" are integrated in traffic lights and provide V2I interfaces -- also to mobile phones of cyclists -- about signal changes and warn of pedestrians or cyclists. Dispensing with inductive loop, magnetometer, or radar-based sensors buried in the pavement makes the system inexpensive. We'll present initial results from pilot testing in a German city. 25-minute Talk Edward Zimmermann - Principal Consultant, Nonmonotonic Networks / joint R&D with GESIG. Gesellschaft fur Signalanlagen
Add to My Interests
S7178 - Bidirectional Recurrent Convolutional Networks and Their Applications to Video Super-Resolution

We'll discuss a fully convolutional version of recurrent neural networks, namely bidirectional recurrent convolutional networks, which can greatly reduce the number of learning parameters from millions to several hundreds. We'll demonstrate its effectiveness by achieving significant performance and running time improvements for the task of video super-resolution. Using GPUs can further accelerate the speed by 20 times.

25-minute Talk Qi Zhang - Assistant Professor, Chinese Academy of Sciences, Institute of Automation
Add to My Interests
S7405 - Bifrost: A Python/C++ Framework for Easy High-Throughput Computing Bogged down trying to build a fast GPU processing pipeline? We'll present a solution: Bifrost, a framework for rapidly composing real-time data collection and analysis pipelines. Real-time data processing lies at the heart of most modern radio telescopes, and while hardware capabilities and data collection rates advance to the petascale regime, development of efficient real-time processing codes remains difficult and time-consuming. Bifrost solves this problem by combining a TensorFlow-like Python API with a library of common algorithms and highly efficient data transport. We'll describe the design and implementation of this framework, and demonstrate its use as the backend for a large radio telescope. 25-minute Talk Miles Cranmer - Research Assistant, Harvard-Smithsonian Center for Astrophysics
Add to My Interests
S7475 - Big Data, Little Cluster: Using a Small Footprint of GPU Servers to Interactively Query and Visualize Massive Datasets We'll discuss the approach to and advantages of using GPUs to not only power through large-scale database queries but also use the graphics pipeline of the GPU to rapidly and efficiently visualize the outputs of billions of rows of data. The application of the GPU for both query and render results in a fast system for multi-terabyte scale analytic challenges. We'll cover the high-level benefits of the approach and delve into the technical details associated with GPU-powered databases, server side rendering, and other software refinements needed to squeeze the maximum amount of performance from this exceptional hardware platform. 50-minute Talk Todd Mostak - Founder and CEO, MapD
Add to My Interests
S7481 - Big Image-Omics Data Analytics for Clinical Outcome Prediction We'll introduce how to develop big image-omics data analytics algorithms with GPU computing tools for clinical outcome prediction from pathological images and cell profiling data of cancer patients. Recent technological innovations are enabling scientists to capture image-omics data at increasing speed and resolution, where the image-omics refers to both image data (pathology images or radiology images) and omics data (genomics, proteomics, or metabolomics) captured from the same patient. This is generating a deluge of heterogeneous data from different views. Thus, a compelling need exists to develop novel data analytics tools to foster and fuel the next generation of scientific discovery in image-omics data-related research. However, the major computational challenges are due to the unprecedented scale and complexity of heterogeneous image-omics data analytics. There is a critical need for large-scale modeling and mining strategies to bridge the gap and facilitate knowledge discovery from complex image-omics data. We'll introduce our recent work on developing novel deep learning methods to detect cells in the terapixel histopathological images with 10,000+ speedup and automatically discovering biomarkers for clinical outcome prediction. 25-minute Talk Junzhou Huang - Associate Professor, University of Texas at Arlington
Add to My Interests
S7298 - Blasting Sand with NVIDIA CUDA: MPM Sand Simulation for VFX

We'll present our challenges and solutions for creating a material point method (MPM)-based simulation system that meets the production demands of fast turnaround for artistic look development. Our method fully utilizes the GPU and performs an order of magnitude faster than the latest published results. With this improvement, the technique's main limiting factor - its speed - has been eliminated, making MPM appealing for a wider range of VFX applications. Practitioners in computational physics and related fields are likely to benefit from attending the session as our techniques are applicable to other hybrid Eulerian-Lagrangian simulations.

25-minute Talk Ken Museth - Director of R&D, DreamWorks Animation
Gergely Klar - Software Engineer, DreamWorks Animation
Add to My Interests
S7652 - Blending the Worlds of Machine Learning and Deep Learning to Make the Fastest AI Platform on GPUs

Deep learning algorithms have benefited greatly from the recent performance gains of GPUs. However, it has been unclear whether GPUs can speed up data manipulations such as joins and aggregations and machine learning algorithms such as generalized linear modeling, random forests, gradient boosting machines, and clustering., the leading open source AI company, is bringing the best-of-breed data science and machine learning algorithms to GPUs, not just deep learning. In addition, is porting data.table to GPUs, already the fastest open-source columnar data frame library and the world's fastest implementation of the sort algorithm. This powerful combination will enable the fastest data science and machine learning pipelines for AI transformations for applications such as IoT time series, fraud prevention, anomaly detection, and many more. We'll demonstrate benchmarks for the most common algorithms relevant to enterprise AI and showcase performance gains as compared to running on CPUs.

25-minute Talk Arno Candel - CTO,
SriSatish Ambati - CEO and Co-Founder, H2O
Add to My Interests
S7450 - Boosting Performance and Earnings of Cloud Computing Deployments with rCUDA

We'll present how cloud computing facilities using GPUs can boost overall performance while generating increased economic benefits. To achieve these important improvements, we'll propose to move from the traditional model for using GPUs within virtual machines to a new model that leverages the remote GPU virtualization mechanism. This mechanism allows GPUs to be detached, in a logical way, from the nodes where they are installed so that GPUs now can be transparently used from any node of the cluster. Furthermore, the remote GPU virtualization mechanism allows GPUs to be concurrently shared among many different applications. We'll use the rCUDA middleware as a case study for demonstrating how GPUs can be concurrently shared among virtual machines in a cloud computing deployment. We'll show performance results to quantify the improvements attained by using rCUDA in cloud deployments. 

25-minute Talk Federico Silla - Associate Professor, Technical University of Valencia
Add to My Interests
S7436 - Boosting Visual Object Tracking Using Deep Features and GPU Implementations We'll explain how to use Deep Features for enabling state-of-the-art results in visual object tracking. Visual object tracking is a difficult task in three respects, since (1) it needs to be performed in real-time, (2) the only available information about the object is an image region in the first frame, and (3) the internal object models needs to be updated in each frame. The use of Deep Features gives significant improvements regarding accuracy and robustness of the object tracker, but straightforward frame-wise updates of the object model become prohibitively slow for real-time performance. By introducing a compact representation of Deep Features, a smart updating mechanism, and exploiting systematically GPU implementations for feature extraction and optimization, real-time performance is achievable without jeopardizing tracking quality. 25-minute Talk Michael Felsberg - Professor, Linkoping University
Add to My Interests
S7819 - Bringing Gaming, VR, and AR to Life with Deep Learning Game development is a complex and labor-intensive effort. Game environments, storylines, and character behaviors are carefully crafted requiring graphics artists, storytellers, and software to work in unison. Often games end up with a delicate mix of hard-wired behavior in the form of traditional code and somewhat more responsive behavior in the form of large collections of rules. Over the last few years, data-intensive machine learning solutions have obliterated rule-based systems in the enterprise -- think Amazon, Netflix, and Uber. At Unity, we've explored the use of deep learning in content creation and deep reinforcement learning in character development. We'll share our learnings and the Unity APIs we use with the audience. 25-minute Talk Danny Lange - Vice President, Unity Technologies
Add to My Interests
S7250 - Bringing Low-Latency and Fault-Tolerant Computing to Tegra SoCs with Persistent Threading

The NVIDIA Tegra K1 and X1 have revolutionized embedded computing. Combining ARM cores and a powerful GPU, these devices have found their way into everything from cars to low-power sensor systems. The high computational efficiency of Tegra SoCs enables potential new markets that have long been held by FPGAs. However, some apps do not map well into the typical CUDA execution model. Persistent threading (PT) is a relatively unexplored model for GPU computing, enabling FPGA-like behavior. Like an FPGA, PT executes until the device is reset or a rare halt condition is met. Memory management and application synchronization are shifted from the NVIDIA API to the developer as the PT kernel runs in parallel with the host application. Leveraging the Tegra unified memory model, PT is able to reduce API overhead to only launch of the kernel and scheduler workload.

25-minute Talk Andrew Milluzzi - Doctoral Candidate, University of Florida
Add to My Interests
S7324 - Bringing NVIDIA GPUs to the PGAS/OpenSHMEM World: Challenges and Solutions

Learn about techniques and solutions that bring GPU computing to the world of partitioned global address space (PGAS) models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. We'll discuss simple extensions to the OpenSHMEM model to address this issue. We'll also present challenges and solutions in designing NVIDIA CUDA aware runtimes to support these extensions and optimize data movement using CUDA IPC and GPUDirect RDMA features. And we'll demonstrate the impact of these concepts to application performance.

25-minute Talk Dhabaleswar K. (DK) Panda - Professor and University Distinguished Scholar, The Ohio State University
Add to My Interests
S7223 - Bring the Power of CUDA to Small Devices Learn how to bring the power of GPUs and CUDA to small machines and IoT edge devices. Experience the development process from proof of concept to a production-ready device. NVIDIA TK1 and Jetson TX1 SoCs allow for the first time the use of high-performance GPGPUs on small, power-constrained devices. The complexity and cost to get from a maker board like the Jetson TK1 to a hardware design ready for customers are for many preventing progress. We'll explain how computer modules like the Jetson X1 module can be used to simplify the process and get you to market faster and cheaper. We'll go step by step through a typical development process. You'll learn what skills and resources you require to create an industrial-grade device. We'll evaluate how this approach compares to other solutions like single board computers and designs from scratch. If you know the power of GPUs, but don't know how to bring it to machines or IoT devices, this talk is for you! 50-minute Talk Daniel Lang - CTO, Toradex Inc.
Add to My Interests
S7634 - Build a Neural Translation System from Scratch with PyTorch

As recently covered by the New York Times, Google has totally revamped its Translate tool using deep learning. We'll learn about what's behind this system, and similar state of the art systems?including some more recent advances that haven't yet found their way into Google's tool. We'll start with looking at the original encoder-decoder model that neural machine translation is based on, and will discuss the various potential applications of this kind of sequence to sequence algorithm. We'll then look at attentional models, including applications in computer vision (where they are useful for large and complex images). In addition, we'll investigate stacking layers, both in the form of bidirectional layers and deep RNN architectures. We'll focus on the practical details of training real-world translation systems, and showing how to take advantage of PyTorch's dynamic nature to heavily customize an RNN as required for modern translation approaches.

50-minute Tutorial Jeremy Howard - Entrepreneur,
Add to My Interests
S7366 - Building a GPU-enabled OpenStack Cloud for HPC M3 is the latest generation system of the MASSIVE project, an HPC facility specializing in characterization science (imaging and visualization). Using OpenStack as the compute provisioning layer, M3 is a hybrid HPC/cloud system, custom-integrated by Monash's R@CMon Research Cloud team. Built to support Monash University's high-throughput instrument processing requirements, M3 is half-half GPU-accelerated and CPU-only. We'll discuss the design and tech used to build this innovative platform as well as detailing approaches and challenges to building GPU-enabled and HPC clouds. 25-minute Talk Blair Bethwaite - Lead Cloud Architect, Monash University
Add to My Interests
S7704 - Building an L4 Autonomous Driving R&D Platform

We'll give a step-by-step description of how to use NVIDIA DRIVE PX 2 and the NVIDIA DriveWorks SDK to enable Level 4 autonomous research vehicles. We'll consider choice of sensors (camera, lidar, radar) and mounting locations for highway and urban autonomous driving. We'll also discuss optimal use of DriveWorks for sensor data gathering and processing using NVIDIA's AI solutions. The presentation will include video demonstrations of real-life examples showcasing the utilization of DRIVE PX 2 and DriveWorks as an end-to-end deep learning platform for automated driving.


25-minute Talk Wolfgang Juchmann - VP Sales and Business Development , AutonomouStuff
Add to My Interests
S7350 - Building a Successful Deep Learning Platform: Experiences in Building GPU-Enabled HPC Clusters

Conducting deep learning research and development requires a combination of cutting-edge hardware, elastic software frameworks, and a collaborative research community. We'll provide the scaffolding for participants to construct an enterprise-scale, GPU-enabled high performance computing solution for machine learning and data science by drawing on the experiences gained while IBM Research built its Cognitive Computing Cluster. We'll start by discussing how to build a secure, shared-resource computing cluster optimized for deep learning. Next, we'll cover how to provide deep learning frameworks supporting speech, vision, language, and text processing and their underlying primitives. Finally, we'll discuss how to build a best practice knowledge base to improve research quality and accelerate discovery.

25-minute Talk Brian Belgodere - Research Software Engineer, IBM Research
Add to My Interests
S7670 - Building Emotionally Aware Cars

Advanced and autonomous AI systems surround us daily, but as smart as these are, they lack the ability to sense and adapt to human emotions. At Affectiva, our mission is to humanize technology by bringing artificial emotional intelligence (Emotion AI) to the digital world. Using computer vision and deep learning, Affectiva measures facial expressions of emotions. We'll explore the applications of Emotion AI in automotive. We'll show how driver's emotion can be measured in human-driven cars and (semi-) autonomous vehicles to improve road safety and deliver a more personalized transportation experience. In addition, we'll share our findings from over 28 hours of in-car data collected, such as the most frequently observed emotions.

25-minute Talk Abdelrahman Mahmoud - Product Manager, Affectiva
Add to My Interests
S7780 - Building Exascale Deep Text Comprehension Tools for Effective Cancer Surveillance

We'll share our experience in developing novel text comprehension tools for enabling population-level cancer surveillance and research at scale to support the National Cancer Institute's Surveillance, Epidemiology, and End Results program.

25-minute Talk Arvind Ramanathan - Staff Scientist, Oak Ridge National Laboratory
Add to My Interests
S7858 - Building low-latency, production-grade Deep Learning platforms: the unsexy journey towards real-life results (Presented by Twitter)

You'll learn how Cortex built some components of its Deep Learning platform to fit into the highly optimized Twitter Timelines ranking pipeline. After navigating through a range of specific machine learning/deep learning workflows, found in different parts of the industry, we will focus on one that does not make the headlines of Recode, but is at the core of the business: the data first deep learning pipeline. We'll see how we had to build new, custom code on CPU and GPU to even think of pushing this production, and how what matters even more than the actual algorithm is how it can be deployed, tested and debugged. We'll discuss the opportunity that such an optimized production workflow generates for a company the scale of Twitter, and the challenges ahead.

25-minute Talk Nicolas Koumchatzky - Lead for Twitter Cortex, Twitter
Add to My Interests
S7815 - Building Scale-out Deep Learning Infrastructure: Lessons Learned from Facebook A.I. Research

Facebook AI Research (FAIR) in partnership with NVIDIA has designed a scale-out infrastructure built on NVIDIA DGX-1. This initiative began with an extensive evaluation of design approaches for multi-system scale, as well as considerations for networking and storage supporting one of the world's largest DGX-1 clusters. Attend this session to gain valuable insights into how one of the world's leading AI innovators is building a scale-out infrastructure for deep learning, learn architectural best practices, and participate in Q&A with featured panelists from FAIR and NVIDIA.

25-minute Talk Soumith Chintala - AI Research Engineer, Facebook
Howard Mansell - Engineering Manager, Facebook
Add to My Interests
S7595 - Building Truly Large-Scale Medical Image Databases: Deep Label Discovery and Open-Ended Recognition The recent rapid and tremendous success of deep neural networks on many challenging computer vision tasks derives from the accessibility of the well-annotated ImageNet and PASCAL VOC datasets. Nevertheless, unsupervised image categorization (that is, without ground-truth labeling) is much less investigated, critically important, and difficult when annotations are extremely hard to obtain in the conventional way of "Google Search" + crowd sourcing (exactly how ImageNet was constructed). We'll present recent work on building two truly large-scale radiology image databases at NIH to boost the development in this important domain. The first one is a chest X-ray database of 110,000+ images from 30,000+ patients, where the image labels were obtained by sophisticated natural language processing-based text mining and the image recognition benchmarks were conducted using weakly supervised deep learning. The other database contains about 216,000 CT/MRI images with key medical findings from 61,845 unique patients, where a new looped deep pseudo-task optimization framework is proposed for joint mining of deep CNN features and image labels. Both medical image databases will be released to the public 50-minute Talk Le Lu - Staff Scientist, National Institutes of Health
Add to My Interests
S7792 - Buildling Exascale Deep Learning Tools to Help Understand Cancer Biology at the Molecular Scale

Understanding the biology of cancer at the molecular scale is a critical challenge for the RAS oncogene family of cancers. We are developing an adaptive molecular dynamics simulation framework that uses multi-scale models to achieve simulation time scales that allow biologically interesting behaviors to emerge. We'll develop new deep learning techniques that can help identify phase transitions, the formation of complex structures, and the detection of interesting events between the RAS protein and cell membrane. This molecular dynamics simulation data will drive the need for new techniques in both model and data parallelism within deep learning toolkits, and require the capabilities of next-generation supercomputers such as SIERRA and Summit at LLNL and ORNL, respectively.

25-minute Talk Brian Van Essen - Computer Scientist, Lawrence Livermore National Laboratory
Add to My Interests
S7438 - Build Systems: Combining CUDA and Modern CMake Learn all about CMake's new CUDA support and how best to combine it with "modern" CMake usage requirements. CMake is an open-source, cross-platform meta build generator. This year CMake was updated to fully support CUDA as a first-class language on all major platforms. This enables projects to fully leverage "modern" target-based features inside projects that require CUDA compilation. We'll iteratively develop the CMake logic for a sample project using modern CMake with a focus on CUDA. We'll cover transitive usage requirements, how to request language standard levels, mix language libraries, CUDA separable compilation, and generating export configuration files. We expect people to already have some familiarity with the CMake language. 25-minute Talk Robert Maynard - Staff R&D Engineer, Kitware, Inc.
Add to My Interests
S7636 - Cache Directive Optimization in OpenACC Programming Model

OpenACC is a directive-based programming model that provides a simple interface to exploit GPU computing. As the GPU employs deep memory hierarchy, appropriate management of memory resources becomes crucial to ensure performance. The OpenACC programming model offers the cache directive to use on-chip hardware (read-only data cache) or software-managed (shared memory) caches to improve memory access efficiency. We have implemented several strategies to promote the shared memory utilization in our PGI compiler suite. We'll briefly discuss our investigation of cases that can be potentially optimized by the cache directive and then dive into the underlying implementation. Our compiler is evaluated with self-written micro-benchmarks as well as some real-world applications. 

25-minute Talk Xiaonan Tian - GPU Compiler Engineer, NVIDIA
Add to My Interests
S7540 - CAE Productivity and GPU Technology We'll present performance results for the NVIDIA Tesla P100. Simulation is the key to greater productivity in many areas of product development and GPU technology plays a crucial role in achieving that goal. We'll use the simulation of a full 3D particle compaction process to compare run times with the NVIDIA Tesla K40. The results are generated from a commercially available nonlinear explicit transient dynamic finite element solver that takes full advantage of GPU technology for parallelization. The commercial software used to create the finite element mesh includes newly developed meshing techniques that make it easy to create the model. We'll also discuss details of the commercially available hardware used to perform the simulation, which has been certified for the P100. 25-minute Talk Wayne Mindle - Director of Sales & Marketing, CertaSIM, LLC
Add to My Interests
S7601 - Caffe2: A New Lightweight, Modular, and Scalable Deep Learning Framework

Caffe2 is a new lightweight, modular, and scalable deep learning framework refactored from the previous Caffe. Caffe2 is widely used at Facebook for production to enable new AI experiences. We'll explain the strengths of Caffe2 and many improvements we made from the original Caffe.

25-minute Talk Yangqing Jia - Research Scientist, Facebook
Add to My Interests
S7731 - Can an Artificial Intelligence Win a Nobel Prize?

We're investigating if deep learning can help scientists exploring fundamental physics with ultra-cold atoms. The Nobel prize was awarded to scientists who first discovered how to cool atoms to near absolute zero to create a special phase of matter called a Bose-Einstein Condensate (BEC). In a BEC all atoms are in the same quantum state, meaning they move together as if they are one super atom. We can use BECs to make ultra-precise measurements of gravity, potentially allowing us to make gravitational images to see hidden features in the world around us. BECs are made using a process of evaporative cooling, where the boundaries that trap the atoms are changed over time to let the hotter atoms escape. This approach has hit a limit, and BECs have remained around the same size for the last 10 years. We are handing over control of our ultra-cold atom experiment to a deep-learning algorithm, and investigating if it can find entirely new ways to make BECs. In particular we let the deep learning algorithm take control of not only the boundaries of the atoms but the interactions between the atoms as well.

25-minute Talk Michael Hush - Lecturer, University of New South Wales
Add to My Interests
S7788 - CANDLE: Predicting Tumor Cell Response to Drug Treatments

We'll focus on one of the three pilots of the DOE and NCI partnership on precision oncology and the Cancer Moonshot, namely predicting tumor cell response to drug treatments with deep learning. Predicting tumor cell response to drug treatments is a critical challenge for accomplishing the promise of precision medicine in oncology. As part of a joint project between DOE and NCI to develop advanced computing solutions for caner, we are developing a deep learning-based framework for modeling tumor-drug interaction and predicting dose response in pre-clinical screening.

25-minute Talk Fangfang Xia - Computer Scientist, Argonne National Laboratory
Add to My Interests
S7698 - CanvoX: High-Resolution VR Painting for Large Volumetric Canvas

As Tilt Brush and Quill are not voxel based, a new VR-based voxel painting system with large (40km^3) and detailed (0.3mm^3) canvas would be interesting. We develop an array of octree of depth 24, using 5 indices per cell: parent, child, and 3-neighbors to accelerate ray traversal. We adaptively refine or coarsen the octree in CPU and sync it with GPU, and then ray cast front to back. To accelerate, we develop a foveated rendering algorithm. We design a quadtree render target whose resolution is dynamically adjusted to heat map, traverse ray, and then interpolate the color in screen space. We traverse ray through upper-level cells as the ray cone widens. We analyze floating point error propagations to thoroughly understand precision problems in deep cells and ray intersections.

25-minute Talk Yeojin Kim - PhD Student, Ewha Womans University
Add to My Interests
S7606 - Capture and Rendering of Interactive 3D Audio for Virtual and Augmented Reality The goal of VR and AR is to immerse the user in a created world by fooling the human perceptual system into perceiving rendered objects as real. This must be done without the brain experiencing fatigue: accurate audio representation plays a crucial role in achieving this. Unlike vision with a narrow foveated field of view, human hearing covers all directions in full 3D. When the rendered audio and vision do not agree, the user falls out of the experience. The importance of audio for VR and AR are being increasingly realized, and VisiSonics is developing a comprehensive toolset to address the needs of industry. We'll describe several products developed by VisiSonics that are based on over a decade of research. These include propagation engines that are embedded in standard authoring workflows for gaming (Unity, Unreal, Wwise, FMOD) and movie postproduction (Adobe, ProTools); capture of audio into high-order ambisonics and MPeg-H; personalization of 3D audio to the individual's head shape via customization of the head-related transfer function and others. We'll demonstrate workflow solutions designed to enrich the audio immersion for the gaming, video post-production and capture in VR/AR. 25-minute Talk Ramani Duraiswami - CEO, VisiSonics Corporation
Add to My Interests
S7202 - Capturing Real-Time 360 Stereo Video from 3D Applications

360 video is a new and exciting way to share immersive content with other people. We'll describe both the techniques required to optimize performance and the best practices to avoid various visual artifacts. We'll cover efficient cube-map rendering, stereo-conversion of the cube-map, and handling of translucent objects. We'll share some of the pitfalls of working with particles, billboards, lighting, tone mapping, screen-space effects, etc.

50-minute Talk Alexey Panteleev - Senior Developer Technology Engineer, NVIDIA
Add to My Interests
S7600 - ChainerMN: Scalable Distributed Deep Learning with Chainer We'll present ChainerMN, a multi-node distributed deep learning framework, together with the basics of distributed deep learning. Even though GPUs are continuously gaining more computation throughput, it is still very time-consuming to train state-of-the-art deep neural network models. For better scalability and productivity, it is paramount to accelerate the training process by using multiple GPUs. To enable high-performance and flexible distributed training, we developed ChainerMN, built on top of Chainer. We'll first introduce the basic approaches to distributed deep learning. Then, we'll explain the design choice, basic usage, and implementation details of Chainer and ChainerMN. We'll report benchmark results and discuss the future directions of distributed deep learning. 25-minute Talk Takuya Akiba - Researcher, Preferred Networks, Inc.
Add to My Interests
S7280 - CLBlast: A Tuned BLAS Library for Faster Deep Learning We'll demonstrate how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at deep learning training and inference and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the convolutional layers: the computational heart of all deep-learning frameworks (TensorFlow, Caffe, etc.). CLBlast has three main advantages over other BLAS libraries: 1) it can be explicitly tuned for specific matrix-sizes and hardware platforms, 2) it runs on less common devices (and it is fast), such as embedded and low-power GPUs, and 3) it can perform operations in half-precision FP16 format, saving precious bandwidth, time, and power. 25-minute Talk Cedric Nugteren - GPU / deep learning specialist, TomTom
Add to My Interests
S7837 - Cloud and Edge Deep Learning Platform for various real business fields (Presented by ABEJA)

"ABEJA Platform" is PaaS (Platform as a Service) architected for "Society5.0" and "Industry4.0" Ecosystem of IoT, Big data and AI, that collect sensor data from IoT devices, collaborate with existing data, training on cloud from their Big data using Deep Learning, inference on edge and cloud based the trained models, output inferred data via API. In the training phase of Deep Learning, the platform is optimized on distribution by GPU. In the inference phase, they admin various models(versioned), and deploy on cloud base distributed inference system or edge-side computer (ex: Jetson). Also, since the system is automatically scaled per the request on cloud. And the cloud system monitor the edge-side computer and user can control it like cloud system.

25-minute Talk Yousuke Okada - Founder & CEO / CTO, ABEJA, Inc.
Add to My Interests
S7654 - Cloud-Based Deep Learning as the Radiologist's Best Friend

Sad but true: most of radiology is mind-numbing tedium. Radiologists spend countless hours on tasks that are onerous and error-prone, resulting in high costs and frequent misdiagnoses. Our first product designed to address these deficiencies is Arterys Cardio DL, a web-based, zero-footprint cardiac MRI postprocessing suite. Arterys Cardio DL includes a deep learning-based contouring algorithm that vastly reduces the time required to diagnose heart disease in cardiac MRI. Arterys Cardio DL is the first technology ever to be cleared by the FDA that leverages cloud computing and deep learning in a clinical setting. We'll discuss the technology behind the software and how we proved its safety and efficacy to secure FDA clearance in the United States and the CE Mark in Europe.

25-minute Talk Daniel Golden - Director of Machine Learning, Arterys
Add to My Interests
S7657 - CloudBrain: AI SaaS Case Study in China

CloudBrain is providing deep learning/AI SaaS to enterprises to automatically optimize their key performance indexes. We build a deep learning/AI platform using the latest NVIDIA GPUs and CUDA technologies, which enables us to research and implement state-of-the-art learning/inference algorithms with fast iterations. This platform also reduces the hardware/operation cost and hence improve clients' return over investment. We'll present two case studies in the fintech and energy sectors.

25-minute Talk Benyu Zhang - CEO, CloudBrain
Add to My Interests
S7296 - CloudLighting: Merging GPU-based HPC with Cloud Services Learn how you can integrate GPU-enabled HPC and Cloud computing using building on recent container technologies and integration. This presentation will highlight the efforts we are doing as part of the EU Horizon 2020 project CloudLighting where ww look at how to integrate Heterogenous Computing with Cloud technologies. 25-minute Talk Anne C Elster - Professor of High Performance Computing, Norwegian University of Science &Technoloy /Univ. of Texas at Austin
Add to My Interests
S7489 - Clustering GPUs with Ethernet As GPUs get more widely deployed for machine learning, training is being done over larger datasets than ever before resulting in longer training time. Reducing training time from days to hours or less, requires clustering of large number of GPUs. As more users are starting to see the benefits of machine learning to their businesses, there is also a need to provide on-demand access to the users of these data center-based clusters. The ideal technology for such large-scale clustering in the data center is Ethernet. We'll discuss the work Broadcom is doing with NVIDIA to enable GPUDirect using its RoCE v2 line of Ethernet NICs. 25-minute Talk Fazil Osman - Distinguished Engineer, Broadcom Limited
Add to My Interests
S7850 - Cognitive Augmented Design with AI powered 3DEXPERIENCE

As we are entering the experience economy, all innovative companies are rethinking their entire development pipelines and processes, fueled with new disruptions for generative multi-physics, machine and deep learning based design and next generation engineering automation. We'll present how the 3D experience platform can provide intelligent, deep learning-based and automated accelerated design approaches leveraging enterprise patrimony and knowledge, data and models to catalyze next generation 3D engineering practices and product design experiences. 

50-minute Talk Romain Perron - CATIA R&D Web Apps and Services Director, 3DS
Add to My Interests
S7471 - Combining NVIDIA Docker and Databases to Enhance Agile Development and Optimize Resource Allocation Learn how to use NVIDIA Docker combined with database analysis to improve your agile development process, generalize hardware requirements, speed up deployment, and identify optimal configurations. Discover how to leverage the resource isolation of Docker containers to test different GPU-architecture performances and resource allocation to optimize system use and maximize processing throughput. Learn how to test this resource isolation using agile methods including development of a processing chain from multi-threaded CPU, to single GPU, and finally to multi-GPU architecture. Hear our observations about compilation timing, execution performance, resource allocation, and generation of CUDA binaries within containers while showcasing an automated image registration pipeline. 50-minute Talk Sophie Voisin - Research & Development Associate, Oak Ridge National Laboratory
Chris Davis - Geospatial Software Engineer, Oak Ridge National Laboratory
Add to My Interests
S7423 - Community Detection on the GPU

Community detection is a key kernel in the analysis of complex networks for a variety of fields. We'll present our implementation of a new GPU algorithm for community detection based on the Louvain Method. Our approach parallelizes the access to individual edges, enabling load balancing of networks with nodes of highly varying degrees. We're able to obtain speedups up to a factor of 270 compared to the sequential algorithm. The algorithm consistently outperforms other recent shared memory implementations and is only one order of magnitude slower than the current fastest parallel Louvain method running on a Blue Gene/Q supercomputer using more than 500K threads.

25-minute Talk Antonino Tumeo - Research Scientist, Pacific Northwest National Laboratory
Mahantesh Halappanavar - Research Scientist, Pacific Northwest National Laboratory
Add to My Interests
S7472 - Comparative Study of CNN Models for Detection of Clouds in Overhead Imagery Learn how to improve pixel-wise image quality and geolocation accuracy by leveraging high-end hybrid computing resources. This particular test case involves the use of deep learning in the detection and masking of cloud objects, and imagery content that reduces image quality and usability, from overhead imagery. Timely results are attained through expediting selection and deployment of a deep learning model for overhead imagery for the cloud detection problem. An optimum deep learning model is selected through evaluation of a set of convolutional neural networks for their ability to detect cloud objects. Evaluation of each network is performed using a number of open-source neural network packages to give comparative performance results. In addition, two complementary image segmentation techniques are implemented in parallel, one operating on CPUs and the other on GPUs, to rapidly obtain candidate regions for cloud objects at a fine resolution. 25-minute Talk Byung Hoon Park - R&D Staff Scientist, Oak Ridge National Laboratory
Add to My Interests
S7635 - Comparison of OpenACC and OpenMP4.5 Offloading: Speeding Up Simulations of Stellar Explosions Learn about a case-study comparing OpenACC and OpenMP4.5 in the context of stellar explosions. Modeling supernovae requires multi-physics simulation codes to capture hydrodynamics, nuclear burning, gravitational forces, etc. As a nuclear detonation burns through the stellar material, it also increases the temperature. An equation of state (EOS) is then required to determine, say, the new pressure associated with this temperature increase. In fact, an EOS is needed after the thermodynamic conditions are changed by any physics routines. This means it is called many times throughout a simulation, requiring the need for a fast EOS implementation. Fortunately, these calculations can be performed independently during each time step, so the work can be offloaded to GPUs. Using the IBM/NVIDIA early test system (precursor to the upcoming Summit supercomputer) at Oak Ridge National Laboratory, we use a hybrid MPI+OpenMP (traditional CPU threads) driver program to offload work to GPUs. We'll compare the performance results as well as some of the currently available features of OpenACC and OpenMP4.5. 25-minute Talk Tom Papatheodore - Solutions Architect, NVIDIA
Add to My Interests
S7334 - Computational Focus-Tunable Near-eye Displays

We'll explore unprecedented display modes afforded by computational focus-tunable near-eye displays with the goal of increasing visual comfort and providing more realistic and effective visual experiences in virtual and augmented reality. Applications of VR/AR systems range from communication, entertainment, education, collaborative work, simulation, and training to telesurgery, phobia treatment, and basic vision research. In every immersive experience, the primary interface between the user and the digital world is the near-eye display. Many characteristics of near-eye displays that define the quality of an experience, such as resolution, refresh rate, contrast, and field of view, have been significantly improved over the last years. However, a pervasive source of visual discomfort prevails: the vergence-accommodation conflict (VAC). Further, natural focus cues are not supported by any existing near-eye display.

25-minute Talk Nitish Padmanaban - PhD Student, Stanford Computational Imaging Lab
Add to My Interests
S7507 - Computer Preemption and TotalView Have Made Debugging Pascal Much More Seamless

With Pascal, NVIDIA released compute preemption built right into the card. Debugging now is much smoother because when we stop a thread on the GPU we no longer stop the whole GPU, enabling interactive debugging on single-GPU systems and debugging multiple processes using the same GPU. Having said that, TotalView, the leading multi-threaded Linux debugger, has invested into improving its architecture to support multi-GPU systems at scale, resulting in a much more seamless debugging experience. Come get a better understanding of the latest technology and how and where we are looking to go next.

25-minute Talk Martin Bakal - Product Manager, Rogue Wave Software
Larry Edelstein - Sales Engineer, Rogue Wave Software
Add to My Interests
S7277 - Computer Virtual Experiment on Fluidized Beds Using GPU Accelerated CFD-DEM Method Learn how to use GPUs to accelerate CFD-DEM, the computational fluid dynamics - discrete element method, to achieve computer virtual experiment on fluidized beds in the chemical engineering field. We'll discuss how to organize the gas- and solid-phase equations solved concurrently by CPUs and GPUs in a heterogeneous supercomputing system. With systematic optimization of the model, numerical method, software, and hardware, we can simulate lab- to pilot-scale fluidized beds at quasi-realtime speed, and conduct demos of such systems. Our method realizes some real applications tthat need very long time simulations. 25-minute Talk Wei Ge - Professor, Institute of Process Engineering
Add to My Interests
S7853 - Computer Vision and Natural Language Processing with Apache MXNet (Presented by Amazon)

By attending this lab you'll gain hands-on experience using Apache MXNet with preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your computer vision and natural language processing development. Topics covered include a walk-through on setting up AMIs, CloudFormation Templates and other deep learning frameworks on AWS, a comparison of MXNet with other deep learning frameworks, and, using Apache MXNet, NDArrays, Symbols.

120 Instructor-Led Lab Joseph Spisak - Sr. Mgr - Product Management, Amazon
Sunil Mallya - Deep Learning Solutions Architect, Amazon Web Services
Mu Li - Sr. Applied Scientist, Amazon
Add to My Interests
S7173 - Concept to Production: An Architectural Design Firm's Jump to Virtualization

About a year ago, CannonDesign embarked on a journey to relocate and upgrade its entire data center, implementing NVIDIA GRID technology, to allow us to collaborate on architectural and engineering design projects throughout all of our offices worldwide. Now we're using our graphics-intensive applications on virtual desktops in our new data center. The design of the infrastructure and implementation of the migration was not without its hurdles, but we're here to share our journey. We'll give some insight into our designs for the virtual desktops, how the machines performed compared to our initial benchmarks, lessons learned, recommendations of tweaks we made, and a glimpse into some of our future plans. If you're planning a virtual desktop infrastructure, interested in creating a virtual environment designed around graphics-intensive applications, or are looking to upgrade and tweak your current environment, come learn from our journey.

50-minute Talk Andrew Schilling - Chief Infrastructure Officer, CannonDesign
Jimmy Rotella - Design Application Specialist, CannonDesign
Add to My Interests
H7113 - Connect with the Experts: Accelerated Graph & Data Analytics

Learn about the latest capabilities for Accelerated Graph & Data Analytics. How do GPUs excel at communication driven workloads like graph analytics? Come and find out! We will discuss libraries, benchmarks, tools and frameworks. Share your experiences, suggestions and questions regarding GPUs as a platform for batch and streaming analytics.

1 Hour Connect with the Experts Frank Eaton - Technical Lead, Accelerated Graph & Data Analytic, NVIDIA
Add to My Interests
H7129 - Connect with the Experts: Accelerated Libraries - cuFFT, cuSPARSE, cuSOLVER, nvGRAPH

This Connect with the Experts session focuses on GPU-accelerated libraries and gives an opportunity for attendees to connect with NVIDIA engineers. The libraries focused in this session are - cuFFT, cuFFTW - cuSPARSE - cuSOLVER - nvGRAPH

1 Hour Connect with the Experts Alexandre Fender - Software Engineer, NVIDIA
Lung-Sheng Chien - Software Engineer, NVIDIA
Lukasz Ligowski - Software Engineer, NVIDIA
Add to My Interests
H7125 - Connect with the Experts: Advanced Deep Learning

Attend this session to get your technical questions about Deep Neural Network architectures and scaling Deep Learning applications answered. Learn more about strategies you can employ to explore the right neural network architectures for your problem and train at scale to converge to your solution faster. NVIDIA deep learning research and HPC experts can provide you with the right guidance to maximize the performance and accuracy of your Deep Learning based solution.

1 Hour Connect with the Experts Michael Houston - Sr. Distinguished Engineer, NVIDIA
Sylvain Jeaugey - Senior Communication and Computing Engineer, NVIDIA
Add to My Interests
H7114 - Connect with the Experts: Building Autonomous Vehicles using DRIVE Platforms

Connect with NVIDIA experts and discuss why autonomous technologies powered by deep learning have become a key focus for every car manufacturer, as well as transportation services and technology companies. The car needs to know exactly where it is, recognize the objects around it, and continuously calculate the optimal path for a safe driving experience. This situational and contextual awareness of the car and its surroundings demands a powerful visual computing system that can merge data from cameras and other sensors, plus navigation sources, while also figuring out the safest path - all in real-time. This autonomous driving platform is NVIDIA DRIVE PX.

1 Hour Connect with the Experts Shri Sundaram - Senior Product Manager - DRIVE PX 2, NVIDIA
Richard Albayaty - Solutions Architect, NVIDIA
Aaraadhya Narra - Solutions Architect - Autonomous Driving, NVIDIA
Add to My Interests
H7116 - Connect with the Experts: Building Your AI Products and Services (FOR INCEPTION PROGRAM PARTNERS)

This is one of three Connect with the Experts sessions created exclusively for the members of our AI and deep learning startup program, Inception, and will focus on how to build your product. We will be focusing on product design and how you can scale out. Speak with experts on your technology stack (data, computer, model, deployment, etc), and considering scaling out through training and inference through the cloud or in-house.

1 Hour Connect with the Experts Chris Gottbrath - Accelerated Computing Product Manager, NVIDIA
Ryan Olson - Architect, Solutions, NVIDIA
Kari Briski - -, NVIDIA
Louis Capps - Solution Architect, NVIDIA
Add to My Interests
H7120 - Connect with the Experts: Containers for GPU Applications Interactive session to answer any question you might have regarding using GPUs with Linux containers technologies (such as Docker, Rkt or Singularity) and how to deploy GPU applications in your cluster with containers orchestrators (such as Kubernetes or Mesos). We will also share tips on how to tune containers for high-performance applications. This session complements the presentation "S7177 - Using Containers for GPU-Accelerated Applications". Containers technologies are evolving very quickly, your use case might not have been covered in this presentation. 1 Hour Connect with the Experts Jonathan Calmels - Systems Software Engineer, NVIDIA
Felix Abecassis - Systems Software Engineer, NVIDIA
Renaud Gaubert - System Software Engineering Intern, Deep Learning Software Platform, NVIDIA
Add to My Interests
H7109 - Connect with the Experts: Creating Efficient OpenCL Software

In this free-format interactive session, get to meet and interact directly with engineers who build the NVIDIA OpenCL system software. Key focus areas for the session are efficient memory management and performance optimizations, but all topics welcome!

1 Hour Connect with the Experts Karthik Raghavan Ravi - Engineering Manager, OpenCL, NVIDIA
Add to My Interests
H7124 - Connect with the Experts: Deep Learning Applications

Attend this session to get your questions answered on deep learning applications in computer vision, signal processing, natural language processing and others. Learn more about the different types of deep neural networks and algorithms used in various applications. NVIDIA experts can help you choose the right approach for your application and project.

1 Hour Connect with the Experts Dennis Lui - Solutions Architect, NVIDIA
Jeremy Appleyard - Engineer, Tech SW, NVIDIA
Julie Bernauer - -, NVIDIA
Josh Park - Architect, Solutions, NVIDIA
Nathan Luehr - Engineer, Tech SW, NVIDIA
Add to My Interests
H7121 - Connect with the Experts: Deep Learning Basics

Attend this session to get your questions on deep learning basics and concepts answered. NVIDIA experts can help you with the fundamentals and provide guidance on how and when to apply Deep Learning and GPUs to your work. No question is too basic.


1 Hour Connect with the Experts Joohoon Lee - Certified Instructor, NVIDIA
Jonathan Bentz - Solutions Architect, NVIDIA
Slawek Stephniewski - Engineer, Tech SW, NVIDIA
Leo Tam
Larry Brown - -, NVIDIA
Philippe Vandermersch - Engineer, Sys SW, NVIDIA
Simon Layton - Engineer, Tech SW, NVIDIA
Scott Yokim - Engineer, Tech SW, NVIDIA
Jonathan Barker - Architect, NVIDIA
Ben Barsdell
Khairul Kabir, NVIDIA
Natalia Gimelshein - Engineer, Tech SW, NVIDIA
More Times Add to My Interests
H7123 - Connect with the Experts: Deep Learning Deployment (Cloud, Datacenter and Embedded)

Attend this session to get your questions on deep neural network deployment answered. Learn more about deployment platforms such as cloud, datacenters and embedded and merits and limitations of each approach. NVIDIA experts can help you choose the right deployment platform for your application and project.

1 Hour Connect with the Experts Kismat Singh - -, NVIDIA
Sharan Chetlur, NVIDIA
Mostafa Hagog, NVIDIA
Micah Villmow - Engineer, Tech SW, NVIDIA
Dilip Sequeira, NVIDIA
Add to My Interests
H7132 - Connect with the Experts: Deep Technical Dive into NVIDIA GRID

We will be taking a deep dive on both the software and hardware for NVIDIA GRID technology. Maybe you are in the process of implementing GRID technology for your enterprise. Maybe you are just curious. Stop by for a chat.

1 Hour Connect with the Experts Chenghuan Jia - Software Architect, NVIDIA
Jeremy Main - Lead Solution Architect - GRID, NVIDIA
Andrew Currid - System Architect, NVIDIA
Jared Cowart - Sr. Solutions Architect, NVIDIA
Add to My Interests
H7122 - Connect with the Experts: Frameworks for Training Deep Neural Networks

Attend this session to get you questions on deep learning frameworks answered. Learn more about widely used Deep Learning Frameworks such as Caffe, Theano, Torch, TensorFlow, CNTK, and MXNet and let NVIDIA experts can help you with choosing the right framework for your research or project.

1 Hour Connect with the Experts Luke Yeager - Engineer, Sys SW, NVIDIA
Deyu Fu - Engineer, Tech SW, NVIDIA
Boris Ginsburg - Deep Learning Engineer, NVIDIA
Khairul Kabir, NVIDIA
Allison Gray - Solutions Architect, NVIDIA
Sharan Chetlur, NVIDIA
Michael O'Connor - Senior Engineering Manager, Deep Learning, NVIDIA
Cliff Woolley - Director, Developer Technology Engineering, NVIDIA
Ryan Olson - Architect, Solutions, NVIDIA
Sergei Nikolaev - Engineer, Tech SW, NVIDIA
Kevin Vincent
More Times Add to My Interests
H7118 - Connect with the Experts: Go To Market Strategy (FOR INCEPTION PROGRAM PARTNERS)

This is one of three Connect with the Experts sessions created exclusively for the members of our AI and deep learning startup program, Inception, and will focus on how to go-to-market. We will be focusing on your startup's go-to-market strategy and how you can scale out. Speak with experts on how to optimize your vertical strategy and leverage partnerships. Enhance your marketing efforts and perfect your business models.

1 Hour Connect with the Experts Jeremy Barnish - Worldwide Field Ops, NVIDIA
Lasandra Brill - Head of Digital Planning and Analytics, NVIDIA
Lisa Lahde - AI and Deep Marketing Lead, NVIDIA
Izumi Barker - Healthcare Campaign Marketing Manager, NVIDIA
Laura Fay - Vice President Enterprise Marketing, Corporate Communications, Global Events, NVIDIA
Add to My Interests
H7133 - Connect with the Experts: How Many Users Per Host with NVIDIA GRID

Learn about depending on the user profile, how many hosts can be supported with NVIDIA GRID. Learn from engineers what the best options are for your business. 

1 Hour Connect with the Experts Luke Wignall - GRID Performance Engineer, NVIDIA
Add to My Interests
H7137 - Connect with the Experts: HPC Visualization in Virtual Reality

Attend this special session and get a first glimpse of scientific data in ParaView being exported to a scene composed with the Unreal Engine. We will discuss the workflow that shows how geometry generated in a running ParaView session can be uploaded to a running pre-composed Unreal VR scene on-the-fly. You can now enjoy the immersive experience from consumer VR while taking advantage of easy-to-create high-quality graphical environment for scientific data.

3-Hour Connect with the Experts Kees van Kooten - Scientific Visualization Software Engineer, NVIDIA
Add to My Interests
H7110 - Connect with the Experts: Jetson Developer Kit and Software Development

Connect with the experts to learn about the Jetson developer kit and processor module. Experts will be on-hand to discuss the Jetson platform and answer your questions. NVIDIA Jetson with GPU-accelerated parallel processing is the world's leading embedded visual computing platform. It features high-performance, low-energy computing for deep learning and computer vision making the Jetson platform ideal for compute-intensive embedded projects like drones, autonomous robotic systems, mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, makers and hobbyists can use the NVIDIA Jetson Developer Kit and module to explore the future of embedded computing.

1 Hour Connect with the Experts Eric Brower - Director, Tegra Linux Platform Software, NVIDIA
Karan Jhavar - -, NVIDIA
Philip Lawrence - Program Manager, NVIDIA
Frank Chen
Saurabh Maniktala
Nathan Lord
Rohit Vaswani - Engineer, Sys SW, NVIDIA
Winnie Hsu - -, NVIDIA
Daniel Horowitz - -, NVIDIA
Vincent Nguyen Quang Do - Architect, Solutions, NVIDIA
James Jeun - Sr. Product Manager, NVIDIA
Lynette Farinas - -, NVIDIA
Andrey Trachenko - -, NVIDIA
Ying Zhou - Engineer, Sys SW, NVIDIA
Sean Pieper - -, NVIDIA
David Wang - Engineer, Sr.Sys SW, NVIDIA
Sebastien Domine - VP Software Engineering, Developer Tools , NVIDIA
More Times Add to My Interests
H7111 - Connect with the Experts: MDL Ask questions about MDL, or the NVIDIA vMaterial library. Exchange ideas or just give feedback. 1 Hour Connect with the Experts Daniela Flamm Jackson - Technical Product Mktg, NVIDIA
Tom-Michael Thamm - -, NVIDIA
Jan Jordan - Software product manager MDL, NVIDIA
Add to My Interests
H7108 - Connect with the Experts: Mental Ray and Iray Rendering Workflows

Come discuss rendering workflows with the experts on NVIDIA Mental Ray and NVIDIA Iray. NVIDIA Mental Ray rendering software generates images of outstanding quality and unsurpassed realism. It combines physically based light simulation with full programmability to let you create any imaginable visual effect. NVIDIA Iray is a highly interactive and intuitive physically based rendering technology that generates photorealistic imagery by simulating the physical behavior of light and materials. It's a highly predictive approach that marries with the scalable, world-class performance across NVIDIA GPUs to give constant feedback and rapid results.

1 Hour Connect with the Experts Barton Gawboy - Product Designer, NVIDIA mental ray for Maya , NVIDIA
Dave Coldron - Product Director, Lightwork Design Ltd.
Peter de Lappe - Product Manager, NVIDIA mental ray, NVIDIA
Jay Axe - Technical Product Manager, NVIDIA Corporation
Add to My Interests
H7117 - Connect with the Experts: Moving from Machine Learning to Deep Learning (For Inception Program Partners)

This is one of three Connect with the Experts sessions created exclusively for the members of our AI and deep learning startup program, Inception, and will focus on how you can switch from machine learning to deep learning. Speak with experts on how to use GPUs on the edge. This is your chance to show and tell us what you are working and work with experts about pushing your company further in the deep learning space. Learn more how the Deep Learning Institute can help too.

1 Hour Connect with the Experts Lynette Farinas - -, NVIDIA
Mark Ebersole - -, NVIDIA
Add to My Interests
H7119 - Connect with the Experts: Multi-GPU Programming Wondering how to scale your code to multiple GPUs in a node or cluster? Having the need to discuss some CUDA-aware MPI details? Interested in knowing more about the new entry into GPUDirect Technologies: GPUDirect Async? This is a right session for you to ask your beginner or expert questions on Multi-GPU Programming, GPUDirect, NCCL, NVSHMEM and MPI. 1 Hour Connect with the Experts Jiri Kraus - Senior Devtech Compute, NVIDIA
Sreeram Potluri - Senior Software Engineer, NVIDIA
Add to My Interests
H7135 - Connect with the Experts: NVIDIA Data Center Tools

Attendees will learn the latest about the NVIDIA Data Center Tools, including Data Center GPU Manager (DCGM), NVIDIA Validation Suite (NVVS), NVIDIA Management Library (NVML), and new tools to verify system health.

1 Hour Connect with the Experts Brent Stolle - Software Engineer, NVIDIA
Scott McMillan - Software Architect, NVIDIA
Add to My Interests
H7128 - Connect with the Experts: NVIDIA Deep Learning Institute

Certified instructors from the NVIDIA Deep Learning Institute (DLI) will share how developers, data scientists, and researchers can access hands-on technical training from NVIDIA to solve challenging problems with deep learning. This session will cover everything you need to know about DLI, including which labs are offered, how to access labs online, how to find a workshop near you, and more. Plus, our experts are available to answer your technical questions about deep learning for Automotive, Healthcare, Finance, and other important industries.

1 Hour Connect with the Experts Ryan Olson - Architect, Solutions, NVIDIA
Kelvin Lwin - Senior Deep Learning Institute Instructor, NVIDIA
Jonathan Bentz - Solutions Architect, NVIDIA
Add to My Interests
H7130 - Connect with the Experts: NVIDIA GPUDirect Technologies on Mellanox Network Interconnects

NVIDIA GPUDirect family of technology is meant to accelerate data exchange in GPU accelerated applications. Mellanox is a leader in high-performance, scalable, low-latency network interconnects for both InfiniBand and Ethernet. Since 2013, Mellanox works with NVIDIA to enable GPUDirect support, with large scale deployments in HPC and Artificial Intelligence. During this session, we will briefly discuss the state of the art capabilities of GPUDirect RDMA and GPUDirect Async, while devoting most of the time to a Q&A session with users.

1 Hour Connect with the Experts Davide Rossetti - Senior Software Engineer, NVIDIA
Gil Bloch - Principal Architect, Mellanox
Scot Schultz - Sr. Director, HPC/Artificial Intelligence & Technical Computing, Mellanox
Add to My Interests
H7134 - Connect with the Experts: NVIDIA GRID Archictects

Speak with NVIDIA engineers and architects to answer your datacenter questions. This is the best place to get your queries concerning visualization answered, from user-level to developer-level. Learn how to achieve GPU-accelerated graphics while maintaining security and get your questions answered as to the best methods implementing NVIDIA GRID for your enterprise.

1 Hour Connect with the Experts Luke Wignall - GRID Performance Engineer, NVIDIA
Jeff Weiss - Director, West Territory SAs, NVIDIA
Add to My Interests
H7112 - Connect with the Experts: NVIDIA Video and Capture SDK

Join this Connect with the Experts session for answering any specific questions and understand feedback from customers of NVIDIA Video SDK and NVIDIA Capture SDK.

1 Hour Connect with the Experts Abhijit Patait - Director, Multimedia System Software, NVIDIA
Ganapathy Raman Kasi
Add to My Interests
H7103 - Connect with the Experts: OpenACC: Start with GPUs and Optimize Your Code

This session is designed for anyone who is either looking to start with GPUs or already accelerating their code with OpenACC on GPUs or CPUs. Join OpenACC experts and your fellow OpenACC developers to get an expert advice, discuss your code and learn how OpenACC Directives are used by others.

1 Hour Connect with the Experts Jeff Larkin - DevTech Software Engineer, NVIDIA
Andreas Herten - Post-Doctoral Researcher GPU in HPC, Julich Supercomputing Centre, Forschungszentrum Julich
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Filippo Spiga - Head of Research Software Engineering, University of Cambridge
Stephane Chauveau - Engineer, Tech SW, NVIDIA
Carl Ponder, NVIDIA
More Times Add to My Interests
H7115 - Connect with the Experts: OpenGL and CUDA

Come by to ask questions on OpenGL and CUDA.

1 Hour Connect with the Experts Chris Hebert, NVIDIA
Kyrylo Perelygin - Senior Systems Software Engineer, NVIDIA
Sebastian Jodlowski - System Software Engineer, NVIDIA
Add to My Interests
H7102 - Connect with the Experts: Programming at Scale Discover the forthcoming features and techniques for harvesting parallelism on large-scale systems. Bigger, better, faster. Scaling up and out can help get us there. NVIDIA platforms have unique features that offer better power efficiency, easier access, higher bandwidth and lower communication latency than our competition. But how do we write, refactor and optimize codes to get to scale? NVIDIA offer a vision that encompasses work distribution, efficient data communication, and ease of programming. It covers asynchronous task parallelism, pushing control and communication down to where the data is, and selective exercise of control over how work and data are mapped to the underlying platform. Expect an informative and engaging discussion about how NVIDIA tech applies to your work. 1 Hour Connect with the Experts CJ Newburn - Principal HPC Architect for Compute SW, NVIDIA
Add to My Interests
H7107 - Connect with the Experts: VR: GL, DX & VK

Come talk to us about anything VR related. We invite you to discuss anything from efficient rendering over multi-GPU rendering to the newest hardware features.

1 Hour Connect with the Experts Ingo Esser - Senior Developer Technology Engineer, NVIDIA
Christoph Kubisch - Sr. Developer Technology Engineer, NVIDIA
Patrick Mours - DevTech Engineer, NVIDIA
Robert Menzel - DevTech Engineer, NVIDIA
Add to My Interests
H7127 - Connect with the Experts: VRWorks Tools

Come meet experts from the NVIDIA Software, Developer technology and tech marketing team to learn how to use DesignWorks, VRWorks and GameWorks to improve your VR experience

1 Hour Connect with the Experts Vincent Brisebois - Senior Technical Marketing Manager, NVIDIA
Manuel Kraemer - Sr. DevTech Engineer, NVIDIA
Rochelle Pereira - Senior Software Engineering Manager, NVIDIA
Edward Liu - Sr. Developer Technology Engineer, NVIDIA
More Times Add to My Interests
H7104 - Connect with the Experts: Vulkan, OpenGL, Graphics pipeline

Opened to discuss anything around realtime rendering using OpenGL API or Vulkan API; discuss about NVIDIA-specific features that could be of any interest to speed-up the rendering process

1 Hour Connect with the Experts Chris Hebert, NVIDIA
Christoph Kubisch - Sr. Developer Technology Engineer, NVIDIA
Add to My Interests
S7294 - Controlling Hundreds of GPU-Powered Plasma-Physics Simulations with Machine Learning Algorithms Better hardware and algorithms have made plasma-physics particle-in-cell codes much faster. Instead of running individual simulations, it's now common to explore the space of physical parameters with large sets of simulations. However, predefined regularly spaced parameter scans can be inefficient and expensive. Instead, we use an adaptive algorithm that learns from previous simulations and determines the most promising parameters to try next. We illustrate this method on the problem of electron injection in laser-wakefield acceleration. Using hundreds of GPU-powered simulations with the code FBPIC on the Titan cluster at ORNL, the algorithm quickly focuses on the most relevant regions of the explored parameter space. 25-minute Talk Remi Lehe - Postdoctoral Fellow, Lawrence Berkeley National Laboratory
Add to My Interests
S7605 - Convolutional Neural Networks for Modeling Temporal Biomarkers and Disease Predictions

Lab values and biomarkers are often irregularly and asynchronously measured, making them difficult to use in predictive modeling. However, temporal trends can still be recovered from these measurements and are important for predicting disease onsets. We'll present a novel model of high-dimensional temporal input and high-dimensional output. Our model is composed of two convolutional neural network components. The first component is an efficient convolution-based formulation of multivariate kernel regression, which allows us to estimate each biomarker at each time point from the rest of the biomarker time series. The second component is a multi-resolution, multi-task convolutional neural network that recovers temporal trends most predictive of up to 170 diseases. We'll show how this multi-task formulation allows us to retain the correlation structure among the diseases throughout the training. Our experiments on data from 298K individuals over 8 years, up to 100 common lab measurements, and 171 diseases show that the temporal signatures learned via convolution are significantly more predictive than baselines commonly used for early disease diagnosis.

25-minute Talk Narges Razavian - Assistant Professor, New York University Langone Medical Center
Add to My Interests
S7440 - Create High-Quality Materials from Scans with MDL and Substance A worldwide leader for procedural texturing in the gaming industry with its Substance technology, Allegorithmic has partnered with NVIDIA to release Substance Designer 5.5, the first MDL visual editor to efficiently author material and transport the material definition across all supporting software. We'll present a full customer workflow, from high-resolution image scanning to actual MDL-defined material that could serve as reference, similarly to those available through Substance Source. We'll demonstrate customer use cases and present results (at GTC 2016 we showcased Hyundai and Harley-Davidson) with a live demo of Substance solutions with NVIDIA Iray rendering on an NVIDIA VCA cluster, as well as an update on new features of Substance Designer 6.0 released in February 2017. 50-minute Talk Pierre Maheut - Product Manager, Allegorithmic
Jerome Derel - Chief Product Officer, Allegorithmic
Add to My Interests
S7552 - Creating & Exploring Enterprise VR Content

Enterprise Virtual Reality offers the promise of accelerating and disrupting traditional design and modeling workflows. By working in VR, architects, designers, and artists can experience their data at life-scale; and they can collaboratively explore design options in a shared virtual environment. But for enterprise VR experiences to become pervasive, easy to use content creation and exploration tools are required. In this presentation, we will discuss challenges and solutions for creating and exploring enterprise VR content.

25-minute Talk David Weinstein - Director Pro VR, NVIDIA
Add to My Interests
S7823 - Crowdsourcing 3D Semantic Maps for Vehicle Cognition

Extracting context from the vehicle's environment remains one of the major challenges to autonomy. While this can be achieved in highly controlled scenarios today, scalable solutions are not yet deployed. In this talk we explore the crucial role of 3D semantic maps in providing cognition to autonomous vehicles. We will look at how Civil Maps uses swarm methods to rapidly crowdsource these maps, and how they are utilized by automotive systems in real time.



25-minute Talk Scott Harvey - Senior Machine Vision Engineer, Civil Maps
Andy Chen - Head of Global Partners & GM, Asia, Civil Maps
Fabien Chraim - VP of Research and Development, Civil Maps
Add to My Interests
S7132 - CUDA 9 and Beyond

CUDA is NVIDIA's parallel computing platform and programming model. You'll learn about new programming model enhancements and performance improvements in the latest release of CUDA; preview upcoming GPU programming technology; and gain insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

50-minute Talk Mark Harris - Chief Technologist, GPU Computing Software, NVIDIA
Add to My Interests
SE7142 - CUDA Developer Tools Round Table

This session will be gathering major CUDA Developer Tools vendors, including NVIDIA and PGI to share their latest feature development. In addition, each vendor will share with the audience what they believe are the major application development challenges and solutions they might be working on to tackle these. Each panelist will have a short presentation and/or demo of their latest feature set or illustrate their focus on the type of development problems they feel are being tackled. The panelist will come from HPC, Workstation and Embedded business verticals, such that the audience can appreciate where CUDA is present, what type challenges might be specific to one platform versus another, but also be exposed to common development patterns as there might be convergence of hardware system topologies. The moderator will then bring up a variety of topics of discussions meant to steer participation from the panels as to why such problem is or isn’t solved, how developer can successfully develop on systems with such and such limitation, debate the convergence of HPC and Embedded systems and the inadequacy of the developer tools for certain type of applications – and agree to disagree what is missing. The audience will be probed and the moderator will use techniques to engage with the audience by taking surveys/show of hands to validate some statements made by the panelists, opening for questions and comments from developer themselves.



2-Hour Special Event Rafael Campana - Senior Engineering Manager, Developer Tools, NVIDIA
David Lecomber - Senior Director, HPC Tools, ARM
Sheridan Ethier - Director Engineering, Middleware and Verticals, QNX
Ken Jackson - SVP, Real-Time and Linux, Concurrent
Allen Malony - Professor, University of Oregon
Annemarie Southwell - PGI Software Engineering Manager, NVIDIA
Martin Bakal - Product Manager, Rogue Wave Software
Sebastien Domine - VP Software Engineering, Developer Tools , NVIDIA
Add to My Interests
S7122 - CUDA Optimization Tips, Tricks and Techniques

Optimizing your code can be one of the most challenging tasks in GPU programming, but also one of the most rewarding: the performance difference between an initial version and well-tuned code can be a factor of 10 or more. Some optimizations can be quite straightforward while others require care and deep understanding of how the code is executing. A particular focus will be on optimization of the CPU part of your code, which is frequently overlooked even though it is often easier to tune and just as effective. Sometimes the biggest obstacle is just knowing what to look for, so we'll cover a range of techniques that everyone from beginners to CUDA ninjas might not have thought of before.

50-minute Talk Stephen Jones - Principal Software Engineer, NVIDIA
Add to My Interests
L7108 - CUDA Programming in Python with Numba

In this lab, we'll teach you how to do GPU-accelerated numerical computing from Python using the Numba compiler. Numba is an open source compiler that can translate Python functions for execution on the GPU, all without having to write any C or C++ code. Numba's just-in-time compilation ability makes it easy to interactively experiment with GPU computing in the Jupyter notebook. We'll teach you techniques for both automatically parallelizing certain kinds of array functions, as well as how to create and launch CUDA kernels entirely from Python. At the end of the lab, we'll demonstrate how Numba can be combined with Dask for distributed computing on a GPU cluster. Prerequisites: Familiarity with CUDA, Python and NumPy This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Stanley Seibert - Director of Community Innovation, Continuum Analytics
Siu Kwan Lam - Software Developer, Continuum Analytics
Add to My Interests
S7127 - cuMF_sgd: Fast and Scalable Matrix Factorization on GPUs Matrix factorization (MF) has been widely used in recommender systems, topic modeling, word embedding, and more. Stochastic gradient descent (SGD) for MF is memory bound. Meanwhile, single-node CPU systems with caching performs well only for small datasets. Distributed systems have higher aggregated memory bandwidth but suffer from relatively slow network connections. This observation inspires us to accelerate MF by utilizing GPUs's high memory bandwidth and fast intra-node connection. We present cuMF_SGD, a CUDA-based SGD solution for large-scale MF problems. On a single CPU, we design two workload schedule schemes, i.e., batch-Hogwild! and wavefront-update, that fully exploit the massive amount of cores. batch-Hogwild! as a vectorized version of Hogwild! especially overcomes the issue of memory discontinuity. On three datasets with only one Maxwell or Pascal GPU, cuMF_SGD runs 3.1 to 28.2x as fast compared with state-of-art CPU solutions on 1 to 64 CPU nodes. 25-minute Talk Wei Tan - Research Staff Member, IBM T. J. Watson Research Center
Add to My Interests
S7255 - cuTT: A High-Performance Tensor Transpose Library for GPUs We'll introduce cuTT, a tensor transpose library for GPUs that on average achieves over 70% of the attainable memory bandwidth, independent of tensor rank. Tensor transposing is important in many applications such as multi-dimensional Fast Fourier Transforms and deep learning, and in quantum chemistry calculations. Until now, no runtime library existed that fully utilized the remarkable memory bandwidth of GPUs and could perform well independent of tensor rank. We'll describe two transpose algorithms, "Tiled" and "Packed," which achieve high-memory bandwidth in most use cases, as well as their variations that take care of many important corner cases. We'll also discuss a heuristic method based on GPU performance modeling that helps cuTT choose the optimal algorithm for the particular use case. Finally, we'll present benchmarks for tensor ranks 2 to 12 and show that cuTT, a fully runtime library, performs as well as an approach based on code generation. 25-minute Talk Antti-Pekka Hynninen - Developer Technology Engineer, NVIDIA
Add to My Interests
S7452 - Cutting Edge OptiX Ray Tracing Techniques for Visualization of Biomolecular and Cellular Simulations in VMD We'll present the latest advances in the use of NVIDIA Optix for high-fidelity rendering of state-of-the-art biomolecular and cellular simulations. We'll present the latest technical advances in the OptiX-based ray -racing engines in VMD, which are heavily used for both interactive progressive ray-tracing (local and remote), and for batch mode in-situ or post-hoc visualization of petascale molecular dynamics simulations. 25-minute Talk John Stone - Senior Research Programmer, University of Illinois Urbana-Champaign
Add to My Interests
S7401 - Daino: A High-level Framework for Parallel and Efficient AMR on GPUs We'll present a high-level framework for producing parallel and efficient adaptive mesh refinement code on GPU-accelerated supercomputers. AMR methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult, considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated supercomputers. We'll present a compiler-based, high-level framework that can automatically transform serial uniform mesh code annotated by the user into parallel adaptive mesh code optimized for GPU-accelerated supercomputers. We show experimental results on three production applications. The speedups of code generated by our framework are comparable to hand-written AMR code while achieving good strong and weak scaling up to 3,640 GPUs. 25-minute Talk Mohamed Wahib - Postdoctoral Researcher, RIKEN Advanced Institute for Computational Science
Add to My Interests
S7577 - Data Science Bowl Lung Challenge

Deep learning is currently overhauling the field of medical image analysis and computer-aided diagnosis. Recent results in various areas show that deep networks that analyze the contents of medical images, trained with large amounts of data, obtain results close to or better than human experts for diagnostic tasks in radiology, pathology, ophthalmology, and dermatology. One particular area is the analysis of chest computed tomography (CT) scans. This is of particular interest because screening with low-dose CT for lung cancer is currently being implemented on a large scale in the Unitied States and other countries, after large studies have shown that this is the most promising strategy to reduce the number of deaths due to lung cancer, by far the largest cancer killer. Screening for lung cancer will produce many millions of CT scans that under current guidelines would have to be analyzed by radiologists. Automation could streamline and improve that process, and reduce the high costs associated with screening. We'll show the background of CT image analysis, explain how clinical experts read CT scans following the current guidelines, and show results from deep learning, in particular

25-minute Talk Bram van Ginneken - Professor of Medical Image Analysis, Radboud University Medical Center
Add to My Interests
S7693 - Data Science Bowl to Improve Lung Cancer Screening The Data Science Bowl (DSB), sponsored by Booz Allen Hamilton and Kaggle, is the premier data science for social good competition, catalyzing the world's data science community. DSB 2017, organized in collaboration with the National Cancer Institute, seeks to improve on the accuracy of low dose computed tomography, currently the best method for lung cancer screening. Teams competed to develop open-source algorithms using artificial intelligence techniques to reduce false positives. Hear about the research leading up to DSB 2017 and about the top placing teams' prize-winning algorithms ($1M prize purse provided by the Laura and John Arnold Foundation). 80-minute Tutorial Keyvan Farahani - Program Director, National Cancer Institute
Elias Vansteenkiste - Post-doctoral researcher, Ghent University
William Cukierski - Head of Competitions, Kaggle
Mark-Jan Harte - CEO, Aidence
Anna Fernandez - Health Informatics/Precision Medicine Lead, Booz Allen
Leon Chen - CEO,
Eric Syphard - Chief Technologist, Booz Allen
Add to My Interests
S7752 - Deep Dive on DGX Deep Learning Frameworks: Engineered for Performance

Data science practitioners can find themselves investing significant effort in tuning popular open source distributions to improve deep learning performance. NVIDIA engineering teams bring extensive skills and expertise in improving today's popular deep learning frameworks for maximized performance on NVIDIA DGX systems. Attend this session to learn: (1) the genesis for NVIDIA's unique, integrated software stack built on NVDocker container technology, (2) how NVIDIA engineering optimizes deep learning frameworks for I/O data path performance, along with integration with cuDNN and cuBLAS, and how multi-GPU scale and performance is maximized with NCCL, and (3) why DGX users can quickly deploy a system, and expect a seamless, streamlined out of the box experience.

50-minute Talk Michael O'Connor - Senior Engineering Manager, Deep Learning, NVIDIA
Michael Houston - Sr. Distinguished Engineer, NVIDIA
Add to My Interests
S7680 - Deep Incremental Scene Understanding

We'll demonstrate recent advances in the field of deep learning and computer vision aimed at scene understanding from images. We'll present two research works on this subject. The first one relates to the use of deep learning for monocular simultaneous localization and mapping (SLAM) and semantic segmentation. The outcome is a technique able to carry out accurate real-time semantic mapping and 3D reconstruction from a single RGB camera. Since in many computer vision problems a single prediction cannot express the uncertainty or ambiguity that is given in a scene, the second research work that we'll present employs deep learning for solving ambiguous prediction problems. Finally, we'll demonstrate how the two approaches can be merged together to enable robust extraction of 3D semantic information such as pixel-wise labeling and object detection in real time by means of a simple webcam.

25-minute Talk Federico Tombari - Senior Research Scientist, Technical University of Munich (TUM)
Christian Rupprecht - Graduate Student, Technical University of Munich (TUM)
Add to My Interests
S7549 - Deep Learning Acceleration of Progress toward Delivery of Fusion Energy

Expediting delivery of fusion power -- identified by the 2015 CNN "Moonshots for the 21st Century" series as one of six grand challenges for the modern world -- can be enabled by engaging big-data-driven machine/deep learning predictive methods. Princeton's associated project has access to over a half-petabyte of the EUROFUSION/JET disruption database, and it's new FRNN (Fusion Recurrent Neural Net) code exhibits excellent scaling to nearly 200 GPUs. We'll target extending this exciting trend on NVIDIA's powerful SATURN V to its nearly 1,000 GPUs (124 nodes with eight Pascal P100 GPUs per node) in time for presentation at GTC 2017.

50-minute Talk William Tang - Principal Research Physicist, Princeton University
Add to My Interests
S7844 - Deep Learning: An Artificial Brain That Detects Any Type of Cyber Threat

Join our presentation on the first application of deep learning to cybersecurity. Deep learning is inspired by the brain's ability to learn: once a brain learns to identify an object, its identification becomes second nature. Similarly, as a deep learning-based artificial brain learns to detect any type of cyber threat, its prediction capabilities become instinctive. As a result, the most evasive and unknown cyber-attacks are immediately detected and prevented. We'll cover the evolution of artificial intelligence, from old rule-based systems to conventional machine learning models until current state-of-the-art deep learning models. 

25-minute Talk Eli David - CTO, Deep Instinct
Add to My Interests
S7554 - Deep Learning Application Development on Multi-GPU/ Multi-Node Environment

We'll show a brief overview of our deep learning applications such as image recognition and taxi demand forecasts and how we have accelerated our development using NVIDIA Docker, the NVIDIA DGX-1 AI supercomputer, and tens of GPU servers. As deep learning applications become widespread, it becomes more essential for engineers to quickly adapt deep learning to new data and to efficiently seek optimal configurations. To improve the development speed by engineers on the shared GPU resources, we developed a job management system, which provides the separated learning environment for each engineer using NVIDIA Docker and queuing functions on the multi-GPU/multi-node system. This system helps us improve our productivity and create more sophisticated solutions to offer better services.

25-minute Talk Toshiki Sakai - Data Scientist, NTT DOCOMO, INC.
Add to My Interests
S7582 - Deep Learning Applications Across the Breadth of GEOINT

This presentation will cover research and development Harris has performed in the application of deep learning to key challenges in the geospatial intelligence community.  Because the application of deep learning to 2D overhead imagery has provided a proven baseline approach to object detection and ID,  Harris has been able to expand deep learning applications to other GEOINT datasets in order to make synergistic decisions.  Tested data sources include video from ground based and drone sensors, 3D point clouds from LiDAR sensors and derived from stereo pairs of imagery, high temporal resolution data from small sat providers.  A discussion of combining multiple streams of data to add value in decision making will follow data examples.

50-minute Talk David Gorodetzky - Research Lead / Remote Sensing and Machine Learning, Harris Corporation
William Rorrer - Program Manager, Harris
Add to My Interests
S7210 - Deep Learning Applications for Embedded Avionics on the Jetson Platform

We'll discuss the uses and tradeoffs of semantic segmentation and detection networks when deployed on the Jetson TX1. There is significant research into deep learning semantic segmentation and detection networks since these can both detect and localize numerous objects within the image. We use FCN ( as an example of a semantic segmentation network, and the DIGITS DetectNet as an example of a detection network. These networks require significant computing resources for inferencing, and within embedded avionics applications we wish to provide the best tradeoff of performance-per-watt by leveraging these networks on the Jetson TX1. We'll explore characteristics of these deep learning networks, how these deep learning capabilities can be utilized on the Jetson TX1 platform, and characterize their runtime performance on the Jetson TX1 compared to larger GPU systems.

25-minute Talk Aaron Mosher - Design and Analysis Engineer, The Boeing Company
Add to My Interests
S7378 - Deep Learning Approaches to Timeseries Data

Survey of successful deep learning (DL) applications within several domains featuring continuous streaming data [ time-series ]. Overview of what network architectures have yielded results and why these networks work. Network architectures reviewed included: RNNs (dynamic models and prediction), CNNs (for frequency transformed time series data, i.e., spectrograms), Autoencoders (anomaly detection and unsupervised data-structure visualization), and deep MLPs (sliding window event detection and classification). Example case studies: Industrial { Industrial Robotics, Automotive Telematics, Prognostics/Zero-Down-Time }, IoT { Event & Anomaly Detection, Information Leakage Attacks/Defenses }, Financial { Limit Books, Mortgage Risk Markets}.

25-minute Talk Jeff Weiss - Director, West Territory SAs, NVIDIA
Miro Enev - Solution Architect, Deep Learning, NVIDIA
Add to My Interests
S7437 - Deep Learning-Based Accelerated Analytics for Medical Imaging

Medical Accelerated Analytics includes electronic health records, medical imaging, genomic data, and more. Meanwhile, medical imaging data occupies more than 90 percent among them. How to apply medical big data into clinical practice? This is a question that concerns medical and computational researchers, and deep learning and GPU computing provide an excellent answer for this question. We'll introduce our research of deep learning-based disease diagnosis such as Alzheimer's disease and mild cognitive impairment, and discuss current statuses and approaches of deep learning-based medical Accelerated Analytics.

25-minute Talk Di Zhao - Dr., Chinese Academy of Sciences
Add to My Interests
S7457 - Deep Learning Demystified What is deep learning? In what fields is it useful, and how does it relate to artificial intelligence? Join this session to get a working understanding of deep learning and why this powerful new technology is getting so much attention. Learn how deep neural networks are trained to perform tasks with super-human accuracy, and the challenges organizations face in adopting this new approach. We'll also cover the software, hardware, and training resources that many organizations are using to overcome the challenges and deliver breakthrough results. 50-minute Tutorial Will Ramey - Director, Developer Marketing, NVIDIA
Add to My Interests
S7465 - Deep Learning for 3D Design and Making We'll look at the application of deep learning to design information to provide AI-assisted 3D design as well as AI-assisted robotic assembly during the manufacturing process. Autodesk is working on facilitating a more efficient and open design-manufacture-use cycle using intelligent sensors, data aggregation, and deep learning. We'll discuss the DeepForm project for generating novel 3D forms as well as an intelligent robotic assembly project for making industrial robotic assembly a closed loop, general-purpose solution that is amenable to environmental and design changes. 50-minute Talk Yotto Koga - Software Architect, Autodesk, Inc.
Massimiliano Meneghin - Principal Research Scientist, Autodesk, Inc.
Add to My Interests
S7732 - Deep Learning for Condition Assessment of Civil Infrastructure Systems We'll present the use of deep learning for autonomous condition assessment of civil infrastructure systems. Regular inspection of civil infrastructure systems is crucial for safe operations. Manual inspection is currently the predominant method of inspection and is time-consuming, tedious, and subjective. A less time-consuming and inexpensive alternative is the use of optical instrumentation (for example, digital cameras), where the feasibility of using image processing techniques to detect deterioration in structures has been acknowledged by leading experts in the field. Due to the recent advances in using CNNs, the vision-based classification performance of computers has been improved significantly. A CNN learns the appropriate classification features that in traditional algorithms were hand-engineered. Eliminating the need for dependence on prior knowledge and human effort in designing features is a major advantage of CNNs. We'll discuss CNN-based approaches for condition assessment of infrastructure systems, including a new framework that combines deep convolutional neural network and Naive Bayes classifier to detect cracks in videos. The crack patches are spatially and temporally clustered and the posterior probabilities of being real cracks are derived. Experimental tests have been carried out to evaluate the performance of the proposed system. 25-minute Talk Mohammad Jahanshahi - Assistant Professor, Purdue Univeristy
Add to My Interests
L7147 - Deep Learning for Genomics using DragoNN with Keras and Theano (Presented by NVIDIA Deep Learning Institute)

In this lab, we use the dragonn toolkit on simulated and real regulatory genomic data, demystify popular DragoNN (Deep RegulAtory GenOmics Neural Network) architectures and provide guidelines for modeling and interpreting regulatory sequence using DragoNN models. We will answer questions such as When is a DragoNN good choice for a learning problem in genomics? How does one design a high-performance model? And more importantly, can we interpret these models to discover predictive genome sequence patterns to gain new biological insights?

120 Instructor-Led Lab Charles Killam - Curriculum Designer & Certified Instructor, NVIDIA
Johnny Israeli - Biophysics PhD Candidate & SIGF Bio-X Fellow, Stanford University
Add to My Interests
L7126 - Deep Learning for Image and Video Captioning (Presented by NVIDIA Deep Learning Institute) Effective descriptions of content within images and video clips has been performed with convolutional and recurrent neural networks. Attendees will apply a deep learning technique via a framework to create captions on data and generate their own captions. Prerequisite: Familiarity with deep learning and a framework. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 240 Instructor-Led Lab Allison Gray - Solutions Architect, NVIDIA
Add to My Interests
S7711 - Deep Learning for Long-Term Value Investing

We'll introduce the work being done at Quantenstein GmbH, a joint venture between Swiss AI startup NNAISENSE and Acatis Investment, that harnesses the latest advances in deep learning to automatically build custom portfolios for long-term value investing based on company fundamentals. The efficient GPU implementation of deep learning architectures and distributed computation are essential to Quantenstein's mission, enabling the testing of financial models in a walk-forward fashion, where retraining the entire system can be done monthly. We'll introduce the trading framework, learning process, principles guiding our design decisions, and show how deep learning and GPU computing make it possible to learn everything end-to-end, taking the human out of the loop.

25-minute Talk Jonathan Masci - General Manager , Quantenstein GmbH
Add to My Interests
L7135 - Deep Learning for Medical Image Analysis using R and MXNet (Presented by NVIDIA Deep Learning Institute)

Convolutional neural networks (CNNs) have proven to be just as effective in visual recognition tasks involving non-visible image types as regular RGB camera imagery. One important application of these capabilities is medical image analysis, where we wish to detect features indicative of medical conditions and use them to infer patient status. In addition to processing non-visible imagery, such as CT scans and MRI, these applications often require us to process higher dimensionality imagery that may be volumetric and have a temporal component. In this lab you will use the deep learning framework MXNet to train a CNN to infer the volume of the left ventricle of the human heart from a time-series of volumetric MRI data. You will learn how to extend the canonical 2D CNN to be applied to this more complex data and how to directly predict the ventricle volume rather than generating an image classification. In addition to the standard Python API, you will also see how to use MXNet through R, which is an important data science platform in the medical research community. Prerequisites: Basic knowledge of CNNs. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Charles Killam - Curriculum Designer & Certified Instructor, NVIDIA
Abel Brown, NVIDIA
Add to My Interests
S7653 - Deep Learning for Medical Knowledge Extraction from Unstructured Biomedical Text We'll present work in progress on a deep learning system that extracts expert-level knowledge from the published and less formal medical literature. Using a large curated source of 5 million biomedical journal articles, disease encyclopedias such as The Merck Manuals and The Mayo Clinic's Guide to Diseases and Conditions, as well as hospital-based physician reference material, we'll demonstrate that it's possible to infer existing medical concepts such as disease-disease, disease symptom, and disease-drug relationships with an unsupervised deep learning model. We'll extend this model to show that it's capable of answering multiple-choice medical questions that are typically given to medical students as part of the licensing examination. 25-minute Talk Andrew Beam - Postdoctoral Fellow, Harvard Medical School
Add to My Interests
S7587 - Deep Learning for Predictive Maintenance

The talk is dedicated to the machines' failures prediction (Predictive Maintenance - PdM). We'll clearly set the goal, present the methodology, and sketch the estimations on the size of the market, including automotive, oil and gas, chemistry, energy, etc. We'll then present new prediction techniques, including deep learning, as well as a broad performance comparison to the state-of-the-art PdM methods together with an idea of dealing with long-period prediction with DL models. We'll show the gain and its origins in detail. We'll introduce two approaches: centralized PdM system and autonomous predictive maintenance devices. The former is the best option for IIoT-typed problems – where all the monitored devices are constantly connected to the internet – and the latter broadens the range of PdM for devices with or without costly network connections, such as cars, trains, or mining equipment. Within the centralized system, we use NVIDIA Tesla GPUs and for the autonomous devices we use NVIDIA Tegra chipsets, which guarantees us both the energy and the computational efficiency. Finally, we'll present case studies of real, production data and the experience gathered while implementing solutions for our clients.

25-minute Talk Pawel Morkisz - CTO, Reliability Solutions
Mateusz Marzec - CEO, Reliability Solutions Sp. z o.o.
Add to My Interests
S7690 - Deep Learning for Retail Analytics and Reference Data Management

We'll show how state-of-the-art deep learning techniques can be applied to retail analytics. Namely, we'll show how one can retrieve various information about the product, including its category and ingredients, using a mixture of visual and textual information. We'll start with depicting the business scenario and operational needs of such a system, and then move into a technical and in-depth discussion of the underlying deep learning pipeline. The solution is based on an interplay of region-based convolutional neural networks and NLP techniques. This is a joint effort of Nielsen and

25-minute Talk Alessandro Zolla - VP technology - Machine Learning Program Lead, Nielsen
Robert Bogucki - Chief Science Officer,
Add to My Interests
S7701 - Deep Learning for the IoT: Leveraging Representation Learning

Machine learning applications for the Internet of Things (IoT) pose unique challenges and necessitate understanding of large-scale multi-dimensional heterogeneous sensor data at varying granularities. We'll highlight the unique challenges posed by IoT applications especially for deep learning algorithms and we'll present some work on leveraging representation learning in conjunction with deep learning to design successful algorithms for these problems. We'll demonstrate the effectiveness of the proposed approaches on real-world IoT use cases. The proposed deep representation learning models are each trained using an NVIDIA Tesla M40 GPU. Finally, we'll discuss a technology view of deep learning in the context of IoT.

25-minute Talk Mohak Shah - Head of Data Science, Bosch AI Research
Add to My Interests
S7737 - Deep Learning Frameworks with Spark and GPUs

Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel, GPU clusters is fast becoming the default way to quickly develop and train deep learning models. As data science teams and data savvy companies mature, they'll need to invest in both platforms if they intend to leverage both big data and artificial intelligence for competitive advantage. We'll discuss and show in action an examination of TensorflowOnSpark, CaffeOnSpark, DeepLearning4J, IBM's SystemML, and Intel's BigDL and distributed versions of various deep learning frameworks, namely TensorFlow, Caffe, and Torch.

50-minute Talk Subbu Rama - CEO, Bitfusion
Add to My Interests
S7348 - Deep Learning in's Autonomous Vehicles

We'll provide an overview of the models and Ford are using to fuse sensor information, and give examples of the performance optimization. and Ford are leveraging deep learning for autonomous vehicle perception across a multitude of sensors. It is important that these models have optimized performance to process high-resolution images, lidar point clouds, and other sensor inputs in a timely fashion. We will discuss how and Ford are exploring a variety of methods to push the run-time performance to new limits and maximize the use of the resources available, including modifying the underlying models, data structures, and the inference engine itself.


25-minute Talk Bryan Goodman - Staff Engineer, Machine Learning, Ford Motor Company / Argo AI
Add to My Interests
S7360 - Deep Learning in Business Conversation Analysis

Gridspace uses GPU-accelerated deep learning to analyze conversational speech on phone calls. We'll outline our DNN-based approach as well as several commercial applications of call grading. Our GPU-based software stack provides a novel way to process large-scale speech data. Results from a recent case study show call grading to be as accurate as human call grading and highly scalable in production. Deep call analysis with 100% coverage has never been achieved before. Also we'll discuss how this system can be improved by training continuously without expert supervision.

25-minute Talk Anthony Scodary - EVP of Engineering, Co-founder, Gridspace
Wonkyum Lee - S/W Engineer, Gridspace
Add to My Interests
H7126 - Deep Learning Inference with TensorRT

Are you ready to start using Deep Learning to enable features or capabilities in an app or device? 

Do you need more throughput for a DNN in the cloud or lower latency in an embedded device?

Attend to learn about the TensorRT Deep Learning Inference Software. Experts will be standing by to talk about your use case and also to discuss recent developments like: reduced precision inference, user defined custom layers, and recurrent neural network (LSTM/GRU) support.

1 Hour Connect with the Experts Chris Gottbrath - Accelerated Computing Product Manager, NVIDIA
Add to My Interests
S7639 - Deep Learning in Medical Imaging: Opportunities and New Developments

Learn about some of the key opportunities for deep learning in medical imaging, some of the current challenges, and exciting recent developments that are tackling them. We'll begin with a brief overview of medical imaging, current challenges for human observers of these images, and key applications for deep learning for improving image interpretation. We'll follow with descriptions of several specific use cases for deep learning in radiology, pathology, urology, and ophthalmology imaging, including improvements in image diagnosis that are besting state-of-the-art computerized diagnosis algorithms, approaches for visualizing and explaining to physicians what deep networks have learned to improve confidence in using the information they provide to guide decision making, and new, freely available tools to dramatically enhance the efficiency of creating new deep learning models. We'll provide links for more information about tools and information so attendees can try their hand at tackling problems in this exciting domain. Finally, we'll give a live demonstration for a portable deep learning package optimized for medical imaging.

50-minute Talk Darvin Yi - Graduate Student, Stanford University
Daniel Rubin - Associate Professor of Biomedical Data Science, Radiology, Medicine (Biomedical Informatics Research), and by courtesy, Ophthalmology, Stanford University
Add to My Interests
S7222 - Deep Learning in the Connected Kitchen We'll present Innit's work applying deep learning technology to build a platform that powers the connected kitchen of the near future. We've been carrying out pioneering work in the applications of modern computing technology to tackle problems in the food space, with a specific focus on empowering the very personal relationship between people and food. Throughout the food ritual (from planning and shopping to cooking and serving), Innit connects information about food with personal preferences and needs, and delivers actionable information via multiple channels such as mobile apps and embedded user interfaces at home and at the store. Deep learning makes multiple appearances in this process, from the latest in CNN-based object detection and classification, to using CNN features for image retrieval and matching, to advanced sensing in extreme environments such as an operating oven. 25-minute Talk Hristo Bojinov - CTO, Innit, Inc.
Add to My Interests
S7722 - Deep Learning in the Healthcare Enterprise

Deep learning tools present a tremendous opportunity to improve healthcare. By increasing efficiency and accuracy of diagnostic testing, and elevating meaning from vast troves of clinical data, deep learning provides a pathway to true precision care. However, there are challenges in the translation of this technology to the clinic: model performance, infrastructure development, data privacy, hospital policy, and vendor relationships are all critical components to this effort. We'll discuss the early experience of the MGH & BWH Center for Clinical Data Science in supporting the translation of deep learning technologies in medicine, touching upon many of the existing and emerging technical, clinical, and cultural challenges that this work presents.

25-minute Talk Mark Michalski - Executive Director, MGH & Brigham Women's Hospital Center for Clinical Data Science
Add to My Interests
S7849 - Deep Learning Lifecycle: A Better Approach to GPU Scaling (Presented by Dell)

Explore new techniques in developing an end to end application life cycle for deep learning in the Enterprise space. We'll cover numerous use cases, and summarize studies done by Dell EMC and Bitfusion on high performance heterogeneous elastic rack of Dell C4130 with NVIDIA P100 GPU. Some of the use cases that will be talked about in detail will, ability to bring on-demand GPU acceleration beyond the rack across the enterprise with easy attachable elastic GPUs for deep learning development, as well as the creation of a cost effective software defined high performance elastic multi-GPU system combining multiple Dell C4130 servers at runtime for deep learning training.

50-minute Talk Bhavesh Patel - Technical Staff, Dell EMC
Mazhar Memon - CTO,
Add to My Interests
S7157 - Deep Learning Meets Motor Sports at ROBORACE

Self-driving technology meets motorsport with the Roborace series. Learn how the tech is making its way onto the track, experience exciting milestones achieved and discover what to expect in the near future. This session will cover relevant AI technologies in the Robocar and highlight how software is defining the future of the auto industry and motor racing. 

25-minute Talk John Waraniak - Vice President of Vehicle Technology, Specialty Equipment Market Association, SEMA
Bryn Balcombe - CTO, Roborace
Add to My Interests
S7768 - Deep Learning Models for Time Series Data Analysis with Applications to Healthcare

Many emerging applications of big data involve time series data. We'll discuss a collection of deep learning models to effectively analyze and model large-scale time series data. We'll show experiment results to demonstrate the effectiveness of our models in healthcare.

50-minute Talk Yan Liu - Associate Professor, University of Southern California
Add to My Interests
S7420 - Deep Learning of Cancer Images for Precision Medicine

We'll demonstrate a deep learning framework to predict survival of lung cancer patients by using convolutional networks to learn high-dimensional representations of tumor phenotypes from CT images and clinical parameters. We'll evaluate our framework from three independent cohorts with survival data, and show how the addition of clinical data improves performance. Furthermore, we'll describe how image noise can improve the robustness of our model to delineation errors and introduce the concept of priming, which helps improve performance when trained on one cohort and tested on another.

25-minute Talk Mu Zhou - Post-Doc Fellow, Stanford University
Edward Lee
Add to My Interests
S7562 - Deep Learning to Enable Real-Time Gravitational Wave and Multimessenger Astrophysics

The aLIGO Advanced Laser Interferometer Gravitational Observatory went on line last year and very rapidly produced data confirming Einstein's theory of gravitational waves. This discovery and the success of the detection device open the door for another dimension to be added to and combined with other electromagnetic detection devices (telescopes, radio telecopes, etc.) to dramatically increase the potential to understand the workings of deep space and astronomical phenomena at the origins of the universe. The project used data produced by the CACTUS HPC simulation to produce datasets that were used to train a DNN using the MXNet framework. The results were that the prediction accuracy increased over classical waveform analysis and reduced the number of processors from hundreds of CPUs to one GPU, where the prediction was achieved with a latency of 1 millisecond. The work was done on the BlueWaters supercomputer and at the Innovation Lab at NCSA. The reduction in the "pipeline size" (number of CPUs needed to make a detection) and the improved latency open up the potential for multi-messenger astrophysics, where an observation that is "heard" with the gravitational wave detector can be used to steer a detector in the visible or EM spectrum where to look.

25-minute Talk Eliu Huerta - Gravity Group Leader, University of Illinois at Urbana-Champaign
Daniel George - Scientist, University of Illinois at Urbana-Champaign
Add to My Interests
L7104 - Deep Learning Using Microsoft Cognitive Toolkit

This lab will provide hands-on experience with Microsoft's open-source production-grade deep learning Cognitive Toolkit, formerly CNTK. The Cognitive Toolkit is used in several Microsoft products for training and evaluating deep neural networks. The same features are available for everyone outside Microsoft and is supported for both Windows and Linux platforms with Python/C++ API. The Cognitive Toolkit supports feed-forward, convolutional, recurrent networks, and reinforcement learning for speech, vision, and text data, also in combination. The hands-on lab will help you build end-to-end use cases with basic FCN to more advanced CNN, RNN/LSTM and auto-encoders in different domains. You'll also learn how the toolkit leverages multiple GPUs for advanced optimization and run the models on Azure cloud. Attendees need to install the CNTK Binaries on their local machines if they want to have a hands-on experience on their local machines. The instructions can be found here - This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Sayan Pathak - Principal ML Scientist, Microsoft
Add to My Interests
S7520 - DeepLumen: Fast and Accurate Segmentation of Coronary Arteries for Improved Cardiovascular Care

Learn about HeartFlow's unique approach for better diagnosis and treatment of cardiovascular disease. From CT images, HeartFlow creates a complete geometric and physiologic model of the patient's coronary anatomy. Blood flow is simulated using computational fluid dynamics to functionally assess narrowings of the coronary artery. HeartFlow's approach is approved by regulatory bodies and in commercial use around the world today. We'll focus on DeepLumen, the fast and highly accurate method for extracting coronary arteries from a CT scan. It is formulated as a novel 3D rotational CNN that exploits translational and cyclic symmetries. DeepLumen is shown to be at least as accurate as expert radiologists in quantifying disease compared to invasive catheterization measurements.

25-minute Talk Kersten Petersen - Senior Medical Imaging Researcher, HeartFlow
Add to My Interests
L7136 - Deep Multitask Prediction with Digital Health Data

In multitask learning, we aim to improve performance on multiple prediction tasks by solving them simultaneously using models that are related. Neural networks can especially benefit from multitask training in ways that simpler (linear) models cannot. Although multitask neural nets, which were first proposed over 20 years ago, are conceptually simple to design, they can present unexpected challenges. In this lab, we will demonstrate how to build and successfully train multitask neural networks to predict multiple clinical outcomes simultaneously from publicly available digital health data using DeepLearning4J ((DL4J). We will also how to train a similar model using the Keras frontend for TensorFlow and import the resulting model into DL4J for deployment. Prerequisite: Basic knowledge of any programming language. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab David Kale - Deep Learning Engineer, Skymind
Add to My Interests
S7373 - Deep Neural Networks for Non-Equilibrium Molecular Dynamics

Molecular dynamics simulation of matter far from equilibrium presents one possible approach to the discovery of non-equilibrium constitutive relations but are limited to coarse-grained hamiltonians that include electronic effects only implicitly. We'll explore the possibility that deep neural networks -- when trained over the appropriate atomic states -- may provide the hamiltonian for a molecular dynamics simulation, thus providing a sub-grid representation of variables at spatial and temporal scales that cannot otherwise be explicitly resolved. The advent of GPU-accelerated training of deep neural networks, and specifically recent improvements to the CuDNN library, now makes it feasible to handle the large and high dimensional datasets incumbent to such systems. Finally, we'll elucidate a few of the challenges inherent in DNN-coupled dynamics, such as obeying the constraints of momentum and energy conservation.

25-minute Talk Jonathan Belof - Physicist, Lawrence Livermore National Laboratory
Edward W. Lowe Jr. (Will) - Senior Data Scientist, FitNow, Inc
Add to My Interests
S7468 - Deep Packet Inspection Using GPUs In high-speed networks, packet-based network traffic monitoring and analysis applications require a large amount of computing power and high I/O throughputs. These applications face extreme performance and scalability challenges. GPUs have been widely applied to accelerate general-purpose scientific and engineering computing. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. Fermilab network research group's prototype GPU-based network traffic monitoring and analysis system consists of two major components: a lossless packet capture engine that supports 10/40GE commodity NICs, using our WireCAP technology; and a complete set of GPU libraries for network traffic analysis. Our GPU libraries now supports per-packet-based deep inspection analysis. It is anticipated to support per-flow-based deep inspection analysis very shortly. 25-minute Talk Wenji Wu - Principal Network Research Investigator, Fermilab
Add to My Interests
S7563 - Deep Patient: Predict the Medical Future of Patients with Deep Learning

Precision medicine initiatives bring tremendous opportunities to speed up scientific discovery and promote quality improvement in medicine. However, it also raises big challenges in dealing with massive data from heterogeneous sources, such as electronic health records (EHRs), -omics, and wearables. Traditional data mining and statistical learning methods tend to favor clean and structured data, which may not be able to effectively utilize the rich information embedded in biomedical data. The latest breakthrough in deep learning technologies provides a unique opportunity to retrieve information from complex and heterogeneous sources. We'll review advances in deep learning applied to precision medicine and next-generation healthcare, with a special focus on Deep Patient, a general-purpose patient representation from EHRs that facilitates clinical predictive modeling and medical analysis.

50-minute Talk Riccardo Miotto - Research / Data Scientist, Icahn School of Medicine at Mount Sinai, New York
Joel Dudley - Associate Professor, Icahn School of Medicine at Mount Sinai, New York
Add to My Interests
L7110 - Deep Reinforcement Learning Agents on Atari 2600 Games (Presented by NVIDIA Deep Learning Institute)

Learn the basic principles of reinforcement learning and develop a learning agent (Deep Learning Network -- CNN network trained with Q Learning) capable of playing classic Atari games. In this context, the neural network improves through in-game experience so as to choose the next best possible action by interpreting the screen's raw pixels along with the current score (action-value Q learning). At the beginning of the lab, students will be given an "intermediate" agent (trained for ~20 hours) and asked to continue the improvement/training process on NVIDIA-provided GPUs. At the end of the lab, students will be able to play against their best network and take home code that they can use to train agents in other Atari games. Prerequisites: Introductory knowledge of Lua and/or Python. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Jeff Weiss - Director, West Territory SAs, NVIDIA
Eric Harper - Solutions Architect, NVIDIA
Miro Enev - Solution Architect, Deep Learning, NVIDIA
Add to My Interests
L7137 - Deep Reinforcement Learning for Gameplay and Robotics

In this lab, you will learn the basics of Chainer and how to use ChainerRL by training an agent to play text-based games with OpenAI Gym on a Jupyter notebook. ChainerRL contains a set of Chainer implementations of deep reinforcement learning (DRL) algorithms. Following the success of DeepMind's Deep Q-Network (DQN) algorithm on Atari games, DRL has been applied to many tasks from playing Go to robot control. ChainerRL runs on top of Chainer, one of the popular Python-based deep learning frameworks, which enables users to intuitively implement many kinds of models, with a lot of flexibility and comparable performance with GPUs. ChainerRL already includes state-of-the-art DRL algorithms from DQN to DDPG to A3C, so that users can use them on their reinforcement learning applications. Prerequisites: Basic knowledge of Python, deep learning and reinforcement learning. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Shohei Hido - Chief Research Officer, Preferred Networks
Add to My Interests
S7621 - Deep Reinforcement Learning for Robotics Using DIANNE

We'll show how a mobile robot arm can learn to locate and retrieve objects, such as soda cans, using deep reinforcement learning and the DIANNE framework. The robot is equipped with a Jetson TX1 embedded GPU to efficiently process sensory input generated by laser scanners, placed both in the environment and on the robot itself. Deep reinforcement learning allows an intelligent agent to solve complex planning problems with high-dimensional inputs in an efficient and generalisable way. While very promising for the field of robotics, integration of and learning in a physical system is not trivial, and additional simulation is often required to speed up the learning process.

25-minute Talk Sam Leroux - Ph.D. Researcher, Ghent University - imec
Add to My Interests
S7514 - Deep Representation and Reinforcement Learning for Anomaly Detection and Control in Multi-Modal Aerospace Applications We'll discuss how deep auto-encoder (DAE) and deep reinforcement learning (DRL) can be formulated to address multimodal anomaly detection and additive manufacturing control problems in aerospace domain. DAE-based representation learning is constructed by multi-layered neural-net architecture to model complex data non-linearity. We use DAE via NVIDIA GPU implementation for: (1) unsupervised fault disambiguation from big multimodal data, and (2) structural health monitoring (crack detection) from experiment video frames on aerospace material. At the second half of the talk, we show how guided policy search (GPS) based DRL framework can be implemented for optimally planning and generalizing trajectory nozzle dynamics in a wide range of cold spray type of additive manufacturing application. 50-minute Talk Soumalya Sarkar - Senior Research Scientist , United Technologies Research Center
Add to My Interests
S7381 - DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

This talk will introduce DeepTraffic, a deep reinforcement learning competition at MIT that has received over 10,000 submissions and is preparing for its second iteration. It's accessible to both beginners and experts. Whether with Javascript or TensorFlow, the task is to drive faster than anyone else in the world. We will introduce deep reinforcement learning through the case study of motion planning in dense micro-traffic simulation, and describe the emergent behavior achieved through crowdsourced hyper-parameter tuning of policy networks. Go deep, go fast at

50-minute Talk Lex Fridman - Postdoctoral Researcher, Massachusetts Institute of Technology (MIT)
Add to My Interests
S7551 - Deep Unconstrained Gaze Estimation with Synthetic Data Gaze tracking in unconstrained conditions, including inside cars, is challenging where traditional gaze trackers fail. We've developed a CNN-based algorithm for unconstrained, head-pose- and subject-independent gaze tracking, which requires only consumer-quality color images of the eyes to determine gaze direction, and points along the boundary of the eye, pupil, and iris. We'll describe how we successfully trained the CNN with millions of synthetic photorealistic eye images, which we rendered on the NVIDIA GPU for a wide range of head poses, gaze directions, subjects, and illumination conditions. Among appearance-based gaze estimation techniques, our algorithm has best-in-class accuracy. 25-minute Talk Shalini De Mello - Senior Research Scientist, NVIDIA
Add to My Interests
S7588 - Deep Watershed Transform for Instance Segmentation

Learn about the design, training, and analysis of a state-of-the-art, deep learning-based, instance-level segmentation pipeline enabled by NVIDIA DGX-1. Instance segmentation is the task of assigning semantic class labels to each pixel of an image (for example, car, person, etc.), as well as a coherent instance identifier such that every pixel belonging to the same object instance shares the same identifier. This has a wide array of applications, including object recognition and tracking, pose estimation, and scene understanding. In the context of autonomous driving, this will allow vehicles to accurately delineate multiple vehicles and pedestrians within an image. We'll present a simple yet powerful end-to-end convolutional neural network to tackle this task with state-of-the-art performance on the challenging Cityscapes Instance-Level Segmentation task. Our model consists of two independently trained individual deep neural networks with innovative training targets, followed by joint fine-tuning. The 30 million parameter network is trained on the new NVIDIA DGX-1 deep learning accelerator in approximately 30 hours. This is a 50% speedup compared to the NVIDIA Maxwell TITAN X, and is immeasurably faster than any CPU implementation.

25-minute Talk Min Bai - PhD Student, University of Toronto
Add to My Interests
S7763 - Deliver a Transformative 3D Graphics User Experience with VMware Horizon, Blast Extreme Adaptive Transport, and NVIDIA GRID

Discover the benefits of virtualizing any desktop or application using VMware Horizon and NVIDIA GRID. Learn about how NVIDIA GRID and VMware Blast Extreme Adaptive Transport now delivers a transformational user experience for LAN and WAN users, understanding graphics use cases enabled by Blast Extreme, NVIDIA GRID Performance Engineering benchmarking and results for Blast Extreme Adaptive Transport, and high-performance graphics environment demos (considering SxS for this) deployment and TCO considerations.

50-minute Talk Luke Wignall - GRID Performance Engineer, NVIDIA
Kiran Rao - Director, Product Management, VMware
Add to My Interests
S7203 - Delivering Immersive Experiences Through GPU Virtualization and Streaming

Introducing the transition from traditional workstation to immersive experience workspace, hear about novel NVIDIA and ESI technologies to combine streaming and virtualization for GPUs to provide scalable immersive virtual and augmented reality. We'll discuss the challenges in advancing to the immersive workspace for mobile, desk-side, or team-size immersive experiences through on-premise and cloud-based virtual engineering applications.

50-minute Talk Jan Wurster - Team Leader Software Development, ESI Group
Add to My Interests
S7826 - Democratize Autonomous Driving

Most of you probably have already heard about Project Apollo, which we announced couple of weeks ago at Shanghai Motor Show. Baidu was one of the first major tech companies to embrace artificial intelligence and machine learning, and its autonomous vehicle push began with road testing in Beijing in 2015. In this presentation, you will learn more about the Project Apollo. And we will share the application scenarios of autonomous driving, the key practices of GPU application, and the vision of Baidu Intelligent Vehicle.

25-minute Talk Gu Weihao - General Manager of Baidu Intelligent Vehicle Business Unit, Baidu
Add to My Interests
S7808 - Deploying Embedded GPUs into Military Applications (Presented by Abaco) We'll explore how GPUs are being used in military applications (ground vehicles and avionics) and how we can ruggedize GPU technology for use in the harshest environments. Learn how high-bandwidth applications can stream data into the Jetson TX2 for real-time processing and situational awareness. We'll show how data-heavy networks coupled with embedded GPUs can be deployed into mobile platforms and deliver increasing capabilities and greater autonomy. Military open standards now cater to future technology insertion and GPU technology can be deployed into existing and future platforms to deliver deep learning at the edge of the battlefield. 50-minute Talk Ross Newman - Senior Field Applications Engineer, Abaco Systems
Add to My Interests
S7458 - Deploying Unique DL Networks as Micro-Services with TensorRT, user Extensible Layers, and GPU Rest Engine Once you have trained your neural network to do some unique and interesting task, you might wonder how to make it available to colleagues, collaborators, or perhaps the world. One of the best ways to do that is to create a REST-based microservice. Then anyone with the URL can make a request and get an answer from your neural network. We'll show how three technologies come together to make that possible: 1. TensorRT provides low-latency, high-throughput inference; 2. Custom layer support in TensorRT allows you to express your unique deep learning secret sauce within TensorRT; 3. GPU Rest Engine gives you a fast and easy way to create a GPU-powered microservice. We'll show the steps necessary for you to start creating your own deep learning-powered microservices. 25-minute Talk Chris Gottbrath - Accelerated Computing Product Manager, NVIDIA
Add to My Interests
S7822 - Dermatologist-Level Classification of Skin Cancer with Deep Neural Networks

Skin cancer, the most common human malignancy, is primarily diagnosed visually, beginning with an initial clinical screening and followed potentially by dermoscopic analysis, a biopsy and histopathological examination. Here we demonstrate classification of skin lesions using a single CNN, trained end-to-end from images directly. We test its performance against 21 board-certified dermatologists on biopsy-proven clinical images with two critical binary classification use cases: malignant carcinomas versus benign seborrheic keratoses; and malignant melanomas versus benign nevi. The CNN achieves performance on par with all tested experts across both tasks, demonstrating an AI capable of classifying skin cancer with dermatologist-level accuracy.

25-minute Talk Andre Esteva - PhD Candidate, Stanford University - Sebastian Thrun's Lab
Add to My Interests
S7687 - Designing Autonomous Vehicle Applications with Real-Time Multisensor Frameworks

As embedded software in intelligent vehicles becomes more complex, researchers and engineers need more efficient tools and integration frameworks that simultaneously align ease-of-use, dynamism, execution performance, and portability. We'll introduce Intempora's RTMaps (Real-Time Multisensor applications) framework, which is a component-based design and execution middleware for software development, integration, and testing. This framework reduces software development cycle times and provides easy access to the DRIVE PX 2 capabilities. RTMaps supports most automotive sensors on the market for real-time execution, and also provides recording and synchronized playback capabilities for offline development, testing, and validation. RTMaps is now available on DRIVE PX 2. It offers a drag-and-drop approach for GPU-based computer-vision and AI systems, including an integration of the NVIDIA DriveWorks software modules as independent building-blocks.

25-minute Talk Nicolas Du lac - CEO, Intempora
Add to My Interests
S7614 - Design with Virtual Reality in Architecture, Engineering & Construction

Learn how Gensler is using the latest technology in virtual reality across all aspects of the design process for the AEC industry. We'll cover how VR has added value to the process when using different kinds of VR solutions. Plus we'll talk about some of the challenges Gensler has faced with VR in terms of hardware, software, and workflows. Along with all of this, NVIDIA's latest VR visualization tools are helping with the overall process and realism of our designs.

25-minute Talk Scott DeWoody - Firmwide Creative Media Manager, Gensler
Add to My Interests
S7293 - Detecting Topological Changes in Dynamic Delaunay Triangulations Using CUDA Learn how to detect topological changes that occur in dynamic 2D Delaunay triangulations using CUDA. We'll present a novel, unified approach that can be applied in all those cases (pedestrian tracking, flocking, moving bubbles, etc.) where objects are triangulated starting from a density map. Topological changes are detected comparing two subsequent triangulations and they show up as "flipped-edges." We'll show new physics results due to the unprecedented statistics of detection of irreversible topological changes, occurring in the triangulation of the droplets of a Lattice Boltzmann emulsion, allowed by our implementation. Such changes are associated to the so-called plastic events that are responsible for the complex behavior of emulsions possessing both liquid and solid features at the same time. In our implementation, we used a suitable mix of in-house developed CUDA kernels and primitives from existing CUDA libraries. 25-minute Talk Matteo Lulli, University of Rome Tor Vergata
Add to My Interests
S7519 - Developer Tools for Automotive, Drones and Intelligent Cameras Applications Embedded development systems are getting more powerful than ever. With this trend comes the ever-growing complexity of delivering real-time applications that can capitalize on all the potential computational horsepower of the system. The application developer needs to be able to design new software IP, easily port the application to the Embedded system, and then optimize and maximize the CPUs and GPUs utilization, data acquisition and transfers, to provide a reliable real-time visual computing experience that can full fill even the most demanding computational requirements. In this tutorial/talk ? the audience will learn about recommended development flows for the latest embedded systems. We will cover the overall developer tools offering available for each of the specific Software Development Kits provided respectively to Automotive, Embedded and Mobile platforms. For each of these platforms, we will dissect and present important learnings from the development of show casing applications demonstrating advanced Autonomous Driving and Intelligent Video Analytics use cases. The audience will learn what tools are available for each platform and the purpose of each tool and its value proposition that can be taken advantage of. 50-minute Talk Sebastien Domine - VP Software Engineering, Developer Tools , NVIDIA
Add to My Interests
S7824 - Developer Tools update in CUDA 9

This session will provide an overview of developer tools and what is changing in Nsight Eclipse for CUDA 9.0.

25-minute Talk Sanjiv Satoor - Senior Engineering Manager, Developer Tools, NVIDIA
Add to My Interests
S7388 - Developing an Improved Generalized Eigensolver with Limited CPU Offloading We'll explore strategies to reduce CPU dependencies within existing hybrid CPU/GPU LAPACK routines, such as those implemented with the open-source MAGMA library. This will be carried out within the context developing an improved generalized eigensolver, written in CUDA Fortran for the open-source Quantum ESPRESSO library. The solver aims to replace offloaded subblock CPU computations within the existing hybrid algorithms with GPU resident subblock computations to limit dependencies on available CPU resources. Performance considerations and strategies used in developing the solver, including the use of profiling tools available within the CUDA toolkit will be covered. Additionally, we'll provide an example developing software using CUDA Fortran. 25-minute Talk Joshua Romero - Graduate Student, Stanford University
Add to My Interests
S7573 - Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge We'll bring CUDA into a compute-intensive application by learning how to use CUDA-enabled development tools in the process of profiling, optimization, editing, building, and debugging. Using the Allinea Forge development toolkit, we'll cover how to profile an existing application and identify the most compute intensive code regions. We'll then replace these regions with CUDA implementations and review the results - before turning to the task of debugging the GPU-enabled code to fix an error introduced during the exercise. We'll learn debugging techniques for CUDA and debug using Allinea Forge to produce the correct, working, high-performance GPU-accelerated code. As we'll be using GPUs hosted in the cloud, all attendees are required to bring is a laptop with a modern browser. 50-minute Talk Ryan Hulguin - Applications Engineer, ARM
Add to My Interests
S7617 - Developing Your Own Wake Word Engine Just Like 'Alexa' and 'OK Google'

A wake word is a word or phrase like "Alexa" and "OK Google." It provides an always-listening capability to a microphone-enabled device. Developers who want their own version of wake word did not have such a solution until KITT.AI released its Snowboy product, a developer-facing, always-on, offline, real-time wake word engine. It's trained on clusters of GPUs with hundreds of people's voices to provide robustness, while it works on small embedded devices like a $5 Raspberry Pi Zero. We'll demo how to use Snowboy for developing home automation or hands-free projects and we'll show how we used GPUs to build the Snowboy product.

50-minute Tutorial Xuchen Yao - CEO, KITT.AI
Guoguo Chen - CTO, KITT.AI
Yuan Cao - Software Engineer, KITT.AI
Add to My Interests
S7281 - Device Lending: Dynamic Sharing of GPUs in a PCIe Cluster Learn how GPUs can be time-shared between multiple hosts connected in a PCIe cluster using a method called device lending. Unlike approaches for sharing GPUs that typically require specific programming models, device lending makes a GPU appear to the operating system as if it is locally installed. This allows the GPU to be controlled and used by a remote host without any modifications to existing software. We'll present how device lending is implemented using standard PCIe and non-transparent bridging. As a proof-of- concept, we accelerate EIR, a computer-aided medical diagnosis system using machine learning and computer vision to do polyp detection, from being an offline tool to giving real-time feedback by dynamically borrowing remote GPU resources. 25-minute Talk Jonas Markussen - PhD student, Simula Research Laboratory
Add to My Interests
S7643 - Diet Networks: Thin Parameters for Fat Genomics Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting when training deep learning models. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. We propose a novel neural network parameterization, that we call Diet Networks, which considerably reduces the number of free parameters in the model. The Diet Networks parametrization is based on the idea that we can first learn or provide an embedding for each input feature and then learn how to map a feature's representation to the parameters linking the value of the feature to each of the hidden units of the classifier network. We experiment on a population stratification task of interest to medical studies and show that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier. This work was accepted at ICLR 2017. 25-minute Talk Adriana Romero Soriano - Postdoc, University of Montreal, Montreal Institute for Learning Algorithms
Add to My Interests
S7620 - Digital Twin, AI, and Industial Internet of Things

We'll cover the emerging area of industrial IoT and the application of deep learning and AI to this space. DIgital Twin was named one of the top five tech trends for 2017 by Gartner and it is the foundational technology for GE's industrial internet platform called Predix. Digital Twin is a live digital representation of a physical system that is predictive in nature and uses continuous learning to get better as new data comes in from the physical system. Digital Twins coupled with AI technologies such as deep learning, and high performance computing are used to precisely predict future behavior under new scenarios and optimize the system: can we get an extra 1% in efficiency and save millions of dollars worth of fuel, can I produce 1% more in output from a manufacturing plant, can we optimize a hospital, can we detect smaller lesions and do so earlier? Deep learning and GPUs play a key role in harnessing the value from massive streams of IIoT data - from anomaly detection, to video analytics to optimization.

50-minute Talk Babu Narayanan - Senior Principal Scientist, General Electric
Add to My Interests
S7856 - Disrupting Cancer Diagnostics - Cloud-based Deep Learning AI for Gigantic Pathology Images

We'll introduce a novel approach to digital pathology analytics, which brings together a powerful image server and deep learning –based image analysis on a cloud platform. Recent advances in AI and Deep Learning in particular show great promise in several fields of medicine, including pathology. Human expert judgement augmented by deep learning algorithms has the potential to speed up the diagnostic process and to make diagnostic assessments more reproducible. One of the major advantages of the novel AI-based algorithms is the ability to train classifiers for morphologies that exhibit a high level of complexity. We will present examples on context-intelligent image analysis applications, including e.g. fully automated epithelial cell proliferation assay and tumor grading. We will also present other examples of complex image analysis algorithms, which all run on-demand on whole-slide images in the cloud computing environment. Our WebMicroscope® Cloud is sold as a service (SaaS) approach, which is extremely easy to set up from a user perspective, as the need for local software and hardware installation is removed and the solution can immediately be scaled to projects of any size.

25-minute Talk Kaisa Helminen - CEO, Fimmic
Add to My Interests
S7803 - Distributed TensorFlow

TensorFlow gives you the flexibility to scale up to hundreds of GPUs, train models with a huge number of parameters, and customize every last detail of the training process. We'll provide a bottom-up introduction to distributed TensorFlow, showing all the tools available for harnessing this power.

50-minute Talk Wolff Dobson - Developer Programs Engineer, Google
Add to My Interests
L7128 - DIY Deep Learning: a Hands-On Lab with Caffe2

Caffe2 is a new lightweight, modular, and scalable deep learning framework, evolving from the previous Caffe library.This is a hands-on lab of Caffe2. You'll learn how to design, train and deploy state-of-the-art deep learning models, use GPUs to achieve large-scale distributed training, and learn ways to incorporate such deep learning into applications. For Caffe users, you'll also learn how to seamlessly migrate your current Caffe models to Caffe2 and keep productive.

In more detail, the lab will cover:
• Introductory material on deep learning, its motivations and background
• Migration from Caffe to Caffe2
• Training convolutional models for image classification.
• Recurrent Neural Network examples and demos for natural language processing
• Efficient deep learning & distributed training with multiple GPU machines

120 Instructor-Led Lab Yangqing Jia - Research Scientist, Facebook
Pieter Noordhuis - Software Engineer, Facebook
Alexander Sidorov - Software Engineer, Facebook
Add to My Interests
S7493 - DNA for Automated Driving

We'll showcase an architecture that enables discrete driver assistance systems to all work in tandem. This framework is enabling automakers to develop complex systems more quickly and efficiently, reducing time to market for ADAS functionality. As part of our discussion we'll share a reference implementation that demonstrates a valet parking function, which was built by using the architecture and accessing maps from the cloud.


25-minute Talk Jeremy Dahan - Innovation Project Manager, Elektrobit
Add to My Interests
S7136 - DNA Sequences Alignment in Multi-GPUs: Energy Payoff on Speculative Executions Find out the energy cost of launching speculative executions when handling data dependencies to enhance parallelism on multi-GPU platforms. We present CUDAlign 4.0 as case study, a multi-GPU execution for an optimal alignment of huge DNA sequences using the exact Smith-Waterman algorithm. Our speculative approach easily attains 10-20x speed-up versus the baseline pipelined version where GPUs are idle waiting for dependencies to be solved. But working on mispredictions, GPUs waste energy. In the green computing era where GFLOPS/w is the trending metric, we need to know which is worse: wasting time or power. Our experimental study analyzes speculation hit ratios to evaluate extra performance and measures energy spent on mispredictions, to conclude to what extent the speculative approach jeopardizes the GFLOPS/w ratio. 25-minute Talk Manuel Ujaldon - Full Professor and NVIDIA CUDA Fellow, University of Malaga (Spain), Computer Architecture Department
Add to My Interests
S7865 - Doctors & Developers: Combining Expertise With VR and AI To Improve Medical Training and Simulation

Learn how doctors aided in the design process to create authentic VR trauma room scenarios; how expert content and simulation devs crafted a VR experience that would have impact in a world where there's no room for error and why Oculus supports the program. 
Experiential learning is among the best ways to practice for pediatric emergencies. However, hospitals are spending millions on expensive and inefficient mannequin-based training that does not consistently offer an authentic experience for med students or offer convenient repeatability. Join us for a case study on a groundbreaking pilot program that brought together Children's Hospital Los Angeles with two unique VR and AI dev teams to deliver VR training simulations for the most high stakes emergencies hospitals see: pediatric trauma.

25-minute Talk Shauna Heller - CEO, Clay Park VR
Add to My Interests
S7624 - Driver Monitoring: A Deep Learning Approach for Gaze Estimation

A driver monitoring camera will be a valuable component when it comes to autonomous driving for levels 3 & 4. The camera is able to distinguish the area of the drivers' attention. For this purpose the estimation of the gaze of the driver is needed. Additionally to signal "eyes on road," the user experience for HMI can be significantly improved. We'll present a deep learning approach that trains a neural network in an end-to-end manner. Small patches of the eye serve as input to a convolution neural network. The tradeoff between a deep and shallow net is an important aspect when it comes to a commercial product. The massive use of GPUs can help to find the best tradeoff between accuracy and number of needed FLOPS as well as the best suited DNN architecture.

25-minute Talk Cornelius Wefelscheid - Machine Learning Expert - Advanced Development, Leopold Kostal GmbH & Co. KG
Add to My Interests
S7427 - DriveWorks: A Look Inside NVIDIA's Autonomous Driving SDK

We'll introduce NVIDIA DriveWorks, a software development kit for autonomous driving and processing sensor data through perception, mapping, localization, and path planning steps. DriveWorks provides a rich set of functionalities: sensor abstraction layer, algorithm modules, DNNs, applications, UI and tools for sensor setup and management. The SDK is modular, optimized for GPUs, and runs on top of OS, CUDA/cuDNN, TensorRT, and VPI. This is the foundation for developers working on autonomous vehicle applications, and the session will highlight how to leverage it.

50-minute Tutorial Dennis Lui - Solutions Architect, NVIDIA
Miguel Sainz - Senior Director, NVIDIA
Gaurav Agarwal - Senior Product Manager, NVIDIA
Add to My Interests
S7781 - Driving Shareholder Value in the Enterprise with GPU Hardware AI is moving from consumer applications to the enterprise and will soon affect all parts of operations from the customer to the product to the enterprise. Stephen Pratt, the CEO of and former head of Watson for IBM GBS, presents a shareholder value perspective on why enterprise artificial intelligence will be the single largest competitive differentiator in business over the next five years?and what you can do to end up on top:(1)A framework for why AI will be key to creating shareholder value,(2)How to determine where to start and how to progress (with case studies),(3)How to manage spread of AI in your enterprise (with lessons from the past), (4)How to ensure proper adoption of AI solutions, and (5)Early results of applying the DGX-1 to business process optimization challenges. 25-minute Talk Stephen Pratt - CEO,
Add to My Interests
S7449 - Driving the Assembly of the Zebrafish Connectome through Deep Learning Tracing pathways through large volumes of data is an incredibly tedious, time-consuming process that significantly encumbers progress in neuroscience and the tracing of neurons through an organism. We'll explore the potential for applying deep learning to the automation of high-resolution scanning electron microscope image data segmentation. We've started with neural pathway tracing through 5.1GB of whole-brain serial-section slices from larval zebrafish collected by the Center for Brain Science at Harvard. This kind of manual image segmentation requires years of careful work to properly trace the neural pathways in an organism as small as a zebrafish larvae, which is approximately 5mm in total body length. Automating this process could vastly improve productivity, which would lead to faster data analysis and more breakthroughs in understanding the complexity of the brain. 50-minute Talk Nick Nystrom - Senior Director of Research, Pittsburgh Supercomputing Center
Ishtar Nyawira - Co-President, Timmy Global Health: Pitt Chapter, University of Pittsburgh
Add to My Interests
S7124 - Drone Net: Using Tegra for Multi-Spectral Detection and Tracking in Shared Air Space

The challenge and opportunity presented by use of UAS "drones" in the national airspace has historic significance. The FAA estimates that by 2020 the drone market will be $98 billion with 7 million drones added annually. How drones ranging from professional service to hobby will safely share airspace is unclear. Preliminary research at Embry Riddle to develop a drone detector, which can be placed on rooftops and networked with other detectors and information services, has shown that multi-spectral electro-optical/infrared detection is quite effective. Our team is using NVIDIA Jetson systems in an EO/IR detector system. The NVIDIA Kepler architecture-based NVIDIA Tegra co-processor provides real-time object detection for aircraft and drones using salient object detection algorithms accelerated by GPUs. We'll present the power efficiency and real-time processing advantages GP-GPU provides compared to FPGA and multi-core, which we've also tested for this application.

25-minute Talk Sam Siewert - Assistant Professor, Embry-Riddle Aeronautical University
Add to My Interests
S7596 - DSD: Dense-Sparse-Dense Training for Deep Neural Networks Learn a new technique to prevent deep learning optimizers from getting stuck in a local minima, and to produce better optimization results. We'll introduce DSD, a dense-sparse-dense training method that regularizes neural networks by pruning and then restoring connections. Our method learns which connections are important during the initial dense solution. Then it regularizes the network by pruning the unimportant connections and retraining to a sparser and more robust solution with same or better accuracy. Finally, the pruned connections are restored and the entire network is retrained again. This increases the dimensionality of parameters, and thus model capacity, from the sparser model. DSD training achieves superior optimization performance. We'll highlight our experiments using GoogLeNet, VGGNet, and ResNet on ImageNet; NeuralTalk on Flickr-8K; and DeepSpeech-1&2 on the WSJ dataset. This shows that the accuracy of CNNs, RNNs, and LSTMs can significnatly benefit from DSD training. At training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn't change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD in our numerical experiments highlights the inadequacy of current deep learning training methods, while DSD effectively achieves superior optimization performance for finding better solutions. 25-minute Talk Song Han - Ph.D. candidate, Stanford University
Add to My Interests
S7176 - Dynamic Facial Analysis: From Bayesian Filtering to Recurrent Neural Networks

We propose to use recurrent neural networks for analyzing facial properties from videos. Facial analysis from consecutive video frames, including head pose estimation and facial landmark localization, is key for many applications such as in-car driver monitoring, facial animation capture, and human-computer interaction. Compared with the traditional Bayesian filtering methods for facial tracking, we show RNNs are a more generic, end-to-end approach for joint estimation and tracking. With the proposed RNN method, we achieved state-of-the-art performance for head pose estimation and facial landmark localization on benchmark datasets.

25-minute Talk Jinwei Gu - Senior Research Scientist, NVIDIA
Add to My Interests
S7802 - Edge-AI for Intelligent User Experience

We'll showcase how Mercedes-Benz is enabling edge AI in the car by utilizing powerful embedded hardware for sensor processing and fusion in the cabin interior. The focus of AI work today has been dominated by the cloud environment. The availability of computation power, combined with technologies for scaling with massive datasets, makes the cloud a perfect ecosystem for the application of AI technologies. However, there are a myriad of AI applications today that can’t fully live on the cloud, such as an AI application in a moving vehicle where connectivity to the cloud is not guaranteed. In such cases, AI in the edge computing space faces a number of challenges not always present in today's cloud environment. Chief among them is a sense of autonomy: when the edge AI encounters problems that require prompt decision making, the problems have to be resolved by its own intelligence. We’ll talk about how Mercedes-Benz is enabling edge AI to address this issue. 

25-minute Talk Kal Mos - VP Connected Car, User Interaction & Telematics, Mercedes-Benz Research and Development North America
Add to My Interests
S7543 - Effectively Scaling Deep Learning Frameworks to 40 GPUs and Beyond

A variety of deep learning frameworks now make it simple to train deep neural networks of many types. However, scaling deep learning frameworks to large models with data parallel training on many GPUs remains a challenge, as the default utilities for inter-device and inter-node communication provided by these frameworks are often not optimal. Using examples from several frameworks, we demonstrate that linear strong scaling to many nodes and many devices can be achieved augmenting deep learning frameworks with CUDA-aware MPI allreduce and allgather operations, which allow them to be used in an HPC setting where multi-GPU nodes are augmented with high-speed Infiniband interconnects. We'll show that these operations allow us to quickly train very large speech recognition models.

25-minute Talk Andrew Gibiansky - Machine Learning Engineer, Baidu SVAIL
Add to My Interests
S7240 - Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs We'll present a method for highly efficient lattice Monte Carlo simulations with correlation-free updates. Achieving freedom from erroneous correlations requires random selection of lattice sites for updates, which must be restricted by suitable domain decomposition to create parallelism. While approaches based on caching limit the number of allowed states, the multisurface-type approach presented here allows arbitrarily complex states. The effectiveness of the method is illustrated in the fact that it allowed us to solve a long-standing dispute around surface growth under random kinetic deposition in the KPZ-universality class. The method has also been applied to Potts models and is suitable for spin-glass simulations, such as those required to test quantum annealers, like D-Wave. 25-minute Talk Jeffrey Kelling - Scientist, Helmholtz-Zentrum Dresden-Rossendorf
Add to My Interests
S7130 - Efficient Deep Model Selection

Convolutional neural networks have achieved impressive success in many tasks in computer vision. However, they come at a high memory and computational cost, thus making it difficult for deep learning to be commercially viable. In addition, selecting the architecture is still an engineering process. We'll introduce DecomposeMe, an efficient architecture based on filter-compositions. This architecture can be trained quickly and is capable of achieving real-time operation in embedded platforms (250+ fps in an NVIDIA Jetson TX1). We'll also introduce our approach to automatically determining the number of neurons of the architecture during the training process. Finally, we'll introduce a novel approach to quantizing the network parameters.

25-minute Talk Jose Alvarez - Researcher, Commonwealth Scientific and Industrial Research Organisation(CSIRO)
Add to My Interests
S7125 - Efficient Imaging in Radio Astronomy Using GPUs Realizing the next generation of radio telescopes such as the Square Kilometre Array requires both more efficient hardware and algorithms than today's technology provides. We'll present our work on the recently introduced Image-Domain Gridding (IDG) algorithm that tries to avoid the performance bottlenecks of traditional AW-projection gridding. We'll demonstrate how we implemented this algorithm on various architectures. By applying a modified roofline analysis, we show that our parallelization approaches and optimization leads to nearly optimal performance on all architectures. The analysis also indicates that, by leveraging dedicated hardware to evaluate trigonometric functions, NVIDIA GPUs are much faster and more energy-efficient than regular CPUs. This makes IDG on GPUs a candidate for meeting the computational and energy-efficiency constraints for future telescopes. 25-minute Talk Bram Veenboer - PhD Researcher, Astron
Add to My Interests
S7544 - Efficient Inference for WaveNet Audio Synthesis Models WaveNet is a generative neural network architecture for audio in the time domain. Due to the high sampling frequency of audio signals and the sequential dependencies between timesteps, inference in a WaveNet model is incredibly expensive, and can take many minutes to generate a single second of audio with an unoptimized implementation. We implement custom WaveNet inference kernels and demonstrate that an efficient implementation on a CPU or a GPU can provide faster than realtime audio generation, even though neither platform is perfectly suited to such a task due to the effective lack of parallelism and high compute requirements. To our knowledge, this is the first demonstration that neural audio generation can be done efficiently enough to deploy in a production text-to-speech system. 50-minute Talk Andrew Gibiansky - Machine Learning Engineer, Baidu SVAIL
Add to My Interests
S7370 - Efficient Maximum Flow Algorithm and Applications

Maximizing data flow is one of the most important graph problems and has numerous applications across various computational domains: transportation networks, power routing, image segmentation, social network clustering, and recommendation systems. There are many efficient algorithms that have been developed for this problem, most of them trying to minimize computational complexity. However, not all these algorithms map well to massively parallel architectures like GPUs. We'll present a novel GPU-friendly approach based on the MPM algorithm that achieves from 5 to 20 times speedup over the state-of-the-art multithreaded CPU implementation from Galois library on general graphs with various diameters. We'll also discuss some real-world applications of the maximum flow problem in computer vision for image segmentation and in data analytics to find communities in social networks.

25-minute Talk Nikolay Sakharnykh - Senior Developer Technology Engineer, NVIDIA
Hugo Braun - Intern, Ecole Polytechnique
Add to My Interests
S7153 - Efficient Observations Forecast for the World's Biggest Eye Using DGX-1

Have you heard about the largest ground-based telescope ever built? Are you interested in the newest NVIDIA DGX-1 hardware accelerator? Come and learn how the DGX-1 architecture dramatically leaps forward the computational astronomy community in designing major, multimillion-dollar optical instruments for the European Extremely Large Telescope. Starting from the mathematical model up to the high-performance implementation on distributed-memory systems with hardware accelerators, we'll explain how the resulting matrix computations associated with an efficient task-based programming model help design the next generation of telescope instruments.

50-minute Talk Hatem Ltaief - Senior Research Scientist, KAUST
Damien Gratadour - Associate Professor, Universite Paris Diderot & Observatoire de Paris
Add to My Interests
L7105 - EglStreams: Interoperability for Camera, CUDA and OpenGL

These will be the key takeaway from the lab:  1)  Participants will get an overview of eglstreams implementation 2)  We will talk about a wrapper over eglstreams which is easy to plug and play 3)  We will describe how to create an eglstream camera producer and how to connect it to an eglstream CUDA consumer. Consumer will do CUDA processing on frame received from camera. 4)  We will describe the means of connecting an eglstream camera producer to an eglstream OpenGL consumer. 5)  We will describe a means to have multiple eglstreams at the camera producer and different ways to connect these to CUDA and OpenGL consumers. 6) We will also talk about cross process eglstreams.

Platform requirements : TX1 with E3326 camera


120 Instructor-Led Lab Yogesh Kini - Manager, System Software, NVIDIA
Senthil Ramalingam - Software Engineer, NVIDIA
Praveen K - System Software Engineer, NVIDIA
Venugopala Madumbu - Software Architect, NVIDIA
Add to My Interests
S7585 - Elevating the Enterprise: Bringing One of the World’s Largest Banks to New Frontiers With Deep Learning

Evidence for deep learning’s potential to transform business increases each day. While deep learning offers today’s business leaders tremendous possibilities, ensuring the technology’s success requires the right data, knowledge and resources. In this talk, Stephen Piron will share how his Toronto-based startup helped one of the world’s largest banks successfully incorporate deep learning into its complex organizational network. From GPUs to customized machine learning training, Piron will uncover the toolkit making deep learning transformations possible for the world’s largest organizations.  

25-minute Talk Stephen Piron - founder,
Add to My Interests
S7515 - Eliminating the Regular Expression with Neural Networks Regular expressions are as old as computing itself. Our deep learning-based approaches aim to retire this tool from the modern data scientist's tool bag. The regular expression is often introduced to computer scientists as part of their early college education, often in their first discrete structures course. In this context, they are an incredible tool used to describe languages, grammars, and syntax. In practice though, developers all over the world use them to detect data types or parse certain structures. Even for common use cases such as email or phone validation, regular expressions that capture the full breadth of cases can become untenably large. We show how neural networks can learn approximation of regular expressions so that modern data scientists and developers never have to write one again. 25-minute Talk Tim Delisle - CEO, Datalogue
Add to My Interests
S7190 - Embedded Bayesian Perception and V2X Communications for Autonomous Driving

We'll present technologies developed by the Inria Chroma team that robustly perceive and interpret dynamic environments using Bayesian systems (such as BOF, HSBOF, and CMCDOT) relying on embedded sensors input and V2X communications (vehicle to vehicle and vehicle to infrastructure). These technologies were initially developed in collaboration with industrial partners such as Toyota, Renault, and Probayes SA. We'll demonstrate how heterogeneous sensors can be used efficiently, merged, and filtered in real time into probabilistic grids, and discuss how to compute collision risks in an optimized way on embedded GPU platforms like the NVIDIA Jetson. 

25-minute Talk Christian Laugier - First Class Research Director , Inria Grenoble
Add to My Interests
S7505 - Enable GPU-Accelerated Simulation Practices on the Cloud with Rescale We'll review the benefits of leveraging NVIDIA GPU technology through Rescale, a cloud-based simulation platform. Through concrete engineering use cases and benchmark results, we'll illustrate performance gains with GPUs across a large selection of simulation software. 25-minute Talk fanny Treheux - director of solutions, Rescale
Add to My Interests
S7846 - Enabling Intelligent Enterprises with SAP Machine Learning (Presented by SAP)

We'll talk about how SAP is realizing its vision to make enterprise applications intelligent. We'll provide a glimpse of the breadth of machine learning use cases SAP addresses through its Machine Learning Portfolio. Then take a deep dive into one of the applications with a detailed business process view. We'll then provide a detailed view of the underlying technology stack and how NVIDIA GPUs are enabling SAP to build machine learning solutions at scale.

25-minute Talk Markus Noga - Vice President, Machine Learning, SAP
Add to My Interests
S7708 - Enabling Scientific Discovery with Large-Scale Interactive Visualization and Tiled Displays We'll focus on leveraging large-scale visualization and large tiled displays to enable scientific discovery. We'll present a case study where domain scientists evaluate the complicated, hierarchical microstructure of enamel in primate teeth to gain insight into the principles governing the evolution of mineralized biological tissues. We integrate X-ray micro-tomography with large-scale visualization and analysis techniques to explore the internal structure of mineralized biological tissues. In an interactive visualization session, we'll bring domain scientists and visualization experts together to collaborate. We'll explore a high-resolution visualization streaming from a GPU-based visualization cluster on a large tiled display, along with a distributed global illumination algorithm, which helps scientists improve depth perception in rendered images. Analyzing the data interactively, domain scientists are able to identify structures previously unseen in the data. 25-minute Talk Silvio Rizzi - Assistant Computer Scientist, Argonne National Laboratory
Add to My Interests
S7686 - Encrypted Deep Learning: A Guide to Privacy-Preserving Speech Processing In today's cloud, to make your data searchable, you give up its contents to your cloud provider, even if they then encrypt it. While you gain the speed and power of the cloud, you do so by sacrificing the privacy of your data, a common barrier to cloud adoption. Hence, to encourage the migration of sensitive data from behind the firewall to the cloud, we need to process that data without ever decrypting it. We'll demonstrate the state of the art of processing encrypted data using GPU-accelerated cloud. We'll also present a roadmap for near-future plans for cryptographic schemes for secure transcription. Inspired by fully homomorphically encrypted convolution nets for secure image processing, so-called CryptoNets, we'll demonstrate a CNN-based acoustic model and discuss in broader terms how the CryptoNet idea extends to other types of deep learning network, such as RNNs. 25-minute Talk Nigel Cannings - CTO, Intelligent Voice
Add to My Interests
S7415 - Enhance Multi-Contrast MRI Reconstruction for Improved Diagnosis with Deep Learning Powered by NVIDIA GPUs Advanced computation powered by GPUs is changing the clinical decision-making process. We'll present an exciting example of using NVIDIA GPUs for multi-contrast magnetic resonance imaging exams. Neurological disorders result in great clinical challenges and high societal burdens. Multi-contrast MRI exams are frequently used for diagnosis because the various tissue contrasts provides complementary diagnosis information to distinguish normal tissue from pathology. However, the cost of acquiring these multiple sequences is extensive scanning time, which significantly increases both the diagnosis cost and patients' discomfort and limit the acquired image quality. We'll propose a new approach to accelerate multi-contrast imaging using a deep learning approach powered by GPUs. Validated on both patients and healthy subjects, we'll demonstrate that we can significantly reduce scanning time while improving image resolution and quality and preserving the diagnostic information. 25-minute Talk Enhao Gong - PhD Candidate, Stanford University
Add to My Interests
S7138 - Enhancing Pricing Performance and Quants Productivity in a Cloud Based Development Environment

Misys quants use a groovy-based DSL to write efficient GPU-enabled pricing models without any OpenCL or NVIDIA CUDA knowledge. Allowing progressive migration from legacy code to GPU-enabled models, this framework leverages GPGPU strengths to achieve high-performance pricing with a really short learning curve. We'll start with an overview of the framework, and then focus on the online ecosystem Misys provides to allow third parties to develop and run their custom code on GPUs in the cloud through a PaaS-like interface.

25-minute Talk Nicolas Blanc - Software Engineer, Misys
Add to My Interests
S7589 - Enterprise AR: Industry Opportunities and Technology Challenges

We'll discuss the state of the art and upcoming opportunities and challenges for AR in the enterprise. We'll focus on how enterprise end-users are using AR to accelerate their workflows and reduce project costs; how ISVs are developing new applications and UX models to leverage AR technology, and the challenges they face in UI design and in developing for cutting-edge technology; and the technical and design challenges that AR headset manufacturers are facing as they create portable, powerful displays that smoothly integrate with enterprise workflows. Application areas will include service and maintenance (for example, automotive, BIM), education, and product and building design.

50 minutes Panel Ryan Pamplin - VP, Partnerships and Sales, Meta
Juba Hadj Ali, Dassault Systems
Dace Campbell - Senior Customer Success Manager, Autodesk
William Newell - CEO, North South Studios
Eric Trabold - VP Sales & Marketing, Avegant
Kyle Szostek - Sr. Virtual Construction Engineer, Gilbane Building Company
Add to My Interests
S7747 - Envrmnt: Real-Time Streaming VR

Learn how Verizon's R&D built a VR graphics engine and platform that streams HD video and game experiences to massive audiences using GPU scaling and streaming techniques. We'll share architecture and configuration that enables us to serve real-time networked game and augmented reality experiences. We'll also discuss how GameWorks VR was instrumental in our rendering pipeline and how GPUs are being used in our cloud and network to enhance streaming VR. Finally, we'll walk through a 15-minute example of Envrmnt's tools and show a demo of a livestreamed networked Vive experience.

25-minute Talk Mohammad Raheel Khalid - CTO / Chief Engineer, Verizon Labs
Add to My Interests
S7706 - Essential CUDA Optimization Techniques - Presented by Acceleware (Session 4 of 4)

This tutorial is for those with some background in CUDA, including an understanding of the CUDA memory model and streaming multiprocessor. Our previous three tutorials provide the background information necessary for this session. This informative tutorial will provide an overview of the analysis performance tools and key optimization strategies for compute, latency, and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. It'll also include code examples and a programming demonstration highlighting the optimal global memory access pattern applicable to all GPU architectures. We'll provide printed copies of the material to all attendees for each session ? collect all four!

80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
Add to My Interests
S7181 - Evaluating Windows 10: Learn Why Your Users Need GPU Acceleration

Learn why EVERY remote user should have GPU resources available to them. We'll discuss the advantages end-users experience once their virtual desktops/sessions have GPU capabilities. Recent data from the NVIDIA GRID Performance Engineering team shows a significant impact GPUs like the Tesla M10 has on knowledge workers. The data includes real user testing and scientific data like latency, bandwidth, and CPU utilization, which all play a significant role in the overall user experience.

50-minute Talk Uday Kurkure - Staff Engineer, VMware
Lan Vu - Senior Member of Technical Staff, VMware
Hari Sivaraman - Staff Engineer, VMware
Jason Kyungho Lee - Sr. Performance Engineer, NVIDIA GRID, NVIDIA
Add to My Interests
S7429 - Expert and Customer Roundtable: Real-World Tales of GPU-Accelerated Desktops and Apps - Implementers Share Best Practices

Experts from various industries join us for a roundtable discussion of their experiences implementing GPU-accelerated virtual desktops and apps. You'll learn how Windows 10 is creating new urgency around including GPUs in VDI deployment architectures; how to design environments for greater scale, superior user experience, and lower cost; and how the latest features in VMware Horizon and NVIDIA GRID can make desktop virtualization for every use case a reality.

50-minute Talk Huong Vu - Director Engineer, Cerner
Luke Wignall - GRID Performance Engineer, NVIDIA
Pat Lee - VP Product Management, VMware
Stuart Jackson - Sr. Technology Architect, Cerner
Add to My Interests
S7430 - Expert Roundtable: GPU-Accelerated Desktops and Apps with NVIDIA GRID and Citrix XenDesktop

Experts from various industries join us for a roundtable discussion of their experiences implementing GPU-accelerated virtual desktops and apps. Learn: (1) how Windows 10 is creating new urgency around including GPUs in your VDI deployment architecture, (2) how to design your environment for greater scale, superior user experience, and lower cost, and (3) how the latest features in Citrix XenDesktop and NVIDIA GRID make desktop virtualization for every use case a reality.

50-minute Talk James Hsu - Senior Technology Architect, Partners, Windows App Delivery, Citrix
Derek Thorslund - Director of Product Management, HDX, Citrix
Luke Wignall - GRID Performance Engineer, NVIDIA
Jared Cowart - Sr. Solutions Architect, NVIDIA
John Fanelli - VP of Product, NVIDIA GRID, NVIDIA
Add to My Interests
S7175 - Exploratory Visualization of Petascale Particle Data in NVIDIA DGX-1 Learn to leverage the visualization capabilities of the NVIDIA DGX-1 system to visualize particle data. We'll cover techniques suitable for exploratory visualization such as parallel dataset reading and reduction on demand with ADIOS I/O library, GPU-based optimization techniques for particle rendering such as radar view frustum culling, occlusion culling, texture-less point sprites, and OpenGL near zero driver overhead methods. We'll also include implementation details to take advantage of the eight NVIDIA Pascal? GPUs included in the NVIDIA DGX-1. 25-minute Talk Benjamin Hernandez - Computer Scientist, Oak Ridge National Laboratory
Add to My Interests
S7688 - Exploring Machine Learning in Visual Effects

Some aspects of visual effects production are ideally suited to using machine learning technology. Whether it's coming from the digital cameras on set or from motion capture session or other sources, huge amounts of data are captured during the production of a movie. Models are built to modify this data or create new effects from it. Instead of building these models by hand, can machine learning systems be trained to do the same thing? We'll present active research projects where we are using machine learning to either accelerate a process in visual effects or allow the artists to create novel visual effects. This is definitely a work in progress report, some of the techniques show promise but are not fully developed at this time.

25-minute Talk Doug Roble - Director of Software R&D, Digital Domain
Add to My Interests
S7553 - Exploring Sparsity in Recurrent Neural Networks Recurrent neural networks are widely used to solve a variety of problems. As the quantity of data and the amount of available compute have increased, model sizes have also grown. We'll describe an approach to reduce the parameter count of RNNs using a simple pruning schedule without increasing the training time. The reduction in parameters achieves two goals. It helps reduce the size of the neural network, allowing it to be deployed on mobile and embedded devices. It also helps speed up evaluation time for inference. We'll demonstrate how this technique works for vanilla RNNs and the more complex gated recurrent units. 25-minute Talk Sharan Narang - Researcher, Baidu
Add to My Interests
S7608 - Exploring the Latent Visual Space Between Adjectives with Generative Adversarial Networks

Generative adversarial networks (GANs) have been applied for multiple cases, such as generating images and image completion. One interesting feature of GANs is the exploration in latent space, where new elements can appear caused by the interpolation between two seed elements. With this in mind, we're interested in exploring latent space in terms of adjective-noun pairs (ANP) able to capture subjectivity in visual content such as "cloudy sky" vs. "pretty sky." Although it is challenging for humans to find a smooth transition between two ANPs (similar to color gradient or color progression), the presented GANs are capable of generating such a gradient in the adjective domain and find new ANPs that lie in this (subjective) transition. As result, GANs offer a more quantified interpretation for this subjective progression and an explainability of the underlying latent space.

50-minute Talk Federico Raue - Researcher, German Research Center for Artificial Intelligence (DFKI)
Damian Borth - Director Deep Learning Competence Center, German Research Center for Artificial Intelligence (DFKI)
Add to My Interests
S7572 - Extending Mahout-Samsara Linear Algebra DSL to Support GPU Clusters

Data scientists love tools like R and Scikit-Learn, as they offer a convenient and familiar syntax for analysis tasks. However, these systems are limited to operating serially on datasets that can fit on a single node and don't allow for distributed execution. Mahout-Samsara is a linear algebra environment that offers both an easy-to-use Scala DSL and efficient distributed execution for linear algebra operations. Data scientists transitioning from R to Mahout can use the Samsara DSL for large-scale data sets with familiar R-like semantics. Machine learning and deep learning algorithms built with the Mahout-Samsara DSL are automatically parallelized and optimized to execute on distributed processing engines like Apache Spark and Apache Flink accelerated natively by CUDA, OpenCL, and OpenMP. We'll look at Mahout's distributed linear algebra capabilities and demonstrate an EigenFaces classification using Distributed SSVD executing on a GPU cluster. Machine learning practitioners will come away from this talk with a better understanding of how Samsara's linear algebra environment can help simplify developing highly scalable, CPU/GPU-accelerated machine learning and deep learning algorithms by focusing solely on the declarative specification of the algorithm without having to worry about the implementation details of a scalable distributed engine or having to learn to program with native math libraries.

50-minute Talk Suneel Marthi - Senior Principal Engineer , Redhat Inc
Trevor Grant - Open Source Analytics Technical Evangelist Committer, Apache Mahout Project, IBM
Add to My Interests
S7691 - Facial Expression and Emotion Detection for Mobile

We'll outline how Affectiva employs CNN-based approaches for the task of detecting individual facial movements (facial actions) from real-world data. Affectiva's mission is to humanize technology by bringing artificial emotional intelligence (emotion AI) to the digital world. Using computer vision and deep learning, Affectiva measures facial expressions of emotion. We'll discuss challenges encountered and advantages from using deep learning models as well as share experimental results. Models explored will include those trying to push accuracy as well as the tradeoff incurred in trying to run smaller models that can operate in environments with more constraints (such as mobile).

25-minute Talk Jay Turcot - Director of Applied AI, Affectiva
Add to My Interests
S7314 - Fast Flow-Based Distance Quantification and Interpolation for High-Resolution Density Distributions We'll discuss our GPU-targeted algorithm design for the efficient computation of distances and interpolates between high-resolution density distributions (based on the Earth Mover's Distance / the Wasserstein metric). We particularly focus on the changes - and their rationale - to transition from our previous multicore approach to a manycore design (utilizing NVIDIA?CUDA? CUB, and Thrust) that yields a massive improvement in performance. Expressive distances and interpolates are a crucial building block for numerous applications in computer vision, computer graphics, and visualization, and we'll give examples from different areas to demonstrate both utility and performance of our improved approach. 25-minute Talk Steffen Frey - Postdoc, University of Stuttgart, Visualization Research Center
Add to My Interests
S7480 - Fast Forward Poster Program for the Top 20 Posters

GTC Fast Forward Poster program is an accelerated poster presentation program that serves as a catalyst for the advancement of an array of innovations that come from universities, research labs, and industry. The GTC Poster Review Committee selected the best 20 posters submitted to GTC2017. This program gives the author a chance to present his or her GPU project in front of the top technology developers working in a vast array of industries.

80-minute Tutorial
Add to My Interests
S7268 - Fast GPU Monte Carlo Simulation for Radiotherapy, DNA Ionization and Beyond

Learn about techniques used to accelerate a Monte Carlo particle physics simulator. The strategies discussed include sorting to minimize thread divergence and data structures for efficient memory access. The software, named MPEXS, is primarily focused on X-ray radiotherapy and has been recently extended to cellular and DNA levels. Simulation of DNA ionization is particularly challenging, because large numbers of low energy particles have to be managed. Implementation of these strategies has both improved the run-time performance and reduced the memory usage. The results from the performance analysis are likely to be of use in other domains that rely on discrete event simulation. Extension of physics coverage for proton and carbon therapy and neutron radiation protection is envisioned.

50-minute Talk Shogo Okada - Research Associate, Kobe University
Nick Henderson - Research Associate, Stanford University
Add to My Interests
S7303 - Finding Parallelism in General-Purpose Linear Programming Get to know two different techniques in retrieving parallelism hidden in a general purpose linear programs (LPs) that are broadly used in operations research, computer vision, and machine learning. With conventional solvers often being restricted to serial computation, we'll show two ways of retrieving inherent parallelism, using: (1) parallel sparse linear algebra techniques with an interior-point method, and (2) a higher-level automatic LP decomposition. After a quick introduction to the topic, we'll present details and results for a diverse range of applications on the GPU. 25-minute Talk Daniel Thuerck - Ph.D. Student, Technical University Darmstadt
Maxim Naumov - Senior Research Scientist, NVIDIA
Add to My Interests
S7607 - Floating Point Array Compression on the GPU

To increase performance, high-performance systems are adopting a heterogeneous approach through the use of accelerators (for example, GPUs). These accelerators provide this performance increase with massive parallelization. Unfortunately, these HPC systems, with or without accelerators, are hitting a wall: an increasing divergence between compute and bandwidth. As core counts have increased and bandwidth at all levels of the system have stagnated, data movement has become the bottleneck for performance at multiple places between subsystems: storage, network, accelerator, and memory levels. To address these bandwidth issues in heterogeneous systems, we developed a lossy fixed-rated compression algorithm, cuZFP, for the GPU. The ZFP compressor specifically addresses the needs of lossy compression for high-performance floating point data like those used in scientific codes. By extending lossy compression to the GPU, the compression is up to an order of magnitude faster than the CPU version. Further, bandwidth limitations can be eased directly on the accelerator without copying the data back to the CPU.

25-minute Talk Mark Kim - Postdoctoral Researcher, Oak Ridge National Lab
Add to My Interests
S7196 - FMM with Periodic Boundaries Support on GPU The direct solution of the N-body problem is a simple, yet scientifically important and ubiquitous showcase algorithm for modern GPUs. However, the computational complexity is O(N^2). The fast multipole method is an algorithm that reduces runtime and complexity to optimal O(N) for any required precision. We'll present an optimized, fully NVIDIA CUDA-enabled, templated C++ implementation of the FMM, which considers all stages of the method, from particle input to the forces extraction. We compare different parallelization approaches and show the performance improvement when going from a dynamic parallelization to a presorted list-based approach that fits particular system constraints such as periodic boundary conditions. We'll discuss how to exploit the FMM operators such that both memory access overhead and the number of complex multiplications are minimized. Thereby the kernels are led to the compute bound range, and performance is increased. 25-minute Talk Bartosz Kohnke - Software Developer, Max Planck Institute for Biophysical Chemistry
Add to My Interests
S7841 - Forget Catastrophic Forgetting: AI That Learns After Deployment

One of the major hassles of Deep Learning is the need to fully retrain the network on server every time new data becomes available in order to preserve the previous knowledge. This is called 'catastrophic forgetting' and severely impairs the ability to develop a truly autonomous AI. We present the patent pending technology that allows us to solve this problem by simply training on the fly the new object without retraining of the old. Our results not only show state of the art accuracy, but real time performance suitable for deployment of AI directly on the edge, thus moving AI out of the server room and into the hands of consumers. Imagine a toy that can learn to recognize and react to its owner or a drone that can learn and detect objects of interest identified while in flight.

25-minute Talk Anatoly Gorshechnikov - CTO, Neurala
Add to My Interests
S7742 - Frame Cloud Workstation Platform: The Promise and the Reality of Cloud Graphics

We are still in the early days of the cloud graphics revolution, but things are about to change dramatically. Major cloud providers, like AWS, Microsoft Azure, and Google Cloud, are all rapidly adding or upgrading GPU capabilities. Great user experience, low-latency application delivery, and strong security of a cloud workspace environment are drawing interest from millions of enterprise users around the world. We'll share our experiences from 3+ years on the forefront of the cloud graphics movement, from the early days of GPUs on AWS in 2013, through the recent launch of N-Series on Microsoft Azure, in December. We'll present encoding and rendering benchmarks, share the details of Frame's graphics stack, and profile NVIDIA optimizations. Finally, we'll share customer stories from global enterprise leaders, like PTC, HP, and Adobe, who all use Frame to power their cloud applications delivery services. 

25-minute Talk Carsten Puls - Chief Product Officer, Frame
Justin Boitano - VP of Marketing, Frame
Nikola Bozinovic - CEO, Frame
Add to My Interests
S7575 - From Cracks to Hard Hats: Focusing on Industrial Computer Vision We'll present, in a case study driven presentation, specific examples of how GPU-enabled deep neural networks are powering new methods for analyzing the content of photos and videos from industrial contexts. First, we'll present a collaboration between and Engineering News-Record, the leading publication in the architecture, engineering, and construction vertical. This ongoing initiative leverages computer vision techniques and semantic approaches to help identify and indicate safe and unsafe situations in jobsite photos. Second, we'll present a collaboration with Arup, a London-based engineering firm, on the use of specific classifiers to localize and measure cracks and related defects in infrastructure. 25-minute Talk Sean True - Director of Machine Learning,, Inc.
Josh Kanner - Founder & CEO,, Inc.
Add to My Interests
S7244 - From Desktop to Cloud to Embedded GPUs: Designing, Training, and Compiling Vision Algorithms and Deep Learning Using MATLAB

Learn how to adopt a MATLAB-centric workflow to design, develop, and deploy computer vision and deep learning applications on to GPUs whether on your desktop, a cluster, or on embedded Tegra platforms, including Jetson TK1/TX1 and DRIVE PX boards. The workflow starts with algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB. Next, those networks are trained using MATLAB's GPU and parallel computing support either on the desktop, a local compute cluster, or in the cloud. Finally, a compiler auto-generates portable and optimized CUDA code from the MATLAB algorithm, which is then cross-compiled and deployed to the Tegra board. We'll use examples of common computer vision algorithms and deep learning networks to describe this workflow, and we'll present their performance benchmarks, including training with multiple GPUs on an Amazon P2 cloud instance.

50-minute Talk Avi Nehemiah - Product Manager- Computer Vision and Automated Driving, MathWorks
Joss Knight - Senior Developer, MathWorks Ltd
Girish Venkataramani - Development Manager, MathWorks
Add to My Interests
S7675 - From Model to Product: How Did Infervision Become Radiologists' Real Vision?

A model is different from a real product. We'll share Infervision's journey from designing algorithms for medical image analysis to actually implementing models inside hospital's PACS systems. A product is different from a model on three aspects: (1) Products make a real difference. Robustness, reliability, and accuracy are no longer simple numbers reported in articles, but criteria that judge the efficacy of algorithms from time to time; (2) Products solve real problems. Models service deep learning science, whereas products service medical decisions. When designing a medical image diagnosis product, we need to identify radiologists' real need and solve problems that matter to clinical decisions. (3) Products take into account all complexities in a real application context. We'll give a brief introduction of China's medical system with an emphasis on radiology imaging diagnosis. We'll also share some challenges and achievements Infervision experienced when attempting to insert A.I. products into radiologists' daily work flow.

50-minute Talk Kuan Chen - CEO, Infervision
Add to My Interests
L7118 - From Trained Neural Network Model to Deployment for Inference (Presented by NVIDIA Deep Learning Institute)

NVIDIA TensorRT is a high-performance neural network inference engine for production deployment of deep learning applications. This lab provides hands-on experience using TensorRT to optimize, validate, and deploy trained neural networks for inference in a self-driving car application. Prerequisites: C/C++ programming and basic knowledge of deep learning.

120 Instructor-Led Lab Joohoon Lee - Certified Instructor, NVIDIA
Steve Byun - Certified Instructor, Deep Learning Institute, NVIDIA, NVIDIA
Chris Gottbrath - Accelerated Computing Product Manager, NVIDIA
Add to My Interests
S7372 - Functional Safety: Developing ISO 26262 Compliant GPU Applications

Functional safety is an important consideration for many applications of GPU computing, especially autonomous driving, robotics, and healthcare. We'll cover what it means to be compliant with current functional safety standards, learn the basics of functional safety, and uncover how the prevailing standard, ISO26262, can apply to GPUs and GPU programming. Often the development of an application’s core features takes precedence, leaving functional safety considerations until the end of the development cycle. If functional safety is considered and planned from the start, results can improve while cost decreases. We'll explain the support that NVIDIA has implemented inside GPUs for functional safety and the various tools and methodologies that are available to support ISO26262 compliance for both hardware and software.


25-minute Talk Richard Bramley
Add to My Interests
S7235 - Fusing Vision and 3D Sensors with AI to Build Cognition Systems

Learn how to use GPUs to run 3D and camera deep learning fusion applications for autonomous driving. Cameras provide high resolution 2D information, while lidar has relatively low resolution but provides 3D data. Smart fusing of both RGB and 3D information, in combination with AI software, enables the building of ultra-high reliability classifiers. This facilitates the required cognition application for semi-autonomous and fully autonomous driving.


50-minute Talk Ronny Cohen - CEO, VayaVision
Ido Goren - SW Manager, VayaVision
Add to My Interests
S7169 - GA3C: A Hybrid CPU/GPU Implementation of A3C for Deep Reinforcement Learning

We'll introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We'll analyze its computational traits and concentrate on the critical aspects to leverage the GPU's computational power. We'll introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed-up compared to a CPU implementation and is publicly available to other researchers.

25-minute Talk Iuri Frosio - Senior Research Scientist, NVIDIA
Add to My Interests
S7502 - Generative Adversarial Networks Generative adversarial networks are machine learning models that can generate new data drawn from the same distribution as the training data. They are widely used for image generation tasks and are beginning to be used for video generation and reinforcement learning. We'll describe the basics of how GANs work and summarize their latest applications. 50-minute Talk Ian Goodfellow - Research Scientist, Google
Add to My Interests
S7565 - Getting Started with Apache MXNet

Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding, and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer-friendly deep learning frameworks. During this session, members of Amazon's deep learning team will provide a short background on deep learning, how it is applied at Amazon, and what Amazon’s strategy is for investing in the MXNet project. You'll also learn how to get started quickly using Nvidia GPUs in the AWS cloud, easily scaling to hundreds of GPUs in a matter of minutes.   

50-minute Talk Joseph Spisak - Sr. Mgr - Product Management, Amazon
Mu Li - Sr. Applied Scientist, Amazon
Add to My Interests
L7138 - Getting Started with CUDA C/C++ In this hands-on lab, you will learn how to work with the CUDA platform to accelerate C and C++ code on a massively parallel NVIDIA GPU. We'll start with the basics of writing in a CUDA-enabled language, work through accelerating sections of code on the GPU, learn how to error check, and more! As we'll be using GPUs hosted in the cloud, all you are required to bring is a laptop with a modern browser. Prerequisites: None This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Jonathan Bentz - Solutions Architect, NVIDIA
Add to My Interests
S7349 - Getting Started with GPUs for Linux Virtual Desktops on VMware Horizon

You've just been tasked with building a Linux VDI environment for an engineering team with graphics requirements. Now what? Join an NVIDIA GRID Community Advisor to learn the basics of setting up Linux VDI desktops with GPU capabilities and see the results we captured when we built it in the lab. This is a session for those wanting to get started with Linux virtual desktops that need GPU capabilities.

50-minute Talk Trey Johnson - Sr. Solutions Architect, Dell EMC
Tony Foster - Principal Technical Marketing Engineer for EUC Solutions, Dell EMC
Add to My Interests
S7525 - GI Next: Global Illumination for Production Rendering on GPUs Learn how to accelerate the computation of global illumination (a very expensive part of the rendering process) with the aid of GPUs. Porting a production renderer to take advantage of GPUs is a considerable effort and often requires rewriting the whole engine; moreover, custom shaders may not be accessible in source code and often introduce performance penalties if not especially adapted to the accelerator. However, function calls to the renderer's API from within shaders may be intercepted and thus costly functions in the render core may be accelerated outside of the shader code. One such render core API function is the calculation of the global illumination contribution, and it is this part that we accelerate on the GPU. 25-minute Talk Rajko Yasui-Schoeffel - Senior Graphics Software Engineer, NVIDIA
Enzo Catalano - Senior Graphics Software Engineer, NVIDIA
Add to My Interests
S7625 - Going Deeper in Finance

How wide is deep learning applicable in finance? We'll provide an overview of promising deep learning applications in finance. We'll then focus on deep (variational) autoencoders, showing how they can learn hidden representations of unlabeled data and generate new data. This opens interesting new applications in anomaly detection, risk analysis, price prediction, and algorithmic trading. We'll explore some of these use cases with real FX data and illustrate the concepts with interactive notebooks, showing how to build the models using frameworks such as Tensorflow and Keras, and how to use latest Tesla P100 GPUs for training.

25-minute Talk Daniel Egloff - Partner, QuantAlea and InCube
Add to My Interests
S7282 - GPU-Accelerated Convolutional Neural Networks for Protein-Ligand Scoring

We'll describe a convolutional neural network that takes as input a comprehensive 3D representation of a protein-ligand interaction and predicts whether the ligand (a small molecule, like a drug) binds to the protein. We'll provide a brief orientation in structure-based drug design, describe how we effectively use the GPU to efficiently train, evaluate, and visualize our neural networks, and discuss preliminary results and current limitations. Our CNN scoring function outperforms the conventional AutoDock Vina scoring function when ranking poses both for pose prediction and virtual screening.

25-minute Talk David Koes - Assistant Professor, University of Pittsburgh
Add to My Interests
S7397 - GPU-Accelerated Deep Learning Framework for Cyber-Enabled Manufacturing

We'll present a GPU-accelerated deep-learning framework for cyber-manufacturing, which enables real-time feedback to designers regarding the manufacturability of a computer-aided design model. We'll talk about a 3D-convolutional neural network-based approach for learning the manufacturability of a mechanical component. The 3D-CNN can recognize the features in a CAD model and classify it to be manufacturable or non-manufacturable with a greater accuracy than traditional rule-based methods. We'll discuss a novel GPU-accelerated voxelization algorithm used to discretize the CAD model and prepare it for deep learning. We'll briefly outline the challenges in training a 3D-CNN using complex CAD models on a GPU (NVIDIA TITAN X) with limited memory. Finally, we'll touch upon different methods to extend the framework to other manufacturing processes, such as additive manufacturing and milling.

25-minute Talk Adarsh Krishnamurthy - Assistant Professor, Iowa State University
Aditya Balu - Ph.D. Student, Iowa State University
Add to My Interests
S7290 - GPU-Accelerated Natural Language Processing

We'll give an introduction into natural language processing on GPUs. So far, GPUs are not used in big data as much as they should. We'll show how GPUs can bring deep learning techniques into production for large big data systems. We'll discuss some of the possible use cases of NLP, and w'll see why the techniques used up until now havent been enough. We'll talk about vector embeddings, and see in a live demo why they do convey the semantic information we're looking for when processing language.

50-minute Talk Guillermo Molini - Madrid, Wavecrafters
Add to My Interests
S7367 - GPU-Accelerated Similarity Searching in a Database of Short DNA Sequences

The challenge: do interactive similarity searching in a SQL database that contains billions of short DNA sequences. The response: this database query is amenable to GPU acceleration because efficient numerical computation can be carried out in parallel on large numbers of independent data items. Implementation details and performance will be discussed, with emphasis on the integration of GPU computation with the database server environment.

25-minute Talk Richard Wilton - Associate Research Scientist, Johns Hopkins University
Add to My Interests
S7390 - GPU-Accelerated VDI for Car Design Environments Honda's evolutionary new project?internally called the "Next-gen Engineering Workstation (EWS) Project"?is designed to optimize usage of our CAD-VDI environment for R&D offices and factories. The project's challenges are to move from the existing physical EWS and pass-through VDI environments to an NVIDIA GRID vGPU environment. All while improving user density (CCU/server), usage monitoring, resource optimization for designers, and flexible resource reallocation. Honda successfully deployed more than 4,000 concurrent CAD-VDI users in its initial phase, with aggressive plans to further increase utilization. This session will review the project's challenges and Honda's future vision. 25-minute Talk Hiroshi Konno - Assistant Project Leader, Honda R&D Co., Ltd.
Yuma Takahashi - CAD Administrator, Honda R&D Co., Ltd.
Masashi Okubo - Large Project Leader, Honda R&D Co., Ltd.
Add to My Interests
S7759 - GPU Acceleration in Intuit’s SmartLook

TurboTax, Intuit's leading tax product for consumers and small businesses, is getting even more personal with expert human help available on-demand, within the product at the point of need. Exclusively with TurboTax SmartLook, customers can now connect live, via one-way video, to credentialed CPAs or enrolled agents to get personalized, real-time answers to their tax questions whenever they need it through the TurboTax SmartLook feature. Intuit's team of tax experts are located throughout the country and leverage the SmartLook application from virtualized desktops. Learn how Intuit is transforming the customer experience with the help of GPU acceleration while also ensuring the security, performance, and manageability of their virtual application solution. 

50-minute Talk Bill Schuller - Domain Architect, Intuit
Add to My Interests
S7238 - GPU Acceleration of Airway Reconstruction Guided Deep Learning on Lung Cancer Detection Recent research in deep learning has reached a state-of-art accuracy in various domains, including image classification, voice recognition, natural language processing, music generation, drug discovery, and genomics. In the diagnoses of lung diseases, the structure of the airway is critical for doctors to recognize abnormal sites such as cancer or tumors. The process of 3D airway reconstruction can work as feature extraction to help the recognition of benign tumors. While both the reconstruction and the deep learning requires a large computational resource and memory usage, these tasks are also time-consuming. With the advent of ever-improving GPUs, parallel programing can largely enhance the performance of lung cancer detection. 25-minute Talk Yuwei Chang - Student, National Taiwan University
Add to My Interests
S7561 - GPU Acceleration of a Large Eddy Simulation Software for High-Pressure, Supercritical Reacting Flows RAPTOR is a massively parallel flow solver for the simulation of turbulent combustion. In preparation for the upcoming Summit system at the Oak Ridge Leadership Computing Facility, a performance portable and GPU-ready version of RAPTOR has been developed. A combination of programming models have been used to convert the distributed memory parallel code to a hybrid parallel code with multiple levels of parallelism. Major performance-critical kernels have been reimplemented in C++ using the Kokkos programming model. The main flow solver has been accelerated using OpenMP compiler directives. We'll present the performance characteristics of RAPTOR on the IBM Minsky system for a high-pressure, supercritical reacting flow problem with applications in the aerospace and energy industry. 25-minute Talk Ramanan Sankaran - Computational Scientist, Oak Ridge National Laboratory
Levi Barnes - Engineer, NVIDIA
Add to My Interests
S7417 - GPU Acceleration of Monte Carlo Simulation for Capital Markets and Insurance Learn about CUDA-based GPU acceleration of Monte Carlo simulations in the financial industry for pricing, risk management, and regulatory calculations. We'll provide an overview of three use cases. (1) Pricing with tens or hundreds of thousands of Monte Carlo "paths" or scenarios, depending on complexity of the financial instrument. (2) For new international regulatory capital requirements introduced in January 2016 and also for new margin requirements that are in effect since September 2016, we'll discuss calculation of cost of capital and margin throughout the life of a portfolio which requires nested Monte Carlo simulation. (3) Since the insurance industry uses a smaller number of Monte Carlo paths for pricing, we'll consider other approaches to take advantage of GPU acceleration, such as grouping similar policies together and policy code optimizations. We stress the importance of NVLink for accelerating the pricing of insurance policies. 25-minute Talk Serguei Issakov - Global Head of Quantitative Research and Development, Senior Vice President , Numerix
Add to My Interests
S7735 - GPU Acceleration of the HiGrad Computational Fluid Dynamics Code with Mixed OpenACC and CUDA Fortran

We'll present the strategy and results for porting an atmospheric fluids code, HiGrad, to the GPU. Higrad is a cross-compiled, mixed-language code that includes C, C++, and Fortran, and is used for atmospheric modeling. Deep subroutine calls necessitate detailed control of the GPU data layout with CUDA-Fortran. We'll present initial kernel accelerations with OpenACC, then discuss tuning with OpenACC and a comparison with specially curated CUDA kernels. We'll demonstrate the performance improvement and different techniques used for porting this code to GPUs, using a mixed CUDA-Fortran and OpenACC implementation for single-node performance, and scaling studies conducted with MPI on local supercomputers and Oak Ridge National Laboratory's Titan supercomputer, on different architectures including the Tesla K40 and Tesla P100.

25-minute Talk Jenniffer Estrada - Researcher, Los Alamos National Laboratory
Add to My Interests
S7400 - GPU-Cloud Photorealistic Rendering for the Next Generation of Cloud CAD Tools

We'll introduce OneRender, a photorealistic elastic cloud solution for accelerated rendering. OneRender is connected with, a leading cloud CAD solution. We'll present a general overview of these two platforms, and explain how they connect to each other. We'll talk about the challenges and solutions to communicate complex geometries from Onshape CAD format to OneRender format, and continuously maintain consistency with any change in the former. OneRender core engine is built on top of the NVIDIA OptiX framework for ray tracing and GPU-based acceleration. Furthermore, OneRender has the capability of launching multiple GPU clouds in parallel to accelerate rendering process. Then, we'll give an overview of the GPU usage vs user arrival and workload growth. Finally, we'll show some examples of real-world CAD designs and photorealistic renderings visualizations.

25-minute Talk Miguel Arias - CEO, Prefixa
Add to My Interests
S7248 - GPU Computing for the Construction Industry: AR/VR for Learning, Planning, and Safety We'll dive headfirst into some of the current challenges of the construction industry, how we're addressing them, and how we're planning to utilize virtual/augmented reality and real-time GPU computing to address them. To optimize the construction of a building, site logistics must be planned, and all systems analyzed and coordinated to confirm constructability. Along with the use of building information modeling (BIM) and the advent of inexpensive GPU and AR/VR hardware, we're building tools to redefine the planning and analysis process for construction management. No longer are virtual and augmented reality systems just for entertainment; they can help us plan faster, help confirm our client's design goals, and facilitate stronger communication among our team members before and during the construction process. 25-minute Talk Kyle Szostek - Sr. Virtual Construction Engineer, Gilbane Building Company
Ken Grothman - Sr. VDC Engineer, Gilbane Building Company
Add to My Interests
S7342 - GPU Data Mining in Neuroimaging Genomics

Large datasets of imaging and genomic data have become available for research into the correlation between genome and brain structure for Alzheimer's disease. We'll present a GPU-enabled tool that permits interactive correlation between the attributes of the MRI voxels and single nucleotide polymorphisms in DNA sequences of Alzheimer's patients. The system runs on a desktop PC and is several orders of magnitude faster than the Matlab version.

25-minute Talk Robert Zigon - Sr Staff Research Engineer, Beckman Coulter
Add to My Interests
S7156 - GPU-Enabled Comparative Genomics Calculations on Leadership-Class HPC Systems

We'll describe recent work to map comparative genomics algorithms to GPU-accelerated leadership-class systems. The explosion in availability of genomic data holds promise for enabling determination of the genetic causes of phenotypic characteristics, with applications to problems such as the discovery of the genetic roots of diseases. The growing sizes of these datasets and the quadratic and cubic scaling properties of the algorithms necessitate use of leadership-scale accelerated computing. We'll discuss the mapping of two-way and three-way algorithms for comparative genomics calculations to large-scale GPU-accelerated systems. Focusing primarily on the Proportional Similarity metric and the Custom Correlation Coefficient, we'll discuss issues of optimal mapping of the algorithms to GPUs, eliminating redundant calculations due to symmetries, and efficient mapping to many-node parallel systems. We'll also present results scaled to thousands of GPUs on the ORNL Titan system.

25-minute Talk Wayne Joubert - Computational Scientist, Oak Ridge National Laboratory
Add to My Interests
S7254 - GPU-Enabled Differential Dependency Network Analysis of Large Datasets We present EDDY-GPU, a GPU-accelerated algorithm to identify pathways enriched with differential dependencies between two conditions. High sensitivity has been one benefit of this statistical rigor yet at considerable computational cost, which limits the size of data for EDDY analysis. However, the ample and regular compute, coupled with small memory footprint, positioned EDDY as an ideal candidate for GPU-acceleration. Now complete, EDDY-GPU exhibits two orders of magnitude in performance enhancement. Such improvement provides new opportunities for EDDY-GPU such as 1) TCGA pan-cancer analysis to identify pathways perturbed by multiple mutation compared to wild-type, and 2) personalized target discovery of an individual tumor patient enabled by single cell RNAseq profiles of tumor sample. 25-minute Talk Gil Speyer - Senior Postdoctoral Fellow, The Translational Genomics Research Institute
Add to My Interests
S7645 - GPU Open Analytics Initiative (GOAI) Panel Discussion

This panel discussion and QA is a great opportunity to hear first-hand about the goals and current progress of GPU Open Analytics Initiative (GOAI) with its founding members, MapD, H2O, and Continuum. 

25 minutes Panel Jim McHugh - VP and GM, NVIDIA , NVIDIA
Stanley Seibert, University of Pennsylvania
SriSatish Ambati - CEO and Co-Founder, H2O
Todd Mostak - Founder and CEO, MapD
Add to My Interests
S7105 - GPU Scheduling and Synchronization for ADAS

Learn how the GPU schedules different workloads, and how it solves the challenges when developing ADAS systems. In these systems, some functionalities are expected to be executed with deterministic manner, and even prioritized and synchronized with different functionalities involved with GPU. We'll discuss the preemption feature in different GPU architectures, and also introduce two different approaches for achieving deterministic and priority execution of different GPU functionalities.

25-minute Talk Venugopala Madumbu - Software Architect, NVIDIA
Add to My Interests
S7382 - GPUs Unleashed: Analysis of Petascale Molecular Simulations with VMD We'll showcase recent successes in the use of GPUs to accelerate challenging molecular simulation analysis tasks on the latest NVIDIA?Tesla?P100 GPUs on both Intel and IBM/OpenPOWER hardware platforms, and large-scale runs on petascale computers such as Titan and Blue Waters. We'll highlight the performance benefits obtained from die-stacked memory on the Tesla P100, the NVIDIA NVLink# interconnect on the IBM "Minsky" platform, and the use of NVIDIA CUDA?just-in-time compilation to increase the performance of data-driven algorithms. We will present results obtained with OpenACC parallel programming directives, current challenges, and future opportunities. Finally, we'll describe GPU-accelerated machine learning algorithms for tasks such as clustering of structures resulting from molecular dynamics simulations. 50-minute Talk John Stone - Senior Research Programmer, University of Illinois Urbana-Champaign
Add to My Interests
S7764 - GPUs: Using HMM to Blur the Lines Between CPU and GPU Programming

Heteregeneous memory management (HMM) is the name of an upcoming Linux kernel patchset, authored by Red Hat's Jerome Glisse. The patchset enables GPU programmers (CUDA programmers, for example) to write code that treats "a pointer as a pointer": the same pointer values can be used in both CPU and GPU code. This significantly simplifies writing new CUDA programs and porting older C/C++ (or even Fortran) programs to use GPU acceleration. In other words, malloc(3) can be called to allocate a buffer on the CPU, and that buffer's address can be passed to a CUDA kernel that runs on the GPU. HMM migrates the pages automatically. This session includes: improved programming model, some bandwidth and tuning considerations, kernel details.

25-minute Talk John Hubbard - Principal Software Engineer, NVIDIA
Add to My Interests
S7309 - Graph500: From Kepler to Pascal

How to solve massive graphs BFS on GPU architecture, case of the Graph500 benchmark. Our work present the results from CPU to Tesla Kepler GPUs and then the new P100 GPU provided in the DGX-1. This session will present the algorithms for single and multiGPU BFS on large graphs with results up to 256 GPUs on the french cluster ROMEO in the Reims University. The ways to solve these kind of very irregular problem will be discussed and detail the algorithms. We'll show that even if the algorithms do not fit the GPU architecture, the real limitation stays in the communications between the nodes. But using the Infiniband QdR interconnect with a GPU-aware MPI and GPUDirect implementation allowed us to provide very interesting results. We'll also show the performance we get by using the new NVIDIA DGX-1 applied on these kinds of problems.

25-minute Talk Julien Loiseau - Ph.D. Student, URCA/CReSTIC
Michael Krajeacki - Professor, University of Reims Champagne-Ardenne
Add to My Interests
SE7143 - GTC Party

Celebrate with peers at the San Jose Tech Museum of Innovation. Previous events included laser mazes, a robot bartender, an AR funny mirror, a drone race track, and more. Don't miss it this year.

4-Hour Special Event
Add to My Interests
S7676 - Half Precision Benchmarking for HPC With Tegra X1 and Pascal architecture Tesla P100 GPUs, NVIDIA introduced hardware-based computation on FP16 numbers also called half-precision arithmetic. We'll introduce the steps required to build a viable benchmark for this new arithmetic format. This will include the connections to established IEEE floating point standards and existing HPC benchmarks. The discussion will focus on performance and numerical stability issues that are important for this kind of benchmarking and how they relate to NVIDIA platforms. 25-minute Talk Piotr Luszczek - Research Director, University of Tennessee
Add to My Interests
S7840 - Harnessing AI in Healthcare

As computers outperform humans at complex cognitive tasks, disruptive innovation will increasingly remap the familiar with waves of creative destruction.  And in healthcare, nowhere is this more apparent or imminent than at the crossroads of Radiology and the emerging field of Clinical Data Science. As leaders in our field, we must shepherd the innovations of cognitive computing by defining its role within diagnostic imaging, while first and foremost ensuring the continued safety of our patients.  If we are dismissive, defensive or self-motivated - industry, payers and provider entities will innovate around us achieving different forms of disruption, optimized to serve their own needs.  To maintain our leadership position, as we enter the era of machine learning, it is essential that we serve our patients by directly managing the use of clinical data science towards the improvement of care—a position which will only strengthen our relevance in the care process as well as in future federal, commercial and accountable care discussions. We'll explore the state of clinical data science in medical imaging and its potential to improve the quality and relevance of radiology as well as the lives of our patients.

50-minute Talk Dr. Keith Dreyer - Vice Chairman and Assnt Professor of Radiology, Massachusetts General Hospital and Harvard Professor
Add to My Interests
S7785 - Harnessing the Power of Anaconda for Scalable Data Science Many data scientists use Anaconda and Python to increase their productivity, but don't realize they can leverage these technologies for scalable analysis. We'll survey the landscape of Python tools that empower data scientists to take their work to the next level, harnessing the growing computing capability of GPUs and clusters. We'll show the power of Python to drive distributed computation with Spark and Dask, execute large-scale machine learning with TensorFlow, and visualize large datasets right in the web browser. 50-minute Talk Stanley Seibert - Director of Community Innovation, Continuum Analytics
Peter Wang - CTO & Co-Founder, Anaconda Powered By Continuum Analytics
Add to My Interests
SE7102 - Heterogeneous Hierarchical Async Tasking: Making it Real

Special Event

3-Hour Special Event CJ Newburn - Principal HPC Architect for Compute SW, NVIDIA
Add to My Interests
S7247 - High-Bandwidth 3D Image Compression to Boost Predictive Life Sciences

Modern microscopes easily produce large data volumes (terabyte datasets) at high rate (1,000 megabytes/s is no exception) that makes using them almost impossible. Once an acquisition is started, it typically has to be stopped again as the hard drives run full. We'll share how GPUs helped us bring this nightmare to an end. We'll introduce our open-source package, called sqeazy, that is capable of compressing microscopic data at faster speeds than a hard drive can spin. We show how GPUs provided a crucial boost in this endeavor and we'll share what technical challenges we overcame interfacing with modern video encoding libraries, like libavcodec of ffmpeg. Finally, we'll discuss how NVENC provides portable performance that helps scientists to observe living developing specimens over long time spans. This may be the foundation for modern predictive biology of the 21st century. Join us for a tour on how modern media technology straight from Hollywood can boost science!

25-minute Talk Jeffrey Kelling - Scientist, Helmholtz-Zentrum Dresden-Rossendorf
Add to My Interests
S7205 - High-End Design & Visualizations on Azure

Learn how you can easily provision cloud-based workstations and VDI infrastructure to provide designers, engineers and scientists the ability to create high fidelity models, designs and visualizations. Powered by NVIDIA's Tesla GPUs and NVIDIA GRID capabilities you can run hardware accelerated applications to design the next concept car or industrial parts or even creating the next Hollywood blockbuster. Hear about customer stories and learn about partner solutions that enable anyone to have a high end workstation in their back pocket!

25-minute Talk Karan Batta - Senior Program Manager, Azure HPC Team, Microsoft
Nikola Bozinovic - CEO, Frame
Add to My Interests
S7266 - Higher Performance LBM Simulation on GPUs The Lattice Boltzmann method has been used widely in the simulations of turbulence, porous media flow, and multiphase flows. It's efficient for its high parallelism and scalability, however, due to the low ratio of computational operations to memory access requirements, LBM simulations are memory-bound and their actual performance is typically 10 to 15% of the peak performances for both CPUs and GPUs. We'll introduce our efforts to boost its performace by reducing the memory access and increase the computational operations by considering more complex physical processes and integrating statistical and visualization operations for interactive dynamic simulation of multiphase flows. The direct numerical simulation of gas-solid flow is carried out using NVIDIA Tesla K80 and P100 GPUs with encouraging results. 25-minute Talk Wei Ge - Professor, Institute of Process Engineering
Add to My Interests
S7835 - High-Fidelity Light Field VR Playback Using NVIDIA GPUs

We'll outline the process of producing content with Lytro Immerge, a production-ready Light Field video solution for virtual reality and then dive deeply into the unique challenges of playing back a high-fidelity Light Field at 90 frames per second using NVIDIA Pascal GPUs.  Lytro Immerge was used to produce Hallelujah, recently shown at Tribeca Film Festival 2017. Lytro's 6DoF playback allows the viewer to move around freely in the viewing experience, complete with parallax, view-dependent illumination, perfect stereo in all directions and no matter the orientation of the viewer's head.

25-minute Talk Nikhil Karnad - Member of Technical Staff, Lytro
Tim Milliron - VP of Engineering, Lytro
Add to My Interests
S7345 - High-Performance Broadcast Designs for Streaming Applications on Multi-GPU InfiniBand Clusters Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various NVIDIA?CUDA?features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on the challenges in combining and fully utilizing GPUDirect RDMA (GDR) and hardware InfiniBand multicast technologies in tandem to design support for high-performance heterogeneous broadcast operation for streaming applications. Further, we present associated challenges and designs in supporting reliability for clusters with multi-HCA and multi-GPU configurations. Performance evaluation of the proposed designs on various system configurations will be presented and analyzed. 25-minute Talk Dhabaleswar K. (DK) Panda - Professor and University Distinguished Scholar, The Ohio State University
Add to My Interests
S7569 - High-Performance Data Loading and Augmentation for Deep Neural Network Training Next-generation GPUs have revealed that data loading and augmentation can be a major bottleneck to accelerating deep neural network training on many-GPU distributed systems. This work presents the design and implementation of a high-performance data loading and augmentation system for the Expresso deep learning framework developed by Samsung. Our system leverages multiple levels of parallelism and automatic runtime performance tuning to achieve speedups of 15.5% on average across our experiments. 50-minute Talk Trevor Gale - Student, Northeastern University
Steven Eliuk - Project Lead
Add to My Interests
S7571 - High-Performance Deep Learning on Embedded Devices MXNet

Learn how to compile and run an optimized version of the MXNet deep learning framework for various embedded (IoT) devices, as well as see the wide range of exciting applications that running deep-network inference in near-realtime on "edge" devices opens up. Specifically, we'll be showing performance numbers for a variety of deep learning models based in MXNet running on Raspberry Pis as well as TK1 processors, demonstrating the massive efficiency gains on embedded devices MXNet yields over comparable frameworks. We'll then demo the power of real-time image processing via deep learning models with an example application walkthrough. Finally, we'll demonstrate how to use AWS IoT services to massively augment the flexibility and reliability of the models running in our example application.

50-minute Talk Aran Khanna - Software Engineer, Amazon Web Services
Miro Enev - Solution Architect, Deep Learning, NVIDIA
Add to My Interests
S7413 - High-Performance Machine Learning for Weather Prediction Applications Learn how statistical modeling is revolutionizing weather/climate prediction applications. Such models offer high fidelity in theory and are increasingly viewed as potential replacements to actual simulations. The main drawbacks of such models are the expensive number of flops and the overhead of the memory footprint due to computations resulting from the large dense covariance matrix, which makes it unrealistic in practice. By exploiting the low rank structure of the matrix and redesigning the underlying linear algebra in terms of batch operations, the fidelity of the model is not only maintained but also the corresponding performance achieved on GPUs is unprecedented. Low-rank matrix computations on GPUs boosts existing machine learning algorithms for weather prediction applications and opens new research directions. 25-minute Talk Hatem Ltaief - Senior Research Scientist, KAUST
Add to My Interests
S7727 - High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks

We'll describe our efforts in building an efficient convolutional neural network capable of automating breast cancer screening. First, we'll highlight fundamental differences between natural and medical images, as well as differences in current practices when training neural networks on these types of data. Second, we'll describe the architecture of our network, its training process and promising experimental results. Then we demonstrate how decisions of our network can be explained by visualizing parts of image that had the greatest influence on the predictions made. Our visualization reveals surprising agreement between radiologists and the network in spotting important regions of interest. Finally, we'll discuss future directions of research necessary to automate early diagnosis of breast cancer and beyond using medical imaging technology.

25-minute Talk Krzysztof J. Geras - Postdoctoral Researcher, New York University
Add to My Interests
S7545 - High-Speed Robotic Weeding Blue River Technology builds "See & Spray" robots for agricultural applications. Its current product sees, detects, optimizes, and acts on 10% of the lettuce produced in the U.S. and is capable of plant-by-plant care. We'll go through the milestones in developing and deploying computer vision systems into a market where high reliability is expected, data is biased, compute platforms need to be rugged, and the system needs to run in real time. 25-minute Talk Lee Redden - CTO, Blue River Technology
Add to My Interests
S7583 - Homebyme: How Iray, VCA, Deep Learning and VR Helps you Experience your New Apartment Before it is Built

Learn how space planning experience can be enhanced with the help of realistic rendering using NVIDIA Iray. We'll present functionalities designed to help the consumer project himself in his soon-to-be apartment, like 360 renderings and light baking for an immersive experience in VR ; luminosity study for a better lighting of your flat ; or inspirational rendering for a quick preview of various combinations of materials and furniture in your room. This functionalities are based on Iray supported by a combination VCA architecture and Amazon instances for content production, and a WebGL planner for an out of the box experience for the consumer.

50-minute Talk Ankit Patel - Senior Product Manager, NVIDIA
Jonathan Merlet - 3DVIA Software Engineer, Dassault Systemes SE
Add to My Interests
S7661 - How Artificial Intelligence and Edge Computing Are Transforming Driver Safety, Recognition, and Retention

Through the application of artificial intelligence and deep learning, "computing at the edge" is changing how safety systems are detecting, capturing, analyzing, and applying reasoning to events. Using real-time analysis of the data from cameras and inertial sensors mounted on a vehicle, we can not only detect unsafe driving events but also analyze the chain of events that lead to unsafe situations. We can recognize driver's positive performance in addition to areas where best practices need to be reinforced. Power-efficient and powerful deep learning processors enable us to process all of this data in real time at the edge of the network. This allows us to create an accurate and comprehensive record of driving performance that fleet managers can use to create incentives for safer driving. Insurance companies can also use this information to set proper premiums customized for individual drivers and potentially adjusted dynamically to reflect the driving environment. 

25-minute Talk Avneesh Agrawal - CEO, Netradyne
Add to My Interests
S7660 - How Deep Learning is Powering Intelligent Video Analytics in AI Cities

Get an in-depth understanding of how Dahua is solving the challenge of city-scale video analysis with deep learning accelerated on Tesla GPUs. We'll cover real-time decoding over many camera streams combined with high-accuracy object detection and analysis. We'll discuss the challenges of solving for detection and classification video use cases on flexible, scalable, and efficient architectures leveraging NVIDIA SDKs like TensorRT.

25-minute Talk Ping Chen - Senior Researcher, Zhejiang Dahua Technology co.,ltd
Add to My Interests
S7114 - How GPUs and Deep Learning Help to Make Dental Care More Affordable Learn about the unique challenges being solved using deep learning on GPUs in a large-scale mass customization of medical devices. Deep neural networks have been successfully applied to some of the most difficult problems in computer vision, natural language processing, and robotics. But we still haven't seen the full potential of this technology used in manufacturing. Glidewell Labs daily produces thousands of patient specific items, such as dental restorations, implants, and appliances. Our goal is to make high-quality restorative dentistry affordable to more patients. This goal can only be achieved with flexible, highly autonomous CAD/CAM systems, which rely on AI for real-time decision making. 25-minute Talk Sergei Azernikov - Machine Learning Team Lead, Glidewell Dental
Add to My Interests
S7618 - How GPUs Power Comcast's X1 Voice Remote and Smart Video Analytics

We'll describe the deep learning models behind Comcast's X1 Voice Remote and Smart Video Analytics and how we use GPUs to train and run these models. We'll explain how we can accurately parse the millions of voice queries we receive every day, how we automatically determine the domain of a query (TV, sports, billing, etc.), and how deep learning helps us understand what is happening on TV at any given moment. We'll also go into detail about how our distributed multi-GPU clusters speed up training the models and enable inference on millions of voice commands and hundreds of thousands video clips every day.

25-minute Talk Jan Neumann - Director, Technical R&D, Comcast
Add to My Interests
S7433 - How to Achieve Real-Time Analytics on a Data Lake Using GPUs

The complexities associated with development and ongoing management of a data lake that aims to deliver real-time analytic response can be costly and overwhelming. To get real-time analytic response on live, streaming data, consider plugging a GPU-accelerated database into your data lake. GPUs are often embedded in compute-intensive technologies like video games, cars, and mobile devices. They're now gaining traction in the data center. This talk will describe how a GPU-accelerated, scale-out, in-memory database brings orders of magnitude more compute power, with a significantly smaller hardware footprint, to provide unrivaled analytic capabilities. Get the latest information on GPUs, and how their multi-core architecture can process many computations efficiently and quickly, making them ideal for today's streaming datasets and IoT use cases.

25-minute Talk Mark Brooks - Principal Solutions Engineer, Kinetica
Add to My Interests
S7836 - How to Become a Self-Driving Car Engineer

Learn how Udacity trains engineers to work on autonomous vehicles! Topics include deep learning, computer vision, sensor fusion, localization, control, path planning, and system integration. You'll cover the technical challenges and trends of self-driving cars and the autonomous vehicle industry. Review examples of the projects that Udacity students build to learn and showcase their autonomous vehicle skills.

25-minute Talk David Silver - Self-Driving Car Team Lead, Udacity
Add to My Interests
S7184 - How to Bring Engineering Datasets on Head-Mounted Displays Hear visualization experts explain why people in professional visualization, in particular virtual engineering, are great candidates to unleash the full potential of HMDs and how close today's technology pushes application developers to the finish line of discovering massive datasets with HMDs. Learn about new hardware (NVIDIA Pascal?-powered NVIDIA Quadro?GPUs), extensions, APIs (NVIDIA VRWorks?: NVIDIA SLI?VR, Single Pass Stereo), techniques (GPU culling), and next steps that enable ESI to create amazing VR experiences even with high node and triangle count. 50-minute Talk Andreas Mank - Line Manager ADAS, Elektrobit
Ingo Esser - Senior Developer Technology Engineer, NVIDIA
Add to My Interests
S7128 - How to Enable NVIDIA CUDA Stream Synchronous Communications Using GPUDirect Learn how to enable CUDA stream synchronous communications in your applications by employing novel GPUDirect features. 50-minute Talk Elena Agostini, NVIDIA
Davide Rossetti - Senior Software Engineer, NVIDIA
Add to My Interests
S7842 - How Triage is Detecting Skin Cancer from Smarthphones with Deep Learning (Presented by Triage)

You'll learn how Triage is using deep learning to diagnose skin cancer from any smartphone. 1 in 3 cancer diagnoses is skin cancer and 1 in 5 Americans will develop skin cancer in their lifetime. The average wait time to see a dermatologist in the United States is 1 month and even greater in other parts of the world. In that time skin disorders can worsen or become life threatening. Triage's Co-Founder and CEO, Tory Jarmain, will demonstrate how they trained a Convolutional Neural Network to instantly detect 9 in 10 cancer cases with beyond dermatologist-level accuracy. Tory will also show how Triage's technology can identify skin disorders across 23 different categories including acne, eczema, warts and more using Deep Residual Networks.

25-minute Talk Tory Jarmain - CEO, Triage
Add to My Interests
S7439 - How Video Analytics Help to Improve Efficiency for Broadcasting Industry Arcvideo is top video solution provider in China, targeting broadcasting companies, TV stations, and recent booming game/entertainment live broadcasting and online education markets. Video codec, intelligent video analytics, universal end device player, and cloud video service are four pillars of our product line. We'll discuss GPU-accelerated intelligent video analytics, which plays an increasingly important role in video-related products and services, bringing more efficiency to handling tons of emerging video content, and better interaction between end users and their video interests. 25-minute Talk Jin Huang - CTO, Arcvideo, Inc.
Add to My Interests
S7725 - HPC and Machine Learning-Based Applications at Shell We'll cover applications that range from the traditional HPC world to the more exploratory machine learning-based setup. On the traditional HPC front, we have tested the speedup and scalability of computational chemistry open source packages VASP and LAMMPS on HPC clusters fitted with GPGPUs and compared against CPU-only nodes. A performance enhancement of at least 3x is observed, which agrees with available literature. On the machine learning front, we're experimenting with Google's TensorFlow and the applicability of deep learning approaches for a set of challenging problems at Shell, including (a) searching for an optimal dispatch strategy under uncertainty, (b) searching for price prediction patterns in European energy data, and (c) fault detection in raw seismic data. The latter was presented at GTC last year. This time, we'll show scalability results using TensorFlow. GPGPUs have been key in speeding up model training for all these applications. 25-minute Talk Mauricio Araya-Polo - Senior Researcher Computer Science, Shell International Exploration and Production Inc.
Add to My Interests
S7340 - Hydra: A Framework for Data Analysis in Massively Parallel Platforms We'll discuss Hydra, a templatized header-only, C++11-compliant library for data analysis on massively parallel platforms targeting, but not limited to, the field high-energy physics research. Hydra supports the description of particle decays via the generation of phase-space Monte Carlo, generic function evaluation, data fitting, multidimensional adaptive numerical integration, and histograming. 25-minute Talk Antonio Augusto Alves Junior - Post-doc, University of Cincinnati
Add to My Interests
S7123 - IBM Watson: AI in VR Learn about the intersection between the emerging fields of artificial intelligence and virtual reality. From applications in science, training, therapy, rehabilitation, productivity, education, and, yes, gaming, attendees can expect to learn some concrete examples from the speaker's own experience and be presented industry trends and understand more about where AI and VR are going in both the near and long-term future. 25-minute Talk Michael Ludden - Senior Product Manager at IBM Watson, IBM Watson
Add to My Interests
S7477 - IFM: Intelligent Flying Machines Counting Inventory We'll describe how Intelligent Flying Machines is leveraging NVIDIA GPU technology to fully automate inventory counting in warehouses with flying robots. Three different topics will be covered: First, we'll talk about the challenges of commercializing advanced robotics technology for industrial applications and how IFM has developed a framework that enables effective deployment and implementation. Then, we'll discuss recent advances in leveraging an onboard Jetson TX1 GPU for highly accurate, long-distance visual inertial navigation in a warehouse environment. Finally, we'll show how IFM is using deep learning to enable its flying robots to adapt to different types of warehouses and identify key pieces of information in the environment. 50-minute Talk Marc Gyongyosi - CEO and Founder, Intelligent Flying Machines
Sushobhan Ghosh - AI Lead Engineer, Intelligent Flying Machines
Add to My Interests
S7832 - Illuminating AI: Understanding the AI's Goals, Reasoning and Compromises

Why does existing AI need to be explainable? Applying AI where someone may get hurt requires: safety regulation, explanation & understanding. But the internal decisions of AI are not explainable, earning a well-deserved designation of "black-box". Thus until AI becomes more transparent, applications will remain limited in fields where safety and understanding are paramount such as medicine, self-driving cars and banking. Illuminated AI helps debug your models by revealing goals of the network and insights from test cases. With this knowledge you, your customers and those who certify AI algorithms can be more confident of your AI. We enter a new era of introspective AI that is more trusted and safer, allowing broader applications in medicine, self-driving cars and banking.

50-minute Talk Tsvi Achler - CEO, Optimizing Mind
Add to My Interests
L7144 - Image Classification and Object Detection using NVIDIA Jetson TX2 (Presented by NVIDIA Deep Learning Institute)

Learn to build an end-to-end Deep Learning pipeline. You'll develop the skills to not only train a deep neural network but also how to deploy it in a production environment. In this lab, you take pre-trained image classification and object detection networks and deploy them on Jetson TX1 or TX2 Developer Kits. You will then test these networks using the built-in camera to classify and detect several real-world objects. The networks will be deployed in a variety of programming environments, and we will even cover how to optimize classification and detection performance at runtime using NVIDIA's TensorRT inference engine library. Prerequisites: None

120 Instructor-Led Lab Michael Mendelson - Curriculum Designer & Certified Instructor, NVIDIA Deep Learning Institute
Add to My Interests
L7129 - Image Classification & Object Detection using Deep Learning & MATLAB

This lab will use an object recognition/image classification example to teach how to apply deep learning to practical problems. You will learn how to: import and manage large datasets; train, evaluate and compare different deep learning models; extract discriminative information from images, and; use transfer learning to fine-tune neural networks for new tasks. We will use the new MATLAB framework for deep learning and real-world examples including data used for ADAS and autonomous driving. Prerequisites: Prior experience with or familiarity with MATLAB is preferred as MATLAB is used for all examples and exercises. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Avi Nehemiah - Product Manager- Computer Vision and Automated Driving, MathWorks
Add to My Interests
L7131 - Image Classification using the Theano Python Library

You will learn how to use the Theano framework, a software compiler/library based on Python, to classify images using the LeNet model as well as work through a few other useful machine learning examples accelerated on NVIDIA GPU. Prerequisites: Some experience with Python. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Frederic Bastien - Team Lead Software Infrastructure, Universite de Montreal
Add to My Interests
L7120 - Image Classification with DIGITS (Presented by NVIDIA Deep Learning Institute)

Learn how to leverage deep neural networks (DNN) within the deep learning workflow to solve a real-world image classification problem using NVIDIA DIGITS. You will walk through the process of data preparation, model definition, model training and troubleshooting. You will use validation data to test and try different strategies for improving model performance using GPUs. On completion of this lab, you will be able to use DIGITS to train a DNN on your own image classification application. Prerequisites: None This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Michael Mendelson - Curriculum Designer & Certified Instructor, NVIDIA Deep Learning Institute
Charles Killam - Curriculum Designer & Certified Instructor, NVIDIA
Add to My Interests
L7145 - Image Classification with TensorFlow: Radiomics - 1p19q Chromosome Status Classification using Deep Learning (Presented by NVIDIA Deep Learning Institute)

Thanks to work being performed at Mayo Clinic, approaches using deep learning techniques to detect Radiomics from MRI imaging can lead to more effective treatments and yield better health outcomes for patients with brain tumors. Radiogenomics, specifically Imaging Genomics, refers to the correlation between cancer imaging features and gene expression. Imaging Genomics (Radiomics) can be used to create biomarkers that identify the genomics of a disease without the use of an invasive biopsy. The focus of this lab is detection of the 1p19q co-deletion biomarker using deep learning - specifically convolutional neural networks ? using Keras and TensorFlow. What is remarkable about this research and lab is the novelty and promising results of utilizing deep learning to predict Radiomics. Prerequisites: Basic understanding of convolutional neural networks and genomics. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Charles Killam - Curriculum Designer & Certified Instructor, NVIDIA
Add to My Interests
S7447 - Image Restoration with Neural Networks

We'll show how image restoration tasks, such as image denoising and demosaicking, super-resolution, and JPEG deblocking can beat the state-of-the-art methods, when performed with neural networks. In particular, we'll show that even a shallow network can produce good results when it is trained to evaluate images in the same way in which humans do, that is, when perceptual loss functions are used in training. We will also discuss strength and limitations of different perceptual loss functions.

25-minute Talk Orazio Gallo, NVIDIA
Add to My Interests
L7122 - Image Segmentation with TensorFlow (Presented by NVIDIA Deep Learning Institute)

There are a variety of important applications that need to go beyond detecting individual objects within an image, and that instead need to segment the image into spatial regions of interest. An example of image segmentation involves medical imagery analysis, where it is often important to separate the pixels corresponding to different types of tissue, blood or abnormal cells, so that you can isolate a particular organ. Another example includes self-driving cars, where segmenting an image into distinct areas is needed to understand road scenes. In this lab, you will learn how to train and evaluate an image segmentation network using TensorFlow. Prerequisites: Basic knowledge of TensorFlow. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop.

120 Instructor-Led Lab Jonathan Bentz - Solutions Architect, NVIDIA
Add to My Interests
S7757 - Immersive Optical-See-Through AR with Meta 2 Meta is soon releasing its second-generation optical see-through display. Different from video see-through displays, which first capture the view and then augment it with additional graphics content, in optical see-through displays, the user sees the environment without any latency, at the speed of light, yielding no conflicts between the visual input and the vestibular system. Many optical see-through technologies produce a relatively narrow field of view, but Meta's design offers a large, 90-degree one. A large field of view provides a higher sense of immersion, instead of providing a floating window to an AR world. Our current system is tethered to a PC with a powerful GPU. We'll give an overview of the Meta 2 display, and the sensors it uses for tracking the environment and the user. 25-minute Talk Kari Pulli - CTO, Meta Co.
Add to My Interests
L7141 - Immersive VR with NVIDIA VR Funhouse

Learn how to use NVIDIA VR Funhouse in this hands-on lab to learn Unreal Engine 4 VR content creation. Attendees will use the mod editor, blueprints, and assets to learn how to create VR experiences. Prior knowledge of Epic's Unreal Engine is recommended, but not required. Attendees will be able to use VR-ready laptops and HMDs to learn. Prerequisites: Attendees should be familiar working in and with Unreal Real Engine 4, but it's not required.

120 Instructor-Led Lab Lou Rohan - Engineer, Tech SW, NVIDIA
Terry Mosier - Technician, NVIDIA
Amanda Bott - Artist, NVIDIA
Add to My Interests
S7524 - Impacts and Paradigms Enabled by GPUs in Engineering Simulations of Discrete Elements We'll explore the impact of the GPU in engineering simulations of discrete elements and glimpse into the future of simulations and engineering training. We consider the roles played by the open-source Blaze-DEMGPU framework we developed, as well as the commercial framework XPS, developed specifically for the pharmaceutical industry by the RCPE GmbH (Research Center Pharmaceutical Engineering GmbH) that allows engineers to simulate process changes before being actually implemented. Industrial-scale discrete element simulations remain a big challenge, but the GPU architecture is changing that perception fast, as is demonstrated by the open-source framework Blaze-DEM and the commercial framework XPS. However, engineering simulation remains characterized by either the analyze-wait-modify-analyze cycle or more recently the batch analyze-wait-modify-batch analyze cycle. The GPU is enabling a new and alternative paradigm denoted interactive simulation and design (ISD) as is demonstrated by Blaze-DEMGPU. We'll explore the algorithmic development of Blaze-DEMGPU in detail with a short historical tour outlining the development as the GPU architectures changed from Kepler to Pascal, enabling higher fidelity models in addition to the natural progression from the conventional analysis cycle towards ISD and the various roles machine learning can play. 25-minute Talk Daniel N. Wilke - Senior Lecturer (PhD) in the Department of Mechanical and Aeronautical Engineering, University of Pretoria
Nicolin Govender - Senior Scientist, Research Center Pharmaceutical Engineering GmbH / CSIR
Add to My Interests
S7166 - Implementing High-Resolution Fluid Dynamics Solver in a Performance Portable Way We'll report on the use of the kokkos C++ library for designing new performance portable implementations of the algorithms used in astrophysics computational fluid dynamics applications. Among others libraries with similar features, kokkos, which is developed at Sandia National Laboratory, provides a very promising way of designing high-performance computing parallel applications with performance portability across multiple hardware architectures, code readability, and high productivity in mind. Many scientific domains use community codes developed by tens of developers, and such high-level language approach will help them use today's GPU and next generations productively. We'll illustrate several advantages of our new kokkos-based implementation of the computational intensive compressible magneto-hydrodynamics kernels involved in code RamsesGPU, and demonstrate its efficiency on a multi GPU platform (NVIDIA Pascal P100). 25-minute Talk Pierre Kestener - Research Engineer, CEA
Add to My Interests
S7504 - Improving Consumer Compliance Through Better Product Recommendation- New Skin Advisor Tool Powered by AI Consumers currently struggle to find the right cosmetic skin care products suited to their personal needs and preferences. Hundreds of brands and product forms sit next to each other on store shelves without simple and intuitive means for consumers to determine what's right for them. A new skin advisor tool has been developed to deliver a personalized beauty consultation tailored for consumers unique skin needs right at her fingertips. We identified that getting the right level of educational information to the consumer, combined with understanding her concerns and aesthetic preferences, can drive product compliance. We collected over 50,000 images of women of known chronological age and built a deep convolutional neural network model that could not only predict a woman's visible skin age with great accuracy but also identify which areas of her face she should focus her skincare on, to improve her skin appearance. Skin age accuracy was validated compared to image gradings from over 350 dermatologists. We'll discuss how we used NVIDIA GPUs and deep learning techniques to develop this new tool. 25-minute Talk Matthew L. Barker, Ph.D. - Principal Data Scientist, Procter & Gamble
Add to My Interests
S7317 - Improving Network Accuracy With Augmented Imagery Training Data

One of the biggest challenges in machine learning today is producing the training data. We'll compare different methods for augmenting a medical imagery training dataset for supervised learning. The different augmentation methods are assessed with respect to their impact on cost, network accuracy, and overfitting. We'll focus on prostate cancer data from the Joint Pathology Center, which is being used in the White House Cancer Moonshot project.

25-minute Talk Niels Olson - Pathology Resident, Naval Medical Center San Diego
Theodore Hromadka - Senior Software Engineer, Integrity Applications Incorporated
Add to My Interests
S7107 - Improving Patient Care Using EchoPixel's Interactive Virtual Reality Technology Get the latest information on how virtual reality is being used to change healthcare outcomes. EchoPixel, a company focused on VR in healthcare, has developed the True 3D Viewer, a real-time, interactive VR platform. It offers physicians an unprecedented opportunity to view and interact with patient tissues and organs in an open 3D space as if they were real, physical objects. The resulting improvement in clinical efficacy and workflow has had a significant positive impact on patient care. 25-minute Talk Janet Goldenstein - Lead Engineer, EchoPixel
Add to My Interests
L7115 - In-Depth Performance Analysis for OpenACC/CUDA/OpenCL Applications with Score-P and Vampir Work with Score-P/Vampir to learn how to dive into the execution properties of CUDA and OpenACC applications. We'll show how to use Score-P to generate a trace file and how to study it with Vampir. Additionally, we'll use the newly established OpenACC tools interface to present how OpenACC applications can be studied for performance bottlenecks. This lab uses GPU resources in the cloud, so bring your laptop. Prerequisites: Basic knowledge on CUDA/OpenACC and MPI is recommended but not required. This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Robert Henschel - Director Science Community Tools, Indiana University
Jiri Kraus - Senior Devtech Compute, NVIDIA
Guido Juckeland - Head of Computational Science Group, Helmholtz-Zentrum Dresden-Rossendorf
Add to My Interests
S7756 - Industrial-Grade Haptics with HaptX and PhysX

Virtual reality can realize its full potential through advanced haptic technology. By adding motion, force feedback, and the full spectrum of touch sensations, a whole new class of applications are possible. We'll dive into the components of realistic touch: tactile, vibration, force feedback, and thermal. Using these four modes of haptic feedback, AxonVR's HaptX platform can reproduce virtually any sensation. HaptX extracts the physical properties of virtual objects with the help of PhysX. It then renders this information as realistic haptic feedback through HaptX Skin ? a smart textile that enables users to feel what they see in VR. The combination of NVIDIA's PhysX and AxonVR's HaptX enable an entirely new class of enterprise and entertainment applications in VR.

25-minute Talk Bob Crockett - Co-founder and Lead Engineer, AxonVR
Add to My Interests
S7679 - Industrial-Level Deep Learning Training Infrastructure: the Practice and Experience from SenseTime

We'll share the practice and experience of how to build an industrial-level deep learning training infrastructure at SenseTime, a leading artificial intelligence company. First, we'll share a new deep learning training framework that was developed by SenseTime from scratch. Second, we'll share the experience of how to build a specially optimized GPU supercomputer for deep learning. Finally, we'll show some applications developed from this training platform.

25-minute Talk Shengen Yan - R&D Director, SenseTime Group Limited
Add to My Interests
S7821 - Industrial Strength AI & Imaging Analytics

We'll focus on some of the knowledge gained from producing industrial strength imaging systems. At General Electric, any imaging analytics and AI are required to perform at a very high level. False positive and negative rate need to be sustained at very low level in live production environments. We'll look at some of the very challenging use cases, including how we conduct inspection on-wing (i.e., without removing an engine from an aircraft) using a borescope, as well as in repair shops where high performing imaging AI is needed to meet very strict requirements when repairing aircraft engine parts. We will also take a look at drone based aerial inspection use cases that puts a very high demand on accuracy of anomaly detections.

50-minute Talk Ser Nam Lim - Director of Advance Analytics and Machine Learning, Senior Principal, General Electric
Add to My Interests
S7487 - Infrastructure Differentiation in Satellite Imagery with Convolutional Neural Networks

We'll discuss efforts to leverage state-of-the art deep learning frameworks to the task of broad area search in satellite imagery. Differing infrastructure projects of very different purpose often look very similar in satellite imagery, and we'll explore the ability of deep learning frameworks to disentangle such classes. A similar complication applies to vehicles viewed from space, where object sizes are often only a couple dozen pixels.

25-minute Talk Adam Van Etten - Research Scientist, IQT
Add to My Interests
S7798 - Inside Volta

The NVIDIA Volta architecture powers the world’s most advanced data center GPU for AI, HPC, and Graphics. Features like Independent Thread Scheduling and game-changing Tensor Cores enable Volta to simultaneously deliver the fastest and most accessible performance of any comparable processor. Join two lead hardware and software architects for Volta on a tour of the features that will make Volta the platform for your next innovation in AI and HPC supercomputing.

50-minute Talk Olivier Giroux - Principal Architect, NVIDIA
Luke Durant - Principal Engineer, CUDA Software, NVIDIA
Add to My Interests
S7790 - Insights From the First Year of VR There are a myriad of choices to make when jumping into VR development. We'll explore how to navigate those decisions, and what the lessons from this first generation of VR content means for future titles. 25-minute Talk Jason Holtman - Head of Publishing, Oculus
Add to My Interests
S7689 - Integrating Deep Learning Platforms Within Enterprise Level Medical Imaging Environments

We'll discuss how deep learning algorithms could be successfully utilized in medical imaging. We'll mainly focus on integration strategies between NVIDIA DIGITS and clinical systems, such as the Picture Archive and Communications System, electronic health records, and clinical and research data warehouses for secondary use of captured clinical data. Attendees should already be familiar with the fundamentals of electronic health record communication standards, clinical data warehouses, image processing, and deep learning.

25-minute Talk Barbaros Erdal - Assistant Professor, The Ohio State University Wexner Medical Center
Add to My Interests
S7864 - Intelligent Automation using Deep Learning in Financial Services - Banking to Insurance
Long term goal of any financial institution is achieve the ability to address users with utmost experience within the boundaries of resources. It could only be a possibility when financial institutions adapt to intelligent systems. The success of such systems depends heavily on the intelligence. Deep Learning has provided a huge opportunity for financial institutions to start building and planning for such large scale intelligent systems which are multi-functional and adapt. In this talk, we will discuss about how we used Deep Learning, Vega as the platform and GPUs to build high scale automation use cases in Fraud detection to complex process automation in both banking and insurance.
25-minute Talk Vinay Kumar Sankarapu - CEO/Founder,
Add to My Interests
S7630 - Intelligent Chatbot on WeChat As one of the biggest social networks in the world, WeChat is innovating the way people acquire needed information, knowledge, and services. We'll present the chatbot development effort made by the WeChat AI team. Both conversational and service-oriented bots are covered. We'll share our vision, our technology, and some good lessons learned while building real-world chatbot products. Especially, we'll demonstrate how deep learning technologies are managed to overcome some hard natural language understanding problems. For example, how the fasttext algorithm is tuned to improve the short question understanding, how to better make use of dialog context by using a recurrent neural network, and how to diversify the dialog response within the encoder-decoder framework. We'll also report our algorithm performance on different GPU cards. 50-minute Talk Cheng Niu - WeChat AI Lab Principal staff, Tencent
Add to My Interests
S7199 - Interactive HPC: Large Scale In-situ Visualization using NVIDIA Index in ALYA MultiPhysics We'll discuss how NVIDIA IndeX Advanced Rendering Tools are helping researchers get more insight through in-situ visualizations. HPC applications have always been centered around large computations, small input, and extremely large simulated output. HPC applications running on big supercomputers are executed using a queuing system, where researchers have to wait a couple of hours before analyzing the outputs. We've designed essential software components that allow in-situ visualizations of sparse volume data from ALYA multiphysics simulation code (Barcelona Supercomputing Center) using NVIDIA IndeX. ALYA multiphysics is one of the two European exascale benchmarks and is used in targeted medicine, cardiac modeling, renewable energy, etc. We'll guide you through techniques that have been used in enabling in-situ rendering and analysis of data 50-minute Talk Christopher Lux - Senior Graphics Software Engineer, NVIDIA IndeX R&D, NVIDIA
Marc Nienhaus - Sr. Engineering Manager, Product Technology Lead, NVIDIA IndeX, NVIDIA
Vishal Mehta - Senior Engineer, Barcelona Supercomputing Center
Add to My Interests
L7146 - Interactive HPC Volume Visualization in ParaView

During this lab session, you will learn about the robust features of NVIDIA IndeX volume visualization tool and how you can take advantage of these features inside ParaView with a simple drop-down menu. The IndeX plugin enables large scale and high fidelity visualization at interactive frame rates. For the tutorial, we will analyze unstructured data. We also encourage you to bring your own dataset in any ParaView supported format on a flash/hard-drive so that you can experience the interactivity and the ease of analyzing your data. We are limiting the dataset size to 15-20GB for a structured grid and 75-100 million cells for an unstructured grid since the lab sessions will run on a single-GPU system. IndeX is scalable and you can expect the same level of interactivity on an HPC system.

120 Instructor-Led Lab Mahendra Roopa - Software Product Manager, NVIDIA IndeX Plugins, NVIDIA