View More
View Less
System Message
An unknown error has occurred and your request could not be completed. Please contact support.
Wait Listed
Personal Calendar
Conference Event
Schedule TBD
Conflict Found
This session is already scheduled at another time. Would you like to...
Please enter a maximum of {0} characters.
Please enter a maximum of {0} words.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Replies ()
New Post
Microblog Thread
Post Reply
Your session timed out.
NVIDIA GTC San Jose 2017
Add to My Interests
Remove from My Interests

Recordings now available to registered pass holders.

S7859 - 3D Cloud Streaming for Mobile and Web Applications Learn how Microsoft is extending WebRTC to enable real-time, interactive 3D Streaming from the cloud to any remote device. The purpose is to provide an open toolkit to enable industries to leverage remote cloud rendering in their service and product pipelines. This is required for many industries where the scale and complexity of 3D models, scenes, physics and rendering is beyond the capabilities of a mobile device platform.  We are extending the industry standard WebRTC framework to 3D scenarios including mixed reality and will walk through the work we are doing to realize the goal of delivering high-quality 3D applications to any client - web, mobile, desktop and embedded. This is only possible using the NVIDIA nvencode pipeline for server-side rendering on the cloud. 25-minute Talk Tyler Gibson - Senior Software Engineer, Microsoft
S7149 - 3D DeepObject for Precision 3D Mapping 3D DeepObject achieves mapping-level positional accuracy. In the geospatial intelligence space, positional accuracy is as important as precision and recall. Unfortunately, convolutional networks in deep learning are invariant to translation. In other words, the positional accuracy from deep learning object detection is inherently poor. Combining deep learning and 3D model fitting, our 3D DeepObject has the best of both worlds. Deep learning can detect object (a bounding box) with close to human-level accuracy, while 3D model fitting can achieve pixel-level positional accuracy. The output (bounding boxes) from deep learning are the input for 3D model fitting. A bounding box from deep learning can significantly reduce the search space for 3D model fitting. Our latest test indicates that 3D DeepObject can achieve much higher positional accuracy than deep learning or 3D model fitting alone can achieve. 25-minute Talk Bingcai Zhang - Tech Fellow, BAE Systems
S7289 - 3D Human Motion Capture from 2D Video Using Cloud-Based CNNs This talk provides a brief overview of how to apply GPU-based deep learning techniques to extract 3D human motion capture from standard 2D RGB video. We describe in detail the stages of our CUDA-based pipeline from training to cloud-based deployment. Our training system is a novel mix of real world data collected with Kinect cameras and synthetic data based on rendering thousands of virtual humans generated in the Unity game engine. Our execution pipeline is a series of connected models including 2D video to 2D pose estimation and 2D pose to 3D pose estimation. We describe how this system can be integrated into a variety of mobile applications ranging from social media to sports training. A live demo using a mobile phone connected into an AWS GPU cluster will be presented. 25-minute Talk Paul Kruszewski - Founder & CEO, wrnch
S7425 - 3D Printing with NVIDIA GVDB Voxels Improvements in 3D printing allow for unique processes, finer details, better quality control, and a wider range of materials as printing hardware improves. With these improvements comes the need for greater computational power and control over 3D-printed objects. We introduce NVIDIA GVDB Voxels as an open source SDK for voxel-based 3D printing workflows. Traditional workflows are based on processing polygonal models and STL files for 3D printing. However, such models don't allow for continuous interior changes in color or density, for descriptions of heterogeneous materials, or for user-specified support lattices. Using the new NVIDIA GVDB Voxels SDK, we demonstrate practical examples of design workflows for complex 3D printed parts with high-quality ray-traced visualizations, direct data manipulation, and 3D printed output. 25-minute Talk Rama Hoetzlein - Graphics Research Engineer, NVIDIA
Jun Zeng - Principal Scientist, HP Labs
S7197 - 4K Video Processing and Streaming Platform on TX1 Learn how to build a platform for processing and streaming 4K video on the NVIDIA Jetson TX1 processor. To achieve real-time video processing, the diverse processing resources of this high-performance embedded architecture need to be employed optimally. The heterogeneous system architecture of the Jetson TX1 allows capturing, processing, and streaming of video with a single chip. The main challenges lie in the optimal utilization of the different hardware resources of the Jetson TX1 (CPU, GPU, dedicated hardware blocks) and in the software frameworks. We'll discuss variants, identify bottlenecks, and show the interaction between hardware and software. Simple capturing and displaying 4K video can be achieved using existing out-of-the-box methods. However, GPU-based enhancements were developed and integrated for real-time video processing tasks (scaling and video mixing). 25-minute Talk Tobias Kammacher - Researcher, Zurich University of Applied Sciences
S7310 - 8-Bit Inference with TensorRT We'll describe a method for converting FP32 models to 8-bit integer (INT8) models for improved efficiency. Traditionally, convolutional neural networks are trained using 32-bit floating-point arithmetic (FP32) and, by default, inference on these models employs FP32 as well. Our conversion method doesn't require re-training or fine-tuning of the original FP32 network. A number of standard networks (AlexNet, VGG, GoogLeNet, ResNet) have been converted from FP32 to INT8 and have achieved comparable Top 1 and Top 5 inference accuracy. The methods are implemented in TensorRT and can be executed on GPUs that support new INT8 inference instructions. 25-minute Talk Szymon Migacz - CUDA Library Software Engineer, NVIDIA
L7132 - Accelerated Analytics and Graph Visualization In this lab, you will learn how to use a GPU-accelerated graph visualization engine in combination with a GPU-accelerated database. By combining these technologies we can visually explore a large network dataset, identify port scan, distributed denial of service, and data exfiltration events. At the end of this lab, you will learn how to load data for accelerated querying and analysis; build graph visualizations using the GPU-accelerated database as a data source and explore large-scale data visualization. Prerequisites: No prerequisite skills are necessary, but basic knowledge of SQL and Python would be helpful This lab utilizes GPU resources in the cloud, you are required to bring your own laptop. 120 Instructor-Led Lab Keith Kraus - Senior Engineer of Applied Solutions Engineering , NVIDIA
Michael Balint - Senior Manager Applied Solutions Engineering, NVIDIA
Deepti Jain - Senior Applied Solutions Engineer, NVIDIA
S7774 - Accelerated Analytics Industry Use Cases Companies of all sizes and in all industries are driven towards digital transformation. Failure to adapt to this movement places businesses at an increased risk in current and future competitive markets. With the slow compute limitation, enterprises struggle to gain valuable insights fast, monetize the data, enhance customer experience, optimize operational efficiency, and prevent fraudulent attacks all at the same time. NVIDIA helps provide deeper insights, enable dynamic correlation, and deliver predictive outcomes at superhuman speed, accuracy, and scale. We'll highlight specific accelerated analytics use cases -- powered by the NVIDIA Tesla platform, DGX-1 AI supercomputer, and NVIDIA GPU-accelerated cloud computing -- in finance, oil and gas, manufacture, retail, and telco industries. 25-minute Talk Renee Yao - Product Marketing Manager, Deep Learning and Analytics, NVIDIA
S7332 - Accelerated Astrophysics: Using NVIDIA DGX-1 to Simulate and Understand the Universe Get an overview of how GPUs are used by computational astrophysicists to perform numerical simulations and process massive survey data. Astrophysics represents one of the most computationally heavy sciences, where supercomputers are used to analyze enormous amounts of data or to simulate physical processes that cannot be reproduced in the lab. Astrophysicists strive to stay on the cutting edge of computational methods to simulate the universe or process data faster and with more fidelity. We'll discuss two important applications of GPU supercomputing in astrophysics. We'll describe the astrophysical fluid dynamics code CHOLLA that runs on the GPU-enabled supercomputer Titan at Oak Ridge National Lab and can perform some of the largest astrophysical simulations ever attempted. Then we'll describe the MORPHEUS deep learning framework that classifies galaxy morphologies using the NVIDIA DGX-1 deep learning system. 25-minute Talk Brant Robertson - Associate Professor of Astronomy and Astrophysics, University of California, Santa Cruz
S7753 - Accelerated Deep Learning Within Reach - Supercomputing Comes to Your Cube Deep learning practitioners have traditionally been forced to spend protracted cycle time cobbling together platforms using consumer-grade components and unsupported open source software. Learn (1) the benefits of rapid experimentation and deep learning framework optimization as a precursor to scalable production training in the data center, (2) the technical challenges that must be overcome for extending deep learning to more practitioners across the enterprise, and (3) how many organizations can benefit from a powerful enterprise-grade solution that's pre-built, simple to manage, and readily accessible to every practitioner. 25-minute Talk Markus Weber - Senior Product Manager, NVIDIA
S7117 - Accelerating Cross-Validation in Spark Using GPU Learn how to utilize GPUs better to accelerate cross-validation in Spark, which is widely used in many bigdata analytics/machine learning applications. 25-minute Talk Minsik Cho - Research Staff Member, IBM Research
S7150 - ACCELERATING CUBLAS/CUDNN USING INPUT-AWARE AUTO-TUNING: THE ISAAC LIBRARY This session describes the design and implementation of ISAAC, an open-source framework for GEMM and CONV that provides improved performance over cuBLAS and cuDNN. Attendees will learn about input-aware auto-tuning, a technique that relies on machine learning models to automatically derive input- and hardware- portable PTX kernels. Benchmarks will be provided for GEMM and CONV in the context of LINPACK, DeepBench, ICA and SVD, showing up to 3x performance gains over vendor libraries on a GTX980 and a Tesla P100. 25-minute Talk Philippe Tillet - Ph.D. Candidate, Harvard University
S7383 - Accelerating Cyber Threat Detection with GPU Analyzing vast amounts of enterprise cyber security data to find threats is hard. Cyber threat detection is also a continuous task, and because of financial pressure, companies have to find optimized solutions for this volume of data. We'll discuss the evolution of big data architectures used for cyber defense and how GPUs are allowing enterprises to do better threat detection more efficiently. We'll discuss (1) briefly the evolution of traditional platforms to lambda architectures with new approaches like Apache Kudu to ultimately GPU-accelerated solutions; (2) current GPU-accelerated database, analysis, and visualization technologies (such as MapD and Graphistry), and discuss the problems they solve; (3) the need to move beyond traditional table-based data-stores to graphs for more advanced data explorations, analytics, and visualization; and (4) the latest advances in GPU-accelerated graph analytics and their importance all for improved cyber threat detection. 50-minute Talk Joshua Patterson - Applied Solutions Engineering Director , NVIDIA
Michael Wendt - Manager of Applied Solutions Engineering , NVIDIA
S7321 - Accelerating Document Retrieval and Ranking for Cognitive Applications Based on a comprehensive performance study of Watson workloads, we'll deep dive into optimizing critical retrieve and rank functions using GPU acceleration. The performance of cognitive applications like answering natural language questions heavily depends on quickly selecting the relevant documents needed to generate a correct answer. While analyzing the question to determine appropriate search terms, weights, and relationships is relatively quick, retrieving and ranking a relevant subset from millions of documents is a time-consuming task. Only after completing it can any advanced natural language processing algorithms be effective. 25-minute Talk David Wendt - Programmer, IBM
Tim Kaldewey - Performance Architect, IBM Watson
S7656 - Accelerating HD Map Creations with GPUs We'll explain how GPUs can accelerate the development of HD maps for autonomous vehicles. Traditional mapping techniques take weeks to result in highly detailed maps because massive volumes of data, collected by survey vehicles with numerous sensors, are processed, compiled, and registered offline manually. We'll describe how Japan's leading mapping company uses the concept of a cloud-to-car AI-powered HD mapping system to automate and accelerate the HD mapping process, including actual examples of GPU data processing that use real-world data collected from roads in Japan.   25-minute Talk Shigeyuki Iwata - Manager, Research & Development Office, ZENRIN Corporation
S7831 - Accelerating High-Frequency Nonlinear Earthquake Simulations on OLCF Titan and NCSA Blue Waters The highly nonlinear, multiscale dynamics of large earthquakes is a difficult physics problem that challenges HPC systems at extreme scale. This presentation will introduce our optimized CUDA implementation of the Drucker-Prager plasticity in AWP-ODC that utilize the GPU's memory bandwidth highly efficiently, which helps to scale to the full size of the Titan system. We demonstrate the dramatic reduction in the level of shaking in the Los Angeles basin by performing a nonlinear M 7.7 earthquake simulation on the southern San Andreas fault for frequencies up to 4 Hz using Blue Waters and Titan. Full realization of the projected gains in using nonlinear ground-motion simulations for controlling sources will improve the hazard estimates, which has a broad impact on risk-reduction and enhanced community resilience, especially for critical facilities such as large dams, nuclear power plants, and energy transportation networks. 25-minute Talk Daniel Roten - Computational Scientist, SDSC
Yifeng Cui - Lab Director, San Diego Supercomputing Center
S7593 - Accelerating the 3D Elastic Reverse-Time-Migration Algorithms Through NVIDIA GPUs We'll cover the optimizing details and the inspiring performance result using NVIDIA Kepler GPUs to accelerate the 10th-order three-dimensional elastic Reverse-Time-Migration (RTM) algorithm. As an essential migration method in seismic application to image the underground geology, RTM algorithm is particularly complex due to its computational workflow and is generally the most time-consuming kernel. Especially, RTM algorithms based on elastic wave equations (elastic RTM) are generally more computationally intense compared to RTM methods for acoustic constant-density media (acoustic RTM). In recent years, the desire for covering larger regions and acquiring better resolution has further increased the algorithmic complexity of RTM. Therefore, computing platforms and optimizing methods that can better meet such challenges in seismic applications become great demands. In this work, we first modify the backward process in the RTM matrix format by adding extra layers, to generate a straightforward stencil that fits well with GPU architecture. A set of optimizing techniques, such as memory tuning and computing occupancy configuration, is then performed to exploit the performance over a set of different GPU cards. By further using the the streaming mechanism, we manage to obtain a communication-computation overlapping among multiple GPUs. The best performance employing four Tesla K40 GPU cards is 28 times better over a fully optimized reference based on a socket with two E5-2697 CPUs. This work proves the great potential to employ NVIDIA GPU accelerators in future geophysics exploration algorithms. 25-minute Talk Lin Gan - Dr., Tsinghua University
S7578 - Accelerating your VR Applications with VRWorks Across graphics, audio, video, and physics, the NVIDIA VRWorks suite of technologies helps developers maximize performance and immersion for VR applications. We'll explore the latest features of VRWorks, explain the VR-specific challenges they address, and provide application-level tips and tricks to take full advantage of these features. Special focus will be given to the details and inner workings of our latest VRWorks feature, Lens Matched Shading, along with the latest VRWorks integrations into Unreal Engine and Unity. 50-minute Talk Edward Liu - Sr. Developer Technology Engineer, NVIDIA
Cem Cebenoyan - Director of Engineering, NVIDIA
S7810 - Acceleration of Multi-Object Detection and Classification Training Process with NVIDIA Iray SDK (Presented by SAP) Many works using deep CNN for multi-object detection and classification observe that a high-quality dataset for the training is even more important than the choice of a network type for the best results. We employ the NVIDIA Iray rendering engine and SDK for the automatic generation of the synthetic images and their annotation that can be either combined with real manually annotated images and used as the input for the training process or used on their own. In most cases, adding a new entity to the classification/detection list requires reviewing the existing dataset and relabeling it. Our contribution allows the acceleration of the process dramatically and allows for the specialization of the training set. 50-minute Talk Tatiana Surazhsky - 3D Graphics Research Expert, SAP Labs Israel LTD
S7564 - Accelerator Programming Ecosystems Emerging heterogeneous systems are opening up tons of programming opportunities. This panel will discuss the latest developments in accelerator programming where the programmers have a choice among OpenMP, OpenACC, CUDA and Kokkos for GPU programming. This panel will throw light on what would be the primary objective(s) for a choice of model, whether its availability across multiple platforms, its rich feature set or its applicability for a certain type of scientific code or compilers' stability or other factors. This will be an interactive Q/A session where participants can discuss their experiences with programming model experts and developers. 50 minutes Panel Michael Wolfe - Engineer, NVIDIA
Christian Trott - Senior Member Technical Staff, Sandia National Laboratories
Stephen Olivier - Principal Member of Technical Staff, Sandia National Laboratories
Mark Harris - Chief Technologist, GPU Computing Software, NVIDIA
Randy Allen - Director of Advanced Research, Mentor Graphics
Fernanda Foertter - HPC User Support Specialist/Programmer, Oak Ridge National Laboratory
S7193 - Achieving Portable Performance for GTC-P with OpenACC on GPU, Multi-Core CPU, and Sunway Many-Core Processor Gyrokinetic Toroidal Code developed in Princeton (GTC-P) delivers highly-scalable plasma turbulence simulations at extreme scales on world-leading supercomputers such as Tianhe-2 and Titan. The aim of this work to achieve portable performance in a single source code for GTC-P. We developed the first OpenACC implementation for GPU, CPU, and Sunway processor. The results showed the OpenACC version achieved nearly 90% performance of NVIDIA?CUDA?version on GPU and OpenMP version on CPU; the Sunway OpenACC version achieved 2.5X speedup in the entire code. Our work demonstrates OpenACC can deliver portable performance to complex real-science codes like GTC-P. In additional, we request adding thread-id support in OpenACC standard to avoid expensive atomic operations for reductions. 25-minute Talk Stephen Wang - GPU Specalist, Shanghai Jiao Tong University
S7435 - Adapting DL to New Data: An Evolutionary Algorithm for Optimizing Deep Networks There has been a surge of success in using deep learning in imaging and speech applications for its relatively automatic feature generation and, in particular, for convolutional neural networks, high-accuracy classification abilities. While these models learn their parameters through data-driven methods, model selection (as architecture construction) through hyper-parameter choices remains a tedious and highly intuition driven task. To address this, multi-node evolutionary neural networks for deep learning (MENNDL) is proposed as a method for automating network selection on computational clusters through hyper-parameter optimization performed via genetic algorithms. MENNDL is capable of evolving not only the numeric hyper-parameters (for example, number of hidden nodes or convolutional kernel size), but is also capable of evolving the arrangement of layers within the network. 25-minute Talk Steven Young - Research Scientist in Deep Learning, Oak Ridge National Laboratory
S7312 - ADAS Computer Vision and Augmented Reality Solution We'll address how next-generation informational ADAS experiences are created by combining machine learning, computer vision, and real-time signal processing with GPU computing. Computer vision and augmented reality (CVNAR) is a real-time software solution, which encompasses a set of advanced algorithms that create mixed augmented reality for the driver by utilizing vehicle sensors, map data, telematics, and navigation guidance. The broad range of features includes augmented navigation, visualization, driver infographics, driver health monitoring, lane keeping, advanced parking assistance, adaptive cruise control, and autonomous driving. Our approach augments drivers' visual reality with supplementary objects in real time, and works with various output devices such as head unit displays, digital clusters, and head-up displays.   25-minute Talk Sergii Bykov - Technical Lead, Luxoft
S7641 - Additive Manufacturing Simulation on the GPU Learn how GPUs can accelerate large-scale finite element-based additive manufacturing (AM) simulation. We'll discuss the computational challenges underlying AM simulation, followed by their solution through fast GPU solvers. We'll also present case studies of metal AM and fused-deposition-modeling simulation, with experimental results. 25-minute Talk Krishnan Suresh - Professor, University of Wisconsin, Madison
S7347 - A Deep Hierarchical Model for Joint Object Detection and Semantic Segmentation How do we tackle multiple vision tasks from within the same deep neural network? We'll address this problem by proposing a neural network architecture that can simultaneously segment and detect objects within an image. We'll begin with a brief overview of deep learning as applied to computer vision, and various popular methods for object detection and semantic segmentation. We'll then propose our model: a hierarchical architecture that explicitly allows fine-grain information from one task to aid in the performance of coarser tasks. We'll show that our multi-task network outperforms and is faster than networks trained to tackle each task independently. We'll then visualize our network results on the Cityscapes data set and discuss potential applications of our ideas, especially in the context of autonomous driving. 25-minute Talk Zhao Chen - Machine Learning Software Intern, NVIDIA
S7834 - Advanced GPU Server Architectures and Deep Learning Training for HPC Customers (Presented by Super Micro Computer Inc.) Recently, Machine Learning leaped into the computing mainstream and now ML is advancing across all enterprise applications. GPU usage models are penetrating new industries and advanced servers with GPUs will take deep learning to new performance levels that augment Artificial Intelligence. New server architecture innovations will drive higher levels of performance in ML applications. As GPUs become more powerful, GPU networks will need to be more efficient as well. Supermicro has advanced the state-of-the-art in GPU-optimized server architectures, perfect for the emerging deep learning applications.Hear the latest in GPU server architectures and deep learning customer case-studies of how customers achieved incredible deep learning results from Supermicro solutions. 50-minute Talk Jason Pai - Director, GPU Servers, Super Micro Computer Inc.
Don Clegg - VP Marketing & WW Business Development, Super Micro Computer, Inc.
S7482 - Advances in Real-Time Graphics at Pixar Explore how real-time graphics are used at Pixar Animation Studios. We'll describe the unique needs for film production and our custom solutions, including Presto and our open-source projects Universal Scene Description (USD), OpenSubdiv, and Hydra. Don't miss this great opportunity to learn about graphics, algorithms, and movies! 50-minute Talk Pol Jeremias-Vila - Sr. Graphics Engineer, Pixar
David Yu - Senior Graphics Software Engineer, Pixar Animation Studios
Dirk Van Gelder - Software Engineer, Pixar Animation Studios
S7862 - Advancing Accelerated Deep Learning with IBM PowerAI IBM PowerAI provides the easiest on-ramp for enterprise deep learning. PowerAI helped users break deep learning training benchmarks AlexNet and VGGNet thanks to the world's only CPU-to-GPU NVIDIA NVLink interface. See how new feature development and performance optimizations will advance the future of deep learning in the next twelve months, including NVIDIA NVLink 2.0, leaps in distributed training, and tools that make it easier to create the next deep learning breakthrough. Learn how you can harness a faster, better and more performant experience for the future of deep learning.     50-minute Talk Sumit Gupta - VP, HPC, AI, and Analytics
S7647 - Advancing Our Understanding of Evolutionary Histories Using GPUs: The BEAGLE Library Estimating the evolutionary history of organisms, phylogenetic inference, is a critical step in many analyses involving biological sequence data such as DNA. These phylogenetic relationships are essential in understanding the evolutionary dynamics of organisms. The likelihood calculations at the heart of the most effective methods for phylogenetic analyses are extremely computationally intensive, and hence these analyses become a bottleneck in many studies. In collaboration with some of the foremost researchers in our area, we have developed an open source library, BEAGLE, which uses GPUs to greatly accelerate phylogenetic analyses. BEAGLE is used by some of the leading programs in the field. We'll describe the phylogenetic inference problem and its importance, and go into details on how we used GPU computing to achieve broad impact in the field. 25-minute Talk Daniel L. Ayres - Graduate Student, University of Maryland
Michael P Cummings - Professor, University of Maryland
S7783 - A Fast, Unified Method for Object Detection, Instance Segmentation, and Human Pose Estimation We'll cover state-of-the-art algorithms for image classification, object detection, object instance segmentation, and human pose prediction that we recently developed at Facebook AI Research. Our image classification results are based on the recently developed "ResNeXt" model that supersedes ResNet's accuracy on ImageNet, but much more importantly yields better features with stronger generalization performance on object detection tasks. Using ResNeXt as a backbone, we'll present a unified approach for detailed object instance recognition tasks, such as instance segmentation and human pose estimation. This model builds on our prior work on the Faster R-CNN system with Feature Pyramid Networks, which enables efficient multiscale recognition. We'll describe our platform for object detection research that enables a fast and flexible research cycle. Our platform is implemented on Caffe2 and can train many of these state-of-the-art models on the COCO dataset in 1-2 days using sync SGD over eight GPUs on a single Big Sur server. 25-minute Talk Ross Girshick - Research Scientist, Facebook
S7857 - AgeAtHome - Deep Learning at the Edge (Presented by IBM) The need for helping elderly individuals or couples remain in their home is increasing as our global population ages. Cognitive processing offers opportunities to assist the elderly by processing information to identify opportunities for caregivers to offer assistance and support.  This project seeks to demonstrate means to improve the elderlys' ability to age at home through understanding of daily activities inferred from passive sensor analysis. This project is an exploration of the IBM Watson Cloud and Edge docker-based Blue Horizon platforms for the use of high-fidelity, low-latency, private sensing and responding at the edge using a RaspberryPi, including deep learning using NVIDIA DIGITS software, K80 GPU servers in the IBM Cloud, and Jetson TX2 edge computing. 50-minute Talk David C Martin - Hacker-in-residence, IBM Watson Cloud CTO Office
Dima Rekesh - Senior Distinguished Engineer, Optum Technology
S7262 - A General Framework for Hybrid Stochastic Model Calibration on the GPU We'll present an overview of a GPU-based approach to calibrating hybrid models in finance, that is, multi-factor correlated stochastic processes to market data (term structure and volatility surfaces). Examples of such models range from the relatively benign 3-factor JY inflation model, to single currency and forex equity baskets, up to a completely general basket of rate/inflation/equity/forex/credit processes described by a global correlation matrix. Due to the inherently multi-threaded nature of Monte Carlo path generation, and the availability of cuRAND, a GPU implementation vastly outperforms CPU or PDE solvers, which are plagued by high dimensionality. Details of the algorithm, as well as a demonstration and analysis of timings and memory limitations will be covered. 25-minute Talk Mark York - Senior Quantitative Analyst, Renaissance Risk Management Labs
S7286 - A High-Quality and Fast Maximal Independent Set Algorithm for GPUs Learn how to efficiently parallelize Maximal Independent Set computations for GPUs. Our CUDA implementation is at least three times faster than the leading GPU codes on every one of the 16 real-world and synthetic graphs we tested. Moreover, it produces a larger maximal independent set in all but one case. It is asynchronous, atomic free, and requires fewer than 30 kernel statements. We'll present the included code optimizations to achieve heretofore unreached performance and describe how to exploit monotonicity to minimize the memory footprint of this important irregular graph algorithm. 25-minute Talk Martin Burtscher - Professor, Texas State University
S7592 - AI and Deep Learning in Trading We'll talk about how artificial intelligence has led to market-leading innovation in trading and the huge opportunity of using deep learning in trading today. There are three dominant trades: fast information extraction ("speed trade"), trade construction ("stat arb"), and prediction ("market timing"). AI has been very successful in all three aspects. We have been key innovators in the speed trade, having started with a $10,000 risk limit and, over the last 10 years, making more than $1.4 billion in profits. The reason is a purist adherence to AI. There is a huge opportunity for using deep learning in the prediction part of the trade, which is not latency sensitive and is mostly about high accuracy. Our mission is to make investing a science, a research-driven utility, and not a competition or a game that it is today. Deep learning has had a lot of success in bringing method to social science settings. We believe over the next five to 10 years that every trading operation will become deep learning based. However, at this time there is a lot of opportunity for innovation using deep learning in trading. 25-minute Talk Gaurav Chakravorty - Head of Trading Strategy Development, qplum
S7739 - AI and the Battle for Cyber Security The security domain presents a unique landscape for the application of artificial intelligence. Defenders in the security space are often charged with securing ever changing and complex networks, while attacks continue to probe for and exploit any system weakness. We'll dive into the state of cyber security, why it is well suited for artificial intelligence-based approaches, and how AI is actively defending against attacks today. 50-minute Talk Matt Wolff - Chief Data Scientist, Cylance
Andrew Davis - Staff Data Scientist, Cylance
S7770 - AI in Healthcare: Beyond Deep Learning in Medical Imaging We'll give an overview of how deep-learning in healthcare can be utilized beyond medical imaging, if applied to clinical decision support and medical asset management. Deep learning is capable of addressing many, if not all, main challenges for care givers: information overflow, work overload, impacted accuracy due to data constrains, optimism bias, and optimal utilization of medical equipment. This needs to involve multiple data sources, and deals with data harmonization, semantic interoperability, and different health data types. Deep learning in healthcare has three main aspects: medical imaging, multi-data (structured, unstructured, streaming, etc.) based decision support, and asset utilization data. 25-minute Talk Dr. Michael Dahlweid - Chief Medical Officer, Digital, GE Healthcare
S7805 - Airbus Vahana - Development of a Self-Piloted Air Taxi Vahana started in early 2016 as one of the first projects at A? the advanced projects outpost of Airbus Group in Silicon Valley. The aircraft we're building doesn't need a runway, is self-piloted, and can automatically detect and avoid obstacles and other aircraft. Designed to carry a single passenger or cargo, Vahana is meant to be the first certified passenger aircraft without a pilot. We'll discuss the key challenges to develop the autonomous systems of a self-piloted air taxi that can be operated in urban environments. 25-minute Talk Arne Stoschek - Head of Autonomous Systems, Airbus A3
S7313 - AirVision: AI Based, Real-Time Computer Vision System for Drones Modern computing hardware and NVIDIA Jetson TX1 performance create new possibilities for drones and enable autonomous AI systems, where image processing can be done on-board during flight. We'll present how Magma Solutions developed the AirVision system to cover advanced vision processing tasks for drones, e.g., image stabilization, moving object detection, tracking, and classification using deep neural networks, and visual position estimation using preloaded maps. We'll describe how Magma Solutions used software frameworks Caffe with cuDNN, OpenVX /NVIDIA VisionWorks, and NVIDIA CUDA to achieve real-time vision processing and object recognition. The AirVision system is in part developed with Lithuanian Ministry of Defence funding and is being used as a tactical UAV system prototype. 25-minute Talk Mindaugas Eglinskas - CEO, Magma Solutions, UAB
S7674 - All That Glisters Is Not Convnets: Hybrid Architectures for Faster, Better Solvers Convolutional neural networks have proven themselves to be very effective parametric learners of complex functions. However, the non-linearities present in conventional networks are not strong; both halves of a (possibly leaky) RELU are linear and the non-linearity is computed independently for each channel. We'll present techniques that create decision tree and RBF units that are designed to respond non-linearly to complex joint distributions across channels. This makes it possible to pack more non-linearity into a small space and this is a particularly valuable replacement for the latter layers of a network - in particular the solver. The result is hybrid networks that outperform conventional pure neural networks that can be trained orders of magnitude more quickly. 50-minute Talk Tom Drummond - Professor, Monash University
S7809 - A Multi-Source, Multi-Sensor Approach to HD Map Creation It's simple to take the output of one type of sensor in multiple cars and produce a map based on that data. However, a map created in this way will not have sufficient coverage, attribution, or quality for autonomous driving. Our multi-source, multi-sensor approach leads to HD maps that have greater coverage, are more richly attributed, and have higher quality than single-source, single-sensor maps. In this session, we will discuss how we have created the world's largest HD map, are able to continuously update it, and are making autonomous driving safer and more comfortable.   25-minute Talk Willem Strijbosch - Head of Autonomous Driving, TomTom
S7404 - An Approach to a High-Performance Decision Tree Optimization Within a Deep Learning Framework for Investment and Risk Management We'll examine an innovative approach using an optimized algorithm to create a decision tree for the basis of regime dependent and pattern classification of financial and macroeconomic time-series data. Implemented in a supervised and unsupervised learning framework, the algorithm relies on the GPU for high performance computing and the host processor to further integrate the results in a deep learning framework. Also, we implement random number generation, in part, using a hardware quantum based true random number generator, balanced with the pseudo-random number generator in CUDA, so as to optimize overall speed where an exhaustive search is not feasible. 25-minute Talk Yigal Jhirad - Head of Quantitative and Derivatives Strategies , Cohen & Steers
Blay Tarnoff - Senior Application Developer and Database Architect, Cohen & Steers
S7174 - An Architectural Design Firm's Journey Through Virtual GPU Technology for Global Collaboration Learn the benefits that virtualization provides for an architecture and engineering design firm, along with the journey through the advancements in virtualization technology it took to finally meet the graphics-intensive needs of our design software. We'll share our experiences in how virtualization allows a large company, with over 15 offices and 1,000 people worldwide, to collaborate and work as a single firm. We'll show some cost comparisons with virtualization, along with their management benefits and requirements. We'll also look at the methods we used to set and test metrics specific to our requirements, and follow the results of those metrics through the changes in graphics virtualization technology. 50-minute Talk Andrew Schilling - Chief Infrastructure Officer, CannonDesign
Jimmy Rotella - Design Application Specialist, CannonDesign
S7252 - An Efficient Connected Components Algorithm for Massively Parallel Devices Learn how to efficiently parallelize connected components, an important irregular graph algorithm. Our CUDA implementation is asynchronous, lock free, converges rapidly, and employs load balancing. It is faster than other GPU codes on all 18 real-world and synthetic graphs we tested. We'll describe how to parallelize this graph algorithm by exploiting algorithmic properties, discuss important optimizations to improve the efficiency, and compare the performance with some of the fastest prior GPU implementations of connected components. 25-minute Talk Jayadharini Jaiganesh - Graduate Student, Texas State University
S7261 - A New Approach to Active Learning by Query Synthesis Using Deep Generative Networks We'll introduce a new active learning algorithm that is made practical using GPUs. Active learning concerns carefully choosing training data to minimize human labeling effort. In a nutshell, we apply deep generative models to synthesize informative "queries" that, when answered by a human labeler, allow the learner to learn faster. The learning is "active" in the sense that these questions are synthesized in an online manner adaptive to the current knowledge, thus minimizing the number of queries needed. Unlike traditional supervised machine training, our training is performed mostly on machine-synthesized data. To our knowledge, this is the first work that shows promising results in active learning by query synthesis. 25-minute Talk Jia-Jie Zhu - Postdoctoral Fellow, Boston College
S7699 - An Introduction to CUDA Programming Presented by Acceleware (Session 1 of 4) Join us for an informative introductory tutorial intended for those new to CUDA and which serves as the foundation for our following three tutorials. Those with no previous CUDA experience will leave with essential knowledge to start programming in CUDA. For those with previous CUDA experience, this tutorial will refresh key concepts required for subsequent tutorials on CUDA optimization. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We'll explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax, and thread hierarchy. We'll deliver a programming demonstration of a simple CUDA kernel. We'll also provide printed copies of the material to all attendees for each session - collect all four! 80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
S7700 - An Introduction to the GPU Memory Model - Presented by Acceleware (Session 2 of 4) This tutorial is for those with a basic understanding of CUDA who want to learn about the GPU memory model and optimal storage locations. Attend session 1, "An Introduction to GPU Programming," to learn the basics of CUDA programming that are required for Session 2. We'll begin with an essential overview of the GPU architecture and thread cooperation before focusing on different memory types available on the GPU. We'll define shared, constant, and global memory, and discuss the best locations to store your application data for optimized performance. We'll deliver a programming demonstration of shared and constant memory. We'll also provide printed copies of the material to all attendees for each session ? collect all four! 80-minute Tutorial Chris Mason - Technical Product Manager, Acceleware Ltd.
S7143 - Anomaly Detection for Network Intrusions Using Deep Learning We'll describe how deep learning can be applied to detect anomalies, such as network intrusions, in a production environment. In part one of the talk, we'll build an end-to-end data pipeline using Hadoop for storage, Streamsets for data flow, Spark for distributed GPUs, and Deeplearning for anomaly detection. In part two, we'll showcase a demo environment that demonstrates how a deep net uncovers anomalies. This visualization will illustrate how system administrators can view malicious behavior and prioritize efforts to stop attacks. It's assumed that registrants are familiar with popular big data frameworks on the JVM. 25-minute Talk David Kale - Deep Learning Engineer, Skymind
Adam Gibson - CTO, Skymind
S7829 - Apache Mahout's New Recommender Algorithm and Using GPUs to Speed Model Creation Predictive AI is often associated with product recommenders. We present a landscape of multi-domain behavioral models that predict multi-modal user preferences and behavior. This session will take the audience from first principles of the new Correlated Cross-Occurrence (CCO) algorithms showing the important innovations that lead to new ways to predict behavior into a deep dive into as variety different use cases, for instance using dislikes to predict likes, using search terms to predict purchase, and using conversion to augment search indexes with behavioral data to produce behavioral search. Some of these are nearly impossible to address without this new technique. We show the tensor algebra that makes up the landscape. Next, we walk through the computation using real-world data. Finally, we show how Mahout's generalized CPU/GPU integration and recently added CUDA support bring significant reductions in time and cost to calculate the CCO models. We expect the audience to come away with an understanding of the kind of applications to be built CCO and how to do so in performant in cost reducing ways.   50-minute Talk Pat Ferrel - Chief Consultant, PMC member of Apache Mahout, ActionML
Andy Palumbo - Data Scientist, Cylance
S7510 - Apache Spark and GPUs for Scaling Deep Learning Libraries Apache Spark has become a popular tool for data warehousing, ETL, and advanced analytics. Meanwhile, deep learning has become one of the most powerful classes of machine learning methods, in large part due to the computational power of modern machines with GPUs and specialized hardware. Spark and GPUs combine well for large deep learning workflows: Spark can handle ETL and data management, and it can distribute data parallel tasks to scale out across many GPUs. 50-minute Talk Tim Hunter - Software Engineer, Databricks, Inc
Joseph Bradley - Software Engineer, Databricks, Inc
S7649 - Applications of Deep Learning: Hardware QA Hardware testing is a multifaceted challenge, but one that stands to benefit greatly from the advances in deep learning. The tricky formula of balancing good coverage against risk is consistently challenged with the rapid evolution of the problem space. The landscape in the industry today points to one that has been more or less linearly refined and improved upon, with the constant refrain of more resources being touted as the go-to solution. We'll discuss one of the ways we're working to evolve the approach to test: by harnessing the available tools in the deep learning space, offering a far more efficient path to providing better quality, while providing the flexibility of better coverage/risk decisions. 25-minute Talk Martina Sourada - Senior Director, SWQA, NVIDIA
Remove From Schedule Add To Schedule Are you sure you would like to Delete this personal time? Edit My Schedule Edit Personal Time This session is full. Would you like to be added to the waiting list? Would you like to remove "{0}" from your schedule? Would you like to add "{0}" from your schedule? Sorry, this session is full. Waitlist Available Sorry, this session and it's waiting list are completely full. Sessions Available Adding this multi-day session automatically enrolls you for all times shown below. Removing this multi-day session automatically removes you for all times shown below. Adding this multi-day session automatically enrolls you for all session times for this session. Removing this multi-day session automatically removes you for all session times for this session. Click to view details Show Interests Hide Interests Search Sessions Export Schedule There is a scheduling conflict. You cannot add this session to your schedule because you are participating in another session at this time. Schedule Conflict. An error occurred while processing this request.. Adding this item creates a conflict with another session on your schedule. Remove from Waiting List Add to waiting list Removing this will remove you from the waiting list for all session times for this session Adding this will add you to the waiting list for all session times for this session.
Get More Results