No
Yes
View More
View Less
Working...
Close
OK
Cancel
Confirm
System Message
Delete
Schedule
An unknown error has occurred and your request could not be completed. Please contact support.
Scheduled
Wait Listed
Personal Calendar
Speaking
Conference Event
Meeting
Interest
Schedule TBD
Conflict Found
This session is already scheduled at another time. Would you like to...
Loading...
Please enter a maximum of {0} characters.
Please enter a maximum of {0} words.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Reply
Replies ()
Search
New Post
Microblog
Microblog Thread
Post Reply
Post
Your session timed out.
This web page is not optimized for viewing on a mobile device. Visit this site in a desktop browser to access the full set of features.
2017 GTC San Jose

S7596 - DSD: Dense-Sparse-Dense Training for Deep Neural Networks

Session Speakers
Session Description

Learn a new technique to prevent deep learning optimizers from getting stuck in a local minima, and to produce better optimization results. We'll introduce DSD, a dense-sparse-dense training method that regularizes neural networks by pruning and then restoring connections. Our method learns which connections are important during the initial dense solution. Then it regularizes the network by pruning the unimportant connections and retraining to a sparser and more robust solution with same or better accuracy. Finally, the pruned connections are restored and the entire network is retrained again. This increases the dimensionality of parameters, and thus model capacity, from the sparser model. DSD training achieves superior optimization performance. We'll highlight our experiments using GoogLeNet, VGGNet, and ResNet on ImageNet; NeuralTalk on Flickr-8K; and DeepSpeech-1&2 on the WSJ dataset. This shows that the accuracy of CNNs, RNNs, and LSTMs can significnatly benefit from DSD training. At training time, DSD incurs only one extra hyper-parameter: the sparsity ratio in the S step. At testing time, DSD doesn't change the network architecture or incur any inference overhead. The consistent and significant performance gain of DSD in our numerical experiments highlights the inadequacy of current deep learning training methods, while DSD effectively achieves superior optimization performance for finding better solutions.


Additional Session Information
All
Talk
Algorithms Deep Learning and AI
Higher Education / Research
25 minutes
Session Schedule