Asynchronous optimization

somdn_product_page

(Downloads - 0)

Catégorie :

For more info about our services contact : help@bestpfe.com

Table of contents

Introduction 
I Asynchronous Optimization for Machine Learning 
1 Modern optimization for machine learning 
1.1 Motivation
1.2 Modern challenges
1.2.1 The big data era
1.2.2 Variance reduction
1.2.3 Asynchronous optimization
1.2.4 Non-smooth regularization
1.3 Goal and contributions
2 Improved asynchronous parallel optimization analysis for stochastic methods 
2.1 Introduction
2.2 Revisiting the perturbed iterate framework for asynchronous analysis
2.2.1 Perturbed Iterate Framework
2.2.2 On the difficulty of labeling the iterates
2.3 HOGWILD analysis
2.3.1 Useful properties and assumptions
2.3.2 Convergence and speedup results
2.3.3 Proof outlines
2.3.4 Key Lemmas
3 Asynchronous parallel variance reduction 
3.1 Introduction
3.2 Asynchronous Parallel Sparse SAGA
3.2.1 Sparse SAGA
3.2.2 Asynchronous Parallel Sparse SAGA
3.2.3 Convergence and speedup results
3.2.4 Key Lemmas
3.3 Asynchronous Parallel SVRG with the “After Read” labeling
3.3.1 SVRG algorithms
3.3.2 Extension to the SVRG variant from Hofmann et al. (2015)
3.3.3 Fast convergence and speedup rates for KROMAGNON
3.3.4 Proof outline
3.4 Empirical results
3.4.1 Experimental setup
3.4.2 Implementation details
3.4.3 Comparison of sequential algorithms: Sparse SAGA vs Lagged updates
3.4.4 ASAGA vs. KROMAGNON vs. HOGWILD
3.4.5 Effect of sparsity
3.4.6 Theoretical speedups
3.4.7 A closer look at the constant
3.5 Conclusion and future work
4 Asynchronous composite optimization 
4.1 Introduction
4.1.1 Related work
4.1.2 Definitions and notations
4.2 Sparse Proximal SAGA
4.3 Asynchronous Sparse Proximal SAGA
4.3.1 Analysis framework
4.3.2 Properties and assumptions
4.3.3 Theoretical results
4.3.4 Proof outline
4.3.5 Comparison to related work
4.4 Experiments
4.4.1 Implementation details
4.4.2 Comparison of Sparse Proximal SAGA with sequential methods
4.4.3 Comparison of PROXASAGA with asynchronous methods
4.4.4 Theoretical speedups
4.4.5 Timing benchmarks
4.5 Conclusion and future work
5 Conclusion 
II Improving RNNs Training through Global-Local Losses 
6 A brief introduction to recurrent neural networks 
6.1 What is a recurrent neural network (RNN)?
6.1.1 A concrete example
6.1.2 Interpretation as a graphical model
6.1.3 The encoder-decoder architecture
6.1.4 Decoding
6.2 Traditional RNN training and its limitations
6.2.1 Maximum likelihood training (MLE)
6.2.2 Limitations
6.3 Alternative training approaches
6.3.1 Imitation learning approaches
6.3.2 RL-inspired approaches
6.3.3 Other methods
6.4 Goal and contributions
7 SEARNN 
7.1 Introduction
7.2 Links between RNNs and learning to search
7.3 Improving RNN training with L2S
7.3.1 The SEARNN Algorithm
7.3.2 Adaptation to RNNs
7.3.3 Expected and empirical benefits
7.4 Scaling up SEARNN
7.5 Neural Machine Translation
7.5.1 Experimental results
7.5.2 In-depth analysis
7.6 Model confidence and beam search
7.7 Scaling SEARNN up further
7.8 Discussion and related work
7.8.1 Traditional L2S approaches
7.8.2 L2S-inspired approaches
7.8.3 RL-inspired approaches
7.8.4 Other methods
7.8.5 An unexpected cousin: AlphaGo Zero
8 Conclusion and future work 
A HOGWILD analysis using the “After read” framework
A.1 Initial recursive inequality derivation
A.2 Proof of Lemma 11 (inequality in gt := g(^xt; it))
A.3 Proof of Lemma 14 (suboptimality bound on Ekgtk2)
A.4 Proof of Theorem 9 (convergence guarantee and rate of HOGWILD)
A.5 Proof of Theorem 8 (convergence result for serial SGD)
A.6 Proof of Corollary 10 (speedup regimes for HOGWILD)
B Asynchronous variance reduction
B.1 Sparse SAGA
B.1.1 Proof of Theorem 15
B.1.2 Proof of Theorem 16
B.2 ASAGA – Proof of Theorem 18 and Corollary
B.2.1 Proof of Lemma 20 (suboptimality bound on Ekgtk2)
B.2.2 Lemma 20 for AHSVRG
B.2.3 Master inequality derivation
B.2.4 Lyapunov function and associated recursive inequality
B.2.5 Proof of Lemma 21 (convergence condition for ASAGA)
B.2.6 Proof of Theorem 18 (convergence guarantee and rate of ASAGA)
B.2.7 Proof of Corollary 19 (speedup regimes for ASAGA)
B.3 KROMAGNON – Proof of Theorem 22 and Corollary 25
B.3.1 Proof of Lemma 26 (suboptimality bound on Ekgtk2)
B.3.2 Proof of Theorem 22 (convergence rate of KROMAGNON)
B.3.3 Proof of Corollary 23, 24 and 25 (speedup regimes)
B.4 On the difficulty of parallel lagged updates
B.5 Additional empirical details
B.5.1 Detailed description of datasets
B.5.2 Implementation details
B.5.3 Biased update in the implementation
C Extension to non-smooth objectives
C.1 Basic properties
C.2 Sparse Proximal SAGA
C.3 ProxASAGA
C.4 Comparison of bounds with Liu and Wright (2015)
D SEARNN
D.1 Algorithms
D.2 Design decisions
D.3 Additional machine translation experimental details

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *