The computational bottleneck

somdn_product_page

(Downloads - 0)

Catégorie :

For more info about our services contact : help@bestpfe.com

Table of contents

List of Figures
List of Tables
Introduction
1 From Discrete to Neural Language Models
1.1 Discrete language models
1.2 Neural network language models
1.2.1 Feedforward language models
1.2.2 Recurrent neural network language models
1.3 Practical considerations
1.3.1 Evaluation
1.3.2 Choosing hyperparameters
1.3.3 The computational bottleneck
2 Avoiding direct normalization: Existing strategies
2.1 Hierarchical language models
2.2 Importance Sampling
2.2.1 Application to Language Modeling
2.2.2 Target Sampling
2.2.3 Complementary Sum-Sampling
2.3 Density estimation as a classication task: discriminative objectives
2.3.1 Noise Contrastive Estimation
2.3.2 BlackOut
2.3.3 Negative Sampling
2.4 Avoiding normalization by constraining the partition function
2.5 Conclusions
3 Detailled analysis of Sampling-Based Algorithms
3.1 Choosing k and Pn: impact of the parametrization of sampling
3.1.1 Eects on Importance Sampling
3.1.2 Eects on Noise-Contrastive Estimation
3.2 Impact of the partition function on the training behaviour of NCE
3.2.1 Self-normalization is crucial for NCE
3.2.2 Inuence of the shape of Pn on self-normalization
3.2.3 How do these factors aect learning ?
3.3 Easing the training of neural language models with NCE
3.3.1 Helping the model by learning to scale
3.3.2 Helping the model with a well-chosen initialization
3.3.3 Summary of results with sampling-based algorithms
3.4 Conclusions
4 Extending Sampling-Based Algorithms
4.1 Language model objective functions as Bregman divergences
4.1.1 Learning by minimizing a Bregman divergence
4.1.2 Directly learning the data distribution
4.2 Learning un-normalized models using Bregman divergences
4.2.1 Learning by matching the ratio of data and noise distributions
4.2.2 Experimenting with learning un-normalized models
4.3 From learning ratios to directly learning classication probabilities .
4.3.1 Minimizing the divergence between posterior classication probabilities and link to NCE
4.3.2 Directly applying -divergences to binary classication
4.4 Conclusions
5 Output Subword-based representations for language modeling
5.1 Representing words
5.1.1 Decomposition into characters
5.1.2 Decomposing morphologically
5.2 Application to language modeling
5.3 Experiments on Czech with subword-based output representations
5.3.1 Inuence of the vocabulary size
5.3.2 Eects of the representation choice
5.3.3 Inuence of the word embeddings vocabulary size
5.4 Supplementary results and conclusions
5.4.1 Training with improved NCE on Czech
5.4.2 Comparative experiments on English
5.5 Conclusions
Conclusion
List of publications
References
Appendices
A Proofs on Bregman divergences
B Subword-based models: supplementary results with NCE
C Subword-based models: supplementary results on embedding sizes inuence
D Previous work on subword-based POS tagging

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *