(Downloads - 0)
For more info about our services contact : help@bestpfe.com
Table of contents
1 Introduction
1.1 Motivation: language endangerment
1.1.1 Magnitude of the issue
1.1.2 Consequences of language loss
1.1.3 Response of the linguistic community
1.2 Computational language documentation
1.2.1 Recent work
1.2.2 The BULB project
1.2.3 Challenges
1.3 Scope and contributions
1.3.1 Unsupervised word discovery
1.3.2 Outline of the thesis
1.3.3 Contributions
1.3.4 Author’s publications
2 Background
2.1 Word segmentation and alignment
2.1.1 Two sides of the same problem
2.1.2 Evaluation
2.1.3 Remarks
2.2 Early models for unsupervised string segmentation
2.2.1 Pioneer work
2.2.2 Multigrams
2.2.3 Minimum description length principle
2.3 Learning paradigms
2.3.1 Signatures
2.3.2 Signatures as finite state automata
2.3.3 Paradigms
2.4 Nonparametric Bayesian models
2.4.1 Stochastic processes
2.4.2 Sampling
2.4.3 Goldwater’s language models
2.4.4 Nested language models
2.4.5 Adaptor Grammars
2.5 Automatic word alignment
2.5.1 Probabilistic formulation
2.5.2 A series of increasingly complex parameterizations
2.5.3 Parameters estimation
2.5.4 Alignments extraction
2.6 Joint models for segmentation and alignment
2.6.1 Segment, then align
2.6.2 Jointly segment and align
2.7 Conclusion and open questions
3 Preliminary Word Segmentation Experiments
3.1 Introduction
3.1.1 A favorable scenario
3.1.2 Challenges for low-resource languages
3.2 Three corpora
3.2.1 Elements of linguistic description for Mboshi and Myene
3.2.2 Data and representations
3.3 Experiments and discussion
3.3.1 Models and parameters
3.3.2 Discussion
3.4 Conclusion
4 Adaptor Grammars and Expert Knowledge
4.1 Introduction
4.1.1 Using expert knowledge
4.1.2 Testing hypotheses
4.1.3 Related work
4.2 Word segmentation using Adaptor Grammars
4.3 Grammars
4.3.1 Structuring grammar sets
4.3.2 The full grammar landscape
4.4 Experiments and discussion
4.4.1 Word segmentation results
4.4.2 How can this help a linguist?
4.5 Conclusion
5 Towards Tonal Models
5.1 Introduction
5.2 A preliminary study: supervised word segmentation
5.2.1 Data and representations
5.2.2 Disambiguating word boundaries with decision trees
5.3 Nonparametric segmentation models with tone information
5.3.1 Language model
5.3.2 A spelling model with tones
5.4 Experiments and discussion
5.4.1 Representations
5.4.2 Tonal modeling
5.5 Conclusion
6 Word Segmentation with Attention
6.1 Introduction
6.2 Encoder-decoder with attention
6.2.1 RNN encoder-decoder
6.2.2 The attention mechanism
6.3 Attention-based word segmentation
6.3.1 Align to segment
6.3.2 Extensions: towards joint alignment and segmentation
6.4 Experiments and discussion
6.4.1 Implementation details
6.4.2 Data and evaluation
6.4.3 Discussion
6.5 Conclusion
7 Conclusion
7.1 Summary
7.1.1 Findings
7.1.2 Synthesis of the main results for Mboshi
7.2 Future work
7.2.1 Word alignment
7.2.2 Towards speech
7.2.3 Leveraging weak supervision
7.3 Perspectives in CLD




