(Downloads - 0)
For more info about our services contact : help@bestpfe.com
Table of contents
Remerciements
Contents
Introduction
Current practices in bitext alignment
Issues and challenges
Improving alignments with discriminative techniques
I Bitext Alignment
1 The Alignment Problem: An Overview
1.1 Bitext Alignment
1.2 Translation and Alignment
1.2.1 Identifying the Translation Unit
1.2.1.1 Meaning-language interface
Words and concepts
Word lexical ambiguity
Word order
1.2.1.2 Translation strategy
1.2.2 Translation Units and Alignment Difficulty
1.2.3 Translation Unit and Alignment-Context Bound
1.3 Alignment Granularity
1.3.1 Document Alignment
1.3.2 Sentence Alignment
1.3.3 Sub-sentential Alignment
1.3.3.1 Word alignment
1.3.3.2 Phrase alignment
1.3.3.3 Structure and tree alignment
1.4 Applications
1.5 A Generic Framework for Alignment
1.6 Alignment Space and Constraints
1.6.1 Segment Constraints
1.6.1.1 Contiguity constraints
1.6.1.2 Length constraints
1.6.1.3 Structural constraints
1.6.2 Alignment Constraints
1.6.2.1 Structural constraints
1.6.2.2 Range constraint
1.6.2.3 Functional constraints
1.6.2.4 Bijectivity constraints
1.7 Evaluation Methods
1.7.1 Intrinsic Measures
1.7.1.1 Alignment Error Rate (AER)
1.7.1.2 Balanced F-measure
1.7.1.3 Other word-level measures
1.7.1.4 Phrase-level measures
1.7.2 Extrinsic Measures
1.7.3 Correlation
1.8 Summary
2 Alignment Models
2.1 Word-Based Alignment Models
2.2 Asymmetric One-to-Many Methods
2.2.1 Heuristic Alignments
2.2.2 Unsupervised Generative Sequence Models
2.2.2.1 Conditional Bayesian networks
Parameter estimation
Expectation-Maximization (EM)
IBM model 1
Inference and EM
Limitations
IBM Model 2
Hidden Markov Model (HMM) alignment
Inference and EM
IBM model 3
Inference and EM
IBM model 4 and beyond
Local log-linear parameterization
Discussion
2.2.2.2 Conditional Random Fields
Inference
Unsupervised parameter estimation
2.2.3 Supervised Discriminative Sequence Models
2.2.3.1 Maximum entropy models
2.2.3.2 Conditional Random Fields
Supervised parameter estimation
2.2.3.3 Large-Margin methods
2.3 Symmetric Many-to-Many Methods
2.3.1 Symmetrization and Alignment Combination
2.3.1.1 Symmetrization heuristics
Grow-diag-final-and (GDFA)
Generalizing the symmetrization
Application-driven combination
2.3.1.2 Agreement constraints
2.3.1.3 Discriminative combination
2.3.2 Weighted Matrix Based Methods
2.3.2.1 Minimum Bayes-risk decoding
2.3.2.2 One-to-many constraints
2.3.2.3 One-to-one constraints
2.3.2.4 Alignment as assignment
2.3.2.5 Alignment as matrix factorization
2.3.3 Generative Many-to-Many Models
2.3.4 Global Discriminative Models
2.3.4.1 CRF-based matrix modeling
2.3.4.2 Other models
2.4 Syntactic and Hierarchical Alignments
2.4.1 Inversion Transduction Grammars
2.4.2 parameterization and Learning
2.4.3 Syntactic Constraints
2.4.4 Other Syntax-Based Models
2.5 Phrase-Based Alignment Models
2.5.1 Bisegmentation
2.5.1.1 Generative models
Hidden semi-Markov models
The degeneracy problem
2.5.1.2 Bayesian models
2.5.1.3 Discriminative models
2.5.2 Generalized Phrase Alignment
2.5.2.1 Extraction heuristics
The standard approach
Weighted phrase-based matrix
2.5.2.2 Translation spotting
2.5.2.3 Discriminative models
2.6 Features
2.6.1 Type
2.6.2 Indicators of alignment
2.6.3 Scope
2.7 Summary
3 Phrase based SMT
3.1 Phrase-Based Translation Model
3.2 Modeling and Parameter Estimation
3.2.1 Discriminative Translation Models
3.2.2 Bilexicon Induction
3.2.3 Features
3.2.4 The Phrase Table
3.2.5 Learning in Discriminative Models
3.3 Decoding
3.4 Evaluating Machine Translation
3.5 Summary
II Improving Alignment with Discriminative Learning Techniques for Statistical Machine Translation
Research Statement
4 MaxEnt for Word-Based Alignment Models
4.1 Word Alignment as a Structured Prediction Problem
4.2 The Maximum Entropy Framework
4.3 Minimum Bayes-Risk Decoding
4.4 Parameter Estimation
4.5 The Set of Input Links
4.6 Features
4.6.1 Word Features
4.6.2 Alignment Matrix Features
4.6.3 Partitioning Features
4.7 Stacked Inference
4.7.1 The Stacking Algorithm
4.7.2 A K-fold Selection Process
4.7.3 Stacking for Word Alignment
4.8 Experimental Methodology
4.8.1 Experimental Setup and Data
4.8.2 Arabic Pre-processing
4.8.3 Remappings Alignments
4.9 Results
4.9.1 Comparison to Generative “Viterbi” Alignments
4.9.1.1 Baselines: IBM and HMM models
4.9.1.2 MaxEnt and stacking
4.9.2 Pruning and Oracle Study
4.9.3 Discriminative Training Set Size
4.9.4 Features Analysis
4.9.4.1 First feature group
4.9.4.2 Second feature group
4.9.5 Precision-Recall Balance
4.9.6 Regularization
4.9.7 Search Space and Window Size
4.9.8 Input Alignments Quality
4.9.9 Model and Feature Selection
4.9.10 A Comparison with Weighted Matrix Based Alignments
4.9.10.1 Viterbi IBM and HMM models
4.9.10.2 N-best heuristic
4.9.10.3 PostCAT
4.9.10.4 CRFs
4.9.10.5 MaxEnt
4.10 Error Analysis
4.11 Summary
5 MaxEnt Alignments in SMT
5.1 Phrase Table Construction
5.1.1 A General Framework
5.1.2 Viterbi-Based (Standard) Approach
5.1.3 WAM-based Instantiation
5.1.3.1 Evaluation and counting functions
5.1.3.2 Alignment constraints and selection criteria
5.1.3.3 Translation model scores
5.2 Experiments
5.2.1 Viterbi-Based Extraction
5.2.1.1 Large scale systems
MaxEnt vs. IBM and HMM models
Correlation between AER and BLEU
5.2.1.2 A study of alignment characteristics
5.2.2 Weighted Matrix Based Extraction
5.2.2.1 Results and discussion
MGIZA++
N-best WAM
PostCAT
CRF
Maximum Entropy (MaxEnt)
5.2.2.2 Discussion
5.3 Summary
6 Supervised Phrase Alignment with SCC
6.1 Supervised Phrase-Pair Extraction
6.1.1 Single-Class Classification (SCC)
6.1.2 Phrase Translation Model Training Algorithm
6.1.3 Balancing Precision and Recall
6.2 Learning the Single-Class Classifier
6.2.1 One-Class SVM (OC-SVM)
6.2.2 Mapping Convergence (MC)
6.2.3 ˆPP Measure and Classifier Selection
6.3 Oracle Decoder for Building the Set of Positive Examples
6.4 Feature Functions
6.4.1 Weighted Alignment Matrix (WAM)
6.4.2 Word Alignments (WA)
6.4.3 Bilingual and Monolingual Information (BI, MI)
6.4.4 Statistical Significance (Pval)
6.4.5 Morpho-Syntactic Similarity (MS)
6.4.6 Lexical Probability (LEX)
6.5 Experiments
6.5.1 Data and Experimental Setup
6.5.2 Classification Performance: ˆPP
6.5.3 Translation Performance: BLEU
6.5.3.1 Phrase pairs scoring method
6.5.3.2 Using additional phrase table features
6.5.4 Discussion
6.6 Summary
Conclusion
Contributions
Future Work
Publications by the Author
Bibliography



