Human-Machine Collaboration in Statistical Machine Translation

Get Complete Project Material File(s) Now! »

Exploitation of Human Knowledge

Once PE is finished, an MT system receives the feedback and exploits it: performs an online adaptation of its models according to the user new inputs. In the case of SMT this adaptation can be performed to model weights, as well as to the model itself (see section 3.3.1). The next sections will briefly describe technical details of those updates that vary depending on the amount of feedback. We will also present ways of exploiting PRE feedback in SMT. Then, we will turn our attention to the problem of compatibility of those updates with real-life PE scenarios. We will discuss the issue of the permanency of updates and of the selectiveness towards this feedback (see section 3.3.2). We will also describe some scenarios of using online updates for domain adaptation (see section 3.3.3), as well as the benefits of performing online updates to an automatic PE system, rather than to an MT system (see section 3.3.4).

Online Adaptation

The adaptation of system weights is usually performed by means of online learning (OL) mechanisms. OL, as opposed to batch learning, updates models on a per-example basis instead of going through the whole example set. OL takes the following steps:
1. OL receives an instance;
2. predicts its label.
3. receives the true label.
4. performs an update of the model.
Such online training schedules perfectly fit into the PE scenario, where the human-machine interaction usually happens on a per-sentence or per-document basis.

Selectiveness towards Human Feedback (Active Learning)

The influence of additional data on MT quality is a well-studied issue [Turchi et al., 2008; Gasc´o et al., 2012; Haddow and Koehn, 2012]. In the absence of real-time constraints, the utility of new data can be verified through the time-consuming process of system re-training and testing. Regular minor online updates make such verifications almost impossible. In a real-life setting, the process risks to become even more complex with the presence of multiple post-editors providing their feedback at the same time.
One of the approaches that addresses the issue is Active Learning (AL) [Settles, 2009]. It relates a group of methods for choosing the training samples that are most likely to improve a system. The initial goal of these techniques is to save human effort by proposing to annotate less data, for MT followed by PE the focus is usually switched to being selective towards the data used for updates. AL is commonly applied to SMT at the sentence level. The main intuition is to choose the most informative sentences for updates that contain a maximum amount of new information [Eck et al., 2005; Haffari and Sarkar, 2009; Haffari et al., 2009; Gonz´alez-Rubio et al., 2012; Du et al., 2015]. For instance, Eck et al. [2005] weight sentences according to the quantity of previously unseen frequent n-grams.

Automatic Post-Editing

Another solution to exploit human feedback consists in online adaptation of Automatic Post- Editing (APE) systems instead of MT models [Simard and Foster, 2013; Lagarda et al., 2015; Chatterjee et al., 2016, 2017a]. APE seeks to automatically correct errors in MT before it is presented to the user. Motivations to use APE systems are diverse: the fact that those systems have access to new information not available to an MT system and are able to perform more direct changes [Parton et al., 2012], they are also lighter and can be more easily adapted to a style or a domain [Chatterjee et al., 2017a].
APE can be SMT-based (“translation” from MT into PE) [Simard et al., 2007a,b], rule-based [Rosa et al., 2012] or neural-based [Pal et al., 2016, 2017]. For instance, Chatterjee et al. [2017a] simultaneously learn several domain-specific APE models from user feedback. The authors use a domain-aware sampling technique to build per-sentence APE models for each new MT output. When no relevant data is available to build a model, MT is not corrected. User feedback is exploited to update the rules containing the information on MT bi-phrases and their corrections. Those rules have the following form: ( ¯ f#¯e, PE). A dynamic knowledge base that stores positive (correct application of a rule) and negative (application of a rule in a wrong context, the rule not used by the decoder) statistics seeks to increase the reliability of APE modifications.
In our mind, APE scenarios are more suitable for production scenarios with experienced professional post-editors, who perform reliable corrections.

Computer-Assisted Translation Systems

All the components of the human-machine collaboration described above (PE, PRE, IMT, online updates) are commonly accommodated by Computer-Assisted Translation (CAT, also called machine-assisted, or machine-aided translation) environments. The primary goal of those environments is to help the human during the process of translation. Thus, the key component of such systems is a user-friendly interface.
Figure 3.5 shows an example of the translation process in the MATECAT tool. The interface is plain, giving easy access to useful functionalities: e.g., change of case, search, access to external help (glossaries, translation memory suggestions), etc. [Cattelan, 2014]. Figure 3.5 – Illustration of the translation process in the MATECAT tool Other CAT functionalities provide:
• PRE help: spell and grammar checkers; CL suggestions, paraphrase suggestions [Seretan et al., 2014; Wu et al., 2016].
• PE help: access to terminology databases [Sheremetyeva, 2014], electronic dictionaries, Internet searches, translation memory matches,13 concordancer searches,14 etc.; visualization of MT-related procedures: e.g., word alignments, n-best translation hypotheses [Koehn et al., 2015], etc.
• Optimization of editing: an e-pen can be used to post-edit translations by means of proofreading gestures (to imitate such actions as delete, move, substitute, etc.); voice commands can be used as well [Alabau and Leiva, 2014].

Table of contents :

Abbreviations
1 Introduction
1.1 Human-Machine Collaboration in Machine Translation
1.2 Towards a New Protocol for Improved Human-Machine Collaboration
1.3 Automatic Translation of Cochrane Medical Review Abstracts
1.4 Contributions
1.5 Thesis Outline
2 A Statistical Machine Translation Primer
2.1 Basic Principles and System Types
2.2 Phrase-Based Statistical Machine Translation
2.2.1 Formal Definition
2.3 The Translation Model
2.3.1 Word Alignments
2.3.2 Phrase-Table Building
2.3.3 Reordering Models
2.4 The Language Model
2.5 Scoring
2.6 Decoding
2.7 Automatic Evaluation
2.8 Summary
3 Human-Machine Collaboration in Statistical Machine Translation
3.1 Injection of Human Knowledge
3.1.1 Post-Edition
3.1.2 Pre-Edition
3.1.3 Quality Estimation and its Role in Post- and Pre-Edition
3.2 Interactive Machine Translation
3.3 Exploitation of Human Knowledge
3.3.1 Online Adaptation
3.3.2 Selectiveness towards Human Feedback (Active Learning)
3.3.3 Domain Adaptation
3.3.4 Automatic Post-Editing
3.4 Computer-Assisted Translation Systems
3.5 Summary
4 Diagnosing High-Quality Statistical Machine Translation within the Cochrane Context
4.1 Human Evaluation and Error Analysis
4.2 Automatic Evaluation and Error Analysis
4.3 Automatic Translation of Cochrane Review Abstracts
4.3.1 Cochrane Production Context and Corpus
4.3.2 Manual Error Analysis of Post-Edits
4.3.3 Cochrane High-Quality Statistical Machine Translation System
4.3.4 Methodology for Diagnosing High-Quality Machine Translation
4.3.5 Results and Analysis
4.4 Summary
5 Detection of Translation Difficulties
5.1 Methodology
5.1.1 Gold Annotations and Segmentations
5.1.2 Main Features
5.1.3 Classification Algorithms
5.2 Detection of Difficulties as a Classification Problem
5.3 Intrinsic Evaluation: Experiments in the MEDICAL domain
5.3.1 Data and Systems
5.3.2 Choice of the Classification Algorithm
5.3.3 Classifier Feature Evaluation
5.4 Intrinsic Evaluation: Experiments in the UN domain
5.4.1 Features
5.4.2 Data
5.4.3 System Building
5.4.4 Source Translation Difficulty Analysis
5.4.5 Classifier Feature Evaluation
5.5 Summary
6 Resolution of Translation Difficulties with Human Help
6.1 Pre-Edition vs. Post-Edition
6.2 Human-Assisted Machine Translation Protocol
6.3 Evaluation of Pre-Translation
6.4 HAMT: a Sentence-Level Scenario
6.5 Experiments in a Simulated Setting for MEDICAL
6.5.1 Comparison to Post-Edition
6.6 Experiments in a Simulated Setting for UN
6.7 HAMT: a Document-Level Approach
6.7.1 Document-Level Human-Assisted Machine Translation
6.7.2 Selection of Crucial Difficult-to-Translate Segments
6.7.3 Update of Translation Models
6.7.4 Cochrane Abstracts: Experiments in a Simulated Setting
6.7.5 Cochrane Abstracts: Experiments in a Real-life Setting
6.8 Summary
7 Conclusion and Perspectives
7.1 Contributions
7.2 Perspectives
Appendix A Extracts from the Cochrane Corpus
A.1 Cochrane Reference Corpus
A.2 Cochrane Post-editing Corpus 1
Appendix B Extracts of Cochrane API Code
Appendix C Extracts of Cochrane UI Code
Appendix D Examples of Medical Text Challenges
Appendix E Standard Features for Translation Difficulty Detection
E.1 List of word-level standard features
E.2 List of standard phrase-level features
Appendix F Feature Ablation Experiments
Appendix G Examples of the Impact on the Context
Appendix H Cochrane Review Abstract Pre- and Post-Edited by Humans
Appendix I Publications by the Author
Bibliography