Using contextual information for Machine Translation: strategies and evaluation

Get Complete Project Material File(s) Now! »

Ambiguity and the problem of translation

To better understand why ambiguity is a problem, it is worth rst taking a step back to re ect on what the process of translation involves from a theoretical point of view. Translation is the transfer of a segment of text from one language into another, preserving as best as possible the intended meaning of the original segment. Our de nition of meaning is very wide and refers to the communicative intention and content of an utterance: its semantic and pragmatic content, speaker attitude, style, formality, etc. The size of the segment depends on the translation situation, and, if performed by a machine, on the computational and modelling capacity of the machine and of the method used: it may be a whole text, a paragraph, a sentence or even a single word. Whilst humans typically translate whole texts, we shall see in the next chapter that MT systems must work with much smaller segments for computational reasons.1 Ambiguity arises in the translation process when there is a choice between several formulations for a given input segment. However large the segment is, the potential for ambiguity is always there. One way in which it can be reduced is by increasing the size of the segment being translated. This equates to adding more textual content that may provide some of the information necessary to disambiguate the ambiguous elements within the segment.
Translation is particularly di cult, because there are multiple stages at which ambiguity can arise, which require di erent types of context to be resolved. The reason for this is that translation involves two di erent language systems. Each language individually has the potential for ambiguity. However, additional ambiguity emerges between the two systems due to di erent conceptual mappings in the languages, as we shall discuss shortly. Figure 2.1 gives a simpli ed representation of these three types of ambiguity, which we will describe in more detail below: (i) source language ambiguity, (ii) cross-lingual meaning transfer ambiguity and (iii) target language ambiguity.

Source language ambiguity

The rst type (step (i) in Figure 2.1) is similar to the ambiguity encountered by any NLP analysis task when dealing with a single language. It concerns the semantic interpretation of the source segment and the fact that a single source segment can have multiple interpretations and meanings. Examples include syntactic ambiguity, homonymy and polysemy, as mentioned above. Two such examples are given below in Examples (5)and (6), in which an inherent ambiguity in English must be resolved in the French translation: lexical semantic ambiguity in the rst example and lexical semantic and syntactic ambiguity in the second.
(5) EN: It was far too steep.
FR: C’était bien trop cher/raide.
(6) EN: Christopher Robin saw her duck.
FR: Christopher Robin a vu son canard vs. Christopher Robin l’a vu baisser la tête.
Gloss: ‘Christopher Robin saw the duck belonging to her vs. Christopher Robin saw her lower her head.’ Unlike in NLP tasks concentrating on monolingual disambiguation, ambiguity present in the source language does not necessarily need to be resolved if the ambiguity can be preserved in the target language. This is highly dependent on the language pair involved in translation. Take for example the potential lexical disambiguation of the polysemous English word glass: glass can be translated into French using the same word verre whether the meaning is the solid, transparent material or the drinking receptacle. These two speci c meanings therefore do not need to be disambiguated for this language pair. Similarly, the two separate meanings of the homonymous English word crane ‘wading bird’ or ‘hoisting machine’ can also be expressed using a single word grue in French. In cases such as these, the inherent ambiguity does not pose a problem for translation, because disambiguation, a choice between several target forms corresponding to each of the two meanings, is not necessary. The same cannot be said however for the translation of crane into Spanish, because of the necessity to disambiguate between the two forms grúa ‘hoisting machine’ and grulla ‘wading bird’, corresponding to the two meanings of crane. We will therefore focus only on ambiguity that is relevant in the translation process (speci c to a language direction), which needs to be resolved.

Ambiguity and the problem of translation

The second type of ambiguity (step (ii) in Figure 2.1) is speci c to translation and concerns the passage from the meaning in the source language to the meaning in the target language. Ambiguity can arise during this transfer due to di erences and mismatches in the conceptual spaces of the source and target languages. A simple example is the translation of English owl into French, which in everyday usage does not have a perfectly equivalent translation, there being instead two words hibou and chouette used to refer to two subspecies of owl. A similar example is given in Example (7), for the translation of the English word river, which in French must be translated as either euve or rivière depending on whether or not the river ows into the sea.
(7) EN: They went swimming in the river.
FR: Ils ont nagé dans le euve/la rivière
Gloss: ‘They swam in the river ( owing into the sea)/the rive (tributary of another river)’ This type of ambiguity is famously seen in the di erences in perception of colours and their naming conventions. The mapping of colours to colour names is not universal, as shown by the illustrations in Figure 2.2, adapted from (Regier et al., 2007). Translating colour names between languages whose conceptual mapping is di erent is therefore a complex feat requiring cultural knowledge.
Other common problems are linked to concepts that are highly associated to a particular culture or country, such that the concept is consequently language-speci c. For example, there is no simple bijective semantic mapping between English lawyer, sollicitor and attorney on the one hand and the French avocat, juriste and notaire on the other, since the functions are speci c to the legal systems of the countries in which the languages are spoken.
Other, obvious examples of problematic elements for translation are national specialties such as pasty, haggis and scone, which do not have translation equivalents in most languages. As such, they do not pose a problem for ambiguity in the traditional sense, as it is more a case that there is no translation equivalent rather than several to choose from. However they do pose a problem of conceptual mapping, as in the previous examples. Social conventions involving politeness (which constitute useful information to be communicated), such as the use of honori cs (for example French tu ‘youinformal’ and vous ‘youformal’), also t into this category. These do not always exist in the same form across languages. For example English you is used for all second person references regardless of familiarity or politeness, and on the other end of the scale, Japanese has ahighly complex and hierarchical honori c system. Generating translations that correctly take such distinctions into account can be seen as essential for communicating the correct style and attitude.

Human versus machine translation

What has been discussed so far is not speci c per se to the problem of machine translation. A human translator is faced with much of the same ambiguity and must use all available information to correctly choose the correct interpretation of the source segment and the most adequate form in the target language. The added di culty for an MT system is that the information available for translation is more restricted than for the human translator, who is equipped with social and cultural knowledge, and is more likely to have access to the entire text when translating. The most restricted scenario possible is the translation of words on an individual basis, without taking into account any surrounding context, and much of the di culty in MT is also allowing access to surrounding context when translating. The idea of using neighbouring words to help disambiguation in word-based translation is far from new. In his 1949 memorandum, entitled Translation, Weaver (1955) writes: . . . if . . . one can see not only the central word in question, but also say N words on either side, then, if N is large enough one can unambiguously decide the meaning of the central word.
As astutely remarked by Bar-Hillel (1960), this may be true for “intelligent readers” such as humans, but is insu cient for “electronic machines”, which lack the encyclopaedic knowledge necessary to use the context in a reasoned manner. Whilst humans can use context to guide their interpretation of a translation segment, the task is much more di cult for a machine, which must also be provided with a mechanism for using contextual information. MT systems must nd ways of approximating the transfer of meaning from one language to another, which generally means learning a correspondence between the wordforms of the source and target segments. The way in which wordforms are modelled and mechanism by which the correspondence is learnt determine how expressive the models can be. As will be discussed in detail in Chapter 3, much of this progress has been achieved thanks to (i) a better form of representing wordforms, the minimal unit of translation, and (ii) changes within MT architectures, enabling an expansion of the size of the translation segment (from words to phrases and then to sentences) resulting in a better use of context within the sentence.

READ THE CURRENT CRISIS IN CORRECTIONS: THE FAILURE OF IMPRISONMENT

The importance of context in MT

A major drawback of most MT systems until now has been that the maximal translation unit has been the sentence, which amounts to translating sentences independently of each other. Beyond the level of the sentence, the context of the sentences themselves is most often ignored, both within the text (linguistic context) and outside of the text (extra- linguistic context). For certain sentences, this means that the correct translation remains out of reach, however well the intra-sentential linguistic content is modelled. Take for instance Examples (9) and (10).
(9) EN: My sentence doesn’t need context to be correctly translated.
FR: Ma phrasefem n’a pas besoin de contexte pour être traduitefem correctement.
(10) EN: But mine does.
FR: Mais lafem miennefem si.
FR: #Mais lemasc mienmasc si.
Whereas the English source sentence in Example (9) can be correctly translated into French without the need for extra information, in Example (10), the correct French translation of mine requires knowing the grammatical gender of its antecedent (the French word phrase ‘sentence’) in order to choose the correct translation, the feminine variant la mienne ‘minefem’, over the erroneous masculine variant, le mien ‘minemasc’ (marked with a # indicating that it is discursively inaccurate).2 This example illustrates the fact that a text may be structured syntactically into sentences, but is above all a coherent unit, in which discourse phenomena and links span across sentence boundaries. Ambiguity within a sentence may be resolvable with intra-sentential context, but this is not always the case, and it is important to be able to look beyond the sentence to context within the surrounding sentences and even outside the text, to better guide translation.

Coherence-based phenomena

A text is coherent when it is logical and semantically consistent, and the sentences within it form a logical succession of ideas that are relevant and well linked. When translating a text from one language to another, the text’s coherence should be preserved regardless of the fact that translation may be performed on a segment smaller than the whole text, for example at the sentence level. In the speci c case of translation, we shall asume that the source text is already a coherent text, and therefore the task of producing a coherent translation amounts to conserving the coherent nature of the text as best as possible.
The di culty when translating is the fact that the language systems of the source and target languages di er, creating potential ambiguity, which, if unresolved, could lead to an incoherent translation. Here we discuss three di erent aspects contributing to discourse coherence: lexical coherence (concerning the semantically relevant word choice), the translation of discourse connectives, and information structure (how information within a sentence is packaged).
Lexical coherence (and word sense disambiguation) Lexical coherence concerns the semantic connections between the words of the text, and therefore how well a particular lexical choice ts semantically (and pragmatically) within the current discourse. Choosing the correct translation in keeping with lexical coherence means choosing targets forms for words that together preserve their source meaning, despite possible ambiguity either in the source language or in the conceptual mapping between the source and target languages. In this respect, ensuring lexical coherence is treated within this thesis as equivalent to lexical disambiguation in the context of discourse. The cases on which we shall focus are therefore those in which the textual elements are ambiguous with respect to their translation, concerning both the rst and second kinds of ambiguity in Figure 2.1. At the very beginning of this chapter, we cited a number of examples of ambiguity types in natural language: morpho-syntactic ambiguity, syntactic ambiguity and semantic ambiguity. Yet here, in the context of translation, we only appear to discuss one of these types, lexical (semantic) ambiguity. The reason for this is that many forms of ambiguity are not present without there also being lexical ambiguity of some sort within the sentence. If this lexical ambiguity is resolved, this also often disambiguates the other forms of ambiguity. A clear example of this is the previously mentioned English sentence I saw her duck, which contains ambiguity on three linguistic levels: (i) morpho-syntactic ambiguity of her and duck (her as either an object pronoun or a possessive pronoun and duck as either a noun or a verb), (ii) syntactic ambiguity (her duck as the direct object of saw or her as the object of saw and subject of duck) and (iii) lexical ambiguity of the same two words (duck signifying either the bird or an action of lowering one’s head). If we were to translate this sentence into another language in which these ambiguities cannot be preserved, the choice between the two interpretations could in practice be made based uniquely on the disambiguation of the single lexical item duck. If an element of context enables us to ascertain that duck refers to the bird (or that it refers to lowering one’s head), then the morphological and syntactic ambiguities are instantly resolved. It is often far easier to consider such examples in this light, because it simpli es the ways in which we perform disambiguation. As we shall see in the following chapter, the standard MT systems we will be using in this thesis do not rely on explicit morphological, syntactic or semantic analysis, and therefore all ambiguity comes down to a choice of the best sequence of translated wordforms given the other word choices within the sentence.

Table of contents :

Introduction and overview
1.1 Motivation for Contextual Machine Translation
1.2 Structure and detailed summary of this thesis
1.3 Publications related to this thesis
I State of the Art: Contextual Machine Translation
2 The Role of Context
2.1 Ambiguity and the problem of translation
2.1.1 Source language ambiguity
2.1.2 Cross-lingual meaning transfer ambiguity
2.1.3 Target language ambiguity
2.1.4 Human versus machine translation
2.2 The importance of context in MT
2.2.1 What is context?
2.2.2 Nature and use of context
2.3 Conclusion
3 Sentence-level Machine Translation
3.1 Statistical Machine Translation (SMT)
3.1.1 Word alignments
3.1.2 Phrase-based translation models
3.1.3 Domain adaptation
3.1.4 Successes and Limitations of SMT
3.2 Neural Machine Translation (NMT)
3.2.1 Neural networks for NLP
3.2.2 Sequence-to-sequence NMT
3.2.3 Sequence-to-sequence NMT with attention
3.2.4 Recent advances in NMT
3.2.5 Successes and limitations
3.3 Evaluating Machine Translation
3.3.1 Issues in human evaluation of MT quality
3.3.2 Standard automatic evaluation metrics
3.3.3 Discussion
4 Contextual Machine Translation
4.1 Evaluating contextual MT
4.1.1 Problems associated with automatic evaluation of context
4.1.2 MT metrics augmented with discourse information
4.1.3 Conclusion
4.2 Modelling context for MT
4.2.1 Modelling context for SMT
4.2.2 Modelling context for NMT
4.3 Translation using structured linguistic context
4.3.1 Anaphoric pronouns
4.3.2 Lexical choice
4.3.3 Discourse connectives
4.3.4 Whole document decoding
4.4 Translation using unstructured linguistic context
4.5 Translation using extra-linguistic context
4.6 Conclusion on evaluating contextual MT
II Using contextual information for Machine Translation: strategies and evaluation
5 Adapting translation to extra-linguistic context via pre-processing
5.1 Integrating speaker gender via domain adaptation
5.1.1 Annotating the The Big Bang Theory reproducible corpus
5.1.2 SMT models: baselines and adaptations
5.1.3 Manual analysis and discussion
5.1.4 Conclusion on data partitioning
5.2 Conclusion
6 Improving cohesion-based translation using post-processing
6.1 Preserving style in MT: generating English tag questions
6.1.1 Tag questions (TQs) and the diculty for MT
6.1.2 Improving TQ generation in MT into English: our post-edition approach
6.1.3 Results, analysis and discussion
6.1.4 Conclusion to our tag-question expriments
6.2 Anaphoric pronoun translation with linguistically motivated features
6.2.1 Classication system: description and motivation
6.2.2 Results, analysis and discussion
6.2.3 Conclusion to pronoun translation via post-edition
6.3 General conclusion on post-edition approaches
7 Context-aware translation models
7.1 Translating discourse phenomena with unstructured linguistic context .
7.1.1 Hand-crafted test sets for contextual MT evaluation
7.1.2 Modifying the NMT architecture
7.1.3 Evaluation results and analysis
7.1.4 Conclusion and perspectives
7.2 Contextual NMT with extra-linguistic context
7.2.1 Creation of extra-linguistically annotated data
7.2.2 Contextual strategies
7.2.3 Experiments
7.2.4 BLEU score results
7.2.5 Targeted evaluation of speaker gender
7.2.6 Conclusion and perspectives
7.3 Conclusion
8 DiaBLa: A corpus for the evaluation of contextual MT
8.1 Dialogue and human judgment collection protocol
8.1.1 Participants
8.1.2 Scenarios
8.1.3 Evaluation
8.1.4 MT systems and setup
8.2 Description of the corpus
8.2.1 Overview of translation successes and failures
8.2.2 Comparison with existing corpora
8.3 Evaluating contextual MT with the DiaBLa corpus
8.3.1 Overall MT quality
8.3.2 Focus on a discourse-level phenomenon
8.4 Perspectives
8.4.1 Language analysis of MT-assisted interaction
8.4.2 MT evaluation
Conclusion and Perspectives
9 Conclusion and Perspectives
9.1 Conclusion
9.1.1 Trends in contextual MT and the impact on our work
9.1.2 Review of our aims and contributions
9.2 Perspectives
9.2.1 Evaluation of MT
9.2.2 Interpretability of contextual NMT strategies
9.2.3 Contextual MT for low resource language pairs
9.2.4 Contextual MT to Multimodal MT
9.2.5 Conclusion: To the future and beyond the sentence
Appendices
A Context-aware translation models
A.1 Translating discourse phenomena with unstructured linguistic context .
A.1.1 Training and decoding parameters
A.1.2 Visualisation of hierarchical attention weights
A.2 Contextual NMT with extra-linguistic context
A.2.1 Experimental setup
B DiaBLa: A corpus for the evaluation of contextual MT
B.1 Role-play scenarios
B.2 Dialogue collection: Final evaluation form
Bibliography