EXISTING CORPORA OF ANNOTATED COMPARISONS AND SIMILES  Automatic Annotation of Similes in Literary Texts

Get Complete Project Material File(s) Now! »

Tversky’s Contrast Model

Tversky (1977) proposes to measure the similarity S of two elements a and b compared in the sentence “a is like b” by taking into account their similarities and their differences: S(a,b)= θf(A ∩B) – αf(A-B) – βf(B-A).
where A ∩ B corresponds to the set of features that are common to both a and b, A – B, the features that belong only to a and B – A, the features that only belong to b. If all the features of a and b are known, this model enables to determine which features are the most decisive in similarity statements. Imagine we have these two sentences: [s5] “This chair is like an armchair” and [s6] “This chair is like a boulder”. According to Goatly (2011), [s6] would be a simile as A – B2 does not equal to zero. Table 2.5 lists the salient features of all the elements compared while the similarities and differences between the objects compared are rendered in Figure 2.3.

Ortony’s Salience Imbalance Model

Examining the importance of similes in languages, Ortony (1975) observes that they help to achieve three main goals: compactness, vividness and formulating the inexpressible. Compactness refers, here, to the fact that similes make it possible to pack a whole range of implied meanings in a single word. These meanings are filtered in two steps: they are chosen first by salience and then by tension elimination, so that remain only the most distinctive traits of the standard of comparison that can be transferred to the comparee NP. According to Ortony, Vondruska, Foss and Jones (1985), the term salience has two acceptations:
– the relevance of an attribute in making a judgement in a particular domain.
– the importance given to an attribute of an object or a category.
Ortony (1979) obviously uses the latter sense when he uses salience as a distinctive factor between comparisons and similes. According to him, for a statement “a is like b” to be a comparison, A and B must share features that are very high-salient in both elements. Spoons and forks, for instance, are both utensils, that are held with a hand, and are used to eat. In contrast, in a simile, the features that A and B have in common should be high-salient in B, the standard of comparison, and low-salient in A, the compare NP. For instance, in “The girl is like a butterfly”, fluidity, flittiness, lightness and transience are more readily associated with butterflies than with girls. Therefore, to take into account feature salience, Ortony (1979) transforms the contrast model into the imbalance model: S(a,b )= θfB(A ∩B) – αfA(A-B) – βfB(B-A). where fA and fB correspond to the measures of salience of the set of features of A and B respectively.
Ortony (1978) also specifies that even though the compared elements in a similarity statement do not come from the exact domain, they can nonetheless be grouped together under a higher specific domain. Consequently, “Billboards are like spoons” could not be called a “sensible similarity statement” as “billboards” and “warts” could not be reunited under a single domain or category (p. 36). In contrast, in “Sally is like a block of ice”, both “Sally” and “block of ice” could describe elements that can both exhibit stiffness. In addition, in this last sentence, a transfer occurs between “coldness” referring to the temperature and “coldness” associated with lack of emotional response. In the three sentences given as examples, it is also possible to notice what Ortony (1978) refers to as “domain incongruence”, i.e. the comparee NP and the standard of comparison belong to distinct semantic categories. However, instead of being described as the source of figurativeness, “domain incongruence” is perceived as enhancing figurativeness in a similarity statement.
If Ortony’s theory characterises comparisons and similes, it fails, however, to do the same for the various structures in between that both share low features (“Billboards are like pears”), no features at all (“Chairs are like syllogisms”) or where the common features are high-salient in A and low salient in B (“Sleeping pills are like sermons”). Moreover, since this last type of similarity statement is described as being metaphorical, even though its metaphoricity is very low, why can it not be considered a simile?
According to Weiner (1984), a simile cannot be recognised only in terms of its low- and high-salient attributes, but rather by the fact that the attributes shared by the comparee NP and the standard of comparison are not strictly identical: the comparee NP can never possess those attributes exactly as the standard comparison but only in an approximate way. In this respect, “Blood vessels are like aqueducts” is a literal comparison and not a simile like Ortony (1978) claims because blood vessels and aqueducts function identically as channels. Similarly, Fishelov (1993) unveils some of the limits of Ortony’s theory (1978) when he considers the sentence “Goliath is like the Empire State Building” as a simile because although “height” is a salient attribute of both “Goliath” and “the Empire State”, the comparee NP is animate whereas the standard of comparison is inanimate. The different cognitive theories exposed in this section are undoubtedly oriented towards simile understanding and have in common the prominence they give to the standard of comparison, which is invariably described as the element that decides whether a statement is a simile or a comparison. Not only Ortony (1978) but also Glucksberg and Keysar (1990) particularly analyse simile components in terms of the “given-new strategy” (Clark & Haviland, 1977): while the comparee NP is known, the sentence segment containing the quality/quantity and the standard of comparison contains the new information that is conveyed about it. In this respect, they agree with rhetoricians on the pragmatic use of similes.

The Polysemy of “like” and “as”

Prototypical simile markers in English, “like” and “as”, can also have other pragmatic meanings than comparison. Of course, “like” can be an inflected form of the verb “to like”. Besides, as a preposition or conjunction, it can introduce:
– a quotation: and then, and then Kevin came up to me and said erm … if you if you go and see Mark this afternoon erm he would like to speak to you, I was like, he should come and speak to me.
– an approximation: My lowest ever [score] was like forty.
– an exemplification: I know but it wouldn’t be any point if someone wanted to be, like, a doctor and they got into a nursery place.
– hesitation: Alright. Erm, well like, I usually take the train about… twenty past.
– a metaphor: She’s like tearing the wall down.
– a hyperbole: We can like endlessly swear on it. (Andersen, as cited in Walaszewksa, 2013, p. 329-330).
According to the Oxford Advanced Dictionary Learner’s Dictionary of Current English (Hornby, 2000, p. 54), the morpheme “as” can be used as:
– a preposition signalling what somebody or something appears to be (e.g. They were all dressed as clowns. The bomb was disguised as a package), somebody’s job or role (I respect him as a doctor. Treat me as a friend) or something’s function (The news came as a shock).
– an adverb to signify a similarity in a situation (As always, he said little.).
– a conjunction that marks temporal simultaneity (As she grew older, she gained in confidence), causality (As you were out, I left a message), conformity in manner (I did as he asked), a comment or an additional information (As you know, Julia is leaving soon) and contrast (“as” means “though”: Happy as they were, there was something missing.).

READ  Intelligent Transportation Systems - Vehicular Networks 

Comparative Mining from a Semantic Perspective

Since comparative statements have been widely discussed by grammarians, it is not surprising that grammar plays a crucial role in the early computational approaches to comparative statements. Most of these proposed grammars, however, are generally oriented towards semantics and mainly geared towards comparative statement understanding. As far as such accounts are concerned, comparative detection is not meant as a separate task, but as a part of a whole system that works in combination with other language processing tools. Ballard (1988), for example, handles comparatives with “less than”, “more than”, “as long as”, “as many as” inside TELI, a question-answering system: the method he proposes uses rules and conceptual knowledge to simplify and rewrite the output of a sentence parse tree in order to obtain a logical expression that can be easily read by a computer (see Figure 3.2).
Like Ballard’s methods, most early works on comparatives in computational linguistics involve two main phases: the production of an intermediary representation of the comparative sentence and the transformation of this representation into a logical expressing using interpretation or writing rules (Staab, 1998). However, apart from Staab and Hahn (1997a, 1997b), the proposed model of semantic interpretation is not evaluated. These works also underline the strong connection between the syntax of a comparative statement and its semantics. As a matter of fact, several of these early research endeavours rely on linguistic theoretical descriptions of comparative constructions.

The Jindal and Liu’s Approach

Jindal and Liu (2006a) also use patterns to identify comparative sentences of the type “Car X is much better than car Y” in text documents. As they are particularly interested in opinions expressed with comparative sentences, they manually compile a list of 83 triggers that includes “beat”, “exceed”, “outperform”, “number one”, “set against”, “but”, “whereas”, “on the other hand”, “favour”, “prefer”, “win”, and of course “more than”, “less than”, “as…as”. Just with this list of markers, they report that they could identify 94% of the comparative sentences in their data set, the precision, however, was far lower, 32%, which means that a lot of sentences that are captured are not really comparative sentences. To solve this issue, Jindal and Liu (2006a) investigate manual rules, sequential rules based on part-of-speech tags and machine learning techniques. To generate their sequential rules, they consider the part-of-speech tag of each three words before and after each trigger. Then, the generated sequence is labelled as either comparative or non-comparative and stored in a database. In the last step, class sequential rules with a minimum confidence threshold are derived from the dataset. However, class sequential rules alone prove to not be sufficient enough to accurately recognise comparative sentences because a single sentence can meet several conflicting rules. Machine learning classifiers such as Naive Bayes were, therefore, used to tackle this problem and combined with class sequential and manual rules, they substantially outperform all other methods with an average precision of 77.3%, a recall of 81% and an F-Score of 79% on manually labelled sentences of three types of texts: review, articles and forums. Tested on other languages such as Korean (Yang & Ko, 2009), this method has also significantly improved the precision initially obtained with triggers alone.

Table of contents :

1 INTRODUCTION
1.1 STYLISTICS AND THE STUDY OF LITERATURE
1.2 INTRODUCING RHETORICAL FIGURES
1.3 RHETORICAL FIGURES AND COMPUTER-ASSISTED STUDIES OF LITERARY TEXTS .
1.4 SCOPE OF THE THESIS
1.5 MOTIVATION OF THE STUDY
1.6 ORGANISATION OF THE THESIS
2 SIMILES, COMPARISONS, METAPHORS AND FIGURATIVENESS
2.1 COMPARISON: SEMANTICS AND SYNTAX
2.1.1 Comparison in Rhetoric
2.1.2 Grammatical Expressions of Comparisons
2.2 COMPARISONS AND SIMILES
2.2.1 Comparisons of Inequality and Similes
2.2.2 Cognitive Accounts of Similes and Comparisons
2.3 FIGURATIVE SIMILES
2.4 METAPHOR AND SIMILES
3 COMPUTATIONAL APPROACHES TO SIMILE DETECTION
3.1 CHALLENGES OF COMPUTATIONAL DETECTION OF SIMILES
3.1.1 Markers’ Polysemy
3.1.2 Comparison and Ellipsis
3.2 COMPUTATIONAL APPROACHES
3.2.1 Automatic Detection of Comparatives
3.2.2 Detection and Analysis of Non-Literal Comparisons
3.2.3 Automatic Detection of Similes
4 SIMILE ANNOTATION
4.1 PRINCIPLES
4.1.1 Types of Linguistic Annotations
4.1.2 The TEI as the Annotation Standard in the Humanities
4.2 SIMILE DESCRIPTION IN LITERARY STUDIES
4.2.1 The Structural Dimension……..
4.2.2 The Semantic Dimension
4.3 EXISTING CORPORA OF ANNOTATED COMPARISONS AND SIMILES  Automatic Annotation of Similes in Literary Texts
4 Suzanne Mpouli – November 2016
5 THE PROPOSED APPROACH
5.1 A GRAMMAR OF THE SIMILE
5.2 THE SYNTACTIC MODULE
5.3 THE SEMANTIC MODULE
5.4. THE ANNOTATION MODULE
6 TOWARDS AN ANNOTATED LITERARY CORPUS OF SIMILES
6.1 CORPUS PRESENTATION
6.2 EXPERTS’ ANNOTATION
6.3 THE CROWDSOURCING ANNOTATION PLATFORM
7 CORPUS-BASED APPLICATIONS
7.1 CORPUS DESCRIPTION
7.2 STEREOTYPICAL FROZEN LITERARY SIMILES
7.3 COLOURS AND SIMILES IN THE ENGLISH CORPUS
7.3.1 Why Study Colours in relation to Similes?
7.3.2 Basic Colour Terms and English Literature
7.3.3 Fully Fledged Colour Similes vs. Noun+CT Similes: Frequency and Stylistic Usage
7.3.4 Creativity and Noun+CT Similes
7.4 ON PROPER NOUNS IN COMPARATIVE CONSTRUCTIONS
8 CONCLUSION AND FUTURE WORK
9 REFERENCES
10 APPENDICES

GET THE COMPLETE PROJECT

Related Posts