Corpora and Corpus Linguistics 

Get Complete Project Material File(s) Now! »

Chapter 4 Formulaic language and multiword units

The realisation that words act less as individual units and more as parts of lexical phrases in interconnected discourse is one of the most important new trends in vocabulary studies (Schmitt 2000:78).


This chapter forms the final part of the literature review in this study. It discusses what is, in effect, the main focus of Phase 2, the delexical multiword unit (MWU). Accordingly, the chapter starts by providing some background to the growth in interest in the phenomenon of formulaic language in general in research over the last 30 years or so (§4.2). It documents the move from an awareness of the fixedness, or ‘formulaicity’ (Wray and Perkins 2000:1), of some language to the understanding expressed in the present study that formulaic language is not ‘a single category’ different from language freely generated by rules, but rather a term which covers all ‘significant features of word combinations’ (Howarth 1998a:25).
In the next section, the focus moves to collocation, a specific type of word combination which relates closely to the type of delexical MWU investigated in this study. Issues of definition of this construct and learning issues are discussed and influential studies reviewed (§4.3). In the last section, the focus narrows to the circumscription of the main construct under investigation in this study, the delexical MWU (§4.4). The definition of this combination as it is used here and the researchers who were most influential in this process are discussed, and the section ends with a review of the research studies on high-frequency verbs in general and on delexical uses of such verbs in particular that have informed this study.
By focusing on formulaic language in general and on MWUs containing delexicalised high-frequency verbs in particular, this chapter ties together the discussions in the previous two chapters: it reflects on the shift that has taken place in vocabulary studies, from discrete-item tests and the view of lexis as individual words towards the notion that words are integral parts of larger discourse, and it documents the influence that corpus studies have had on this move and the way in which corpus analysis has allowed researchers to isolate the type of delexicalised MWU studied in this chapter.


As noted in Chapter 3 (§3.4), one of the most important findings to come out of studies of vocabulary in the last few decades, and from corpus research in particular, is that ‘language is made up of not only individual words, but also a great deal of formulaic language’ (Martinez and Schmitt 2012:299). This section provides some background to this development by discussing the growing awareness of the all-pervasive nature of formulaicity in language (§4.2.1), the challenges scholars have faced when trying to define formulaic language (§4.2.2) and the difficulties formulaic language poses for learners of English (§4.2.3).

The ubiquity of formulaic language

Research into formulaic language and MWUs has increased significantly in the last three decades, influenced in no small part by the increased use of computerised methods and corpora in linguistic studies (see Chapter 2). The focus of vocabulary studies has changed; a great deal of research has been devoted to explaining various lexical patterns (formulaic sequences, idioms, collocations, sentence stems, for example) based increasingly on corpus evidence. Software has been developed which allows researchers to lemmatise their corpora, to establish frequencies and generate concordances of specific words, and to identify collocational tendencies and many other aspects of their corpora. As Kaszubski (2000:2) observes, ‘theory and corpus-based practice have shown that aspects of lexicon, phraseology and style are intertwined’.
Researchers have focused on MWUs and formulaic language, what Granger (1998b:145) calls prefabricated language or ‘prefabs’, and ‘conventionalized language’ (1998b:146), because of their frequency and because they are important to the native-like production of language (Cowie 1992; Fan 2009; Granger 1998b; Hunston and Francis 2000; Nesselhauf 2003; Wray 2002). Such chunks of language are also important to idiomaticity, which Kaszubski (2000:1) says is operationalised in the literature by the properties of ‘non-compositionality of meaning and structure […] and conventionality and naturalness’, or salience. In the words of Pawley and Syder (1983:91), ‘fluent and idiomatic control of a language rests to a considerable extent on knowledge of a body of “sentence stems” which are “institutionalised” or “lexicalised”’. Cowie (1992:10) adds that ‘it is impossible to perform at a level acceptable to native users, in writing or in speech, without controlling an appropriate range of multiword units’. And Renouf and Sinclair (1991:143) provide ‘evidence in support of a growing awareness that the normal use of language is to select more than one word at a time, and to blend such selections with each other’.
Sinclair’s work and the COBUILD project (Sinclair 1991) (see § made his approach to phraseology familiar to researchers in the field of English language and linguistic studies. Sinclair focused on recurrent co-occurrences of words in a body of texts and drew on Firth’s concept of ‘meaning by collocation’ (Howarth 1998a:26). The more frequent such an occurrence, the more significant it was considered to be in the language; for this reason, larger corpora that provided more data produced more reliable results. Howarth (1998a:26) sounds a word of caution here, however, in that ‘a notion of significance based solely on frequency risks giving unwarranted emphasis to completely transparent collocations such as “have children”’. In other words, the researcher needs to establish criteria to identify exactly what sort of combination qualifies as a meaningful, non-compositional collocation; if it is to be useful, ‘the notion of phraseological significance needs to take into account the differences between phraseological types and to consider how they are processed by native and non-native speakers and writers in production’ (Howarth 1998a:27).
Sinclair’s (1991) open-choice and idiom principle helped to concretise the growing awareness of the formulaic nature of language. This principle is based on the notion that, on the one hand, a language user has a huge choice of what words to use when saying or writing something, restricted only by the grammatical acceptability of the production, but that, on the other hand, there are also a large number of semi-preconstructed phrases, constituting single choices, which the language user could choose (Sinclair 1991:109–110). This is now supported by research findings: although studies differ hugely in the proportions of formulaic language they report, it is now generally accepted that there is far more lexical patterning and widespread collocation in language than was previously realised (Howarth 1998b; Hunston and Francis 2000). For instance, Altenberg (1998:102) estimates that over 80% of the words in the London-Lund Corpus ‘form part of recurrent word-combination in some way or another’. While Moon (1998b), on the other hand, found that only 4% and 5% of the Oxford Hector pilot corpus of over 18 million words was made up of fixed expressions and idioms respectively; part of this discrepancy may lie in her more narrowly defined concepts. In a study conducted by Erman and Warren (2000) to explore the impact that prefabricated language has on the structure of a text and on the effort involved in encoding and decoding it, the authors found that there were large amounts of prefabricated language in both spoken and written texts (making up on average around half of the texts they investigated: 58.6% and 52.3% respectively). This ‘makes it impossible to consider idioms and other multi-word combinations as marginal phenomena’ (Erman and Warren 2000:29). These variations in the estimates of the proportion of formulaic language in any given corpus are a reminder of the complexities of formulaic language and its research: there is a host of definitions, many of which are superficially very similar.
Such formulaic expressions are often difficult for learners to understand, even when native speakers would regard them as fairly transparent (Martinez and Schmitt 2012). They also occur frequently in academic discourse, making them particularly important for learners of English in higher education contexts. Studies in phraseology such as those by Altenberg (1998), Gläser (1998) and Howarth (1998a, b) have revealed the blurring of the boundaries between grammar and lexis. Because of their functional importance, knowledge of MWUs is essential for pragmatic competence (Schmitt 2000:101). Shirato and Stapleton (2007), citing McCarthy and Carter (2002, cited in Shirato and Stapleton 2007:409), claim that many high-frequency clusters occur with greater frequency than some common single words and pose great difficulties for ESL learners (see §4.2.3).
Thus, scholars are in agreement on the importance of formulaic language, but as they have used different criteria to establish exactly what makes something formulaic and may apply different terminology to these units, studies in this area are very difficult to compare (Wray 2002:28). For this reason, in the next section of this chapter an attempt is made to describe various researchers’ conceptualisations of word combinations in general and to provide a clearer picture of the definitions and explanations these scholars have settled on.

 Pinning down the phenomenon

As observed above, research in the last three decades or so has seen growing consensus on the formulaic nature of language, and the view that a great deal of text is made up of ‘non-arbitrary and non-random phrases and patterns’ (Kaszubski 2000:2) is generally accepted by scholars. With this consensus has come increased research and a plethora of terms and definitions for such patterns. Some studies have focused mainly on spoken data, and Wray (2000, 2002) is a particularly authoritative voice here. Studies in this area (Nattinger and De Carrico 1992; Schmitt and Carter 2004; Wray 2000, 2002; Wray and Perkins 2000) tend to focus on the pragmatic aspect of what are often termed formulaic sequences. Then there are those scholars who have focused more on written data, and in these studies a great deal of work has been done on lexical collocations. Such studies include those by Howarth (1998a, b), Granger (1998a, b), Altenberg and Granger (2001) and Nesselhauf (2003, 2004, 2005). Then there are many examples, for instance Sinclair’s (1991) many studies and those by later scholars such as Biber et al. (1999) and Biber (2009), where both spoken and written data have been investigated.
Over the years, these studies of various manifestations of formulaic language have given rise to many different names and definitions for these combinations. In fact, Wray (2002:9) found more than 50 terms to describe these chunks of language. These include prefabricated patterns (Hakuta 1974); chunks (Peters 1983, cited in Shirato and Stapleton 2007:395); lexical phrases (Nattinger and De Carrico 1992); recurrent sequences (De Cock 1998; De Cock and Granger 2004); prefabricated language or ‘prefabs’ (Granger 1998b); recurrent word-combinations (Altenberg 1998); lexical bundles (Biber et al. 1999); multiword units (MWUs) (Schmitt 2000); formulaic sequences (Schmitt and Carter 2004; Wray 2000, 2002); as well as idioms, collocations, formulas, formulaic speech, prefabricated routines, and ready-made utterances.

Chapter 1 Introduction 
1.1 Introduction
1.2 Contextualisation of the study
1.3 Focus of the study
1.4 Rationale for the study
1.5 Research aims and research questions
1.6 Methodology
1.7 Structure of the thesis
Chapter 2 Corpora and Corpus Linguistics 
2.1 Introduction
2.3 Corpora for specific purposes
2.4 What is corpus linguistics?
2.5 Some corpus research relevant to student writing
2.6 Conclusion
Chapter 3 Vocabulary and vocabulary studies 
3.1 Introduction
3.2 Establishing the size of students’ vocabulary
3.3 Depth of vocabulary knowledge
3.4 Word lists and academic vocabulary
3.5 The relationship between vocabulary and academic performance
3.6 Conclusion
Chapter 4 Formulaic language and multiword units 
4.1 Introduction
4.2 Formulaic language
4.3 Collocation
4.4 Delexical MWUs
4.5 Conclusion
Chapter 5 Methodology 
5.1 Introduction
5.2 Research questions and phases
5.3 Research design
5.4 Methodological rigour
5.5 Pilot study
5.6 Main study
5.7 Phase 1: Vocabulary size and academic performance
5.8 Phase 2: Comparison of student and expert writers’ use of selected verbs and MWUs within and across academic genre
5.9 Phase 3: The relationships between students’ vocabulary size, production of MWUs and academic performance
5.10 Justification of techniques
5.11 Ethical considerations
5.12 Conclusion
Chapter 6 Analysis and Discussion of Findings 
6.1 Introduction
6.2 Phase 1 Size of productive vocabulary (RQ1) and its relationship to academic performance (RQ2)
6.3 Phase 2 Corpus analysis: Functional distribution of selected verbs (RQ3) and their delexical use in MWUs (RQ4
6.4 Phase 3 The relationships between vocabulary size, vocabulary depth and academic performance
6.5 Conclusion
Chapter 7 Conclusion 
7.1 Introduction
7.2 Review
7.3 Contributions of the study
7.4 Pedagogical implications
7.5 Recommendations
7.6 Limitations of the study and suggestions for further research
7.7 Conclusion
Lexical Levels and Formulaic Language: An Exploration of Undergraduate Students’ Vocabulary and Written Production of Delexical Multiword Units

Related Posts