The respiratory system

Get Complete Project Material File(s) Now! »

The respiratory system

According to Clark, Yallop and Fletcher (2007), most of the respiratory system is contained within the thorax, apart from the upper airways in the subglottal vocal tract. The thoracic cage is made up of 12 ribs which muscle and connective tissue attaches posteriorly to the vertebral columns and anteriorly to the breast-bone (sternum). On the vertical axis, the thoracic cage is limited by the shoulder blades (scapulae) on the posterior side and the collar bones (clavicles) on the anterior side, and by the diaphragm as its base (Clark et al., 2007:168).
Clark et al. (2007) further explain that located inside the thoracic cage are lungs, spungy cone-shaped organs connected to the windpipe (trachea) by two bronchial tubes. Both lungs consist of smaller tubes (bronchioles) ending with tiny air sacs (alveoli). The lungs are connected to the thoracic cage by the pleural linkage, thus forming a single mechanical unit capable of changing the air volume in lungs when the thoracic cage volumes change during the respiratory cycle. Besides performing the vital function of replenishing oxygen and removing carbon dioxide from blood, the lungs provide most of the airflow reservoir necessary for speech production (Clark et al., 2007:168–169).
Clark et al. (2007) describe how the volume of the thoracic cavity is enlarged in two ways during inhalation: firstly, the rib cage is lifted upwards and outwards, and secondly, the floor of the cavity is lowered. The exact balance between the two movements depends on posture, individual habit and respiratory demands. In general, the external intercostal muscles situated between the ribs are responsible for the control of the rib cage dynamics during inhalation. When they contract, the distance between each rib is shortened, raising the rib cage structure, and increasing the thoracic cavity volume. When the diaphragm contracts, it lowers the floor of the thoracic cavity. This action is responsible for inspiratory thoracic cavity changes during quiet breathing. In running speech, the diaphragm has control over increasing the volume of the thoracic cavity for inhalations. Any enlargement of the thoracic cavity results in an increase in lung volume (Clark et al., 2007:169–170).
Reducing thoracic volume, and consequently lung volume, increases the internal air pressure and results in air flowing out of the lungs in order to equalise the internal and external air pressures. The elastic recoil forces set up by the expansion and movement of muscles during inhalation are enough to achieve the necessary lung volume reduction towards relaxation pressure (Clark et al., 2007:170–171). Hixon (1987) proposes that these forces governing both expansion and reduction can be thought of as a spring-like force: if stretched and then released, the spring will rapidly recoil back to its original position (Hixon, 1987:31–33). At lung volumes above the resting level, this relaxation is the result of a passive exhalation generated by positive (i.e. above atmospheric) pressure towards resting level. When lung volumes are below resting level, this process is reversed, as lungs inflate from residual volume to the resting expiratory level due to the increasing magnitude of subatmospheric pressure (Hixon, 1987:25).

Lung volume and capacity

According to Hixon (2006), volume is one of the key variables controlling respiraton. The air displaced by the respiratory apparatus is called lung volume and it corresponds to the change of volume of the thoracic cage. There are four lung volumes, each exclusive of the other, and four lung capacities which are the sum of two or more lung volumes (for more detail see Fig. 1) (Hixon, 2006:45).
The tidal volume (TV) is the volume of air inhaled or exhaled during breathing, measured while resting (Hixon, 2006:45). The resting point of the rib cage and lungs is just at the expiratory-end level of the TV and is referred to as the resting expiratory level (REL) (Cleveland, 1998:47). Hixon (2006) reports that the minimum and maximum volumes of air that can be inhaled or exhaled from the tidal levels are the inspiratory reserve volume (IRV) and expiratory reserve volume (ERV), respectively. At the end of a maximum exhalation, the volume of air left in the pulmonary apparatus is called the residual volume (RV) (Hixon, 2006:45–46).
The four lung capacities, as described by Hixon (2006), are as follows. Vital capacity (VC) is the maximum volume of air that can be exhaled after a maximum inhalation. Inspiratory capacity (IC) is the maximum volume of air inhalable from the resting level, functional residual capacity (FRC) denotes the volume of air in the pulmonary apparatus at the resting tidal end-expiratory level, and total lung capacity (TLC) refers to the maximum amount of air in the pulmonary apparatus achievable after a maximum inhalation (Hixon, 2006:46–47).
Although lung capacities vary greatly depending on age, gender, posture and body type, a typical capacity for an adult male is within 5–7 litres of air. In that case, the vital capacity ranges from 3.5 to 5 litres. During quiet breathing, the amplitude of exhalable and inhalable air is around 0.5 litres, and it normally makes up about 10%–15% of vital capacity (Clark et al., 2007:173).

Speech breathing

Speech breathing commonly refers to the special manner of using the respiratory mechanisms to produce airflow for phonation. According to Euler (1982), speech production usually demands more effort than quiet breathing, and the system is optimised to provide the required airflow. During speech, the rate and volume of inhalation and rate of exhalations are mostly governed by the speech controlling system. For example, this system takes into account requirements for phrasing, loudness and articulations. An important aspect distinguishing automatic or metabolic breathing from voluntary and controlled speech breathing is the brain structure responsible for these mechanisms (see e.g. McKay, Evans, Frackowiak & Corfield, 2003). The first is controlled primarily by the bulbopontine centers in the brainstem, whereas the second also involves cortical structures (Euler, 1982:95–97). The significance of this difference is that the cerebral cortex and other forehead structures control the respiratory system on a higher organisatory level. As other speech functions are also controlled by the cerebral and cerebellar regions of the brain (see e.g. Blank, Scott, Murphy, Warbuton & Wise, 2002), this connects the organisation of speech breathing to other aspects of speech production. Metabolic breathing, on the other hand, is part of the optimal gas-exchange system for life purposes (Euler, 1982:97).
Both the rib cage and abdomen can be used to displace air during speech. Some speakers exhibit the stronger use of rib cage over abdominal contributions, and some speakers show a relatively equal contribution from both the rib cage and abdomen (Hixon, 1982:82). In general, the type of articulation involved, overall vocal effort and the habits of the individual speaker have an effect on the aerodynamic demands of speech on the respiratory system (Clark et al., 2007:172–173).
According to Hixon (1987), speech breathing demands the necessary amount of alveolar pressure to ensure the steady production of utterances. Alveolar pressure is constant during both sustained utterances and conversational speech, but depends on several variables. For example, muscular pressure and relaxation pressure need to be balanced for alveolar pressure to stay constant. More specifically, at high lung volumes, a net inspiratory force is added to the relaxation pressure, but the magnitude of this force decreases as the amount of air in the lungs decreases during speech. At around half of the VC, the net force value is zero. Accordingly, when the level of air in the lungs falls below that, a positive muscular pressure needs to applied increasingly while the lung volume steadily decreases, but pressure needs to be maintained (Hixon, 1987:27–35; 46–48).
Although speech breathing demands more effort than normal quiet breathing, Clark et al. (2007) observe that both operate in the lower midrange of vital capacity and the minimum respiratory volumes at the end of the exhalation phase tend to be around 30%–40% of VC. However, the tidal peak after inhaling can range from 45% of VC in quiet breathing to 80% of VC in loud speech (Clark et al., 2007:173). According to Hixon (1982), conversational speech is normally encompassed around approximately 40%–60% of VC, while most utterances begin from around twice the resting tidal volume and end just above FRC (Hixon, 1982:82). Hixon, Goldman and Mead (1973) have also compared read speech and conversations and found that regardless of the condition, in most utterances, speech was initiated at 50%–60% VC and terminated at approximately 30%-50% of VC in the upright position (Hixon et al., 1973:93), while a typical speech breathing exhalation phase had the amplitude of approximately 10%–20% of VC, in some cases reaching 30% of VC (Hixon et al., 1973:95). Occasionally, speech ends even lower, in the expiratory reserve level, because speakers aim to finish utterances without inspiratory interruption (Hixon, 1987:44).
The differences between the quiet and speech breathing cycles have been described by Hixon (1987), as follows. While the quiet breathing cycle has been claimed to repeat 12 or more cycles per minute with exhalations lasting slightly longer than inhalations, the frequency of inhalations and exhalations in speech breathing is lower. The relative durations of the phases change because inhalations become considerably shorter than exhalations to minimise interruptions to the speech flow. Exhalations, on the other hand, become much longer due to higher resistances in the upper airway that prevent air from quickly flowing out. Therefore, the patterns of quiet breathing and speech breathing are very different: quiet breathing encompasses relatively equal phases of inhalation and exhalation in terms of duration, amplitude and velocity, whereas speech breathing is characterised by short inhalations and long exhalations (see Fig. 2 for example). As Hixon points out, the hallmark of the volume changes of conversational speech is in fact the irregularity of the breathing cycle (Hixon, 1987:45–46).

Evidence of speech planning

It is generally agreed that language production is incremental (e.g. Kempen & Hoenkamp, 1987:203–204) – processing one level of information triggers activity on the next level of the production system. In a simplified manner, it is the piece-by-piece process that guides an idea or thought all the way to articulation and thus results in language and speech production, corresponding to specific communicative demands (Ferreira & Swets, 2002:57). However, it is not completely clear how incremental language production exactly is. Some (e.g. Wheeldon & Lahiri, 1997; Levelt, 1989) suggest it is radically incremental in that speakers start articulating when they know the first word of their utterance and during that, the planning of the next phonological word takes place. Others have found evidence that language planning can be more flexible and speakers tend to look for balance between planning and initiating speech quickly, indicating that we are capable of planning larger portions of the utterance beyond the immediate phonological word (Ferreira & Swets, 2002:76–80). That can be exemplified by the use of common expressions, collocations or idioms that tend to form single fixed units in the speakers’ processing memory as opposed to being simply strings of words (see e.g. Wray & Perkins, 2000).
Much of the system for speech planning and organisation has been studied by comparing the proportions and timing of speech and pausing. Pauses in speech have been claimed to be controlled by different variables, such as the rate of speaking, syntactic strength of boundaries, emphatic stress, sentence length, etc. According to Cruttenden (1986), pauses occur at either major constituent boundaries, before words of high lexical content, or after the first word in an intonation group, while the last two types are interpreted as hesitation pauses (Cruttenden, 1986:30–31). Butterworth (1975) studied how strongly pause locations correspond to phonemic clause boundaries and suggests that speakers plan ahead in terms of clauses and sentences but also have the ability to plan superordinate units consisting of multiple clauses and sentences that form a kind of semantic unit (Butterworth, 1975:84). Krivokapić (2010) reports, based on a study on prosodic phrase length effects on pause durations in read English, that speech is planned quite far ahead beyond the extent of the first phrase but the exact extent depends on the particular speaker (Krivokapić, 2010:3). In addition, Butterworth has pointed out that some pauses could have the communicative function of helping the listener segment the speech (Butterworth, 1975:84). Therefore, pauses not only have the purpose of providing time for cognitive processing to formulate speech, but help guide the listener’s interpretation as well.
Respiratory patterns occurring during read speech have been investigated extensively and a number of respiratory variables connected to speech planning have been looked at closely. Analysis on reading texts has concluded that almost 100% of inhalations occur at syntactic boundaries marked with punctuation or conjunctions (Rochet-Capellan & Fuchs, 2013:1128). More precisely, it has been determined that in read speech, speakers always inhale between paragraphs, very likely inhale between sentences and sometimes also in complex sentences before a comma or connectors (Conrad et al., 1983:224). Remarkably, even reading tasks where speakers are asked to produce only silent, inner speech result in a speech-like respiratory pattern, indicating that breathing is controlled by the cortical structures even when speech is not actually articulated (Conrad & Schönle, 1979:266). Fuchs et al. (2013) have experimented with read German, demonstrating a series of respiratory patterns connected to the syntactic content of the text. For example, they conclude that longer and deeper pauses lead to longer sentences but, at the same time, syntactic complexity does not indicate the same necessity and instead causes more frequent inhalations as compensation (Fuchs et al., 2013:38). Whalen and Kinsella-Shaw (1997) observe a similar effect of utterance length (in terms of durational and syllabic length) on inhalation duration, regardless of whether inhalations were measured acoustically or physiologically (Whalen & Kinsella-Shaw, 1997:145–146). Grosjean and Collins (1979) also investigated the syntactic nature of breathing pauses and set speech rate as a variable. They discovered that at slow and normal rates, speakers prefer to inhale at major constituent breaks, but when the rate is increased, there are fewer breathing pauses and they occur whenever speakers have the need for air. This is caused by the speakers’ wish to minimise the amount of pauses and maximise the speech rate by inhaling very quickly and only when absolutely necessary (Grosjean & Collins, 1979:104–107). Hixon et al. (1973) report similar behaviour from an experiment where speakers were asked to read long sentences at lower lung volumes. They conclude that in those cases, the mechanical aspects of breathing become more important than speech phrasing. According to this study, low levels of air forced the speakers to inhale at unconventional locations in the reading passage to attend to the system’s demands in order to continue the utterance (Hixon et al., 1973:107).
By contrast, spontaneous speech is less predictable than reading – speakers do not have a prepared text to check before producing utterances. Differences in respiratory patterns between read and spontaneous speech have been explored by, for example, Winkworth et al. (1995), who investigated whether the associations known to exist between linguistic factors and lung volumes in read speech also hold in spontaneous speech. They conclude that the location of inhalations mostly (72%) follows clause structure and that longer breath groups have a higher initiation lung volume than short breath groups (Winkworth et al., 1995:132–139). The term ”breath group” (Lieberman, 1967:2) itself was introduced to denote the boundaries of a prosodic pattern of simple declarative sentences in normal speech, mainly defined by the fact that they are uttered on a single exhalation. Findings on inhalation locations indicate that speech is mostly structured into breath groups taking into account not only respiratory demands, but also grammatical structure, and, as such, breath groups tend to consist of relatively complete clauses, phrases or sentences. Indeed, grammatically inappropriate inhalations have been found to occur around 2% for reading and 13% for spontaneous speech in English (Wang et al., 2010:300). However, Winkworth et al. (1995) claim that there is large individual variation to these numbers and that in spontaneous speech, breath groups rather reflect units of meaning (Winkworth et al., 1995:139–140). Rochet-Capellan and Fuchs (2013) investigated spontaneous German in terms of how inhalation depth and duration were connected to syntactic contents of breath groups. Their results show that both the amplitude and duration of inhalation depend on the length of the following breath group, but also whether it started with a matrix clause or some other clause type. If the breath group started with a matrix clause, the preceding inhalation was deeper. In addition, inhalations were found to be deeper when the breath groups contained at least one hesitation (Rochet-Capellan & Fuchs, 2013:1130–1131).

READ  Atmospheric overturning during the MJO 103

Turn-organisation and respiratory cues

Spontaneous speech usually occurs in the form of a conversation between two or more interlocutors. Conversation, as defined by Jaffe and Feldstein (1970), is a sequence of sounds and silences generated by two or more interacting speakers (Jaffe & Feldstein, 1970:19). A key feature of conversational rhythm is its oscillating pattern – one speaker at a time has the speaking turn and simultaneous speech is generally avoided (Jaffe & Feldstein, 1970:3). Therefore, conversational exchange requires precisely coordinated collaboration in the form of turn-taking movements between the partners – one speaker, who holds the floor, while the other(s) are listener(s) (McFarland, 2001:128). Most studies on spontaneous speech involve two speakers, who exchange turns. Although an undeniably useful source of data for studying mechanisms of turn exchange, the mechanisms used to organise turn-taking become more complicated in multiparty conversations. There are a number of intricate strategies for claiming the conversational floor, even though speakers might not be aware of using these devices. Some strategies are used in order to claim the floor, others to keep the floor by avoiding interruptions, and a third type of devices to hand over the speaking turn to another participant. The signals guiding conversational interpretation are said to be empirically detectable as interactional intentions need to be clearly identifiable during conversations (Gumperz, 1982:159). Listeners are known to turn their attention to stimuli which seem relevant for processing, and, as such, these must be communicated as relevant by the speaker (Wharton, 2009:40). Turn-taking events such as a speaker switch, where one person loses the possession of the floor and another gains it (Jaffe & Feldstein, 1970:19), can be achieved by using certain turn organisatory cues or a combination of them.
The turns speakers take are usually defined as turn constructional units – various unit-types like sentential, clausal or lexical constructions speakers use to construct a turn (Sacks, Schegloff and Jefferson, 1974:702). Feldstein (1973), however, has used the term utterance when determining turn-taking events. According to him, an utterance is made up of sequences of pauses and vocalisations of one speaker that are bounded by switching pauses, where a speaker switch occurs, or vocalisations by other speakers at both ends (Feldstein, 1973:95). The borders of turn-constructional units or utterances can be intensified with the help of a number of prosodic markers, for example, intonation, stress, intensity, voice quality, and the rhythm of phrasing, or pausing and speaking. The extent and manner of use for each feature depends on what the speaker wishes to convey and is affected by the incremental nature of speech: the prosodic content of an utterance is created continuously by moment-to-moment decisions about if and how to continue (Couper-Kuhlen & Selting, 1996:29–30). For example, if the final accented syllable of an utterance in German is said on a mid-level pitch, it is perceived as incomplete, whereas a lowering pitch at the same location would demonstrate the ending of an utterance (Selting, 1995:206). This is connected to declination – a phenomenon whereby pitch lowers during an intonation-group due to a decline in transglottal pressure caused by using up the air in the lungs (Cruttenden, 1986:162). As such, falling intonation usually signals the end of a sentence or utterance in many of the world’s languages. Another connection here can be made to voice quality: modal voice at the end of an utterance tends to signal incompleteness while irregular phonation or creakiness has been reported to be a phrase-end or turn-end marker due to very low fundamental frequency accompanying it (Slifka, 2007:232). For example, creaky voice as a turn-ending marker is used in Finnish (Ogden, 2001:139–140), Swedish (Carlson, Hirschberg & Swerts, 2005:330), English Received Pronunciation (Laver, 1994:196–197), American English when combined with yeah (Grivičić & Nilep, 2004:8), but curiously not so clearly in Estonian (Aare, Lippus & Šimko, 2014:34). When a speaker’s voice becomes creaky, it can be therefore interpreted as a signal that they have exhausted the air in their lungs and need to inhale soon – providing a convenient location for taking over the conversational floor.
Cues important for the organisation of conversational dynamics can also be inferred from visual signals. The devices people use include gestures, eye-gaze and facial expressions, all of which contribute to the interpretation of speaker and/or listener intentions in conversation. The loss of visual-gestural cues, as happens in phone conversations, has been reported to alter the temporal patterns of interaction: pause durations and stretches of simultaneous speech become shorter (Jaffe & Feldstein, 1970:42). The purpose of spontaneous movements that accompany speech is claimed to constrain the inferential process by triggering a variety of emotion or attitudinal concepts, and altering the salience of linguistically possible alternatives (Wharton, 2009:130; 140–141). According to Bavelas et al. (1995), such interactive gestures can coordinate speaking turns. In fact, they propose that speakers can gesturally take the turn, give away the turn or indicate the floor is free for taking. They also suggest that the words and gestures in spontaneous dialogues are not separate channels but function as a whole (Bavelas, Chovil, Coates & Roe, 1995:397–398).
Respiratory activity during speech can be both visible and audible. Schegloff (1996) has suggested that an audible inhalation functions as a pre-beginning element in turn-taking and projects the onset of talk (Schegloff, 1996:92–93). It is also known that breath holding can function as a marker of turn incompleteness and exhaling can be a turn-yielding device (French & Local, 1983:33–35; Edlund, Heldner & Włodarczak, 2014:36). Before initiating speech, inhalations can be produced with a strong frication or by inhaling in a way that extensively stretches the rib cage to show the intention of speaking with body language. For example, pre-speech inhalatory noise has been found to be audible before short sentences, but single words are usually preceded by silence (Scobbie, Schaeffler & Mennen, 2011:1784). In addition, research has shown that breathing adapts to dialogue turns and there is some evidence for inter-personal coordination of breathing in turn-taking at a global level (Rochet-Capellan & Fuchs, 2014:5). Rochet-Capellan and Fuchs (2014) have also looked more closely at how breathing cycles might adapt to dialogue events. Their analysis shows that in order to hold a turn, speakers reduced inhalation durations compared to those coinciding with speaker change and thus preserved their turn. They also explain that breathing profiles are different depending on whether speakers are trying to claim the turn or if they are holding the turn: in general, respiratory cycles in turn-taking were longer than in turn-holding and therefore, the breathing pattern of turn-holding was more asymmetrical than for turn-taking. Furthermore, their data on spontaneous German speech demonstrated that turn-taking was more successful after a new inhalation, indicating that speakers coordinate their breathing with turn-taking. (Rochet-Capellan & Fuchs, 2014:12–13). McFarland (2001) investigated the possible influence of turn-taking on respiratory kinematics by comparing the mean inhalatory and exhalatory durations for three breathing cycles directly before and after the onset of speech in scripted dialogues. His results did not reveal significant influence of upcoming speech to the inhalation duration. However, after the onset of speech, the first inhalation was significantly longer than the following two, which in turn were comparable in duration (McFarland, 2001:136).
By contrast, some contributions to spontaneous conversations do not need to be planned. An example of this is the occurrence of backchannels (Yngve, 1970:568), which are short unplanned listener responses indicating that the listener is understanding and following the speaker (e.g. Heldner, Hjalmarsson & Edlund, 2013:137). Due to the relative unpredictability of backchannels and laughter, they are generally regarded as non-interruptive and not considered as attempts to claim the conversational floor (Heldner et al., 2013:137). As research has shown, vocalised backchannels tend to occur around speaker’s exhalation offset in the listener’s respiratory cycle, and are often located near the onset of listener’s inhalation phase. Occasionally, backchannels also occur in the inhalation phase of the listener and could, in theory, be located almost everywhere in the listener’s breathing cycle (Aare, Włodarczak & Heldner, 2014:50–51). Recent research has also provided evidence that backchannels and other very short utterances, such as short answers to questions (Torreira, Bögels & Levinson, 2015:5) can be produced on residual air. After taking into account the respiratory needs for the upcoming utterance, speakers can choose not to inhale if they already have enough air in their lungs to be able to produce the entire utterance (Włodarczak & Heldner, 2015).
All of these markers are combined with the syntactic content of utterances and help determine the turn-taking intentions of participants. From a practical point of view, the results from studies on speech respiration provide information for applications like human information processing in human-computer interactions. For example, Ishii et al. (2014) have investigated how to predict the next speaker in a multiparty conversation based on the participants’ respiratory patterns. They observed that the person who wants to hold the floor inhales more quickly and with a larger amplitude than the subsequent listeners, and that the new speaker takes a bigger breath than listeners in a turn-changing event (Ishii et al., 2014:23). Similarly, it is known that in a question-answer situation, short replies are mostly produced on residual breath whereas longer responses are preceded by an inhalation (Torreira et al., 2015:7) which can be audible and could help narrow down the possible sequential alternatives in conversations..

Table of contents :

1 Introduction
2 Background
2.1 The respiratory system
2.1.1 Lung volume and capacity
2.2 Speech breathing
2.3 Evidence of speech planning
2.3.1 Turn-organisation and respiratory cues
3 Aims and research questions
4 Method and data
4.1 Speakers
4.2 Experiment procedure
4.3 Annotation
4.4 Ethical considerations
5 Results
5.1 Data distribution
5.2 Lung volume levels in inhalations
5.2.1 Inhalation start level
5.2.2 Inhalation end level
5.2.3 Inhalation amplitude
6 Discussion
6.1 Discussion of results
6.1.1 Some theoretical implications
6.2 Discussion of methodology
6.2.1 Notes on experiment set-up
6.2.2 Notes on data analysis
6.3 Future research possibilities
7 Conclusions
References
Register
Figures
Tables
Appendix: Consent form

GET THE COMPLETE PROJECT

Related Posts