Environmental sounds (ES)
Environmental sounds (ES) are usually defined as sounds generated by real events such as bell ringing, water splashing, baby crying, cow mooing, cat hissing and many more which refer to a very small set of referents unlike the speech sounds that are arbitrarily assigned to their referents (Cummings, Čeponienėa, Koyama, Saygin, Townsend & Dick, 2006, p.93). The comprehension of ES varies individually, but they use the same cognitive mechanisms and/or resources as auditory language comprehension (Saygin, 2003, p.929).
In a study conducted by Van Petten & Rheinfelder (1995), the participants were presented with a number of ES that were followed either by spoken words that were either related or unrelated to those sounds, or non-words. According to the authors, their study was the first to investigate the processing of conceptual relationship between words and ES and demonstrated a context effect on words by the ES (p.485, 489). In the first part of their study, they used animal sounds, non-speech human sounds such as coughing, musical instruments i.e. piano, and ‘events’ as they called it (e.g., glass breaking, stream flowing) as their ES stimuli. So, an ES i.e. ‘meow’ would either followed by a related word ‘cat’, or by an unrelated word or a non-word. Following the first part, a second experiment was conducted with a non-invasive technique to investigate brain responses from both words and sounds in related and unrelated pairs, which were the same stimuli that were used in the first part of the study (p.488). Participants were divided into two groups. The first group was presented with a sound, word and word fragment such as ‘meow-cat-ca’ as stimuli and the second group was presented a word, sound and sound fragment stimuli. The fragmented stimuli parts were taken from the beginnings, ends or the middle of each stimulus which occurred both as a match and a mismatch (p.491). Their results show faster decision times for words when followed by a related ES than unrelated ES (p.489). The authors claim that the conceptual relationships between spoken words and ES that are related in meaning influence the processing of both words and sounds (p.504).
Cummings et al. (2006) conducted a similar study to investigate the processing of ES with nouns and verbs in an audio-visual, cross-modal sound–picture match/mismatch paradigm (p.94). First, they presented participants a visual stimulus, a picture on a screen and then whilst looking at the picture participants heard an auditory stimulus either matching the picture or not. After the sound offset, participants were asked to press a button to indicate whether the picture and the sound match or mismatch (p.103). They found the sound type and word-class interaction, in which the response times for noun stimuli were faster for matched auditory words and the ES. Furthermore, Cummings et al.’s (2006) data displayed earlier effects in brain responses in recognition of ES than words. As a conclusion, they claimed that the ES directly activate related semantic representations and their processing may be faster due to their internal acoustic variability (e.g., bass & treble ratio, strength) compared to the processing of lexical stimuli (p.100).
Languages of interest
In the present study, the investigated languages are English1 and Turkish, and the investigation is limited only to the features that are relevant for this study. The presented lexical stimuli are interlingual homophones ‘car’ /ka:r/ from English and ‘kar’ /kar/ from Turkish. According to the lexical database of English WordNet (2010), the concrete noun ‘car’ is a vehicle; usually propelled by an internal combustion engine. The word ‘kar’ in Turkish is also a concrete noun meaning “snow”.
According to the World Atlas of Language Structures (WALS), West Germanic language of the Indo-European language family, English has 24 consonants in its inventory when excluding the sounds that are used only in borrowed words. Among the consonants, the liquid /r/ is argued to have the most phonetic variants among the other consonants of English (McCully, 2009, p.44). Some of its variants are, uvular, alveolar trill, tap, post alveolar and it is also considered as being voiced post-alveolar approximant. Moreover, it is the only retroflex in American English (Yavas, 2011, p.7). The Turkic language family member Turkish is a typologically distinct language from English and has 29 letters in its alphabet. According to the University of California Los Angeles Phonological Segment Inventory Database (UPSID) (2019), the total amount of segments in Turkish are 33.
Being a phonemic language, Turkish has correspondence between letters and sounds higher than English, however, their pronunciation may vary depending on the vowel that precedes or follows them (Göksel & Kerslake, 2011, p.2). One relevant feature to the present study is plosives. For example, the voiceless velar plosive /k/ is aspirated as onset, in initial position of a word when followed by a vowel. Similar aspiration pattern is also common in American English (Yavas, 2011, p.58). Furthermore, the articulation of /k/ varies depending on the vowel which it precedes, so if it is followed by a back vowel as in the word [kɑɾ] “snow”, the articulation is affected, and the tongue moves backward because of the back vowel /ɑ/ (Göksel & Kerslake, 2011, p.3). According to Yavas (2011), American English has a similar feature for velars (p.62).
One of the most problematic sounds is the consonant ‘r’. Göksel & Kerslake (2011) describes the production of the Turkish /r/ as: “[b]y touching the tip of the tongue on the medial part of the palate” (p.2) and Yavas (2011) argues that there is a significant difference can be found in liquids such as the alveolar tap /ɾ/ in final position. According to him, even though the distributions are different they create mismatches (Yavas, 2011, p.191).
Vowel sounds are classified according to their height, backness and lip-rounding (Maddiesson, 1984, p.123). Thus, the tongue is primary as well as the lips’ and the jaw moving are necessary for their articulation. The vowel in the English and Turkish words are unrounded back vowel [ɑ]. In the presented stimuli, the Turkish word ‘kar’ was sounded as [ɐ], however, this does not change the meaning of the word.
Event-Related Potentials (ERPs)
The non-invasive neurophysiological method to record electrical activity from the scalp is known as electroencephalography (EEG). The EEG recordings show neuronal activity related to language processing in a millisecond resolution. However to distinguish the investigated processes from the other ongoing activities in the brain that are unrelated to the investigated phenomena is difficult (Luck, 2014, p.4). Event-related potentials, on the other hand, make it possible to distinguish the sensory, cognitive and motor responses from the overall collected EEG data. ERPs are electrical potentials that are time-locked to specific events as Luck (2014) describes them (p.4). They originate mostly as post-synaptic potentials (PSP) which occur within a single neuron (Luck, 2014, p.12 -3). These PSPs create a flow of tiny electrical current, a dipole, and ERPs can only be measured from the scalp when these single neurons sum up and travel together to the same destination as a response to a specific event (Luck, 2014, p.13).
Some of the ERP components are known as language related such as N400, which is sensitive to semantic violations (Kutas & Federmeier, 2000, p.463), and P600, which is sensitive to syntactic violations such as gender agreement (Urbach & Kutas, 2018, p.14). In the present study, the focus will be on the Mismatch Negativity (MMN) component of ERPs. Previous studies show evidence for MMN being sensitive to all linguistic processes (Shtyrov & Pulvermüller, 2007, p.179; Zora et al., 2015; Zora et al., 2016a; Zora et al., 2016b).
The Mismatch Negativity (MMN)
ERP investigations focusing on the brain mechanisms involved in auditory processing is widely used in cognitive and psycholinguistic studies. One of the robust ERP components in auditory processing is the so-called Mismatch Negativity (MMN). This widely accepted event-related potential, which is elicited by any discriminable auditory-change in stimulation, is also a well-known index of long-term memory traces in human brain (Shtyrov & Hauk, 2004, p.1089). MMN is mostly investigated with the so-called oddball paradigm, which was first introduced by (Squires, Squires & Hillyard,1975). It is an experimental design, in which a frequently repeated stimulus ‘standard’ is interrupted with a rarely repeated stimulus. These rare stimuli are called deviants/oddballs. The standard stimulus forms a representation, a memory trace for the heard sound for a few seconds, and this short-term memory trace is violated with the presentation of deviant stimulus resulting in a MMN response (Näätänen et al., 2007, p.2545-6, 2548). MMN is acquired by the calculation of the difference between the two, that is by subtracting the standards from the deviants. The deviations can be in the basic acoustic features such as pitch, intensity, duration and frequency with respect to the frequently presented stimuli (Shtyrov & Pulvermüller, 2007, p.176).
MMN has a frontocentral scalp distribution (Näätänen et al., 2007, p.2549). It usually peaks at 150-250 ms from change onset that is sensitive to the magnitude of the change in the stimulus which affects the latency of the peak, resulting in an earlier threshold as early as 100 ms (Näätänen et al., 2007, p.2545).
The pre-attentive feature of MMN makes it widely applicable since there is no requirement of a task for it to be elicited. That is, MMN is elicited regardless of participant’s attention and considered as being an automatic component (Shtyrov & Pulvermüller, 2007, p.178). In fact, it has been argued that its amplitude gets larger with the attention directed away from the intended stimuli. To draw the attention away from the intended stimuli, simultaneous tasks are usually applied (e.g., watching a silent movie, reading a book) (Näätänen, 1984, p.286; Tamminen, Peltola, Kujala, & Näätänen, 2015, p.23).
Previous MMN studies
MMN is sensitive to all linguistic processes (Shtyrov & Pulvermüller, 2007, p.179; Zora et al., 2015; Zora et al., 2016a; Zora et al., 2016b). In their study, conducted with Finnish and Estonian native speaker groups, Näätänen et al. (1997) showed evidence for native-language memory traces. Their interpretation was the result of enhanced MMN responses to the language specific phoneme representations (Näätänen et al., 1997, p.433). In previous studies, comparing pseudowords to meaningful words, a word-related MMN enhancement was found (Shtyrov & Pulvermüller, 2002, p.525). This enhancement is argued to be the activation of memory traces of strongly populated neurons for the meaningful words (Shtyrov & Hauk, 2004, p.1085).
MMN’s sensitivity is not specific to just one’s native-language. Perception and thus discrimination of auditory input has also been investigated among L2 listeners. After a couple of days training, native speakers of Finnish showed evolved memory traces which was indexed by behavioral studies as well as the MMN (Tamminen et al., 2015, p.23). In an MMN study conducted with Finnish monolinguals and Swedish-Finnish bilinguals, Tamminen, Peltola, Toivonen, Kujala & Näätänen (2013) compared phonological processing between monolinguals and bilinguals. Their results showed smaller amplitudes and longer MMN latencies in bilinguals than in monolinguals, which is argued to be the result of “[e]xtensive intertwined phonological system where both languages are active all the time” (p.12).
MMN has been used in a magnetoencephalographic (MEG) study to investigate early processing for semantic and syntax conducted by Menning, Zwitserlood, Schöning, Hihn, Bölte, Dobel, Mathiak & Lütkenhöner (2005). The study was divided into two parts. In the first part a phonemic contrast was presented in a sentence ‘You have just heard lawn/giants/roses’ which is originally /ra:sen/, /ri:sen/ and /ro:sen/ in German (p.78). In the second part on the other hand the sentence was ‘The woman fertilizes the lawn/giants/roses in May’ in which on the one hand the word giant ‘riesen’ created a semantic error because one cannot fertilize giants, on the other hand the word roses ‘rosen’ created a morphosyntactic error due to German language grammar rules (p.78). The brain responses from the MEG data showed evidence for larger MMN to the semantic error than the syntactic one which interpreted by the authors as first and foremost that semantic and syntactic process were different and that “MMN reflected detection of meaningfulness” (Menning et al., 2005, p.79-80).
In the present study MMN’s sensitivity to meaning has been taken into account to investigate the brain responses by presenting the participants different types of auditory stimuli sharing a semantic relation, thus denoting same and/or similar referent.
Other ERP components of interest
In the previously mentioned ERP study of Cummings et al. (2006) (section 2.4), all of the presented stimuli elicited N1-P2 complex. The difference was that the semantically matching ES and picture stimuli pair elicited an N1-P2 complex followed by a positive wave, and the mismatching pair elicited an N1-P2 complex followed by a negative wave (p.95). The N1 component of this complex is known as an auditory evoked potential that is usually seen with a latency at around 100 ms followed by a positive P2 curve at about 175 ms (Hyde, 1997, p.282). A general view in the literature is N1-P2 amplitude increases with attended stimuli unlike MMN (Hyde, 1997, p.289). The auditory N1 response is underlined by combination of processes (Näätänen & Picton, 1987, p.386) that does not reflect one single event and it differs from the MMN; N1 is sensitive to an individual stimulus in contrast to MMN, which is sensitive to the relations between the present and the preceding stimuli (Näätänen & Picton, 1987, p.389). Unlike the memory dependent ERP component MMN, N1 will be small for the stimuli with an already existing memory trace (Näätänen, 2019, p.42) and if there is no trace there would be no MMN according to Näätänen (2019) and N1 amplitude will be a large one (p.42).
This ERP complex consists of two components one of which is N2a (MMN) and N2b (Näätänen, Simpson & Loveless, 1982, p.87). N2b is the endogenous component which elicits when the participants attend to a stimulus unlike MMN (Näätänen et al., 1982, p.53) The only difference is not the attended condition between N2a (MMN) and N2b. N2b has for instance a latency peaking at around 200-300 ms which is later than the N2a (MMN) component (Näätänen, 1984, p.291) and has a posterior distribution in contrast to MMN which is known to be elicited in the fronto-central regions (Näätänen et al., 2007, p.2254).
Furthermore, N2 has been argued to be an index of inhibition during bilingual language production, but not for the comprehension (Misra, Guo, Bobb, & Kroll, 2012, p.234). However, inhibition in bilingual language processing is recommended to be further investigated (Chen, Bobb, Hoshino, & Marian, 2017, p.52). In their study Misra et al. (2012) investigated whether late but relatively proficient Chinese–English bilinguals inhibit the L1 in order to name pictures in the L2 (p.233). The first group named pictures first in their L1 and then in their L2. The second group did the opposite and first named the pictures in L2 and then in L1 (p.228). Misra et al. (2012) expected a facilitation due to priming effects if no inhibition occurred, otherwise priming was expected to be reduced or eliminated due to an inhibition effect (p.27). Their results displayed two negative activation, one at approximately around 100 ms at fronto-central regions which was followed by a positive peak, and the second at 150 ms again followed by a positive peak around 200 ms. The ERPs also displayed more positivity for L1 that was followed by the L2 condition and more negativity for L2 followed by the L1 condition. They interpreted their results as the priming occurred only for L2 and that the N2 was enhanced when bilinguals switched languages (p.234).
Purpose and research questions
The goal of this study is to investigate the neuronal activity when two semantically related auditory input of ES and lexical stimuli presented in an oddball paradigm by focusing on the negative event-related potential Mismatch Negativity (MMN). The assumption is that the stored language representations and the established connections in the semantic memory for those language representations would lead to distinct brain responses when comparing bilinguals to monolinguals.
Research questions and hypotheses
Question 1: Would an MMN response be elicited in a design where the two stimuli are distinct (i.e., speech sound in contrast to environment sound)?
Hypothesis 1: An MMN response would be elicited in all groups since MMN is a change dependent component and elicited by any discriminable change in the auditory stimulation (Näätänen, 1978). In the present study, the presented auditory stimuli have discriminable acoustic features.
Question 2: How would semantically related ES and word stimuli affect brain responses of monolinguals and bilinguals?
Hypothesis 2: The greatest MMN response is expected to be among the native speakers of English and Turkish for the conditions in which ES and word stimuli match in their semantic representations.
Question 3: Can bilinguals access their L2 when an auditory word stimulus is presented in their L1?
Hypothesis 3: The bilingual participants of this study will use sub-phonemic cues to activate their L2 English.
The participants were native speakers of American English (n=1 female) and late Turkish – English unbalanced bilinguals (n=3 males). Their age ranged between 29 and 48 (M=38,5; SD=7,43). All of the participants signed an informed consent form prior to the experiment and filled out Edinburgh inventory (Oldfield, 1971). They were all right-handed; the laterality index for all of the four participants were 100 % for the 15 items selected and adapted from the Edinburgh Inventory. Participants reported no neurological disorders, or language related deficiencies. The Turkish native speakers also completed an online English vocabulary test LexTale (Lemhöfer & Broersma, 2012) that is designed specifically for cognitive studies to be able match the participants’ proficiency levels within the group. In the present study bilinguals’ test results vary as 80%, 48,75% and 68,75 (SD= 12,9). Participants were recruited via social media and among the available acquaintances that volunteered in the experiment without any compensation.
Experiment contains four different stimuli, two of which are ES ‘engine’ and ‘rain’. The other two stimuli are lexical stimuli: the Turkish word ‘kar’ [kʰɐɾ], which means “snow” and the English word ‘car’ [kʰɑ:ɹ]. ES were selected from a number of copyright free material online which were most perceivable as such in a couple of hundred milliseconds’ duration. The lexical stimuli were recorded by a female native American English speaker in a sound-proof recording studio in the Phonetics Lab of the Department of Linguistics, Stockholm University. REAPER digital audio workstation software was used for the recordings. The Brüel & Kjær 1/2′′ Free-field Microphones (Type 4189) with preamplifiers (Type 2669) were connected to a Brüel & Kjær NEXUS Conditioning Amplifier (Type 2692), which in turn is connected to a Motu 8M audio interface.
The presented lexical stimuli analyzed with the speech sound analysis software Praat (Boersma & Weenik, 2019). Their spectrogram and waveform images can be seen in Figure 1.
Table of contents :
2.2 Interlingual homophones
2.3 Semantic memory
2.4 Environmental sounds (ES)
2.5 Languages of interest
2.6 Event-Related Potentials (ERPs)
2.7 The Mismatch Negativity (MMN)
2.7.1 Previous MMN studies
2.8 Other ERP components of interest
2.8.1 N1 – P2 complex
2.8.2 N200 complex
3 Purpose and research questions
3.1 Research questions and hypotheses
4 Method and material
4.3 Setting up the EEG
4.5 EEG Data Analysis
5.1 ERP data for the L1 English speaker
5.2 ERP data for the Turkish-English bilinguals
5.3 Topographical distribution
6.1 Result Discussion
6.2 Method Discussion
6.3 Further Research