Schools as sites for the playing out of gender

Chapter3:Study 1

This study tested the hypothesis that gender stereotype threat causes performance decrement for adolescent males who sing in secondary school choirs and examined whether strength of salience may alter awareness of stereotype threat.


A stratified, purposefully selected sample of Auckland secondary school choral students (N = 737) resulting in a final number of 16 choirs (comprising 12 groups of male choristers and 12 groups of female choristers), was drawn from schools with a strong choral background. The choirs reflected a mixture of deciles (socioeconomic categories), ethnic and cultural backgrounds, gender mixes, private, independent, religious, state secular, urban and rural schools (see Table 1).


Permission to undertake the study was obtained firstly from the University of Auckland Human Participants Ethics Committee, reference 2007/446, and then the schools and the participants (examples of Participant Information Sheets and Consent Forms are included as Appendices A 1 and A 2). The study took the form of a quasi experiment with a 2 x 2 factorial design. The factors were the genders of the participants (male and female) and performance quality description factors after two performances: these performances were (a) to the whole school (task condition, where stereotype threat was made salient to the males, but not the females) and (b) to an audience of arts peers (fun condition, where stereotype threat was salient for neither group).
Before the performance to the whole school (task condition), the male choristers (the intervention group) were told by their choir directors that the researcher would like to see if boys have a different experience of singing in choirs to females, and that they should sing to their best ability. This comment was designed to trigger the salience of gender stereotype threat. The female choristers (the control group) also performed to the whole school (task condition). They were also told by their choir directors that they should sing to their best ability, but with the confidence that they could enjoy the support traditionally bestowed upon them as young women who sing in the choir. There was no triggering of salient gender stereotype threat in this case, nor was any expected to be present.
In both the task and fun conditions, each choir entered the stage, performed two items, and was awarded a mark for their group’s performance quality. The choir then left the stage, and the individual choristers filled out a self-report questionnaire immediately after each performance, in a quiet room. It should be noted that for three of the choirs (Choirs 15, 16 and 17), the fun condition preceded the task condition, and for all the other choirs, the task condition preceded the fun condition. No effect of time as a confounding variable was suggested in these results. As well, the males in Choir 2 were unable to be marked in the fun condition as they had all left the choir when that performance took place. Thus, data from Choir 2 were not included in any final analyses, although this choir and the incident are discussed as a case study.
Data collection occurred between March and November of 2008 and all statistical analyses were carried out using SPSS v 16.0 (2007).

Marking schedule for performance quality.

Marks of performance quality (MPQ¹ for the task condition, and MPQ² for the fun condition), were awarded for each group of males and females, in each of two performance conditions, by the researcher (Rater 1). An independent rater (Rater 2) marked eight of thirty-four performances, involving Choirs 1, 2, 6, and 10. The raters derived a total mark out of 100, from nine categories on a marking schedule (see Table 2) based on that developed by the New Zealand Choral Federation Inc., Te Kotahitanga Manu Reo O Aotearoa (2007b), to assess secondary school students. The greatest weighting was apportioned to those categories which could be used to indicate technical measures reflecting anxiety: for example, posture, breath management, intonation and accuracy. Where both raters marked a performance, a mean mark was recorded as the final score for that group (see Table 3). Cohen’s kappa was computed to check the reliability of the observer ratings and the kappa of 0.98 indicated that there was a strong agreement between

Self-report questionnaires.

All participants filled out identical self-report questionnaires comprising 32 questions after their performances in both the task condition (SRQ¹), and the fun condition (SRQ²). Responses were made using a five-point Likert scale (i.e., 1=Strongly Agree, 2=Agree, 3=Not sure, 4=Disagree, 5=Strongly Disagree). The content, style and format for the questionnaire were drawn from several sources: the self-report measure used by Steele and Aronson (1995), a version of the Spielberger State Anxiety Inventory (Geen, 1991; Sarason, 1980), the Sarason Test Anxiety Scale (Geen, 1985; Sarason, 1978), and the FRIEDBEN Test Anxiety Scale (the FTA) (Friedman & Bendas-Jacob, 1997). From these, self-measures of vocal competence, self-worth and cognitive interference were derived in order to measure the effect of enhanced immediate situational threat derived from stereotype threat, on performance quality. Participants also evaluated their own performance.
Steele and Aronson (1995) specified processes which could affect performance in the context of stereotype threat: arousal (stimulated activity which reduces the range of cues participants are then able to use [Easterbrook, 1959]); diverting attention to task-irrelevant worries (Sarason, 1972; Wine, 1971) causing an interfering self consciousness (Baumeister, 1984); over-cautiousness (Geen, 1985); and low performance expectations resulting in withdrawal of effort (Bandura, 1977).
Friedman and Bendas-Jacob (1997) identified three subscales in their 23-item scale (the FTA): “(a) Social Derogation (worries about being socially belittled and deprecated by significant others following failure on a test), (b) Cognitive Obstruction (poor concentration, failure to recall, difficulties in effective problem solving, before or during a test), and (c) Tenseness (bodily and emotional discomfort). The FTA was drawn on for the current study as it offered the potential to identify particular worries as anxiety-causing, was specifically tailored to adolescents, and acknowledged that a test situation could be perceived to threaten “students’ social standing in the eyes of significant others” (Friedman & Bendas-Jacob, 1997, p. 1037).
Spielberger and Sarason (1989) considered measures of self-concept and self-awareness had a valid place in anxiety scales. This idea was rooted in the idea that social anxiety (the worry that the need to make a good impression will not be met [Schlenker & Leary, 1982]) is embedded in the process of self-presentation (Geen, 1991; Geen & Gange, 1977). The Sarason Test Anxiety Scale (TAS) (Sarason, 1978) and the Test Anxiety Inventory (TAI) (Spielberger, 1980) included measures relating to self-concept and awareness of the self. Friedman and Bendas-Jacob (1997) also acknowledged these components, linking worry to self-concept. Thus, questions based on such measures were included in the self-report questionnaire for this study.
The researcher expected to find that males would experience more anxiety and greater performance decrement in the task condition than in the fun condition, and would recognise the presence of gender stereotype threat only in the task condition. It was expected that females would experience less anxiety in the task condition than males, show no significant difference in performance quality between either condition, and recognise no presence of gender stereotype threat in either condition.

Marks of performance quality.

A paired sample t-test was conducted to measure differences in observed performance quality (MPQ¹ and MPQ²), for the 16 choirs who completed performances in both conditions. The scores for each group are noted in Table 3. The males were found to have statistically significant lower means in the task condition

Self-report questionnaire data.

After initial cleaning of the data, a maximum-likelihood factor analysis with Oblimin rotation with Kaiser normalisation was used to estimate the factor loadings of items derived from the self-review questionnaire. Every negatively worded item was reverse-scored so that Strongly Agree became 5, Agree became 4, Not Sure remained as 3, Disagree became 2, and Strongly Disagree became 1, as this aided a clearer visual representation of the data. The number of factors was determined by ensuring that there were sufficient items loaded on each factor (at least 5), that all factors could be meaningfully interpreted, and that each factor explained more variance than a single variable could (i.e., eigenvalue > 1).
Three items did not relate to any factor. Two of these items, designed as a manipulation check (“The performance was to see if boys experience singing in choirs differently to girls” and “The performance was to celebrate singing”) did not clearly relate to any of the factors and were not used any further. One remaining item (“I felt proud to be singing”) was considered ambiguous in meaning and therefore deleted from further analyses.
The final factor analysis of the 32 items revealed four clear factors: Performance Quality, Situational Awareness, Technical Aspects and Social Awareness (see Table 4).

A case study of three anomalous choirs.

The performance decrement recorded for the males in the task condition did not appear to be accompanied by the expected raised levels of anxiety pertaining to stereotype threat, for the male participants overall. However, this was not the case for the males in two specific choirs. Although the peer audience had not been included as an explicitly active trigger of salience of stereotype threat, two unplanned incidents offered additional insight into the role of the peer audience in this regard. During Choir 3’s task condition performance the audience was observed by Rater 1 to chat loudly, jeer at the choir, and laugh. The staff members, who attempted to restrain the audience, were continually ignored. Salience of threat, then, was not only triggered by the choir director before the performance, but reinforced overtly by the peer audience during it. There was an increase in observed performance quality for both genders in the fun condition for Choir 3 (see Table 3), but markedly so for the males, and these males exhibited the greatest increase of all the male groups, between task and fun conditions. There was no dissonance between observed and reported data for this choir. The males exhibited increased means in line with expected results, for Situational Awareness and Social Awareness, in the task condition (see Figures 1 and 2), suggesting an increased level of anxiety and concern about audience thoughts.


The paired samples t-test conducted to determine differences in means for observed performance quality, revealed results in line with those expected in that a significantly lower mean in the task condition was found for the males, but there was no significant difference between the means for the two conditions found for the females. However, no significant effect for gender and condition was found as a result of conducting the two-way MANOVA across the four factors. Thus, the observed results differed from those self-reported by the participants, which is not what the researcher expected.
This raised the question of the cause of this difference. Previous research had indicated that there could be dissonance between observed and reported experiences of stereotype threat (Bosson, Haymovitz, & Pinel, 2004), and suggested that the implicit experience of stereotype threat could explain this dissonance. The findings of the first study for this thesis appeared to support this assertion. Despite this, such research did not remove doubt about the reliability of the use of self-report questionnaires, per se, in measuring the effects of stereotype threat, as self-awareness seemed to be impaired.
However, while there was dissonance between the observed and self-reported results overall, this was not the case for the males in two choirs. For these choirs threat from the audience was noted to be blatantly salient in the task condition and, importantly, made so by their peers. This may have indicated that for the choirs where threat was not made blatantly salient, level of salience of threat affected the accuracy with which participants’ threat was self-reported. Thus, an insufficient level of salience of threat, rather than reliability of self-reports, may have contributed to dissonance between the majority of observed and reported performances in this study.
Importantly, it was the out-group peer audiences who had been more successful in making threat blatantly salient, rather than the choir directors for Choirs 2 and 3. This observation strengthened the idea that peer influence represented a more powerful force within the lives of adolescents than that of adults (Harris, 1998;Wigfield & Wagner, 2005), and indeed gave weight to Collins’ (2009) assertion that this was particularly so for adolescent males. Furthermore, the overt disapproval of gender-role non-congruence registered by peer-audiences for Choirs 2 and 3 supported the idea of Martino and Pallotta-Chiarolli (2003) that school assemblies can function as a site where gender norms, practices and relationships can be played out.
The male choristers had openly participated in a domain considered feminine, and had displayed expressive traits rather than the agency associated with the ideal form of masculinity. As such, the male choristers could have been considered to represent a challenge to out-group male peers, in offering a visibly alternative and softer form of masculinity to that commonly considered by previous researchers (e.g., Mac an Ghaill, 1994) to be dominant. The response to this challenge was expressed by vocal and aggressive audience demonstrations of the “hassling” described by Rout (1997), particularly from out-group male peers. This demonstration supported the finding of Vandello et al. (2008) that gender atypical performances incited responses of aggression in male audiences. The lack of tolerance for alternative forms of masculinity reported by Roulston and Mills (1998), and instead a clear policing of a preferred type of masculinity described by Martino and Pallotta-Chiarolli (2003), seemed to be borne out by this behaviour.
The resulting gender stereotype threat which stood to be mediated by such behaviour appeared ultimately to be manifest in the performance decrement recorded for males in the task condition. However, an equally detrimental result of stereotype threat was demonstrated uniquely by the males in Choir 2. These males were observed to disengage from the choir during their task condition performance, joined their out-group peers in ridiculing their fellow choristers, and subsequently withdrew from the choir altogether. These actions exemplified and added weight to the suggestions of Steele et al. (2002) that stereotype threat could also result in protective disidentification, and ultimately the avoidance of a domain. Importantly, too, the results of stereotype threat found for the males invited further research to investigate what coping strategies and traits might be shared by those males who remained in their choirs despite overt and subtle messages that it was not acceptable to do so.
The males in Choir 1 comprised the only group where performance quality was unaffected by stereotype and, surprisingly, the females in this choir demonstrated performance decrement instead. The choir consisted largely of Pasifica participants of whom most were Samoan, and for whom singing was an integral part of cultural life, especially at church. Therefore it did not seem unusual that the males in this choir were not threatened, since for them, singing was considered acceptable for both sexes. Thus cultural context would appear to have bearing upon stereotype threat, supporting the conception of Steele et al. (2002) that stereotype threat is situation specific.
However, the unexpected performance decrement recorded for the females in Choir 1 presented a conundrum. These females appeared to underperform in comparison with their male counterparts, and yet there was no reported cultural climate of singing being inappropriate for females in this context. An exploration of literature surrounding gender expectations of Pasifika cultures (e.g., Shore, 1981), and specifically that relating to Samoan gender expectations (e.g., Park, Sualii-Sauni, Anae, Lima, Fuamatu, & Mariner, 2002), shed light on the underperformance of the females in Choir 1. In Samoan culture (as was generally found in other Pasifika cultures) females were protected by their men folk (Park et al., 2002) in order to maintain reproductive purity (Shore, 1981). While young women were tightly controlled by the men to whom they deferred, young men were privileged in terms of agency and freedom (Park et al., 2002).
In view of the cultural expectations explained by this literature, it seemed that stereotype lift may have advantaged the performance of the males in Choir 1 in the task condition. “When a negative stereotype impugns the ability or worth of an out-group, people may experience stereotype lift – a performance boost that occurs when downward comparisons are made with a denigrated group.” (Walton & Cohen, 2003, p. 456). The dominant role accorded to young Samoan males, as protectors of their sisters (Park et al., 2002), may well have enabled the males in Choir 1 to engage in downward comparison and experience the associated boost in performance.
The data from this first study prompted the need for two further studies. Firstly, a further study was called for to extend and triangulate the data of the initial study, using case study choirs from a range of schools. Secondly, a third and final study was needed to explore beliefs and expectations held in school cultural contexts, which might promote or limit gender stereotype threat. Such research afforded the potential to extend the literature thus far generated on prejudicial and stereotypic beliefs and expectations, and on stereotype threat itself.
While the findings of the first study for this thesis were inconclusive in terms of supporting the hypothesis, they threw light on the research question, suggesting that salience of stereotype threat increased participant awareness of it. Importantly, it may be peers who have the greatest effect in triggering that salience, making blatantly apparent beliefs and expectations which would otherwise only be ambiguously suspected by the targets of stereotype threat. As well, the data suggested that there may be important differences in school cultures which affect how males in choirs are accepted and supported. Moreover, male choristers themselves may share common traits which enable them to persist in a domain where peer pressure asserts that idea that participation is a threat to identity as a male. The following chapter reports on qualitative data collected from focus groups in a second study. This study investigated the possibility of common male-chorister traits, and differing contextual experiences of stereotype threat for males in choirs, extending and triangulating the findings of the first study.


