Mann-Whitney U Test

Get Complete Project Material File(s) Now! »


The research methodology presented in this Chapter was designed to test both the hypothesis derived from P-O fit literature and the hypotheses derived from value change literature. What follows is a description of the methods used to determine the existence of the perceived problem (Pilot Study), and once determined, to describe the methods used to define the extent of the problem (P-O Fit Study), and, lastly, to describe the methods used to determine the effectiveness of the interventions proposed to rectify the problem.
The P-O fit of accounting students in Georgia with the accounting profession in Georgia is assessed with a survey; and the effectiveness of using a value change intervention, to improve (both short-term and long-term) the P-O fit of accounting students, is evaluated with an experiment.
This Chapter proceeds as follows: the chronology of the research process (3.1), in which the sequencing in conducting the Pilot Study, P-O Fit Study, and Value Change Study is explained; descriptions of the students who responded to surveys in the Pilot and P-O Fit Studies and of the students who participated in the Value Change Study (3.2); details regarding the measuring instrument used in all three studies (the Rokeach Value Survey; RVS), as well as details pertaining to the demographic questionnaire (3.3); and the statistical methods used to present and analyze the data (3.4).
Also presented in this Chapter is the Pilot Study method and results (3.5), as this study was seen as separate from the major empirical study. It is included in this Chapter as it informs the methodological choices made later in this Chapter. The methodology of the P-O Fit Study is described in detail (3.6) (including background, method, and procedures) as well as the nature of the Value Change Study (3.7) (including an explanation of the values targeted for change, short-term and long-term value change testing procedures, and the Curriculum Modification and Value Self-Confrontation (VSC) procedures).


This research was conducted in three phases. First, the Pilot Study was conducted during the first semester of 2013; second, the Value Change Study was conducted during the second semester of 2013; and third, the P-O Fit Study was conducted during the first semester of 2014. The order of these three phases was dictated by the availability of subjects for the value change experiment and time limitations. A professional judgment class taught by this researcher, which was ideal for conducting the Value Change Study, was only available to students during the summer semester of each year. The professional judgement class was the only accounting class in Southern Polytechnic State University’s (SPSU; now Kennesaw State University) Master of Science in Accounting (MSA) program in which a focus on personal and professional values readily fit the curriculum. The time constraints in setting up and running the Value Change Study necessitated waiting until the following semester to conduct the P-O Fit Study.


The RVS respondents in the three reported studies (Pilot Study, P-O Fit Study and Value Change Study) were upper level accounting students attending higher education institutions in Georgia.

Pilot Study Respondents

The subjects for the Pilot Study were selected from junior and senior level accounting students (N = 30) taking an upper level accounting course and graduate level students taking three MSA classes (N = 97). All of these students attended SPSU located in Marietta, Georgia (a suburb of Atlanta, Georgia). This convenience sample of accounting students, which did not include all of the accounting students at this university, was used to determine whether or not the P-O fit of accounting students merited further investigation.

P-O Fit Study Respondents

The accounting students in the P-O Fit Study attended universities of higher education in Georgia. These students responded to the RVS study solicitations, which were made state-wide, either by the Georgia Society of CPAs (GSCPA), this researcher, or by a member of the accounting faculty at the university the students attended. No attempt was made to randomly select the participants.

Value Change Study Participants

Of the 34 prior VSC studies reviewed in Chapter 2 (2.2.3), 22 of them were conducted with student subjects ( The subjects in the present study are also college students who were enrolled in one of two sections of an MSA course in professional judgment taught by this researcher during the summer semester of 2013 (the last semester of the school year). These students were randomly assigned to the sections by the administrative assistant in charge of registering students for business administration classes. She equalized the sections for both number and gender. The effect of the Value Change Study was measured with the RVS.


This research is operationalized with the RVS (, Exhibit 1) and the accompanying demographic questionnaire (Exhibit 2).

Rokeach Value Survey

The RVS was developed my Milton Rokeach (1968, 1973) as a method for measuring values and value systems as conceptualized in belief system theory, which includes a theory of value change.

Previous Use

Since its creation by Milton Rokeach (1968, 1973), the RVS has for four decades been utilized in numerous values studies with both non-accounting subjects (e.g., Bocsi, 2012; Uy, 2011), business students (e.g., Gervazio, Giraldi, & Costa, 2012; Giacomino, Li, & Akers, 2013) and various groups of accounting subjects: for example, accounting students (e.g., Baker, 1976; Swindle & Phelps, 1984; Eaton & Giacomino, 2000; Abdolmohammadi & Baker, 2006; Liu, 2011; Wen, 2012), accounting professors (e.g., Pinac-Ward, Ward, & Wilson, 1995), accounting alumni (Giacomino & Eaton, 2003), and CPAs (Swindle et al.,1987; Wilson et al., 1998; Ariail et al., 2012; Ariail et al., 2013).

Composition of the Rokeach Value Survey Instrument

The RVS (Exhibit 1) is composed of 18 terminal values (goals in life/desirable ends) and 18 instrumental values (desirable means for reaching one’s goals) which subjects rank order from 1-18 (1 = most important; 18 = least important) based on, as indicated in the RVS instructions, “. . . their importance to [the subject] as guiding principles in [their] life” (Rokeach, 1973, p. 358).

Validity and Reliability

Investigators have found the RVS to be both valid and reliable. According to Bearden and Netemeyer (1999), “the scale has undergone numerous reliability and validity checks across various samples” (Netemeyer, 1999, p. 121). Reliability coefficients based on test-retest data have been reported in a number of studies: for example, after three weeks, Rokeach (1973) reported coefficients of .74 for the terminal values and .65 for the instrumental values; after five weeks, Feather (1971, 1975) reported coefficients of .74 for the terminal values and .70 for the instrumental values; after two weeks, Munson and McIntyre (1979) reported coefficients of .82 for the terminal values and .76 for the instrumental values; and Reynolds and Jolly (1980) reported a reliability coefficient for the terminal values of .78. Mueller (1984) suggested that the reliability coefficients reported for the RVS indicated that it should not “. . . be utilized in the interpretation or comparison of individual respondents. . . [but that] descriptive and comparison of group values . . . is an acceptable use” (Mueller, 1984, p. 552).
Prior research has addressed the predictive, concurrent and construct validity of the RVS. Subsets of RVS values have been found to predict religiosity (Rokeach, 1973), to predict cheating behavior (Shotland & Berger, 1970; Homant & Rokeach, 1970), to predict social activism (Thomas, 1986), and to be significantly related to over 20 behaviors (Rokeach, 1973). Concurrent validity has been reported between the RVS and various other values research instruments including the England Personal Value Questionnaire (Munson & Posner, 1980) and the List of Values (Beatty, Kahle, Homer, & Misra, 1985). In addition, the construct validity of the RVS has been addressed by a number of researchers: Braitwaite and Law (1985) reported that the RVS is “. . . successful in covering the many and varied facets of the value domain” (Braitwaite & Law, 1985, p. 260); Thompson, Levitov, and Miederhoff (1982) found the RVS to have situation-specific construct validity; Homant (1969) indicated that the RVS values correlated “. . . with the evaluative dimensions of [their] connotative meanings. . . ” (Homant (1969, p. 886); and Rokeach (1973) reported that the RVS values “. . . are not readily reducible to some smaller number” (Rokeach, 1973, p. 48).

READ  Behavioural finance

Demographic Questionnaire

The demographic questionnaire (Exhibit 2) was developed by this researcher to solicit information regarding various dependent variables: gender, years of work experience, student status, college major, ethics education, place of birth, residence at the age of 16, and political orientation. In prior studies (e.g., Rokeach, 1973; Feather, 1975; Wilson et al., 1998; Giacomino & Eaton, 2003; Ariail, 2005; Ariail et al., 2012) most of these variables have been directly or indirectly investigated.
Since this study is focused on the P-O fit of upper level accounting students, question four, regarding student status (freshman, sophomore, junior, senior, and graduate), and question five, regarding college/university major (accounting, management, etc.) are variables used in this study. Question 10 asks students to respond yes or no to whether or not they have previously taken the survey. The answer to this question allowed for the purging of multiple survey responses. While the results for the additional variables of age, gender, years of work experience, ethics training, country of birth, and country of residency at the age of 16 are reported in describing the demographics of the three samples of accounting students (Pilot Study, P-O Fit Study, and Value Change Study), the analysis of the results for several of these variables is reserved for future iterations of this research. Student respondents were instructed to answer the 10 questions included in the demographic questionnaire prior to completing the RVS.


In the P-O Fit Study and the Value Change Study, either the chi-square test or the t-test were used to compare the demographic variables. Since the RVS generates ranked data, non-parametric statistical methods were primarily used in the analysis. Medians and composite rank orders were computed for each of the 36 RVS values. The Mann-Whitney U test, the primary statistic, and the median test, the secondary (supplemental) statistic, were used to compare the medians of the two independent groups (CPA leaders and accounting students). The Wilcoxon signed-rank test (where applicable) and the paired-samples sign test (the alternate statistic) were used to measure the significance of the differences in the medians from Pretest to Posttest 1 for both Group 1 and Group 2. Freidman’s ANOVA test with post hoc paired-samples sign tests were used with Group 2 to compare the medians of the targeted values at Pretest to the medians at three Posttests. Below each of these tests are discussed.

Chi-Square Test

Field (2009) explains that the term chi-square . . . can apply to any test statistic having a chi-square distribution, it generally refers to Pearson’s chi-square test of the independence of two categorical variables. Essentially it tests whether two categorical variables forming a contingency table are associated. . . .What we mean by an association is the pattern of responses . . . in the two . . . conditions is significantly different. (Field, 2009, p. 697 & 783)
Accordingly, in this study the Chi-square test of independence is used to compare the demographic variables of CPAs and students in the P-O Fit Study and selected demographic variables of Group 1 and Group 2 in the Value Change Study. The purpose of these comparisons is to determine the congruence on these variables of the two independent groups. In the Value Change Study, the Chi-square test was utilized to determine whether or not subjects in Group 1 and Group 2 matched on these variables at the beginning of the experiment.


According to Field (2009), the “. . . t-statistic . . . in the context of experimental work . . . is used to test whether the differences between two means are significantly different from zero” (Field, 2009, p. 795).
In the Value Change Study the t-test was used to compare the mean grade point averages of the students in Group 1 and Group 2. The purpose of this test was to determine whether or not these two groups of students differed in this variable at the beginning of the experiment.


Rokeach (1973) indicated that since the frequency distributions of each of the values in the RVS may not produce a normal distribution, the median is the appropriate measure of central tendency. He stated the following:
Each value seems to have its own distinctive nonparametric distribution, and many of them show frequency distributions that are highly skewed in one direction or the other. The frequency distribution for a world at peace and family security, for instance, are heavily skewed toward the higher ranks. Pleasure and an exciting life, on the other hand, show distributions that are heavily skewed in the other direction, with most of the rankings piling up at the low end. Because these frequency distributions deviate so markedly from normality and from one another, a circumstance to be expected with ranked data, the measure of central tendency that was considered to be most appropriate is the median rather than the mean, and the nonparametric median test. (Rokeach, 1973, p. 56)
Accordingly, in the present study the median is used as the measure of central tendency. The median,
which is the “. . . the middle score when scores are ranked in order of magnitude” (Field, 2009, p. 21), is computed in the present study using SPSS. However, it also can be manually computed using the formula (n + 1)/2 where n is the number of scores (Field, 2009).

Grouped Medians Composite Rank Order

In order to determine higher degrees of discrimination between the medians (and thus facilitate production of the RVS value rankings for each group of subjects), grouped medians (Black, 2011;, 2014, no page) were calculated for the aggregated data for each of the RVS value scores obtained from the Pilot Study, the P-O Fit Study, and the Value Change Study. Grouped median results have been reported in several prior studies using the RVS (e.g., Rokeach, 1973; Feather, 1975;
Rokeach, 1979), and is calculated here. Grouped medians, which are computed in the present study using SPSS, can also be manually computed using cumulative frequency tables and the following formula (adapted from, 2014, no page):
1 + (n/2 – cf/f) x h
1 = lower limit of the median class
n = number of observations
cf = cumulative frequency of the class preceding
the median class
f = frequency of the median class h = class size
Whenever the grouped medians are presented in tabular form, the rank order of the grouped medians is included within a parenthesis placed in front of the grouped median value, which is taken to two to four decimal places. This rank order of grouped medians is referred to by Rokeach (1973) as the composite rank order. As indicated by Rokeach (1973), “the composite rank order . . . [is] useful not only as a general index of the relative position of a particular value in the total hierarchy of values but also when comparing the position of a particular value across groups” (Rokeach, 1973, p. 56).

Mann-Whitney U Test

The Mann-Whitney U test is, according to Field (2009),
a non-parametric [chi-square] test that looks for differences between two independent samples. That is, it tests whether the populations from which the two samples are drawn have the same location. It is functionally the same as the Wilcoxon’s rank-sum test, and both tests are non-parametric equivalents of the independent t-test. (Field, 2009, p. 789)
This test is the primary statistic used in the Pilot Study and the P-O Fit Study to compare the RVS values of these independent group of accountants (students and CPAs) and to test Hypothesis 1.

Median Test

The median test has been the test of statistical significance used in a number of prior RVS studies (e.g., Rokeach, 1973; Feather, 1975; Swindle & Phelps, 1984; Wilson et al., 1998) where the medians of groups were compared. “The median test is a chi-squared test of the significance of difference between the number of persons in two or more subgroups who score above and below the group median”
(Rokeach, 1973, p. 56). More specifically, the median test
. . . compares the medians of two independent samples. The null hypothesis is that no difference exists between the medians of the population from which the samples are drawn. . . . The median test is based on the idea that in two samples drawn from the same population the expectation is that as many observations in each sample will fall above as below the median. (Ferguson & Takane, 1989, p. 433)
While the median test has historically been the primary statistic used to analyze the ranked data
generated by the RVS, Friedlin and Gastwirth (2000) suggested that it be retired in favor of the Mann- Whitney U test. The Mann-Whitney U test is considered a better test by modern statisticians because it considers the ranks of all observations in the sample together, instead of separately comparing each observation to the median value (Gibbons & Chakraborti, 2010).
Accordingly, the Mann-Whitney U test is the primary statistic used to compare the medians of the RVS values of different groups of accountants. In order to allow for comparability of the present results with that obtained in prior research, median test results are presented as supplemental information.

READ  Mobile phone information access and interaction frameworks in education

Friedman’s ANOVA Test

In the Value Change Study, Friedman’s ANOVA test is used to compare the medians of the rank ordered (nonparametric data) RVS values of Group 2 from Pretest to Posttest 1 to Posttest 2 (three time periods) and from Pretest to Protest 1 to Protest 2 to Posttest 3 (four time periods). According to Field (2009), Friedman’s ANOVA
. . . is used for testing differences between conditions when there are more than two conditions and the same participants have been used in all conditions (each case contributes several scores to the data). If you have violated some assumption of parametric tests then this test can be a useful way around the problem. (Field, 2009, p. 573)
The instrumental portion of the RVS was administered to Group 2 at four time periods (Pretest at the beginning of the course, Posttest 1 after intervention at the end of the course, Posttest 2 at five-six weeks after intervention, and Posttest 3 at 15-16 weeks after intervention).

Wilcoxon Signed-Rank Test

Field (2009) indicates that the Wilcoxon signed-rank test works in a fairly similar way to the dependent t-test . . . in that it is based on the difference between scores in the two conditions you’re comparing. Once these differences have been calculated they are ranked . . . but the sign of the difference (positive or negative) is assigned to the rank. (Field, 2009, p. 552)
According to the University of Dayton (2014), the Wilcoxon signed-rank test is a “. . . nonparametric statistic that can be used with ordinally . . . scaled dependent variable[s] when the independent variable has two levels and the participants have been matched or the samples correlated” (University of Dayton, 2014, no page).
In the Value Change Study the dependent variables are the four targeted values and the independent variable consists of two categorical matched pairs: Group 1 from Pretest to Posttest 1, and Group 2 over four time periods: Pretest, Posttest 1, Posttest 2, and Posttest 3. Therefore, where applicable, the Wilcoxon signed-rank test was used to compare the medians of Group 1 and Group 2.

Paired-Samples Sign Test

The Wilcoxon signed-rank test requires that “the distribution between the two related groups (i.e., the distribution of the differences between the scores of both groups. . .) needs to be symmetrical in shape” (Laerd Statistics (2014), Wilcoxon signed-rank test, para. 4). Problems were experienced when neither a log natural, a log 10, nor a square root transformation worked to correct the skew from Pretest to Posttest 1 of the median differences of the targeted values of courageous and responsible for Group 1 and of capable, courageous, and honest for Group 2. Therefore, the paired-samples sign test (an alternate statistic) was used to compare the medians from Pretest to Posttest 1 of these targeted values.
According to Laerd Statistics (2014), the paired-samples sign test . . . is used to determine whether there is a median difference between paired or matched observations. This test can be considered as an alternative to the . . . Wilcoxon signed-rank test when the distribution of differences between paired observations is neither normal nor symmetrical, respectively. Most commonly, participants are tested at two time points or under two different conditions on the same continuous dependent variable. (Laerd Statistics, 2014, Sign Test in SPSS, para. 1)
Since the paired-sample sign test can be conducted whether or not the distribution of the differences in rankings are skewed, the paired-sample sign test was also used to augment the Wilcoxon signed-rank test results from Pretest to Posttest 1 for the Group 1 values of capable and honest and for the Group 2 value of responsible: Therefore, the paired-sample sign test was used to analyze the median differences from Pretest to Posttest 1 of all four of the targeted values for both Group 1 and Group 2.

Post Hoc Tests

In order to adequately address Hypothesis 4, regarding long-term value change, it was desirable to augment Friedman’s ANOVA test results with post hoc tests. The Friedman’s ANOVA test “. . . is an omnibus test . . . [that] . . . tells you whether there are overall differences, but does not pinpoint which groups in particular differ from each other. To do this you need to run post hoc tests. . .” (Leard Statistics, 2014, Friedman Test in SPSS, no page). Therefore, in order to further delineate how the targeted values changed over the four time periods in the Value Change Study, post hoc tests with the paired-samples sign test were run on the six possible combinations of the Group 2 results: Pretest to Posttest 1, Pretest to Posttest 2, Pretest to Posttest 3, Posttest 2 to Posttest 3, Posttest 1 to Posttest 2, and
Posttest 1 to Posttest 3.

Tests of Significance

A value of p < .05 was adopted to determine statistical significance in all of the statistical tests: the Chi-square test, t-test, Mann-Whitney U test, median test, Friedman’s ANOVA test, Wilcoxon signed-rank test, and the paired-samples sign test.
Conventions regarding adopting a level of significance are explained as follows by Ferguson and Takane (1989):
The probability of a Type 1 error is called the level of significance of a test. Ordinarily, the investigator adopts . . . a level of significance. It is a common convention to adopt levels of significance of either .05 or .01. If the probability is equal to or less than .05 of asserting that there is a difference between two means, for example, when no such difference exists, then the difference is said to be significant at the .05 or 5 percent level or less. Here the chances are 5 in 100, or less, that the difference could result when there is no difference in the population values. (Ferguson & Takane, 1989, p. 182)


Related Posts