[This piece started as a restatement of a University of Minnesota writeup, in Science Daily, of research conducted by Sehoya Cotner and Cissy J. Ballen and published in PLOS ONE (2017). As with other restatements in this blog, the purpose here was to highlight the distortion of research findings, in science media outlets like Science Daily, to serve a preexisting feminist agenda.
There is no denying that life is hard, nor that women, like men, do face forms of unfair treatment specific to their gender. Nor is it clear that women are, in fact, inferior in general ability to learn and use scientific knowledge. The concern driving this post, like other posts in this blog, is just that feminist ideology, still dominant in the university and in media, often disserves the public — notably, women themselves — in places like Science Daily.
No doubt Science Daily sees itself as merely passing along what it receives from universities. Of course, such an excuse would be completely unacceptable for transmission of other discriminatory and unscientific ideologies — Nazi racism, say, or religious dogma. To avoid corruption due to any such ideology, the more responsible approach would be to solicit and rate volunteers capable of offering more critical analyses.
In writing this piece, I began — as with my other restatements — by using words from the Science Daily article to demonstrate that the same material could have been presented in support of conclusions very different from those adopted by Science Daily. Some of these alternate conclusions reflect my own views; some do not. My views are not the point. The point is that good science, and good reporting on it, require openness to the possibilities, not commitment to a preconceived belief.
In this case, the Science Daily article being restated was rather brief; it left out much relevant information. Thus, this restatement varies from my others by adding and discussing a fair amount of material from the original research report and from other sources, to highlight relevant facts and issues not mentioned in the Science Daily article.]
* * * * *
A new study published in PLOS ONE confirms that male students tend to do better than female students on high-stakes tests in biology courses. In the study, performance gaps between male and female students increased or decreased based on whether instructors emphasized or de-emphasized the value of exams.
Sehoya Cotner, associate professor in the College of Biological Sciences at the University of Minnesota, and Cissy Ballen, a postdoctoral associate in Cotner’s lab, based their findings on a year-long study of students in nine introductory biology courses. They found that female students performed relatively poorly in courses where exams counted for at least half of the total course grade.
“When the value of exams is changed, performance gaps increase or decrease accordingly,” says Cotner.
These findings build on recent research by Cotner and Ballen showing that, on average, women’s exam performance is adversely affected by test anxiety, comparable to the performance anxiety that prevents some people from coping effectively with stressful real-world situations. In this latest publication, Cotner and Ballen did not look into the sources of such anxiety, discuss possible remedies, or ask whether female test anxiety in scientific fields derives from the long-established female inferiority in mathematics.
Those oversights constitute a major weakness in this research. In the words of Ceci et al. (2014, p. 76), summarizing an extensive review of relevant scientific literature,
Although in the past, gender discrimination was an important cause of women’s underrepresentation in scientific academic careers, this claim has continued to be invoked after it has ceased being a valid cause of women’s underrepresentation in math-intensive fields. Consequently, current barriers to women’s full participation in mathematically intensive academic science fields are rooted in pre-college factors and the subsequent likelihood of majoring in these fields, and future research should focus on these barriers rather than misdirecting attention toward historical barriers that no longer account for women’s underrepresentation in academic science.
Consider, for example, research suggesting that children acquire relevant stereotypes as early as age 6 (Bian et al., 2017). By concentrating on the performance of women in STEM (science, technology, engineering, and mathematics) fields at the college level, Cotner and Ballen raised the question of whether women are the problem. That was unfortunate. As the University of Minnesota itself admits, some women do excel in STEM fields — and, of course, many men do not. Women who excel in STEM fields do not seem to need, and may not appreciate, a deliberate skewing of student assessment that would obscure their own superiority and plague them with poorly prepared colleagues and employees. Nor would such an outcome improve long-term public perceptions of women in STEM fields. In lieu of a sexist obsession with the myth of discrimination against women at the college level, researchers might be well advised to focus on those individuals, male and female alike, who find science difficult.
To help instructors conceal the problem of inferior female performance in STEM classes, Cotner and Ballen recommended switching to a “mixed model” of student assessment. This mixed model, in their view, should reduce or eliminate the use of exams that require cold mastery of large amounts of material, and should instead favor smaller exams, quizzes, and non-exam methods of monitoring student progress. The authors’ message seemed to be that, when facing major challenges, such as midterm and final exams, male students were more likely to rise to the occasion, while female students (and researchers) preferred to ask if the challenge could be changed to something more pleasant.
Cotner and Ballen found that “the shift away from an exam emphasis . . . benefits female students.” Unfortunately, these authors’ concept of “benefit” for female students was limited to the achievement of higher final grades, in a grade-inflated milieu. They did not examine the question of whether such a shift produced graduates better prepared for graduate school, or for the demands placed upon science practitioners in the working world.
Examples of non-exam assessment methods include what Cotner and Ballen characterized as “lab work” and take-home “written assignments.” Such tasks could give students unequal opportunities to obtain information and assistance from others. For example, another post in this blog mentions a group of female students who credited their “nerd herd” group with helping them get through their STEM studies. That “nerd herd” seemed to be intended to exclude male students, on grounds of gender, even if they were well suited for collaboration. In that case, assignments that reward collaboration would devalue male students of equal ability facing an unsupportive STEM culture. Teamwork aside, it is also likely that some female students enjoy unequal access to free assistance from male professors or PhD students in the predominantly male STEM fields. Such possibilities may explain why Cotner and Ballen found that women tended to perform better, relative to male students, when exams counted for less toward the final grade.
At the end of their paper, Cotner and Ballen offered their opinions on two other topics that were not part of their study. One was “active learning.” The authors explained that active learning shifts the focus away from lectures and lecture halls to more collaborative spaces and group-based work, and that this shift tends to result in greater use of non-exam methods of assessment. Cotner and Ballen did not offer data to support the latter belief. It seemed obvious that a course could feature mixed tasks and still weigh exams heavily in determining the final grade.
No doubt active learning approaches can be helpful when chosen and implemented well. Yet Hopper (2017, p. 89) cited several reports of failed active learning efforts, and observed that active learning is not a panacea — that, among other things, its effectiveness depends upon the quality of instructor training and the level of instructor commitment. Similarly, Cleveland et al. (2017) concluded that “more research is needed to examine the differential effects of varying active-learning strategies on students’ attitudes and motivations.”
Students surely do tend to learn better if they engage with the material in multiple ways, such as those suggested by Wikipedia (e.g., class discussions, debates, games, learn by teaching). For that matter, most students would do better if they were given private tutors. But most such methods would be infeasible in the “large introductory courses” (consisting of 90 to 239 students each) that Cotner and Ballen studied. An effective argument on such matters would need to include some attention to the costs of providing that level of instruction.
It is true that take-home assignments could be better than exams, for purposes of testing whether students have the ability to assemble a complete solution to a complex problem — to demonstrate, for instance, that they know how to design a bridge that won’t fall down. Working in the comfort of one’s home might also help students avoid making dumb mistakes in simple calculations (of e.g., efficacy of a vaccine) that could occur under the stress of an exam situation. It is also true, however, that take-home assignments are typically open to cheating. Such are the concerns with which any competent instructor wrestles. It seemed remarkable that Cotner and Ballen devoted no attention to such concerns. There seemed to be a question of whether our authors believed that female students should be helped to get through STEM courses by any means available, fair or foul. Only a startlingly twisted feminism would favor such an outcome at the risk of bridges collapsing or vaccines failing.
Cotner and Ballen seemed to be seeking a stark demonstration of the difference in female student performance in courses whose final grades did, or did not, depend heavily on high-stakes exams. Unfortunately, the authors did not investigate a meaningful variety of courses to make their case. They chose to study grades given to male and female students in only nine courses, and in each course the portion of the final grade based on exams fell within the narrow range of 42% to 51%. That is, non-exam assessment methods accounted for 49% to 58% of the grade in each course. Thus, none of these courses seemed to provide a good example of what happens when exams largely determine the grade or are substantially eliminated. The authors admitted, further, that variations in the outcomes in this small sample of courses may have been due to “other variables (different instructors, different student populations, variable in-class teaching techniques).”
Possibly the authors would have done a better job of their main investigation if they had not distracted themselves by trying to mix in a little study of three courses where instructors taught their courses with more of an emphasis on exams in the fall semester and less of an emphasis on exams in the spring semester, or vice versa. This separate study found that female students still did significantly worse than male students, even when exams counted for only 42-44% of the final grade — percentages that were at the bottom end of the range found in the larger courses (above). Female performance seemed to match or exceed male performance only when exams accounted for less than about 30% of final grades. The sense emerging from this small study was that women would be able to keep up only if at least two-thirds of their grade came from more enjoyable and/or manipulable forms of assessment.
Such data suggested that these studies should have been informed by ACT or SAT quantitative scores, to insure that we were, in fact, seeing an apples-to-apples comparison of equally qualified male and female students. The desperate effort to get female students into STEM courses may have led Cotner and Ballen to blame courses, and instructors, for a simple disparity in the qualifications of male vs. female STEM students. The authors did collect ACT score data for the students participating in this research, but declined to disclose male-female differences in those scores in the nine courses examined in their primary study. The authors used ACT score data to conclude that male and female students “were comparable in their preparation” in the secondary study. Yet, even there, the authors did not refer specifically to math and science ACT scores. Rather, they said they used “comprehensive” ACT scores, suggesting that they considered reading subscores equal to science subscores, for purposes of gauging student qualification for science study. (Remarkably, ACT’s own 2017 report provided an extensive nationwide breakdown of ACT scores in each of the exam’s four sub-areas by race, field of study, educational aspirations, and state of residence, but said not a word about gender differences.)
In multiple regards, then, Cotner and Ballen presented a rather scattered analysis, capped by two half-baked afterthoughts. The first of these two, as just discussed, was that we should turn aside to consider the separate topic of active learning, because it might lead to greater use of non-exam forms of assessment, in courses that were already using non-exam assessment for at least half of the final grade — because, even at that level, the female students were not doing well. The second of these afterthoughts involved a contrast between what Cotner and Ballen characterized as the “student deficit model” vs. the “course deficit model.” The authors expressed the view that, when students fail, it is probably because of a failure in the design of the course rather than in the ability, motivation, or effort of the student — if the student is female. Cotner said, “We conclude by challenging the student deficit model, and suggest a course deficit model as explanatory of these performance gaps, whereby the microclimate of the classroom can either raise or lower barriers to success for underrepresented groups in STEM.”
Such a suggestion would have made sense if Cotner and Ballen had made a clear case for it — if, for instance, they had provided supporting data indicating that similarly qualified male and female students were receiving dissimilar grades due to specific flaws in course design, with an explanation of how it would be inappropriate to administer exams to comparably prepared male and female students. Without such data, this suggestion was on a par with the opinions spouted by some random layperson over a beer. It was simply absurd to suggest, in a science periodical, without evidence, that STEM courses need to be reconceived across the board, worldwide, so as to place the blame on instructors when female students of unspecified ability receive grades lower than those given to male students.
It was particularly dismaying that Cotner and Ballen advanced such a vacuous argument at this point, at the end of their paper. They had just finished devoting pages to a display of scientific investigation. They had imitated scientific procedure pretty well, at least if you ignore their obvious bias favoring potentially unqualified female STEM students. In these pages of analysis, Cotner and Ballen did seem to be people who realized that, in science, conclusions depend on evidence. But now, at the end, evidently they could not restrain themselves from launching into these other topics and offering these completely speculative beliefs. It was as if our authors were children: they had put on an imitation of adult science, but ultimately they could not help having fun and making a goof of it.
Readers searching for insight on student vs. course deficit models at this point, near the end of the paper, might conclude that Cotner and Ballen were pulling it out of thin air. The authors may have felt, to the contrary, that they were actually looping back to the start of their article. There, they had cited Valencia’s Evolution of Deficit Thinking (1997, evidently reprinted in 2012). In their reading, it seemed, Valencia rejected the propositions that “some students enter college lacking the academic resources necessary to succeed in an otherwise fair learning environment” and that “high achievement is the direct result of hard work and inherent abilities.” It is true that, in some situations, some types of persons face unequal barriers. But that would hardly negate the obvious fact that students really do differ in ability, preparation, and inclination to work hard.
In these remarks, as I say, I am guessing that Cotner and Ballen mentioned the student and course deficit models, at the end of their article, because they wanted us to recall that they had also mentioned those models, and had cited Valencia, at the start of their paper. I might be guessing wrong. At the start, they had cited Valencia as just one among a number of sources offering divergent explanations of inferior female performance in STEM courses. A reader might infer that, actually, they didn’t buy his hypothesis, because they devoted only three sentences to it. By comparison, they spent two full introductory paragraphs on the hypothesis of “stereotype threat,” in which people essentially live up (or down) to the expectations that others have of them. It was conceivable that stereotype threat would impact female exam performance. But in their closing remarks, Cotner and Ballen admitted,
[W]e did not establish — via surveys, interviews, or any sort of contextual manipulation — the salience of a stereotype about female deficiencies in biology. Thus, we are reluctant to make any claims about stereotype threat affecting females in these introductory biology courses . . . .
It seems, in short, that Cotner and Ballen did not have a clear hypothesis in mind at the start, but were rather just hoping the data would support one hypothesis or another. There’s nothing wrong with pursuing one’s curiosity. The problem here is that the authors began with a desire to find data supporting a predetermined conclusion. Inevitably, some data will, if you look long enough and try enough different lenses. This is how true believers approach matters of politics and religion, as distinct from science.
The article by Cotner and Ballen was published in PLOS ONE. That journal’s webpage said,
Each submission to PLOS ONE passes through a rigorous quality control and peer-review evaluation process before receiving a decision. . . . Once each manuscript has passed quality control, it is assigned to a member of the Editorial Board, who takes responsibility as the Academic Editor for the submission. The Academic Editor is responsible for conducting the peer-review process and for making a decision to accept, invite revision of, or reject the article.
Those lofty claims were perhaps mitigated by the reality that, as of September 2017, PLOS ONE had published more than 200,000 articles. Maybe things slip through the cracks when handling such volume. Or, if the journal did manage to maintain high standards despite such numbers, perhaps the explanation was that Cotner and Ballen were essentially exempted from close scrutiny because they were women, writing about women, within the supposedly woman-unfriendly STEM fields. Maybe the article, and its publication, were really about the same thing: a benevolent sexism, riding forth in defense of damsels in distress, of whom one dare not expect too much.