Arum and Roksa (p. 7) say:
Research on course evaluations by Valen Johnson has convincingly demonstrated that "higher grades do lead to better course evaluations" and "student course evaluations are not very good indicators of how much students have learned. "
I don't have access to Johnson's book, but a review states:
[Johnson] found the "grade-attribution" theory the most useful: "Students attribute success in academic work to themselves, but attribute failure to external sources" (96). Regardless of the reason, the analysis provides "conclusive evidence of a biasing effect of student grades on student evaluations of teaching" (118).
Johnson did his work in the US. If I'm understanding correctly based on the fairly brief descriptions I have available, he managed to get permission to spy on students' actions over time, so that he could actually detect not just correlations but the time-ordering of events, which could help to tease apart questions of causation.
Johnson says that evaluations are "not very good" indicators of learning. My question is basically on what the available evidence is as to what "not very good" means. It's possible that someone could answer this simply by having access to Johnson's book and flipping to p. 118.
If "not very good" means low correlation, then it would be interesting to know whether the correlation is statistically different from zero, and, if so, what its sign is. My guess, which encountered a very skeptical reaction in comments here, was that the correlation might be negative, since improved learning might require higher standards, which would tend to result in lower grades.
If the correlation is nonzero, it would also be interesting to understand whether one can infer that learning has any causal effect on evaluations. These two variables could be correlated due to the grade-attribution effect, but that wouldn't mean higher learning caused higher evaluations; it could just mean that better students learn more, and better students also give higher evaluations.
If we had, for example, a study in which students were randomly assigned to different sections of a course, we might be able to tell whether differences between sections in learning were correlated with differences between sections in evaluations. However, my understanding is that most of these "value added" analyses (which are often done in K-12 education) are statistically bogus. Basically you're subtracting two measurements from one another, and the difference is very small compared to the random and systematic errors.
My anecdotal experience is that when I first started teaching, I was a relatively easy grader, I got very high teaching evaluations, and my students did badly on an internationally standardized test that I gave at the end of the term. Over time, I got confident enough to raise my standards, my teaching evaluations went down, and my students' learning got dramatically better, as measured by this test.
References
Arum and Roksa, Academically Adrift: Limited Learning on College Campuses
Valen Johnson, Grade Inflation: A Crisis in College Education, 2003
related: Do teaching evaluations lead to lower standards in class?
Answer
The answer, from new research approaches dating from 2010, seems to be that increased learning tends to cause lower scores on students' evaluations of teaching (SET), but this is a complicated issue that has historically been a bone of contention.
There is a huge literature on this topic. The people who study this kind of thing the most intensely are psychometricians. There are many things on which they seem to agree universally, and many of these areas simply represent the consensus view of professional psychometricians on their field in general:
The surveys used for students' evaluations of teaching (SET) should be designed by professionals, and are basically useless if created by people who lack professional expertise in psychometrics. Certain common practices, such as treating evaluation scores as if they were linear (and can therefore meaningfully be averaged), show a lack of competence in measurement.
It's a terrible idea to use SETs as the sole measure of a teacher's effectiveness. Multiple measures are always better than a single measure. But, as is often the case, administrators tend to prefer a single measure that is cheap to administer and superficially appears impartial and scientific.
SETs are increasingly being given online rather than being administered in class on paper. This is a disaster, because the response rates for the online evaluations are extremely low (usually 20-40%), so the resulting data are basically worthless.
The difficulty of a course or the workload, as measured by SET scores, has nearly zero correlation with achievement.
SET scores are multidimensional measures of multidimensional traits, but they seem to break down into two main dimensions, professional and personal, which are weighted about the same. The personal dimension is subject to biases based on sex, race, ethnicity, and sexual orientation (Calkins).
Getting down to the main question: does better learning affect teaching evaluations?
Before 2010, the best studies on this topic were ones in which students were randomly assigned to different sections of the same course, and then given an identical test at the end to measure achievement. These studies tended to show that SET ratings had correlations with achievement of about +0.30 to +0.44. But Cohen says, "There is one study finding of a strong negative relationship between ratings and the highest rated instructors had the lowest performing students. There is also one study finding showing the opposite, a near perfect positive relationship between ratings and achievement." This lack of consistency is not surprising, because we're talking about different fields of education and different SET forms. A typical positive correlation of +0.4 would indicate that 16% of the variance in students' performance could be attributed to differences between teachers that could be measured by SETs. Although 16% isn't very high, the sign of the correlation in most of the studies is positive and statistically significant.
But starting in 2010, new evidence arrived that turned this whole picture upside-down (Carrell, Braga). In these newer studies, students were randomly assigned to different sections of a class such as calculus, but they were then followed later in their career as they took required follow-up classes such as aeronautical engineering. The Carrell study was done at the US Air Force Academy, and due to the academy's structure, there was low attrition, and students could be forced to take the follow-up courses.
Carrell constructed a measure of added value for each teacher based on their students' performance on a test given at the end of the class (contemporaneous value-added), and a different measure (follow-on course value-added) based on performance in the later, required follow-on courses.
Academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous value-added but positively correlated with follow-on course value-added.
We find that less experienced and less qualified professors produce students who perform significantly better in the contemporaneous course being taught, whereas more experienced and highly qualified professors produce students who perform better in the follow-on related curriculum.
Braga's study at Bocconi University in Italy produces similar findings:
[We] find that our measure of teacher effectiveness is negatively correlated with the students' evaluations: in other words, teachers who are associated with better subsequent performance receive worst evaluations from their students. We rationalize these results with a simple model where teachers can either engage in real teaching or in teaching-to-the-test, the former requiring higher students' effort than the latter.
References
Abrami, d'Apollonia, and Rosenfield, "The dimensionality of student ratings of instruction: what we know and what we do not," in The Scholarship of Teaching and Learning in Higher Education: An Evidence-Based Perspective, eds. Perry and Smart, Springer 2007 - link
Braga, Paccagnella, and Pellizzari, "Evaluating Students' Evaluations of Professors," IZA Discussion Paper No. 5620, April 2011 - link
Calkins and Micari, "Less-Than-Perfect Judges: Evaluating Student Evaluations," Thought & Action, fall 2010, p. 7 - link
Carrell and West, "Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors," J Political Economy 118 (2010) 409 - link
Marsh and Roche, "Making Students' Evaluations of Teaching Effectiveness Effective: The Critical Issues of Validity, Bias, and Utility," American Psycologist, November 1997, p. 1187 - link
Stark and Freishtat, "An Evaluation of Course Evaluations," ScienceOpen https://www.scienceopen.com/document/vid/42e6aae5-‐246b-‐4900-‐8015-‐ dc99b467b6e4?0 - link
No comments:
Post a Comment