Researchers on education must have thought about this, so I hope to find some directions here. Suppose that I am grading students on a scale from 1 to 10. If I have no prior knowledge of the ability of the students but have obtained their test scores (let's say from 1 to 100).
Suppose these test scores have only ordinal meaning. A student with a higher score can be said to have achieved better mastery of the course but having double the score does not mean that one has achieved double as much as another student. We may also assume that a (statistically speaking) large number of students representative of the student population took the exam.
What should be the optimal grading distribution?
We may also assume that the grading distribution/scale should achieve two goals: a) it should be informative about student's grasp of the material, b) it should incentivize students to study the material.
Regarding a) from an information theory perspective, we may want to maximize the information (entropy) of the grade distribution. Thus, we would choose a scale which yields a uniform distribution. However, in practice most teachers implement distributions which are peaked. What is the motivation behind this?
Answer
Assigning grades to fit some "optimal distribution" is misguided. We don't want to maximize the entropy of the grades in a particular course. It's not a very useful measure of a "good" set of grades. To quote from an answer by Anonymous Mathematician:
Strictly speaking, Shannon entropy pays no attention to the distance between scores, just to whether they are exactly equal. I.e., you can have high entropy if every student gets a slightly different score, even if the scores are all very near to each other and thus not useful for distinguishing students.
(Note that this was in answer to a question that asked about using the entropy of exam scores as an indication of how good an exam is at distinguishing between different levels of mastery. It wasn't suggesting that grades should be curved a posteriori to maximize entropy.)
What we actually want is for grades to signal as closely as possible the students' mastery of course material. If every student in the course has achieved truly excellent mastery of the course material, they should all get high scores. This grade then seems to carry very little information. But in reality, it is much more useful to say that all students in this particular class achieved excellence and deserve a 10/10, than it would be to maximize the "information" carried by the grade and give some students a 1/10 because their performance was slightly less excellent than the highest level of excellence achieved by a student that year. This scenario (where all students achieve excellent or very good grades) is not even that unusual, as Michael Covington points out:
In advanced courses, it can be quite proper for all students to get A’s and B’s, because weak students would not take the course in the first place.
For an individual student, the grade should depend on that student's demonstrated mastery of course material, and hopefully not at all (or as little as possible) on the other students in the class.
If you insist on thinking about it from an information theoretic perspective, what we really want is to minimize the Kullback–Leibler divergence between the distribution of students' achievements and distribution of students' grades.
If you have no information about the exam and what it measures, you cannot assign meaningful grades based on that exam score. If you know about the exam, you can assign meaningful grades based on exam scores, but not according to any optimal distribution - you would assign scores based on how much of the exam students are expected to know to demonstrate various levels of mastery, not by mathematically shaping the grades into some predetermined "optimal distribution."
Edit: Suppose these test scores have only ordinal meaning. A student with a higher score can be said to have achieved better mastery of the course but having double the score does not mean that one has achieved double as much as another student.
Your edits are not going to change the answer; there's still not going to be an optimal distribution. If I have many excellent students in the class and I do an exceptionally good job teaching them, I'll give out many excellent grades, no matter how they rank with respect to one another. If all of my students are terrible and do poorly, they'll all get low grades even if one manages to get in a few more points than another (although in that case, I'll also take a closer look at the class to see why students are doing so poorly.) If half of my class excels and the other half fails to meet minimum standards, I'll give out 50% top grades and 50% failing grades. If their abilities happen to be normally distributed, their grades will be as well. You get the idea.
The distribution of student grades should follow the distribution of demonstrated achievements. Any grade distribution that doesn't is definitely not optimal.
The idea of grading students to some predetermined distribution (of any shape) is known as "norm-referenced grading." For more information about alternatives to norm-referenced grading, see:
- Sadler, D. Royce. "Interpretations of criteria‐based assessment and grading in higher education." Assessment & Evaluation in Higher Education 30.2 (2005): 175-194.
- Aviles, Christopher B. "Grading with norm-referenced or criterion-referenced measurements: to curve or not to curve, that is the question." Social work education 20.5 (2001): 603-608.
- Rose, Leslie. "Norm-referenced grading in the age of carnegie: why criteria-referenced grading is more consistent with current trends in legal education and how legal writing can lead the way." Journal of the Legal Writing Institute 17 (2011): 123.
- Guskey, Thomas R. "Grading policies that work against standards... and how to fix them." NASSP Bulletin 84.620 (2000): 20-29.
These go into some more detail about problems with norm-referenced grading. You asked for a grade distribution that is "informative about student's grasp of the material." The literature I have cited explains that pure norm-referenced grading is not informative about that; it can only inform about student's relative grasp of the material, compared to others in the same group who have taken the same exam. In other words (emphasis mine):
It can be useful for selective purposes (e.g. for the distribution of a scholarship to the 5 best students, or extra tuition to the 5 which are struggling most), but gives little information about the actual abilities of the candidates.
Source: McAlpine, Mhairi. Principles of assessment. CAA Centre, University of Luton, 2002.
In the middle of the 1990s, the Swedish secondary school grading system was changed from a norm-referenced to a criterion-referenced system. Thus it became possible to compare (for the same population) the ability of a norm-referenced grading system and a criterion-referenced grading system to predict academic success.
The first paper looking at this Swedish data set is not written in English. (Cliffordson, C. (2004). De målrelaterade gymnasiebetygens prognosförmåga. [The Predictive Validity of Goal-Related Grades from Upper Secondary School]. Pedagogisk Forskning i Sverige, 9(2), 129-140.) However, in a later paper, the author describes those results as follows:
Cliffordson (2004b) showed in a study of 1st-year achievement in the Master of Science programs in Engineering that the predictive validity of CRIT-GPA was somewhat higher than it was for NORM-GPA.
In that later study, Cliffordson found (consistent with the earlier study)
a somewhat higher prognostic validity for CRIT-GPA than for NORM-GPA.
and that across a variety of disciplines,
Despite differences in both design and purpose, the predictive efficacy of criterion-referenced grading is at least as good, or indeed is somewhat better, than that of norm-referenced grades.
For details, see:
Cliffordson, Christina. "Differential prediction of study success across academic programs in the Swedish context: The validity of grades and tests as selection instruments for higher education." Educational Assessment 13.1 (2008): 56-75.
No comments:
Post a Comment