Why the mean and median statistics that are commonly published are misleading and why reporting distributions provides more accurate and useful information
Ronald G Corwin, Professor Emeritus
Department of Sociology, Ohio State University
How often do we see studies of this and that which only report the mean (or median) outcomes? For example, newspapers reporting home sales figures usually tell us only about changes in the mean prices and average volume of home sales. But what any thinking reader wants to know is whether prices in various price ranges have changed, not merely the typical price. Because, a surge in sales in low-end homes could be pulling prices down, while high end homes remain stable or increase in value. In other words, we want to see the distribution of prices from one time to another.
Most social scientists, including those who study education, have typically riveted their attention to average (mean or median) differences between whatever it is they are studying: for example, differences among large pools of schools. However, summary measures of central tendency mask the enormous variance among schools. Most studies combine average test scores across all schools in a sample without breaking out important differences among various types of schools and without taking into account the range of schools included and how they differ from the average. A typical study of charter schools, for example, will lump together all schools in the sample—across all districts and states represented—and then pretend the calculated average represents the population of charter schools. It doesn’t. It only obscures differences within schools, within districts, and within each program or comparison group. A rare analysis based on distributions was reported by WestEd divided scores at the 50th percentile. Schools were then classified based on the percentage of students below that mark. The analysis examined the percentage of students within each school who were originally on the low side and crossed over to the high side, as well as those who changed in the reverse direction. This approach makes more sense than comparing school means.
If it is the school that is producing the outcomes, then the focus should be on all of the outcomes, not only the typical performances. Suppose a school introduces a course, or more specialized teachers, for the purpose of helping all students. Then suppose the mean test scores do not change. That would occur if everyone is doing about the same as before. But it could also occur if improvement by students at the bottom of the distribution was being cancelled out by a comparable decline among students at the top—or the reverse. So, the main question is not, “how are typical students doing?” It is, “who benefits and who does not?” The most difficult policy choices often involve making trade-offs between winners and losers, between “elitist” and “leveling” strategies. It is possible for a treatment to affect any one or all parts of the distribution of students. Only analyses of the distribution can sort that out. To evaluate the impact of a program like Head Start, for example, we need to know whether children with the lowest test scores advance as much as those average or high scores.
Statistical means do not reveal the fact that many public schools are as good as private schools, and that students who attend the best public schools outperform most private school students. Reporting averages that compare schools, programs, or states doesn’t help parents who are trying to find a suitable school for their children. Everyone might want to believe that an increase in a school’s mean scores implies that all students in that school are benefiting. But, the reality is that a school can increase its average score even if it is only the students in the upper part of the distribution who improve; this pattern would increase the gap between the top and bottom while still reflecting well on the school. Rosenberg observed that sophomores in the High School and Beyond Survey answered 6.6 more questions correctly by their senior year, which is a small gain. However, the lowest quartile of students answered, on average, 4.66 fewer questions correctly. By his calculations, the top quartile answered 18.13 more questions correctly, which had the effect of widening the achievement-gap between the bottom and top groups by 22.79 questions, or 6.33 years.
Now extend this principle to a range of charter schools. Suppose that charter schools within a state have higher mean scores than regular schools. Shall we infer that all charter schools are better? Of course not. Some are better, and some are worse. The question is how many are better. It is always possible that a few schools are pulling up the average, masking the poor performance of most other schools. Given the enormous differences among choice schools, averages and other measures of central tendency are not only meaningless, but also deceptive. What we need to know is the percentage of schools in the top, middle, and bottom of the distribution of charter schools, and how those percentages compare with the distribution of public schools. Knowing the percentages of schools with rising and falling test scores would help parents assess the risks and benefits of sending their child to a choice school.
Driven charter school advocates like to pound their chests over inflated averages, while the other side natters on about trifling differences in mean scores. The fact that some schools may be okay, even exceptional, is little comfort for parents who must make decisions about whether to risk sending their children to a particular school. Given the wide differences among schools, the risk is high. For parents, average scores amount to a crap shoot. However, rather than sorting it out, researchers remain obsessed with meaningless averages, and then compound the ignorance by aggregating data collected from individuals in classrooms to the lofty levels of districts, states, and programs. Standardized test scores were not constructed for the purpose of comparing types of schools and other macro units. It is that simple!
See also:
Ronald G. Corwin, and Krishnan Namboodiri, “Have Test Scores and Individuals Been Overemphasized in the Research on Schools?” in R. G. Corwin, ed., Research in the Sociology of Education and Socialization 8 (Greenwich, CT: JAI Press, 1989): 141‑176.
Karl L. Alexander, and Larry J. Griffin, “School District Effects on Academic Achievement: A Reconsideration.” American Sociological Review 41 (1976):144-52.