Testing Out of Control
Michael Diedrich, Minnesota 2020, February 29, 2012 –
Late last week, in an absurd fit of data-mania, New York City made the shameful decision to release three years’ worth of test scores attached to individual teachers’ names. This is a gross violation of any conceivable standard of professionalism and a sign that the conservative war on teachers has reached new heights of repulsiveness. There is no acceptable justification for this disgraceful behavior.
New York is the second city to make this reprehensible decision. Los Angeles made a similar dump in 2010 and faced righteous indignation as a result. You would think New York would have learned from that example, but they apparently decided that the public outcry was worth hurting teachers with bogus scores.
There is no question about the bogusness of these scores. Indeed, their bogusitude knows no bounds. Want to know the average teacher’s variation for math scores from one year to the next? 35 percentage points. The average variation in English? 53 percentage points. These “data” are so worthless, other statistics hide in damp basements rather than be seen with them in public.
Of the 18,000 teachers whose names are attached to this travesty, only 521 were ranked in the bottom 5% for two or more years, and only 696 were in the top 5% with the same consistency. This is what you’d expect if test scores were allocated at random instead of being indicative of a teacher’s actual performance. (And looking at those average variations, it’s clear that these scores are being allocated at random.)
On a 100 point scale, the margin of error required for 95% confidence is 54 points. That means that a teacher with a score of 50 could actually be anywhere between -4 and 104 on a 100 point scale. To quote Douglas Harris, the University of Wisconsin economist who helped design the system, releasing this, “strikes me as at best unwise, at worst, absurd.”
Absurd is right. This is a meaningless collection of numbers with delusions of accuracy, yet it will be used to publicly humiliate hard-working teachers in difficult classrooms. Those of us who call this farce out for what it is will be pilloried for trying to hide failures or cover up for bad teachers when what we’re really doing is calling out a pile of bull pucky for the rancid mess it is.
The ones doing the pillorying have bought into a vision of education reform built around these unreliable tests. They come from all points on the political spectrum and both major parties, but what they’re really advancing is a slipshod conservative model of “accountability” built on measurements that can’t carry the weight ascribed to them.
There are real problems in our school system’s outcomes. The achievement gaps between the poor, the middle class, and the rich are morally wrong and bad for our social and economic stability. The route to fixing these gaps runs through a strong public school system with robust support, actual professional respect for its teachers, and the flexibility to offer a multitude of different educational approaches within a unified system.
What we have right now is a hunt for “bad teachers” advanced by people who hurl that label around using bad tests. When the mathematicians [PDF],economists, and education researchers [PDF] who understand these subjects all agree that we need a lot more work before “value-added measurements” could possibly be used in a fair evaluation system, maybe we should listen to them.
The most bitterly amusing part of this whole misadventure is New York’s “chief academic officer” Shael Polakow-Suransky’s disclaimer, “The purpose of these reports is not to look at any individual score in isolation ever…. [W]e would never invite anyone — parents, reporters, principals, teachers — to draw a conclusion based on this score alone.”
Then why did you release these scores this way? If the purpose was to make the “data” (such as it is) available for analysis, there was no need to attach names to scores. Now, however, you can bet that these scores will be looked at in isolation and people will draw conclusions based on these scores alone. To link individual names to terrible “data” and then say, “Don’t draw any conclusions about these individuals based on this ‘data,’” is clearly a laughable attempt to cover one’s butt.
This is what our testing obsession has wrought. It is out of control, and it must be stopped.