Who Questions the Test Questioners?
Michael Diedrich, Minnesota 2020, October 10, 2013 – For a public service, too much standardized testing operates behind a veil of secrecy.
We have a more polite term for it—“test security”—but what it amounts to is secrecy. When I was teaching, sealed MCA testing packets would arrive in boxes at our school. At the beginning of each test, I would have students unseal the packets, take the test, and re-seal the packets before collecting them into a box. Then they went back to the testing company (American Institutes for Research, a non-profit) for scoring and analysis. Even though my school was being held “accountable” to how students did on these tests, the only people who ever saw the tests in their final form were the testing company and the students themselves.
Couldn’t I have snuck a peek at the questions? Sure, if I wanted to violate testing procedures. Check out the 2012-13 procedures manual [PDF] for MCA testing. Those administering tests are specifically told not to, “View test items for any reason except as allowed in the administration of an assessment,” nor were we supposed to, “‘Look over the shoulder’ to read test items when monitoring students taking a test.” Also on the “naughty” list: “Offer an opinion to a student, class or other staff member that a question is ‘bad’ or doesn’t have a correct answer.” (Apparently singing the praises of a question is totally fine.)
In other words, too much testing operates in a black box, with limited opportunities for oversight. Even when some efforts at oversight exist, they often don’t prove sufficient.
As the Atlanta Journal-Constitution has found, several states have administered and continue to administer questions of dubious quality. These include, for example, questions without right answers and questions that tend to be answered incorrectly by otherwise high-scoring students while being answered correctly by otherwise low-scoring students. While Minnesota does operate Item Review panels to advise AIR on question quality, we have to take it on faith that they’re responding appropriately to our concerns and screening out questions with weird statistics.
We also have reason to be concerned about cultural and class bias on our tests. I’m not talking here of active racism, but rather of the passive bias that tends to come with privilege. Minnesota does operate Bias Review advisory panels (which you can register for here), but again we have to take it on trust that AIR is responding appropriately to those concerns.
Minnesota appears to be in better shape than many other states, though we’re still far from ideal. We’re also fortunate to have a progressive administration that at least brings some reasonable skepticism about the appropiate scope and purpose of testing. The demise of the GRAD tests for graduation in the last legislative session was a good start; I’m hoping to see further critical evaluations of our testing mindset and approach.
It’s important to remember, though, that no matter how lousy tests can be proven to be, and no matter how bad they are for kids, there will always be some with an incentive to argue for more testing. I’ll discuss two groups: testing-as-ends and testing-as-means.
The testing-as-ends interests are the ones who directly benefit from a large amount of testing. Primarily, I’m referring to the testing companies who collect big checks from the state for developing and scoring the tests, and who often collect plenty of not-so-small checks from districts for “aligned” curricula (i.e. test prep curricula). While some may show restraint, all of them have an incentive to advocate for more testing.
The testing-as-means crowd are the ones who think that a large number of tests, administered regularly and changed every few years to keep everyone off their game, are a great way to slam the school system. Never mind that 70% of variation in test scores is driven by out-of-school factors. What matters to these folks is that there always be a reason to slam the people working in the public schools.
The best way to counter these forces is by having a clear, limited definition of what we want testing to accomplish. There are valid reasons to support a baseline of standardized assessment. Disaggregation of test scores by student group helps us identify areas where we as a society (including but not limited to our school system) have equity gaps.
Used as a red flag instead of a trigger for punishments, a small amount of testing that identifies schools and districts in need of more support can be a useful tool for promoting equity. We need to be vigilant that this testing be used for building awareness and support. If it is linked to dire consequences, it will lead to a narrowing of curriculum, developmentally inappropriate instruction, and increased pressure to cheat.
We also need to increase public scrutiny of testing companies at a national level. I believe most of the people working for them do so for the right reasons, but we need to demand more sunlight on the final products and how they’re being used.