Curiouser and Curiouser: Alice in Testingland

By Linda Darling-Hanfmorid

Once upon a time in Wonderland, a prestigious national commission declared that the state of health care in that country was abominable. There were so many unhealthy people walking around that the commission declared the nation at risk and called for sweeping reforms. In response, a major hospital decided to institute performance measures of patient outcomes and to tie decisions on patient dismissals as well as doctors’ salaries to those measures. The most widely used instrument for assessing health in Wonderland was a simple tool that produced a single score with proved reliability. That instrument, called a thermometer, had the added advantage of being easy to administer and record. No one had to spend a great deal of time trying to decipher doctors’ illegible handwriting or soliciting their subjective opinions about patient health.

When the doctors discovered that their competence would be judged by how many of their patients had temperatures as measured by the thermometer as normal or below, some complained that it was not a comprehensive measure of health. Their complaints were dismissed as defensive and self-serving. The administrators, to insure that their efforts would not be subverted by recalcitrant doctors, then specified that subjective assessments of patient well-being would not be used in making decisions. Furthermore, any medicines or treatment tools not known to directly influence thermometer scores would no longer be purchased.

After a year of operating under this new system, more patients were dismissed from the hospital with temperatures at or below normal. Prescriptions of aspirin had skyrocketed, and the uses of other treatments had substantially declined. Many doctors had also left the hospital. Heart-disease and cancer specialists left in the greatest numbers, arguing obtusely that their obligation to patients required them to pay more attention to other things than to scores on the thermometer. Since thermometer scores were the only measure that could be used to ascertain patient health, there was no way to argue whether they were right or wrong.

Some years later, during the centennial Wonderland census, the census takers discovered that the population had declined dramatically and that the mortality rates had increased. As people in Wonderland were wont to do, they shook their heads and sighed, “Curiouser and curiouser.” And they appointed another commission. 

The misuse of such a criterion to measure performance has relevance, of course, in the current effort to improve the schools. One remarkable answer to the quest for better teachers and better teaching was recently devised along the above lines by Superintendent Linus Wright of the Dallas public schools. Beginning next spring, the Dallas merit-pay plan will award bonuses to teachers on the basis of students’ scores on standardized achievement tests. Eschewing other forms of teacher evaluation because of their expense and subjectivity. Dallas has made test scores the single measure of teacher competence.

The Dallas plan is only one of a rapidly proliferating group of reform proposals triggered by the recent series of commission reports deploring the quality of American education. Unfortunately, two important questions have been largely ignored in these reports: namely, “What is excellence?” and “How do we know when we’ve got it?”  This a curious situation, which in Lewis Carroll’s words is growing curiouser and curiouser as the reform movement gathers speed without pausing to define its goals. The Wonderland quality of this movement results from the fact that although numerous concepts of excellence have been advanced, only one measure of excellence is used to frame the debate. That measure- student scores on standardized, multiple-choice achievement tests – is used to establish that we don’t have excellence now, and it will be the means for knowing when we have excellence once again. There is only one problem with this measure. It is largely unrelated to most of the things that we say we want when we set out in pursuit of educa­tional excellence.

Educational excellence, according to the commission reports, involves the teaching of higher-order intellectual skills, such as the abilities to analyze, draw inferences, solve problems and create. It entails abilities to speak, write and reason intelligently. It includes proficiencies in advanced science, mathematics, foreign language, the humanities and the arts. In short, educational excellence is different from educational mediocrity because it emphasizes students’ ability to think well and perform challenging tasks rather ·than merely decode and compute.

Using standardized tests as the sole measure of educational excellence, howev­er, confuses the medium and the message. The measure is ill-suited to the goal. Standardized, multiple-choice achievement tests do not, of course, measure creativity. They assess one’s ability to find what someone else has already decided is the one best answer to a predetermined question. The tests do not measure the most important aspects of problem-solving ability – the ability to consider and evaluate alternatives, to speculate on the meaning of an idea based on first-hand knowledge of the world, to synthesize and interpret diverse kinds of information, to develop original solutions to problems.

Moreover, the tests do not really measure performance of any kind. Performance, of course, means the ability to do something; it is active and creative. Recognizing a correct answer out of a predetermined list of responses is fundamentally different from the act of reading, or writing, or speaking, or reasoning, or dancing, or anything else that human beings do in the real world. 

Being able to recognize misspelled words and identify synonyms does not necessarily mean that a person can write coherently or even grammatically. Being able to conjugate verbs or decode passages in a foreign language does not mean that a person can speak or write in that language. The converse of these statements is also true. One can speak a foreign language fluently without understanding what it means to conjugate a verb, or write well without knowing what synonyms are. It is even true, as the International Reading Association concluded a decade ago, that one can master the subskills tested on standardized tests of reading achievement without being a good reader, and vice versa That is, there is no clear, causal connection between an identifiable group of subskills and the actual act of reading.

Standardized tests do measure something. They measure the very particular recognition of some very particular skill applications pretty well. They can tell you if a test-taker can recognize correct punctuation or spelling, if he or she can find what the test-maker considers to be the topic sentence in a paragraph or the correct answer to an arithmetic problem or the closest synonym to a given word. They will not tell you the full range of a child’s achievement even in these areas, however. Because of the way the tests are constructed, they don’t include questions to which too many, too few, or the “wrong” subset of students know the answers. In the final analysis, standardized tests turn out to be a very narrow gauge of what students actually know, either individually or collectively.

Despite these limitations of standardized tests, we have adopted them as the single relevant performance measure for schools, students, and teachers. We use this measure because it is cheap, easy and convenient. It seems to be objective. It is a nice tidy variable for data collectors; decision ia. It is more simple than spending the time and energy to make complicated human judgements about what students are learning and teachers are teaching. We use this measure increasingly to make decisions about students, about educational adequacy, about how to design curriculum, and about how to manage schools. In Dallas, it will be the sole measure of teacher competence.

Unfortunately, when standardized tests are used as management-control devices, rather than as sometimes useful sources of information, a set of bureaucratic incentives is created that distorts the educational process as well as the curriculum. Rather than being a sample of what students know, test items soon become the universe of what is taught and learned. This is true not only of the topics that are tested but also of the types of thinking and the modes of performance required by the tests.

Researchers are discovering that, as more and more important decisions are based on test scores, teachers are more likely to teach to the tests, for the tests and the tests themselves. The more a school district designs its curriculum around standardized tests, the less teachers are encouraged or even allowed to spend time on nontested subjects (science and social studies are big losers here, along with the arts) or on nontested activities, such as writing, speaking, problem-solving or real reading of real books.

After recently completing a massive study of more than 1,000 American class­ rooms, John Goodlad confirmed that this was just what had happened in our schools. He found that students listen, respond briefly to questions, read short sections in textbooks and take multiple­ choice quizzes. They rarely plan or initiate anything, create their own products, read or write anything substantial or engage in analytic discussions. In Goodlad’s words, we have drowned out.the message that “there goals beyond what the tests measure” and that “pursuing these goals calls for alternative teaching strategies.” That many creative, innovative teachers are frustrated with this state of affairs seems to trouble test-using policy makers not at all.

The Dallas school administration has only extended the logic of American educational reform to its outer limit. Having forgotten the history of Wonderland, it seems doomed to repeat it.