Time to Get Off the Testing Train

By Stan Karp

Illustrator: Boris Séméniako

In mid-March, the lid blew off a high-profile cheating scandal that saw dozens of overprivileged families scam their way into the nation’s most elite colleges with phony résumés, bribes, and falsified test scores.

A week later, New York City released standardized test results that single-handedly determine admission to the city’s most selective high schools. Although 70 percent of New York City students are Black and Latino, they made up just 10 percent of those admitted.

At the elite Stuyvesant High School, 895 students were accepted, only seven were Black.

Scandalous as these revelations were, they are far from unique. In the decade after the federal No Child Left Behind law was passed in 2001, nearly 200 school districts in more than a dozen states faced cheating scandals. The watchdog group FairTest identified 65 different ways schools cheated to boost test scores, from cooking the testing pool to changing answer sheets.

But while sensational stories like these produce headlines, it’s the routine functioning of standardized testing that erodes daily experience in our schools.

Tests have always been part of schooling. Just about everyone who has gone to school can remember some version of spelling tests, pop quizzes, take-home essays, and final exams. Assessments that provide feedback about student learning, including different kinds of tests, can be powerful tools for students, teachers, even whole schools and districts. Assessments can also provide important information for parents about what their children are learning.

But something fundamental has gone wrong with testing in schools. In recent decades, a sprawling, suffocating — and highly profitable — apparatus of standardized testing has replaced teacher-designed assessments with a “data-driven” mania that is the engine of test-and-punish reform. Data from deeply flawed multiple-choice tests, commercially designed and often scored by machines, is used by policymakers far removed from classrooms to make decisions about whether students get promoted or graduate, whether teachers keep their jobs, even whether whole schools are closed.

Like weeds in a garden, the spread of testing strangles curriculum, narrows the range of what is taught, and impoverishes school experience. Children who need music, art, play, and poetry are instead getting worksheets and test prep. Students who need to critically explore gender, climate, and race issues are being taught to dissect multiple-choice questions. Active learning that helps students find meaning and purpose in their education is being replaced by standardized, scripted curriculum, often tethered to computer screens and produced by the same companies providing the tests.

Test-driven schooling also undermines professional development. Instead of collaboratively developing engaging curriculum, building bridges to families and undoing racism, teachers are spending valuable time parsing student growth scores and building data walls.

To be sure, data can be a useful tool. It can inform discussions about learning progress and make visible patterns of discrimination and inequality. Some districts, seeking balance, strive to be “child-driven, data informed.”

But after the bipartisan No Child Left Behind (NCLB) law was passed, test scores became the most powerful force shaping federal and state education policy. Gaps in scores among student subgroups were used to label schools as failures without providing the resources or strategies needed to address the gaps. State and national policymakers used the inequalities reflected in the test results to create a narrative of failure that undermined support for public education and drove educational policy away from equity concerns and toward punishment and privatization. Poor communities of color were especially targeted by waves of school closings, the spread of privatized, unaccountable charter schools, and systematic disinvestment in public education.

It is not an accident that modern standardized testing has roots in theories of white supremacy and eugenics. During World War I, standardized “intelligence” tests were used to sort and segregate racial and immigrant groups. This led to expanded use of such tests in schools and colleges. The results have been disguising race and class privilege as “merit” ever since.

The limitations of using “quantitative metrics” like test scores to measure complex activities like learning were summed up by social scientist Donald Campbell in a famous maxim known as Campbell’s Law: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures, and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

The cheating scandals, drill-and-kill pedagogy, testing company blunders, and destructive policy choices of the past several decades provide ample evidence of the counterproductive impact of test-driven reform.

The development of a new generation of “common core” tests doubled down on these practices. Millions of public and private dollars that might have been used to reduce class size, fight racial segregation, or build better school facilities were instead poured into creating harder standardized tests. The results were so disastrous they sparked a national revolt. A test “opt out” movement mobilized millions of parents and students. This activism helped change both attitudes and policy. Researchers debunked and teachers rebelled against efforts to extend test-based assessment to teacher evaluation. Polls have repeatedly shown that parents think there is too much testing and too much emphasis on the results. A summary of the 2017 Phi Delta Kappan annual poll of attitudes on public education noted, “Parents say standardized tests don’t measure what’s important to them, and they put such tests at the bottom of a list of indicators of school quality.”

The limits of what standardized testing can tell us are also well documented. The National Assessment of Educational Progress (NAEP), often called “the nation’s report card,” is the only long-term measure of student test score outcomes. The relative consistency of the NAEP math and English language arts tests over several decades has produced comparative scores that show mostly predictable results. Overall, the math and ELA scores of public school students have risen modestly and gaps between racial and socioeconomic groups have persisted.

But it’s also worth noting the differences between the way NAEP is used and the way individual state and commercially produced exams have been misused. NAEP uses statistical sampling to identify representative student subgroups (e.g., gender, race and ethnicity, school location). NAEP tests are administered to different sets of 4th- and 8th-grade students every other year and at least once every four years to a sample of 12th graders. By contrast, NCLB required states to test every student every year in every grade from 3rd through 8th and once in high school. This led to dramatic increases in the amount and frequency of testing.

Moreover, mandated federal tests are often just the beginning. In many elementary and middle schools, students, sometimes as young as 4 and 5 years old, are regularly tested on a biweekly or monthly basis on computers — and then subjected to computerized drills “at their level” as part of state and federal mandated “interventions.” There are pre-tests, interim tests, post-tests, and practice tests. It’s the difference between giving a patient a blood test and draining the patient’s blood.

Federal law also prevents NAEP from identifying the results from specific schools, students, or educators. NAEP’s anonymous sampling provides comparative results about student performance and trends without labeling or ranking individual students or schools. By contrast, the NCLB-induced testing juggernaut aims to tattoo a score every year on every student’s forehead and every school’s data wall as a basis for making high-stakes decisions about school practice and policy.

Like all tests, NAEP has limitations and arbitrary features. For example, in the 1980s the “proficiency” levels for NAEP were set artificially high, well above “average performance” for particular grade levels. As a result, even in states with the highest NAEP scores (typically Massachusetts and New Jersey), the majority of students still usually fail to score at “proficient” levels. Instead they are categorized as “Basic” or “Below Basic” based on arbitrary and unrealistic scoring standards no real schools have ever met. As James Harvey, executive director of the National Superintendents Roundtable, wrote in a 2018 analysis of NAEP scoring levels, “The vast majority of students in the vast majority of nations would not clear the NAEP bar for proficiency in reading, mathematics, or science. And the same is true of the ‘career and college-readiness’ benchmarks in mathematics and English language arts that are used by the major Common Core-aligned assessments.”

Still, the many flaws of standardized testing do not mean we should ignore the real inequalities reflected in the results or the need for schools and educators to be accountable to the communities and students they serve. Transparent assessment practices have a central role to play in making sure schools work for all children. But democratic schooling for social justice needs more authentic types of assessment.

Assessments for democratic schooling should be designed to improve teaching and learning, not primarily to produce data for external monitoring. They should include real tasks for real audiences and be thoroughly integrated with the curriculum. To be authentic and useful, results should come from multiple sources and be used to inform collaborative discussion among educators, parents, and students. Assessment outcomes should also be used to generate new learning strategies and new curriculum goals, not labels and punishments. While they may include various types of tests, they should not be defined by any single format, particularly standardized multiple-choice tests. (For specific examples of better assessment models in real schools, see “Authentic Assessment for Learning” in the recently published third edition of The New Teacher Book.)

The assessments our schools and communities need will not be found on the testing train. Educators seeking a future of hope and justice for our children will need to get off, and in the words of the legendary organizer Myles Horton, “make the road by walking.”

Stan Karp (stan.karp@gmail.com) is a Rethinking Schools editor. He is a co-editor of the recently published third edition of The New Teacher Book, where a version of this piece originally appeared.

Included in:

Volume 33, No. 4

Summer 2019