The New Model of Teacher Evaluation
How Would Ms. Frizzle Fare?
Illustrator: Roxanna Bikadoroff
The two of us were reflecting recently on the portrayal of teachers on-screen these days. There’s the snidely animated “dance of the lemons” and Michelle Rhee’s teacher bashing in Waiting for “Superman.” Now comes Cameron Diaz in Bad Teacher, portraying an impossibly horrifying educator. What happened to the teacher as guide? Or the teacher as inspiration? What happened to Ms. Frizzle?
We remember watching episodes of The Magic School Bus with our children, hoping that our toddlers would someday have teachers as dynamic, quirky, creative, and flamboyant as Ms. Frizzle. But it seems like today’s teachers are getting all the Ms. Frizzle drilled out of them, both on-screen and off.
Which got us thinking about teacher evaluations and how, like everything else, what you get depends on what you measure.
We both live in Washington, D.C. The 2010–11 school year marked the second under the District of Columbia’s new evaluation system, called IMPACT. In July, the school district announced that more than 200 teachers had been fired for flunking IMPACT.
IMPACT was launched in the fall of 2009 by former D.C. Chancellor Michelle Rhee, and was immediately lauded as a model for the rest of the nation. Much of the media focus on IMPACT has been about its use of test scores—so-called Value Added Measures—to judge teacher effectiveness. But the majority of teachers in D.C. are not subject to the value added components of IMPACT. They teach in grade levels or subject areas that are not tested (yet). For these teachers, 50 percent of their evaluation is dependent on two unannounced 30-minute observations conducted by “master educators” known as “MEs.” Three additional observations are conducted by the school’s principal.
What are these evaluators looking for? What gets measured? IMPACT established a “Teaching and Learning Framework”—essentially a checklist of nine teaching practice areas that each teacher is expected to demonstrate during the course of their 30-minute surprise evaluation. Within each practice area, there are a set of specific skills that must be demonstrated to qualify for an “effective” grade, and additional skills that must be present for the teacher to be considered “highly effective.” In all, to receive a perfect score on their observation, teachers must demonstrate more than 60 strategies and skills over the course of 30 minutes.
Partly because of work by Michelle Rhee and her new organization, StudentsFirst, nine states (Ohio, Michigan, Illinois, Florida, Georgia, Indiana, Minnesota, Nevada, and New Jersey) are either adopting or considering new evaluation systems this year that are based on D.C.’s IMPACT. We predict that these new systems will drastically change teaching practice across the country. The question is: Is that change for the better?
Marni is an instructional coach in a D.C. elementary school. Her role used to be helping teachers become better educators. Under IMPACT, her job is now defined as helping teachers pass their IMPACT observations. We thought about the effect of that change on teachers. And we thought of Ms. Frizzle.
Rating Ms. Frizzle
Could Ms. Frizzle teach in D.C.? How would she fare on IMPACT? We decided to find out, by conducting two formal observations using IMPACT’s nine-point rubric. Assessing teachers’ preparedness for their IMPACT observations is Marni’s job. She relished the chance to be an ME for the day.
We popped in on “the Frizz” at Walkerville Elementary School for our first observation. We found her herding her 3rd-grade students onto the Magic School Bus for a trip into the solar system. As her students traveled from Mercury to Jupiter to Saturn to Neptune, Ms. Frizzle allowed them to see, feel, and learn. They determined the gas, oxygen, hydrogen, and water levels of each planet they visited. They collected rocks and analyzed their composition. They worked collaboratively, sharing their knowledge with each other. The students themselves gently prodded one disengaged peer to rejoin the learning experience. Ms. Frizzle helped guide the students—at one point by becoming “lost” herself, and forcing her students to figure out which planet she was on based on scientific clues. They found her.
It was quite a lesson. But IMPACT’s rubric gave no credit to Ms. Frizzle for the experiential and self-guided nature of this exploration of the solar system. She failed to announce an objective for the lesson at the beginning. She did not provide “scaffolded” prompts, or link their learning that day to previous lessons. Although she had allowed her students to experience the solar system through a variety of senses and learning styles, she missed several requirements on the IMPACT checklist.
Under IMPACT, a teacher must be evaluated based on the strict rubric. The Frizz scored only a 2.2 during our first observation. She was “minimally effective.” No matter that her students had had the experience of a lifetime, and demonstrated extensive knowledge of the subject matter at hand. Under IMPACT a teacher could literally take her students to the moon and still be minimally effective. We decided to give her another chance.
The next time we randomly popped in on Ms. Frizzle, she had planned an extraordinary lesson on asteroids. For this, her students were required to intercept and redirect an asteroid that was hurtling toward Earth, threatening a direct impact on Walkerville Elementary School! The students launched into space, where they encountered several extraterrestrial objects (a comet, space junk). How would they know whether each was the ominous asteroid? The kids realized they needed to analyze the object’s composition, trajectory, and speed. When they finally found the asteroid, they figured out that it was made of iron and therefore could be thrown off its course by a magnet. Mission accomplished!
Ms. Frizzle had prepared well for the lesson, having all of the appropriate equipment available on the bus for the students’ discovery process and eventual success. She did better on this evaluation. But she still fell short of “highly effective.” For example, the Frizz did not ask the students any questions. Rather, she provided them with opportunities to determine the relevant questions and then answer them themselves. This sinks her on IMPACT.
The overall average of our dear teacher’s two scores was 2.6—barely into the “effective” range. If we were to conduct three more IMPACT evaluations for a total of five (the number of times DCPS teachers are formally observed each year), the outcome for Ms. Frizzle could be dicey. If she were to drop to even a 2.59, she would be considered minimally effective, and subject to dismissal like so many teachers were in D.C. last summer.
Something’s Wrong Here
A teacher who is able to create a learning environment that is student-led and teacher-facilitated is considered a master of their craft by the education community. But not by D.C.’s IMPACT rubric.
Of course, Ms. Frizzle is fictional, and her extraordinary field trips aren’t really possible in today’s under-resourced classrooms. (No funds for magic school buses in most districts!) But our little exercise of conducting formal IMPACT observations of Ms. Frizzle helped identify a troubling aspect of DCPS’ teacher evaluation system. It’s not that the Teaching and Learning Framework is a bad thing. Particularly for new teachers, having a framework on good practices (stating objectives, checking with students for comprehension throughout the lesson, etc.) is critical. In a strong professional growth system, teachers would not only be given such a framework, but would also be given carefully constructed supports and extensive professional development in the areas where they seemed to be struggling (IMPACT provides only rudimentary feedback from MEs).
But for creative and dynamic teachers like Ms. Frizzle, the IMPACT rubric is a death knell. Teachers in D.C. now, according to several we have talked to, are changing their practice to conform to IMPACT’s checklist. Their salaries and their jobs depend on it. Some are tossing out their most creative lesson plans, knowing that if an ME walked in on such a lesson, their job could be put at risk. A history teacher at one of our children’s schools stopped organizing a mock trial of accused witches in Salem, Mass., after an ME popped in to observe her and found 8th-grade students debating colonial justice and burdens of proof. The lesson didn’t fit the rubric. A colleague of Marni’s got poor IMPACT scores when an ME arrived during a 3rd-grade lesson in editing and constructing essays. This teacher’s student essays had been recognized districtwide for their excellence. But the lesson didn’t correlate with the framework. We’re forcing some of our best teachers to be less creative, to dumb down their practice—or even to leave the classroom altogether. And yes, some of the city’s most dynamic and popular teachers have been fired because their lessons didn’t adhere to the IMPACT rubric.
Evaluation systems should be part of the process of building great, creative, and effective teachers. They shouldn’t be designed with the inflexibility of a mousetrap: “Snap! Gotcha!” Students, parents, and teachers in the D.C. public schools are struggling with the impact of IMPACT. Now, additional states are using it as a model for the redesign of teacher evaluation systems, and there are efforts to tie federal education dollars to the establishment of such systems in every state.
We hope that our children will have teachers with the breadth of skills identified on the IMPACT checklist. But we also hope that our kids will be in classrooms with the many Ms. Frizzles of Washington, D.C.—teachers who don’t just talk about the planets, but take their students to them. Without recognition that sometimes great teaching doesn’t conform to a checklist, we worry that Ms. Frizzle, and teachers like her, may be getting thrown under the bus.