Michelle Rhee, chancellor of the Washington D.C. public schools, made a splash last week when she fired 241 teachers in her efforts to overhaul a system where just eight percent of eighth-graders perform at grade level in math, but nearly all teachers are rated as excellent.
But what if Rhee fired the wrong teachers? That’s a scenario that seems very possible in districts using performance-rating systems, according to a new report by Mathematica researchers that the U.S. Department of Education released yesterday.
The warning was hidden inside a rather dry report that hasn’t received much attention, but the findings are quite startling: In a typical rating system aimed at identifying poorly performing teachers, one in four teachers whose performance is fine could be misidentified as bad. At the same time, teachers whose students underperform had a one in four chance of being mislabeled as average performers.
This could be a big deal if Rhee’s moves to rid the D.C. public schools of bad teachers are replicated around the country. The Obama administration is encouraging school districts to push for similar policies, and already school systems ranging from New York City to Dallas are experimenting with looking at “value-added” measures, based on test scores, to make decisions about teacher compensation, as well as which teachers get tenure and which are fired.
Although it’s unclear exactly how D.C.’s teacher rating system works, and the new report doesn’t address D.C. specifically, it does underscore how difficult it is to create reliable measures for judging teachers. (The study points out that previous research has found, rather depressingly, that 90 percent of a student’s performance isn’t within a teacher’s control.)
Part of the problem in getting an accurate read on teacher performance is the small sample sizes of students used to judge how teachers are doing – just a classroom’s-worth of students for elementary school teachers, the group examined in the study.
Designing better performance-rating systems isn’t impossible, however. In a system that uses three years of test score data, teacher misclassification rates are 26 percent. If a school district instead uses 10 years of data, misclassification rates drop to about 12 percent. Better, but not perfect, especially since the more rigorous data would have to exclude younger teachers vying for tenure.
Despite these issues, the researchers point out that value-added rating systems tend to be better at revealing which teachers are effective than, say, looking at credentials or classroom observations would. They suggest using a variety of measures that include — but don’t rely exclusively on — value-added measures, which could reduce the risk of firing good teachers based on bad data.