that states and districts can increase
responding to the new systems.
If We Do This, Let’s Do It Right
accuracy. One basic step would be to
Another important detail is the
Test-based teacher evaluations are
require that at least two or three years
accuracy of the large administrative
probably the most controversial issue
of data be accumulated for teachers
data sets used to calculate value-added
in U.S. education policy today. In the
before counting their value-added
scores. These data sets must be con-
public debate, both sides have focused
scores toward their evaluation (or,
tinually checked for errors (for example, almost exclusively on whether to include
alternatively, varying the weight of
in the correct linking of students with
value-added measures in new evaluation
value-added measures by sample size).
teachers), and teachers must have an
systems. Supporters of value-added
Larger samples make for more precise
opportunity to review their class rosters scoring say it should dominate evalua-
estimates and have also been shown
every year to ensure they are being eval- tions, whereas opponents say it has no
to mitigate some forms of systematic
error (Koedel & Betts, 2011). Value-
added estimates can also be adjusted
uated for the progress of students they
legitimate role at all. It is as much of a
mistake to use value-added estimates
carelessly as it is to refuse to consider
(“shrunken”) according to sample
them at all.
size, which can reduce the noise from
random error (Ballou, Sanders, &
Error is inevitable, no matter which
measures you use and how you use
Second, even when sample sizes are
larger, states and districts should directly
account for the aforementioned confidence intervals. One of the advantages of
value-added models is that, unlike with
observations, you can actually measure
are neither good nor
bad. It is how we use
them that matters.
them. But responsible policymakers will
do what they can to mitigate imprecision while preserving the information
the measures transmit. It is not surprising that many states and districts
have neglected some of these steps.
They were already facing budget cuts
some of the error in practice. Accounting
and strained capacity before having
for it does not, of course, ensure that the
to design and implement new teacher
estimates are valid—that the models are
Finally, each state should arrange for evaluations in a short time frame. This
measuring unbiased causal effects—but
a thorough, long-term, independent
was an extremely difficult task.
it at least means you will be interpreting research evaluation of new systems,
Luckily, in many places, there is still
the information you have in the best
starting right at the outset. There are
time. Let’s use that time wisely. EL
possible manner. The majority of states few prospects more disturbing than
and districts are ignoring this basic
the idea of making drastic, sweeping
changes in how teachers are evaluated
but never knowing how these changes
Continually monitor results
have worked out.
and evaluate the evaluations.
All these exercises should be accom-
This final recommendation may sound
panied by a clear path to making
like a platitude in the era of test-based
changes based on the results. It is dif-
accountability, but it is too important to ficult to assess the degree to which
omit. States and districts that implement states and districts are fulfilling this rec-
new systems must thoroughly analyze
ommendation. No doubt all of them are
the results every single year. They need performing some of these analyses and
to check whether value-added estimates would do more if they had the capacity.
(or evaluation scores in general) vary
systematically by student, school, or
teacher characteristics; how value-added
scores match up with the other components (see Jacob & Lefgren, 2008);
and how sensitive final ratings are to
changes in the weighting and scoring
of the components. States also need
to monitor how stakeholders, most
For another perspective on
the use of value-added data,
see the online-only article
“Value-Added: The Emperor with
No Clothes” by Stephen J. Caldas at
notably teachers and administrators, are
Baker, E., Barton, P., Darling-Hammond,
L., Haertel, E., Ladd, H., Linn, R., et al.
(2010). Problems with the use of student test
scores to evaluate teachers (Briefing paper
278). Washington, DC: Economic Policy
Ballou, D., Sanders, W., & Wright, P.
(2004). Controlling for student background in value-added assessment of
teachers. Journal of Educational and Behavioral Statistics, 29( 1), 37–65.
Chetty, R., Friedman, J., & Rockoff, J.
(2011). The long-term impacts of teachers:
Teacher value-added and student outcomes
in adulthood (NBER Working Paper
17699). Washington, DC: National
Bureau of Economic Research.
Corcoran, S. (2010). Can teachers be evaluated by their students’ test scores? Should
they be? The use of value-added measures of
teacher effectiveness in policy and practice.
New York: Annenberg Institute.
Goldhaber, D., & Hansen, M. (2008). Is it
just a bad class? Assessing the stability of
measured teacher performance (Working