evaluations that require a specific
100 percent of the variation in teachers’ with value-added measures is systematic.
and relatively high weight (usually
For example, there may be differences
35–50 percent). Some states do not
System designers must pay close
between students in different classes
specify a weight but employ a matrix
attention to how raw value-added scores that are not measurable, and these dif-
by which different combinations of
are converted into evaluation ratings
ferences may cause some teachers to
value-added scores, observations, and
and how those ratings are distributed
receive lower (or higher) scores for
other components generate final ratings; in relation to other components. This
reasons they cannot control (Rothstein,
in these systems, value-added scores
attention is particularly important given 2009).
still tend to be a driving component.
that value-added models, unlike many
In practice, systematic error is
Because there will be minimal variation other measures (such as observations),
arguably no less important than random
between districts, there will be little
are designed to produce a spread of
error—statistical noise due largely to
opportunity to test whether outcomes
results—some teachers at the top, some small samples. Even a perfect value-
differ for different designs.
at the bottom, and some in the middle. added model would generate estimates
with random error.
Think about the political polls cited
almost every day on television and in
newspapers. A poll might show a politician’s approval rating at 60 percent, but
there is usually a margin of error accompanying that estimate. In this case, let’s
say it is plus or minus four percentage
points. Given this margin of error, we
can be confident that the “true” rating is
somewhere between 56 and 64 percent
(though more likely closer to 60 than to
56 or 64). This range is called a confi-
A more logical approach would be
to set a lower minimum weight—say,
10–20 percent—and let districts exper-
This imposed variability will increase
the impact of value-added scores if
other components do not produce
In polls, this confidence interval
is usually relatively narrow because
polling companies use very large
iment with going higher. Such variation much of a spread.
samples, which reduces the chance that
could be useful in assessing whether
Some states and districts that have
anomalies will influence the results.
and why different configurations lead to already determined scoring formulas do Classes, on the other hand, tend to be
divergent results, and this information
not seem to be paying much attention
small—a few dozen students at most.
could then be used to make informed
to this issue. They are instead relying on Thus, value-added estimates—especially
decisions about increasing or decreasing the easy way out. For example, they are those based on one year of data, small
weights in the future.
converting scores to simplistic, seem-
classes, or both—are often subject to
ingly arbitrary four- or five-category
huge margins of error; 20 to 40 per-
Pay attention to all components
sorting schemes (perhaps based on
centage points is not unusual (see, for
of the evaluation.
percentile ranks) with little flexibility or example, Corcoran, 2010).
No matter what the weight of value-
guidance on how districts might cali-
If you were told that a politician’s
added measures may be on paper, their brate the scoring to suit the other com-
approval rating was 60 percent, plus or
actual importance will depend in no
small part on the other components
ponents they choose.
minus 30 percentage points, you would
laugh off the statistic. You would know
chosen and how they are scored. Con-
Don’t ignore error—address it.
that it is foolish to draw any strong con-
sider an extreme hypothetical example: Although the existence of error in value- clusions from a rating so imprecise. Yet
If an evaluation is composed of value-
added data is discussed continually,
this is exactly what states and districts
added data and observations, with each there is almost never any discussion, let are doing with value-added estimates. It
counting for 50 percent, and a time-
alone action, about whether and how to is at least defensible to argue that these
strapped principal gives all teachers
address it. There are different types of
estimates, used in this manner, have no
the same observation score, then
error, although they are often conflated. business driving high-stakes decisions.
value-added measures will determine
Some of the imprecision associated
There are relatively simple ways