I've been teaching in Colorado for six years, and there's always been a troubling pattern in our state standardized math scores. As students progress from 3rd to 10th grade, the percentage that score proficient and advanced declines dramatically. Here are the percentages of students scoring proficient and advanced by grade level, averaged over all the years the test has been given (typically 2002-2008):
|Grade||Avg. % P+A|
The easiest explanation (and the one I've tended to believe) is that students' abilities are, in fact, slipping as they got older. That would be a good assumption if the test at each grade level was equally difficult. But what if the test questions were, on average (and adjusted for grade level), more difficult as students got older? Is it fair to assume a test with increasingly difficult questions would result in lower scores, even with sophisticated score scaling systems that take question difficulty into account?
Fortunately, the state releases "item maps" that describe the difficulty of each item on every test. Using 4 points for an advanced item, 3 points for a proficient item, 2 points for a partially proficient item, and 1 point for an unsatisfactory item, we can come up with an average difficulty for the CSAP at each grade level. Let's add that column to our table:
|Grade||Avg. Difficulty||Avg. % P+A|
This begs for regression analysis. How strong is the correlation between the difficulty of the questions and the scores?
The correlation is surprisingly strong, and the coefficient of determination (R squared) is 0.88, meaning that the average item difficulty is statistically responsible for 88% of the variance in the test scores. 88%? That's big. Statistics rarely tell the whole story, but 88% raises serious doubts that it's just a matter of slipping math students. Why wouldn't the state want to maintain a steady average difficulty year-to-year? Wouldn't that make year-to-year performance comparisons more reliable?