The True Costs of Low-Quality Assessments

Individual states made difficult decisions this spring over whether and how to administer state summative tests. In many cases, students had just returned to in-person instruction (or were soon planning to), with only weeks left in the school year. This spring more than ever, educators, parents, and students felt the acute tension between time spent testing and time spent reconnecting with teachers and friends in order to reset learning goals and routines.

As a leader of an assessment organization and a parent, I experienced these same conflicts. I am eager to know if my daughter has mastered her fifth-grade curriculum this year, a year in which she adapted to a hybrid learning model, attending alternating weeks in school and at home. And I also wanted her to have every possible minute of instructional time with her in-school teacher and cohort of friends. Should she spend five-to-six hours of that limited time testing?

One thing this year has made clear to me is that the time and money we invest in assessments has to deliver at least as much value for student learning as the time and money we invest in teaching and instruction. These can’t be tradeoffs. They have to be aligned and complementary investments that inform one another in a virtuous cycle to increase student learning. And for me, that means every assessment students take, whether local or at the state level, has to be of highest quality.

The Case for Quality

In the assessment community, we talk often about “quality” when it comes to summative and interim testing. But what exactly does that mean? What makes a test high-quality and why does it matter?

The truth is that the difference between high-quality and low-quality assessment is both dramatic and consequential, and educators need to understand the systemic impact of quality—especially in today’s environment. To do so, it helps to draw some contrasts.

Low-quality assessments measure superficial skills and provide little insight into students’ progress toward mastery. Low-quality favors “drill and kill” over critical thinking and rote memorization over deeper thinking. In other words, low-quality assessments reinforce low-quality learning experiences—and the impact can be profound. Testing impacts the classroom, both directly, when assessments inform curriculum and instructional decisions, and indirectly, when educators teach to a test. Low-quality tests represent the complexities of learning as a single score and trap students in unengaging, one-dimensional experiences that do little to promote an interest in learning.

High-quality assessments are the very opposite, focusing on the skills that matter most: critical thinking, problem-solving, and effective communication. They engage students with compelling content that promotes deeper thinking; increase equity by providing appropriate accommodations and ensuring test questions are free of bias; and give educators the information they need to make solid, data-driven decisions. High-quality assessments promote and reinforce high-quality learning practices in the classroom and provide insights into students’ increasing progress toward greater mastery of grade-level knowledge and skills. They provide the building blocks for more individualized instruction and support.

To help states better evaluate the quality of their assessment programs, a panel of education and assessment researchers published the Criteria for High-Quality Assessments in 2013 as a guide that has since informed the criteria for peer review. Those principles still hold today. Assessments should:

Be aligned to empirically validated benchmarks for college and career readiness.
Focus on increasing complexity of literary and informational texts.
Require students to read closely and use evidence from texts.
Focus on higher-order, analytical thinking and application of knowledge and skills.
Assess research and inquiry.
Emphasize writing and communication.
Focus deeply on core math concepts and connect them to practices, application, and procedures.
Emphasize modeling, design, and problem solving using quantitative, mathematical tools.

The point of publishing these criteria at the time was that low-quality, simple multiple-choice tests simply cannot live up to these standards for quality, and that states and testing vendors needed to get creative to develop next-generation assessments that could. Consortia of states indeed collaborated to develop new test designs to reflect the depth and rigor of the new standards. Thousands of teachers from states across the country created these new tests that met the standards’ expectations for deeper engagement with informational content, application of knowledge to solve problems, and the ability of students to express themselves effectively in writing. Students were expected to read and analyze multiple sources and develop a coherent point of view on a topic. No state test had every done anything like that before.

Commitment to Quality

So, in our rush to reimagine the next next-generation assessments, with uncertainty about future waivers and federal dollars flowing in abundance, let us keep these criteria for quality in mind.

We have the chance to boldly re-imagine assessment and create systems that increase equity, promote deeper student engagement, and embrace a more holistic view of learning.

We have the chance to boldly re-imagine assessment and create systems that increase equity, promote deeper student engagement, and embrace a more holistic view of learning. With investment in R&D, we can embed more authentic assessments into instruction to increase their formative value, align them to local curriculum, make them more culturally affirming and sustaining, and overall ensure that we are delivering as much value for time spent on assessment as time spent on instruction.

But we cannot do any of that well if we don’t maintain a commitment to measuring what matters—the critical thinking and problem solving skills that all students need to thrive in our increasingly complex world. In short, a commitment to quality.