Statewide Assessments Don’t Just Help Students

Each fall, as we approach the end of the season for statewide score reporting, I find it helpful to reflect on trends in how students performed across the country and lessons learned from another year of testing. This year’s results may be an early indicator that historical progress in mathematics is at risk.

Roughly 27 million students took a statewide summative assessment last spring, spending about four hours—or less than one percent of their academic year—demonstrating their mastery of grade-level learning standards and their readiness to take on the challenges of this new school year.

Statewide assessments serve an important purpose beyond helping individual students and their families gauge how well they are meeting grade-level expectations. We use state assessments not to evaluate the students but to evaluate ourselves. Are we as educators meeting the sacred charge that we have been given to prepare students for the future—to build an equitable society and face the global challenges that lie before us? Certainly, one test score cannot possibly measure such an important commitment—but statewide assessments are the best measure we have to ensure every school is doing its utmost to serve our students, and our future.

So how did we do this year? Twenty-seven million students means 27 million stories. Some students found inspiration from a great teacher, had the opportunity to engage with rich curriculum and thrived—and their test scores reflected that. Teachers like Michael Dunlea, a third-grade teacher in Tabernacle, New Jersey, who helped write both the state standards and the district’s curriculum in math, make math accessible. Mr. Dunlea won a Presidential Award for Excellence in Mathematics and Science Teaching this month. “As an early elementary teacher, I try to foster a love for mathematics by making the learning interactive, relevant, and fun,” he said.

Other students did not have access to an inspiring teacher like Mr. Dunlea or access to advanced coursework because teacher shortages in their state made it impossible for principals to hire the effective teachers these students need and deserve. A thousand different variables impact the learning opportunities students had last year—and that’s just within the school setting.

Overall, however, students generally improved last year in their English language arts proficiency and held steady or declined in math. These trends appear to hold generally for states working both with New Meridian and with Smarter Balanced—the only states that still are able to compare how well they are preparing students to meet international benchmarks for college and career readiness beyond locally developed expectations.

The continued improvement in English language arts is encouraging: Students are learning to critically read meaningful, important texts, develop a point of view, and make compelling arguments based on evidence and facts—all critical skills we need students to develop as they prepare to address the challenges of the future.

We have to ask ourselves, however, why are more students not mastering the language of mathematics and statistics to model and solve real-world problems in ways we know will be critical for addressing the challenges of an increasingly quantitative world? From coding and artificial intelligence, to medicine, business, economics, climate science and public policy—all of these fields increasingly require a mastery of mathematics and mathematical thinking to engage, define, and solve complex problems quantitatively.

As Michael Kirst, the former State Board of Education president, reviewing this year’s Smarter Balanced results in California acknowledged, “We’re doing better in English language arts than I predicted and worse in math. The problem is serious.”

The lack of progress in mathematics suggests at a minimum that we have likely harvested the “low hanging fruit” on the pathway to preparing students for college and career readiness, and that we have to deepen student engagement in more meaningful and challenging tasks if we are going develop students’ critical thinking, reasoning, and problem-solving skills to meet the more rigorous expectations of college and career ready standards.

This is especially true in mathematics. Experts cite the fact that development in mathematics builds sequentially across the grades, and gaps in foundational understanding taught in elementary grades have a large impact on students’ readiness to grapple with more advanced concepts and skills in middle school and beyond. Declining proficiency rates from grade 3 to Algebra I reflect this snowballing effect as expectations rise and students fall behind.

The importance of high-quality classroom tasks was highlighted in a recent study published by TNTP, The Opportunity Myth. Researchers observed and analyzed nearly 1,000 lessons over two years and found the majority of classroom assignments are not aligned to grade-level expectations. “Students spent more than 500 hours per school year on assignments that weren’t appropriate for their grade and with instruction that didn’t ask enough of them—the equivalent of six months of wasted class time in each core subject,” the study said. Students in lower income neighborhoods were twice as likely as those from higher income neighborhoods to spend their days working diligently on low-level tasks that neither challenge or engage them to meet grade-level standards.

To help address this epidemic of mis-aligned, low-level curriculum, New Meridian is making available exemplar classroom tasks aligned to grade-level standards and assessments to help teachers and students calibrate their expectations of what is required to master grade-level standards. This is only one piece of the puzzle we face in deepening instruction to improve student learning outcomes, but an important one. Together with trained teachers who know how to structure meaningful and engaging learning opportunities, quality tasks are at the heart of the instructional core that has been shown to accelerate and deepen student learning.


Examining how states are rethinking education, assessments and accountability to improve student outcomes

New Tool Helps Educators Select Quality Interim Assessments

Education experts from all over the country convened in New Hampshire in late September to discuss how states and school districts can make better use of interim assessments.

While most states conduct an annual summative assessment, and many school districts use classroom assessments to help inform instruction, interim assessments may help provide a bridge between these two practices. When state and local education agencies collaborate on interim assessments, the result can be better, more comprehensive feedback to educators.

In many states, students take summative tests at the end of the year and the results are often not available until the start of the next year, too late to make meaningful adjustments. “The cycle by which state assessments can inform instruction and school decisions is slow and limited,” according to the State of Assessment report produced this summer by Bellwether Education Partners.

Interim assessments may help change that—if they are designed as part of a portfolio of measures purposefully designed to give states and districts the information they need to improve teaching and learning.

The workshop, part of the 2019 Reidy Interactive Lecture Series hosted by the National Center for the Improvement of Educational Assessment, more commonly called the Center for Assessment, focused on helping states and districts make better decisions on the role of interim assessments. Speakers included assessment experts aand developers; officials from state education departments and school districts; and educators with experience selecting and implementing interim assessments.

The participants worked to refine a tool created by the Center for Assessment to guide states and districts through a process to evaluate interim assessments and identify those that best meet their needs.

“Interim assessment must be evaluated with a clear understanding of the purposes for including the assessment and how the results will be used,” said Scott Marion, executive director of the Center. “This year’s conference focused on interim assessment to help state and district officials improve the way they select and use interim assessments to improve the education they provide for their students.”

Kristopher John, Vice President of Product Strategy at New Meridian, who was at the conference, said interim assessment is unlikely to replace annual state testing in the near future but that it will play an increasingly vital role in state assessment regimes.

“Interim assessment is an important tool and this conference highlighted exactly the type of work we need to do to develop the right interim measures,” he said.

The tool created by the Center for Assessment contains a three-phase process designed to guide those who are evaluating interim assessments toward solid decisions. The first phase identifies assessment gaps and needs. It asks states and districts to specify a vision for teaching and learning; evaluate how current tools align with that vision; and identify gaps in the information needed for students and educators to improve each year.

The second phase is designed to identify characteristics of an interim assessment that will fill in the gaps. To do so, it poses a series of questions about what information must be supplied for the interim assessment to play its role; what design features are necessary, such as test length and item format; what type of score comparisons are required; how and when the test should be administered; what information should be reported; and other important factors.

The third phase addresses evaluation and it is forthcoming. The work done at the conference helped refine the tool, which will eventually be distributed widely.

Tracy Gardner, a psychometrician and Chief Assessment Officer at New Meridian, who also attended the conference, said it focused on issues that are vital to the future of assessment.

“No one test can meet all needs,” she said. “But a comprehensive portfolio of assessments that contain multiple measures can create a highly accurate profile of student learning. It can identify strengths and weaknesses and point out areas that require more focus. Interim assessment will increasingly be part of that portfolio. That’s why this work is so important.”


‘New Testing Landscape’ Shows Innovation and Experimentation Amid Change in State Assessments

A report called The New Testing Landscape sketches a portrait of states undergoing massive change in their assessment programs as they seek to reconcile educators’ needs to understand students’ mastery of state education standards, federal reporting requirements, and political resistance from families and communities that are increasingly skeptical of mandated testing.

“Political opposition and the opt-out movement it spawned have turned testing into a third-rail for policymakers,” wrote Thomas Toch, director of FutureEd, the education policy think tank at Georgetown University that produced the report in September. “And while annual testing survived under the federal Every Student Succeeds Act (ESSA), the national consensus on testing’s importance in school reform has largely dissolved.”

While political support for summative assessments may be waning, the crucible of competing perspectives on assessment has produced a wave of innovation and experimentation as education leaders look for solutions, the report notes. “There’s still more talk than action,” but states across the country are trying new strategies, from distributing assessment services across multiple vendors to increasing formative and interim testing.

Separating Test Services

For example, many states are licensing content, rather than incurring the expense of creating their own test items, and then contracting with separate vendors to handle test administration, scoring and reporting.

“We’ve been seeing an increasing interest among states in licensing this high-quality content, but incorporating it into their own custom design,” said Arthur VanderVeen, the CEO of New Meridian. “It’s a strong trend and it makes a lot of sense. As a country, we spend ridiculous amounts of money every year developing new custom content for each individual state, basically covering the same learning standards.”

Illinois, for example, released an RFP to create a computer-adaptive assessment, using New Meridian content but contracting with separate vendors for administration, scoring and reporting. Texas issued an RFP that separated its testing program into separate components, hoping to attract innovative approaches to test development, administration and other necessities.

“The decoupling of test design and administration is going to open up greater flexibility for states to continue to competitively bid the large bulk of their testing program—administration, scoring, and reporting—while maintaining continuity in their scale and test content,” VanderVeen said.

‘Budgets Have Held Steady’

The New Testing Landscape reported that the quality and rigor of state testing has increased in the last decade, with a continued focus on college and career readiness.

“States haven’t gone back to the days of low-cut scores when almost all kids were proficient,” says Scott Marion, president of the National Center for the Improvement of Educational Assessment, known more commonly as the Center for Assessment. “That’s been, perhaps, a lasting positive effect that they’ve held the line pretty well.”

Though federal money for assessment innovation is limited, and philanthropic funding for such efforts has decreased in the last decade, state budgets for assessment have also remained stable. “While the picture for state assessment budgets varies state by state, officials in most states said those budgets have held steady,” said the report, which was authored by FutureEd Senior Fellow Lynn Olson.

One area of concern the report did flag was “churn,” the turnover among testing vendors as states make changes to their systems. “Turnover comes with a price,” the report said. “The disruption makes it hard to track trends in student learning, maintain stability in accountability systems, and build parent and educator trust and support.”

Advances in Formative Assessment

Several states are also attempting to advance formative and interim tests that take place as students learn. “We’re seeing much more interest in richer forms of assessment closer to the classroom,” Marion said.

As most local superintendents are sold on the value of interim assessments to measure growth and inform instruction, they would like to see those same assessments be used for state and federal accountability purposes and are pressuring states to move toward interim-as-summative assessment designs. The technical challenges of using interim assessments for summative reporting purposes are significant, however. “There needs to be R&D funding to work with states to design new innovative assessment models that can achieve these goals of bringing testing back more closely to teaching and learning while maintaining rigor and quality and consistency,” VanderVeen said.

Overall, while some question the future of state summative tests, arguing they could be rendered obsolete by technology or other innovations, Toch says there will always be a role for standardized testing.

“Standardized testing’s importance—as a backstop to ambitious standards, as a window into school performance, as a driver of improvement, and as a linchpin of educational equity—remains undiminished,” he wrote. “The education sector abandons its commitment to effective assessment at its peril, even as it must address the flaws of standardized testing and test-based accountability.”

Item Spotlight

A look at the unique characteristics that make New Meridian assessment items the best in the industry

Several times a year, The Prime gives readers an in-depth look at the thought process that goes into the production of New Meridian test questions and Literary Analysis Tasks. The goal is to better understand how test items align with academic standards to produce high-quality assessments.

The Question: In a retired Grade 4 Literary Analysis Task (LAT) The Peacock and Juno and the Peacock, text and video are intentionally paired. Students are presented with standards-based questions directed at each medium and asked to use the text and video to write an essay.

The Expert: Laura Beltchenko is a literacy and educational professional from Libertyville, Illinois, one of several dozen educators involved in New Meridian’s collaborative process to develop high-quality, engaging assessment items.

The Analysis: Quality literacy instruction strongly suggests students analyze text. To accomplish this task, students need to understand what the text says both explicitly, or clearly stated textual evidence, and implicitly, or suggested information that must be derived via context clues and cues to fully interpret what the text is revealing to the reader. This also holds true as students interpret information from varying forms of multimedia.

By Grade 4, students should be able to demonstrate the fundamentals of text analysis to participate in discussions or to answer a variety of questions both explicitly and implicitly stated. In The Peacock and Juno and the Peacock, text and video are paired and students are presented with questions and an essay that presents an issue to analyze.

Summarization is a lifelong skill that is important in a student’s school career and beyond. Among the questions posed to the students in this LAT is recognizing information that would be included in a summary. Selecting from a predetermined list of grouped items, students determine what items should be included by dragging and dropping to create a concluding statement.

In the case of this LAT, students will select quotes that should be included in a summary. Knowing the central message and the key ideas and details (RL.4.2) fulfills the goal of knowing what to include. Students must understand that the design of a summary is not like writing the conclusion of book report that generally includes opinions on what you liked or disliked. Summaries are succinct, including the most important information such as facts or quotes from the plot, setting, character or narrator.

Although students in this LAT are not required to place the answer choices in a correct order, the developmental skill of knowing what to include will follow a progression where the order of events or details become an important part of writing a satisfying summarization. This aspect of the LAT sets the pace for what students should do when writing their concluding statement or summary for their Literary Analysis Task.

Inside New Meridian

Ask Us
A Look Into the Future of Assessment

Expert: Tracy Gardner, Chief Assessment Officer, New Meridian

Question: What do you think testing will look like 20 years from now? What are the emerging trends that you believe are most likely to endure?

Answer: I see the pendulum shifting again. States are looking at new assessment models that allow more choices for local education agencies. I see performance-based assessment, through-course assessment, and formative assessment making new strides. Adaptive assessments are also gaining ground in the K-12 space.

During the last 25 years, I have seen performance-based assessments replaced with more traditional machine-scorable items that are more reliable and efficient, but now I am seeing a trend back to more authentic assessment again, at least in some states.

If we expand to include trends outside traditional public education, I also see content standards based on classic and ancient literature, poetry, science, and the arts. For example, much of the CCSS Appendix B Recommended Reading List is based on modern authors, but classical education approaches that are being utilized in many public charter schools, private schools, and homeschools focus on classic liberal arts philosophies and thinkers. It will be interesting to see if these classical education models make way into the modern, traditional public-school curriculum.

Regardless of curriculum choices, I think in the next 20 years, we will continue to see more adaptive assessments and item types that have gaming and virtual reality capabilities. I do hope we see a movement toward using multiple assessment measures that are used to create profiles for students. Every psychometrician knows that one test cannot do all things, so the more that we can move toward triangulation of evidence by collecting multiple sources of data, the better I think we can equip students, parents, teachers and educational decision makers with meaningful information to support more informed decision making.

Do you have a question about assessment that you want answered? Tell us your question and a New Meridian expert may take it on. Contact us at

Must Reads

In case you missed it, assessment-related news worth reading

Want a Better Assessment? Start by Asking the Right Question