Specialization is the Answer to Complex Assessment Systems

Last spring, New Meridian hosted a dozen chief state school officers at our annual convening on assessment literacy, which focused on assessment as a lever for equity and change. Interestingly, the topic that prompted the most interest this year was risk and how to manage the many complex components of a state testing program.

Statewide testing is a complex, large-scale operation for every state. Assessment companies take extraordinary steps to ensure every student has an opportunity to demonstrate his or her mastery of the state’s learning standards in ways that are fair, consistent, unbiased, and appropriate. Testing has become even more complex as assessments have moved online; computer-based testing allows students to demonstrate their knowledge and skills in ways that better reflect how they will process information and solve problems in the real world, but such systems are complex, and things inevitably go wrong.

Several high-profile incidents over the last two years have either delayed reporting of test scores or even invalidated statewide test scores entirely, resulting in a significant waste of resources and time, causing frustration among students and parents, and raising questions as to whether the state can competently manage its budget and operations. Chief state school officers are keenly aware of these responsibilities and rightly concerned about how to reduce the risk associated with assessing students with high-quality tests.

Some states are addressing this problem by separating test design and development from test administration, scoring, and reporting. Texas recently issued an RFP that broke out its testing program into 10 separate components, with test development separated from test administration. The state took this approach—unprecedented in its history—explicitly to attract innovative new approaches from a broader pool of providers, beyond the handful that can manage the massive operational scale required to administer a statewide test to 3.5 million students annually.

The states in the Smarter Balanced Assessment Consortium have always taken this approach: SBAC designs and develops the assessment in cooperation with its member states, and states procure their test administration vendors separately. Christyan Mitchell, chief operating officer at SBAC, says that such an approach has enabled them to focus on quality rather than having to invest in large-scale operational capacity.

Specialization like this is a proven approach to managing complex development processes. In the construction industry, you have architects and builders. A good architect will propose solutions and develop a design that reflects a client’s vision and goals. They will consider innovative developments in construction technologies that builders may not be aware of. (In assessment terms, this would be similar to considering an array of possible item and test designs, including technology enhanced items, multimedia-based items, state-of-the art accommodations, fixed test forms, item-level or multistage adaptive test forms, etc.). They will also help the client anticipate and manage the implications of their design for the construction phase in terms of cost, technical challenges, materials availability, and timeline. Finally, they will act as the client’s agent throughout the construction process, making sure the building is constructed to spec and performs as intended.

Having a partner that specializes in test design and development—like an architect—dramatically reduces risk and brings greater peace of mind. “It’s one of the best decisions we ever made,” notes John White, state superintendent of education for Louisiana. “We have greater control over the test design and a very responsive partner who knows our goals and priorities.”

Finally, as in construction, architects and builders must work as a cooperative team on behalf of the client. Tests must be designed and developed to function flawlessly in the administration vendor’s system. The test developer and the test delivery vendor must work together to help the state evaluate results and ensure technical quality. As states move toward greater specialization across their assessment partners, new models for cooperation and coordination will be required.

As a test design and development architect, New Meridian works with states to adopt this new approach to ensure they are getting the highest quality assessment program for their significant investment of tax dollars. We work closely with every major test administration vendor to ensure seamless coordination across workflows and systems. Ultimately, our goal is to help states develop the highest-quality assessment for the best value to provide educators, families, and students with the information they need to keep learning and growing.

Changemakers

Examining how states and systems are rethinking education, assessments and accountability to improve student outcomes

Report Highlights Innovation and Opportunity in State Assessment

A new report published by Bellwether Education Partners showcases “trailblazing states” that are making big, public reforms around innovation in state assessment. “States are continuing to rethink their roles in assessment and their assessment systems in ways that may benefit teaching, learning, transparency, and equity,” the report said.

Bellwether Education Partners, a nonprofit focused on improving education outcomes for under-served children, published the report to showcase innovative new approaches states are employing to get more value out of their large-scale assessment systems and advance their equity agendas. Several trends noted in the report should interest state chiefs, assessment directors, and education policy leaders.

Shared Item Banks Increase Quality and Lower Costs

Developing assessments is expensive. Tests have dozens of items, each of which can cost thousands of dollars to develop, field test, and validate. While states once joined consortia to share costs, the lack of flexibility became problematic.

“Instead of sharing whole tests, more states now mix, match, and share large groups of items via shared item banks,” the report said, adding that this type of collaboration maximizes benefits and minimizes risks. “States face bigger risks going it alone,” the report said.

“Beyond cost-sharing benefits, this model is cause for optimism because it can support higher-quality tests and interstate collaboration,” the report said, adding that, “Shared item banks and hybrid test designs may also give states the ability to maintain stability and comparability in their scores over time, even if their tests change.”

The report cites New Meridian’s work to foster sharing of content among states as an innovative new approach to increasing quality while reducing costs. New Meridian now manages a bank of more than 14,000 operationally ready test questions that have been developed by states and are available for licensing.

“States can license … that content from New Meridian for their own tests, along with options for various kinds of advisory and test development support,” the report said. “New Meridian’s approach [was] developed in direct response to states’ need for high-quality assessment content at a reasonable cost with autonomy around design decisions, especially among states leaving the consortia.”

The conclusion: “States still have access to some of the advantages of a consortium (high-quality items at a reduced cost), and continuity in their test design and content, with the flexibility to make individual choices.”

Science and Social Studies

Though the Every Student Succeeds Act (ESSA) requires a science assessment once in each grade band, there are no such requirements for social studies. Moreover, neither discipline is required to be tracked in state accountability systems, though some states do. Yet, the report concludes, having more flexibility for local design is enabling states to innovate in their approach to science and social studies assessments, and more opportunities lie ahead.

“Science and social studies assessments are ripe for state innovation and experimentation,” the report said. “These subjects could be a laboratory of assessment innovation, with wide flexibility in law and subject areas that have been somewhat neglected in the era of math and reading assessments.”

Despite the lack of a federal requirement, 33 states offer a social studies assessment of some type. In science, 40 states have adopted new standards in the last five years. The adoption has been driven by the creation of two national standards, including the Framework for K–12 Science Education created by the National Research Council (used in 21 states) and the Next Generation Science Standards created by a group that included the National Academies of Science (used in 19 states).

Several states are leading the way in innovative science and social studies assessment design. Louisiana has created a pilot program that mixes social studies and English language arts in a single assessment. New Hampshire combines classroom-based performance assessments in science in most grades with a single statewide summative assessment in grade five and high school.

“Science and social studies content are both especially well-suited to performance-based, hands-on activities and tasks, which can be engaging for students and provide deeper information on learning to teachers,” the report said.

Interim and Formative Assessment

The report also highlighted efforts to augment summative testing with interim testing, which administers assessments in a more compressed time frame to provide more timely feedback, and formative assessment, administered as instruction and learning take place.

Both strategies can impact instruction more frequently and more quickly, and the renewed interest in adding interim and formative components can bring balance to state assessments. Companies like New Meridian are working with states to formulate such plans, and several states have made major strides.

“Whereas once the state role in assessment was almost entirely limited to developing and administering traditional summative tests, states are thinking about ways to build more comprehensive assessment systems that include different kinds of tests and align with parallel efforts to improve instruction, professional development, standards, and curriculum,” the report said.

In Practice

A look at how educators are developing innovative strategies to use data to inform instruction in the classroom

Superintendent of Lighthouse District: ‘We Live By Data’

Data informs almost everything at Little Falls Township Public Schools in New Jersey, from the electives that are offered to the extracurricular activities available after school.

“We live by data,” said Superintendent Tracey Marinelli. And she means it.

It was a data-driven culture that led to massive improvements in state math and English language arts assessment scores in recent years—and to a Lighthouse District designation from the state Department of Education last month.

In a state with more than 670 districts serving almost 1.4 million students, only nine districts and two charter schools were awarded a Lighthouse designation on July 31. The award highlights districts that use multiple measures to assess students, create individualized instruction, and increase performance among diverse groups of students.

“The aim of the Lighthouse initiative is not to identify the districts with the highest test scores, but rather districts that have shown the greatest improvement,” Kathy Goldenberg, president of the state Board of Education, said in a statement. “By doing so, we give these local educators a platform in which they can share their initiatives, programs and successes with school leaders in other communities.”

Massive Improvement

In Little Falls, a district serving 900 students in kindergarten through eighth grades near Paterson, New Jersey, numbers on the state’s annual assessment tell the story.

In English language arts, 67 percent of students in grades 3 to 8 were meeting or exceeding expectations in the 2014-2015 school year. That rose to 83 percent by the 2017-2018 school year, Marinelli said. In math, the numbers went from 48 percent to 68 percent in the same period.

Among the district’s special education students, the number performing at or above expectations in English language arts more than doubled to 61 percent. The students also gained in math, with scores growing to 41 percent from 22 percent.

A Valuable Tool

When Marinelli arrived five years ago, the picture was different. “Every teacher was functioning in a silo,” she said. “Nobody was using the same data. We needed a new start to get everyone on the same page.”

The district contracted with a company to add formative assessments, while continuing to monitor performance on summative assessments and exposing students to “the type of questions that foster higher thinking skills.” For example, after reading a passage, students were no longer confronted with a multiple-choice question offering characterizations of how the main character felt. Instead, Marinelli said, students were asked to determine themselves how the character felt and provide text evidence to support their conclusion.

“We need to know that they can take what they learned since kindergarten and apply it,” she said. The district focused on student ability to meet state standards. Formative and summative measures, “give us a gauge of the whole student.”

Along the way, Little Falls changed its approach to supporting its students with special needs—20 percent of its student population—to full inclusion. More than half the classrooms in the district now have two teachers.

Marinelli also increased professional development to equip all teachers to support all students in an inclusive classroom. The district adopted a motto: “One District. One Team. One Vision.”

“It’s a collaborative effort. We built teams that used student and staff voices to get buy-in and build trust,” she said, adding that, “when people start to see success, they want more and they work harder.”

Little Falls has gone so far as to put data in the hands of students. Each student has a data folder showing his or her progress against state education standards. Teachers use that data to create individualized plans for instruction. The next step, Marinelli said, is parent-teacher conferences led by the students—and the data.

“People think about assessment and they get afraid of it,” she said. “But if you look at it as your friend, you see what a valuable tool it is.”

Inside New Meridian

Q&A: Tracy Gardner, Chief Assessment Officer at New Meridian

Dr. Tracy Gardner, who leads New Meridian’s test design and development efforts, has more than 20 years of operational experience with considerable expertise in measurement, assessment design and development, and psychometrics. Prior to joining New Meridian, she was Senior Director of Assessment for the next-generation GED assessment, where she contributed to all aspects of the assessment development process at GED Testing Service. Previously, she was Senior Psychometrician and Manager of Psychometrics and Research Services at Pearson, where she led and supported the psychometrics for more than two dozen large-scale testing programs.

Thinking back to your high school and college days, was there a particular moment or seminal event that led you into this field of study?

I was a double major in psychology and mathematics in college. My favorite areas were developmental psychology and cognitive psychology because I was passionate about studying how children learn. Psychology has a strong focus on research, and since I had a strong mathematics background, I found the statistics courses in the mathematics department to be particularly interesting.

Statistics and research methods classes lit my fire, and a professor of mine recognized this passion in me (apparently, that was rare—who knew?). During my junior year, my advisor suggested that I look into quantitative psychology and/or research methodology Ph.D. programs. As soon as I saw the course listings for research methodology/psychometrics, I knew it was the path for me. As a Ph.D. student at the University of Pittsburgh, I got to take courses in educational statistics, measurement, psychometrics, and test design. I always loved taking tests as a kid (particularly the New York State Regents exams), so this field that combined my love of psychology, mathematics, and statistics was a perfect path.

You’ve been on the vanguard of assessment design and development for the better part of two decades. How has test design and assessment development changed from the time you first entered the field?

As a graduate student in the 1990’s, I was blessed to have an internship position on the Maryland State Performance Assessment Program (MSPAP) led by Dr. Suzanne Lane. MSPAP was an innovative, interactive, multi-disciplinary assessment given to groups of students in classrooms.

Students worked in groups on complex, multi-disciplinary, performance-assessment tasks over the course of several days. These challenging tasks had students read for information, use manipulatives, interpret graphics, make predictions, perform calculations, show their work, and explain their reasoning. While students did not receive individual scores, schools did get scores, which they used to evaluate the strength of their curriculum and instruction.

This innovative program was the first assessment that I ever worked on and I became an immediate supporter of performance-based assessment. I graduated in 2000 before the No Child Left Behind (NCLB) legislation was approved. NCLB had lofty goals to get all students proficient and brought much more federal oversight to the assessment process. For the first 10 years of my career, I worked on all NCLB-type tests, which required reliable and valid student scores, annual testing in grades 3-8, and at least one high-school grade. Performance assessments like MSPAP were no longer an option for testing within the new federal guidelines.

Testing changed again with the introduction of the Common Core State Standards and ESSA, and it is continuing to change as it swings back to research more innovative approaches to testing (e.g. the Innovative Assessment Pilot). Change is inevitable, but the good news is that the field is growing, and technology is improving. Ideally, I would love to see the field continue to move toward incorporating a system of assessments that will allow for multiple measures to be combined into a portfolio of evidence.

One of the criticisms of assessments, particularly in the context of accountability, is that too much time is diverted from instruction toward test preparation. How do you respond to critics who say high-stakes, summative tests force teachers to teach to the test?

As a mother of six children that range in age from age 3 to 14, this question is particularly important to me, both as a parent and as an assessment expert. In my opinion, a good assessment enables teachers to be more informed about students’ understanding so the teacher can be a better guide to help students learn. Instruction that is better informed by good assessment is time better spent in the classroom.

For example, the ELA assessments that New Meridian develops measure students’ ability to read literary and informational texts critically, make inferences, draw conclusions and then write extended responses while citing evidence from the text. I believe that classroom time spent on developing these types of skills is time well spent. As a parent of school-age children, these are exactly the kinds of skills that I want my children to be working on in their daily classroom instruction because these are the kinds of skills that they will likely need to succeed in their future careers.

There are more than 14,000 school districts in the U.S. and each of them has the autonomy to develop its own curricula. With so many different curricula in use around the country, is it realistic to believe that a single test—or tests produced from a common pool of performance tasks—will work for every school district?

I can’t help but reflect back on my time in graduate school when I was evaluating the validity of the MSPAP. We found that the extended tasks benefited students from all walks of life. The tasks represented the same kinds of challenges and assignments that they were likely to face as an adult in a real-world job. New Meridian tasks are similar in that they require critical thinking, modeling, reasoning, communication and research skills. Whether a student attends public school, private school, or a charter/magnet school, these skills are critically important for career and college readiness.

As you look at the education landscape in America, what are you most optimistic about?

I am excited to see the pendulum shifting back to having more local control of assessment practices. While I definitely see the benefit of standard assessment and accountability practices that have been introduced through NCLB and ESSA legislation, I am excited to see more opportunities for through-course, interim, and formative assessments that may allow a swing back toward more performance-based assessment as one piece of the assessment system.

As a psychometrician and measurement professional, I have spent my career trying to educate stakeholders on the need for multiple measures. I learned more than 25 years ago that one assessment can’t do everything, but a comprehensive portfolio of multiple measures can provide an informative profile of student strengths and weaknesses. I am optimistic we are moving to a system of assessments that will take all the pressure off one assessment and allow for multiple measures to build out a more complete profile of student achievement.