This case illustrates how the work of leaders and analysts in the Delaware Department of Education (DDOE) and the agency’s partnership with the Strategic Data Project (SDP), a program of the Center for Education Policy Research at Harvard University, created momentum for statewide policy change. By exploring Delaware leaders’ use of data and analytics to challenge assumptions and inform the development of better policies and practices, the case illustrates the importance of leadership, analytic and technical competency, and strategic partnerships when leading education reform. The case specifically highlights the power of human capital analytics to diagnose the current status of Delaware’s educator pipeline, from preparation through development and retention, and how effectively communicating with these analyses built coalitions of support and drove a culture of data use at both the state and district level.
In this study we address the questions: Is teacher knowledge of the content and its teaching multidimensional, as advanced by different theoretical frameworks (e.g., Shulman, 1986) or does it comprise a single construct? If teacher knowledge consists of multiple dimensions, which is more predictive of student outcomes? If these are not multidimensional, do they predict student outcomes? To address these questions, we developed teacher surveys that included items measuring teacher content knowledge drawn from two sources: released items of the Massachusetts Test for Educator Licensure; and items tapping into teachers' mathematical knowledge for teaching drawn from the work of the Learning Mathematics for Teaching project.
We extend this line of research by investigating teacher career and background characteristics, personal resources, and school and district resources that predict an array of instructional practices identified on a mathematics-specific observational instrument, MQI, and a general instrument, CLASS. To understand these relationships, we use correlation and regression analyses. For a subset of teachers for whom we have data from multiple school years, we exploit within-teacher, cross-year variation to examine the relationship between class composition and instructional quality that is not confounded with the sorting of "better" students to "better" teachers. We conclude that multiple teacher- and school-level characteristics--rather than a single factor--are related to teachers' classroom practices.
In this paper, we use measures drawn from multiple studies to estimate the amount of classroom-level variation in student outcomes explained. We also explore whether a relatively small or relatively large number of variables explain this variance; the latter would suggest teaching is multidimensional.To instrument this theory, we collected different types of measures from several different research groups. Initial results demonstrate significant but weak relationships between our independent variables and student outcomes, and suggest that the amount of variance explained is low.
Using data from elementary mathematics teachers, we examine the correspondence between self-reports and observational measures of two instructional dimensions--reform-orientation and classroom climate--and the relative ability of these measures to predict teachers' contributions to student learning.
As many states are slated to soon use scores derived from classroom observation instruments in high-stakes decisions, developers must cultivate methods for improving the functioning of these instruments. We show how multidimensional, multilevel item response theory models can yield information critical for improving the performance of observational instruments.
While considerable variance in teachers' scores on observational instruments is attributed to raters, rater accuracy and its impact on score quality remains underexplored. Using student achievement data and ratings of mathematics instruction, we study methods for differentiating raters by accuracy and investigate whether these differences affect reliability and validity.
The purpose of this study is to investigate three aspects of construct validity for the Mathematical Quality of Instruction classroom observation instrument: (1) the dimensionality of scores, (2) the generalizability of these scores across districts, and (3) the predictive validity of these scores in terms of student achievement.
Descriptive evidence highlighting the potential sorting of teachers to districts and differences in resources to support instructional quality across districts (Hill, Kapitula, & Umland, 2011; Lankford, Loeb, & Wyckoff, 2002; Spillane, 2000) suggests that value-added rankings may not mean the same thing from one district to another. Therefore, in this study, we examine if being classified as a "high" or "low" value-added teacher captures a single underlying trait and common instructional practices across districts. Initial results suggest that measures of teacher quality cannot and should not be thought of as a common metric with a universal definition.
The authors used self-report surveys to gather information on a broad set of non-cognitive skills from 1,368 eighth-grade students attending Boston Public Schools and linked this information to administrative data on their demographics and test scores. At the student level, scales measuring conscientiousness, self-control, grit, and growth mindset are positively correlated with attendance, behavior, and test-score gains between fourth- and eighth-grade. Conscientiousness, self-control, and grit are unrelated to test-score gains at the school level, however, and students attending over-subscribed charter schools with higher average test-score gains score lower on these scales than do students attending district schools. Exploiting charter school admissions lotteries, the authors replicate previous findings indicating positive impacts of charter school attendance on math achievement, but find negative impacts on these non-cognitive skills. The authors provide suggestive evidence that these paradoxical results are driven by reference bias, or the tendency for survey responses to be influenced by social context. The results therefore highlight the importance of improved measurement of non-cognitive skills in order to capitalize on their promise as a tool to inform education practice and policy.
This guide provides an overview of the unique set of resources developed by SDP to promote data use in the education sector. It highlights each SDP resource, including what it is, why you would want to use it, what skills are needed, and where you can find it.
This toolkit provides useful resources for designing and rolling out a high school graduate exit survey, as well as effectively analyzing survey results in a school district. Anyone who is interested in implementing a high school exit survey, reworking a current exit survey, or effectively analyzing survey results in a school district can leverage this resource.
In this paper, the authors propose that an important determinant of value-added model choice should be alignment with alternative indicators of teacher and teaching quality. Such alignment makes sense from a theoretical perspective because better alignment is thought to indicate more valid systems. To provide initial evidence on this issue, they first calculated value-added scores for all fourth and fifth grade teachers within four districts, then extracted scores for 160 intensively studied teachers.Initial analyses using a subset of alternative indicators suggest that alignment between value-added scores and alternative indicators differ by model, though not significantly.
In this study we ask: Do observational instruments predict teachers' value-added equally well across different state tests and district/state contexts? And, to what extent are differences in these correlations a function of the match between the observation instrument and tested content?We use data from the Gates Foundation-funded Measures of Effective Teaching (MET) Project(N=1,333) study of elementary and middle school teachers from six large public school districts,and from a smaller (N=250) study of fourth- and fifth-grade math teachers from four large public school districts. Early results indicate that estimates of the relationship between teachers' value-added scores and their observed classroom instructional quality differ considerably by district.
In this study, we use value-added scores and video data in order to mount an exploratory study of high- and low-VAM teachers' instruction. Specifically, we seek to answer two research questions: First, can expert observers of mathematics instruction distinguish between high- and low-VAM teachers solely by observing their instruction? Second, what instructional practices, if any, consistently characterize high but not low-VAM teacher classrooms? To answer these questions, we use data generated by 250 fourth- and fifth-grade math teachers and their students in four large public school districts.Preliminary analyses indicate that a teacher's value-added rank was often not obvious to this team of expert observers.
The School District of Philadelphia partnered with SDP to produce the SDP College-Going Diagnostic. The diagnostic analyses summarized in this report focus on 1) student performance in the district during high school and into college, 2) critical junctures along the way that affect student success, and 3) student characteristics and other factors that are most strongly related to college enrollment and persistence.
The SDP Toolkit for Effective Data Use is a resource guide for education agency analysts who collect and analyze data on student achievement. Completing the toolkit produces a set of basic, yet essential, human capital and college-going analyses that every education agency should have as a foundation to inform strategic management and policy decisions.
Boston Public Schools collaborated with SDP to produce the SDP Human Capital Diagnostic for its district. The diagnostic is designed to identify patterns of teacher effectiveness and areas for policy change that could leverage teacher effectiveness to improve student achievement. It is also intended to demonstrate how districts can capitalize on existing data to understand its current performance, set future goals, and strategically plan responses.
Summer melt is a phenomenon where seemingly college-intending students fail to enroll the fall after high school graduation. The handbook is a resource for education leaders that contains guidance on how to measure the magnitude of summer melt among high school graduates, provides resources and tools to help design a summer intervention customized to the needs and realities of school communities, and documents the extent of the summer melt problem across several large school districts. It also provides evidence of the positive impact of additional outreach and support for students during the post-high school summer.
Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers, and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers’ responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after ten years.
In this article, Heather Hill and Pam Grossman discuss the current focus on using teacher observation instruments as part of new teacher evaluation systems being considered and implemented by states and districts. They argue that if these teacher observation instruments are to achieve the goal of supporting teachers in improving instructional practice, they must be subject-specific, involve content experts in the process of observation, and provide information that is both accurate and useful for teachers. They discuss the instruments themselves, raters and system design, and timing of and feedback from the observations. They conclude by outlining the challenges that policy makers face in designing observation systems that will work to improve instructional practice at scale.
Los Angeles Unified School District (LAUSD) partnered with SDP to produce the SDP College-Readiness Diagnostic for its district. The diagnostic analyses focus on 1) how students across the district progress toward high school graduation, 2) whether and how students who fall off track for graduation recover and go on to graduate, and 3) the progress of students toward the completion of A-G requirements.
Capstone reports mark the end of the two-year SDP Fellowship experience and capture a portion of the work fellows have led in their agencies. The capstones demonstrate the impact fellows make and the role of SDP in supporting their growth as data strategists. They also provide recommendations to the fellows’ agencies on best practices for sustaining key projects. Additionally, the reports will serve as guides to other agencies, future fellows, and researchers seeking to do similar work.
In Off-Track Status in High School, SDP analysts observed that almost all students who fall academically off track in high school as measured by credits attained are already off track by the end of ninth grade. This SPI, which tracks the proportion of students who move from being “off track” to being “on track”, can serve to focus a district’s attention on at-risk students while there is time to intervene and improve a student’s chances of timely high school graduation.
The High School Effect highlights the wide variation in college-going rates for students with similar levels of eighth grade academic achievement who attend different high schools within a district. The analysis reveals that some schools are better at helping lower performing students successfully enroll in college. This indicator suggests the importance of individual schools in meaningfully influencing their students’ likelihood of enrolling in college.
College Choice uncovers a group of highly successful students in each district who do not attend college at all or enroll in colleges and universities that are less challenging than those for which they are academically prepared. In fact, across districts examined in these analyses, between 7 and 16 percent of high performing students do not enroll in college. Other high achieving students opt to attend less selective postsecondary institutions. Additional analyses, conducted by SDP and others, confirm that students are more likely to drop out from colleges that are not sufficiently academically challenging for them.
Los Angeles Unified School District (LAUSD) partnered with SDP to produce the SDP Human Capital Diagnostic for its district. The diagnostic is designed to identify patterns of teacher effectiveness and areas for policy change that could leverage teacher effectiveness to improve student achievement. It is also intended to demonstrate how districts can capitalize on existing data to understand its current performance, set future goals, and strategically plan responses.
Measurement scholars have recently constructed validity arguments in support of a variety of educational assessments, including classroom observation instruments. In this article, we note that users must examine the robustness of validity arguments to variation in the implementation of these instruments. We illustrate how such an analysis might be used to assess a validity argument constructed for the Mathematical Quality of Instruction instrument, focusing in particular on the effects of varying the rater pool, subject matter content, observation procedure, and district context. Variation in the subject matter content of lessons did not affect rater agreement with master scores, but the evaluation of other portions of the validity argument varied according to the composition of the rater pool, observation procedure, and district context. These results demonstrate the need for conducting such analyses, especially for classroom observation instruments that are subject to multiple sources of variation
Gwinnett County Public Schools worked with SDP to produce the SDP Human Capital Diagnostic for its district. The diagnostic is designed to identify patterns of teacher effectiveness and areas for policy change that could leverage teacher effectiveness to improve student achievement. It is also intended to demonstrate how districts can capitalize on existing data to understand its current performance, set future goals, and strategically plan responses.
Gwinnett County Public Schools worked with SDP to create the SDP College-Going Diagnostic for its district. The diagnostic is designed to identify potential areas for action to increase students’ levels of academic achievement, preparedness for college, and postsecondary attainment. It is also intended to demonstrate how districts can capitalize on existing data to understand its current performance, set future goals, and strategically plan responses.
The Effective Teacher Retention Rate examines how retention rates for novice teachers differ by level of effectiveness. This indicator reveals that there is very little difference in retention rates between the most effective teachers compared to the least effective ones, and that this difference is virtually indistinguishable after the first year. Since districts should ideally try to retain their most effective teachers, and counsel out their least effective ones, this suggests that districts are not yet differentiating retention strategies by teacher effectiveness.
The Novice Teacher Placement Pattern examines whether lower-performing students are disproportionately placed in classrooms of novice teachers. SDP researchers observed that first-year teachers are systematically being placed with students who start the year performing considerably behind their peers. These results were seen in each of the districts studied, regardless of the demographic make-up of that district, across all schools in the district. In three of the four districts examined, these patterns persisted within each of the schools as well.
This case study, published by Harvard Education Press, describes how to use data to challenge assumptions, reveal student needs, address these needs programmatically, and evaluate results. It shows a team of data specialists and educators working together, across institutional and departmental boundaries, to determine why some high school seniors who intend to go to college after graduation do not enroll in the fall. Together, they develop, implement, and evaluate a summer counseling intervention program called Summer PACE to ensure that more students enroll seamlessly in college.
The CEPR report, “Are Practice-Based Teacher Evaluations and Teacher Effectiveness Linked in TNTP’s Performance Assessment System (PAS)?” examines the evaluation system for first-year Louisiana teachers trained by TNTP, a national nonprofit organization focused on improving teacher performance. The authors conclude that there is a modest positive relationship between teachers’ PAS scores and actual student achievement growth in math and reading. The analysis also suggests that, with some technical improvements, the PAS could become an even better predictor of student academic outcomes.
Educational interventions are often evaluated and compared on the basis of their impacts on test scores. Decades of research have produced two empirical regularities: interventions in later grades tend to have smaller effects than the same interventions in earlier grades, and the test score impacts of early educational interventions almost universally “fade out” over time. This paper explores whether these empirical regularities are an artifact of the common practice of rescaling test scores in terms of a student’s position in a widening distribution of knowledge. If a standard deviation in test scores in later grades translates into a larger difference in knowledge, an intervention’s effect on normalized test scores may fall even as its effect on knowledge does not. We evaluate this hypothesis by fitting a model of education production to correlations in test scores across grades and with college-going using both administrative and survey data. Our results imply that the variance in knowledge does indeed rise as children progress through school, but not enough for test score normalization to fully explain these empirical regularities.
SDP published this brief to provide leaders and decision-makers with a well-rounded understanding of value-added measures to inform policy and management changes in their school districts and state education agencies. Since value-added measures are integral to the Human Capital Diagnostic, SDP is providing background on how they are calculated, why they are used, and how they compare to other measures of teacher effectiveness. During a time when there is a lot of controversy around teacher evaluation, this brief aims to increase awareness and understanding of the benefits and limitations of value-added measures in a format that is accessible to general audiences.
In recent years, interest has grown using classroom observation as a means to several ends, including teacher development, teacher evaluation, and impacte evaluation of classroom-based interventions. Although education practioners and researchers have developed numeraous observational instruments for these purposes, many developers fail to specify important criteria regarding instrument use.
Center researchers John Papay, Martin West, Jon Fullerton, and Thomas Kane investigate the effectiveness of the Boston Teacher Residency (BTR) in their working paper Does Practice-Based Teacher Preparation Increase Student Achievement? Early Evidence from the Boston Teacher Residency. BTR is an innovative practice-based preparation program in which candidates work alongside a mentor teacher for a year before becoming a teacher of record in Boston Public Schools.
Fort Worth Independent School District (FWISD) collaborated with SDP to create the SDP College-Going Diagnostic to examine the district’s college-going enrollment and persistence rates. The diagnostic is designed to identify potential areas for action to increase students’ levels of academic achievement, preparedness for college, and postsecondary attainment.
Teachers are the most important school-level factor in student success--but as any parent knows, all teachers are not created equal. Reforms to the current quite cursory teacher evaluation system, if done well, have the potential to remove the worst-performing teachers and, even more important, to assist the majority in improving their craft. However, the US educational system often cannibalizes its own innovations, destroying their potential with a steady drip of rules, regulations, bureaucracy, and accommodations to the status quo. Because that status quo sets an unacceptably low bar for teaching quality, missing this opportunity now means new generations of students may suffer mediocre—or worse—classrooms.
The effect of evaluation on employee performance is traditionally studied in the context of the principal-agent problem. Evaluation can, however, also be characterized as an investment in the evaluated employee’s human capital. We study a sample of mid-career public school teachers where we can consider these two types of evaluation effect separately. Employee evaluation is a particularly salient topic in public schools where teacher effectiveness varies substantially and where teacher evaluation itself is increasingly a focus of public policy proposals. We find evidence that a quality classroom-observation-based evaluation and performance measures can improve mid-career teacher performance both during the period of evaluation, consistent with the traditional predictions; and in subsequent years, consistent with human capital investment. However the estimated improvements during evaluation are less precise. Additionally, the effects sizes represent a substantial gain in welfare given the program’s costs.
Fulton County Schools (FCS) partnered with SDP to produce the SDP College-Going and Human Capital Diagnostic for its district. The diagnostics are meant to demonstrate how districts can capitalize on existing data to understand its current performance, set future goals, and strategically plan responses. The College-Going Diagnostic report illuminates students’ enrollment over time and compares these patterns across a variety of student characteristics and academic experiences. The Human Capital Diagnostic report investigates teacher effectiveness with the intention of informing district leaders about patterns of teacher effectiveness and identifying areas for policy change that could leverage teacher effectiveness to improve student achievement.
Researchers from the Harvard Graduate School of Education, MIT, and the University of Michigan have released the results of a new study that suggests that urban charter schools in Massachusetts have large positive effects on student achievement at both the middle and high school levels. Results for nonurban charter schools were less clear; some analyses indicated positive effects on student achievement at the high school level, while results for middle school students were much less encouraging.
This paper combines information from classroom-based observations and measures of teachers’ ability to improve student achievement as a step toward addressing the challenge of identifying effective teachers and teaching practices. The authors find that classroom-based measures of teaching effectiveness are related in substantial ways to student achievement growth. The authors conclude that the results point to the promise of teacher evaluation systems that would use information from both classroom observations and student test scores to identify effective teachers. Information on the types of practices that are most effective at raising achievement is also highlighted.
The authors administered an in-depth survey to new math teachers in New York City and collected information on a number of non-traditional predictors of effectiveness: teaching specific content knowledge, cognitive ability, personality traits, feelings of self-efficacy, and scores on a commercially available teacher selection instrument. They find that a number of these predictors have statistically and economically significant relationships with student and teacher outcomes. The authors conclude that, while there may be no single factor that can predict success in teaching, using a broad set of measures can help schools improve the quality of their teachers.
Evidence from a new random assignment experiment in Los Angeles elementary schools finds students assigned to teachers rated highly by the National Board of Professional Teaching Standards (NBPTS) outperform students in comparison classrooms. The NBPTS has developed a rigorous process to identify exemplary teachers. Yet, little has been done to connect the NBPTSs assessments to student achievement outcomes. The Center, working with the Los Angeles Unified School District, studied the extent to which the NBPTS process identifies teachers that produce the largest gains in student achievement. We compared the performance of classrooms randomly assigned to either an NBPTS applicant or a comparison teacher. The report includes a number of suggestions for improving the predictive power of the NBPTS scaling process.
The authors used a random-assignment experiment in Los Angeles Unified School District to evaluate various non-experimental methods for estimating teacher effects on student test scores. Having estimated teacher effects during a pre-experimental period, they used these estimates to predict student achievement following random assignment of teachers to classrooms. While all of the teacher effect estimates considered were significant predictors of student achievement under random assignment, those that controlled for prior student test scores yielded unbiased predictions and those that further controlled for mean classroom characteristics yielded the best prediction accuracy. In both the experimental and non-experimental data, the authors found that teacher effects faded out by roughly 50 percent per year in the two years following teacher assignment.
The National Board for Professional Teaching Standards (NBPTS) assesses teaching practice based on videos and essays submitted by teachers. For this study, the authors compared the performance of classrooms of elementary students in Los Angeles randomly assigned to NBPTS applicants and to comparison teachers. The authors conclude that students assigned to highly-rated applicants outperformed those in the comparison classrooms by more than those assigned to poorly-rated teachers. Moreover, the estimates with and without random assignment were similar.
As new teacher evaluation systems are rolled out across the country, we hope that many agencies follow TNTP’s lead to carefully examine the results of their system, learn what is working and what is not, and make changes to improve."