Menu Harvard Graduate School of Education

Daniel Koretz

Henry Lee Shattuck Professor of Education
Daniel Koretz

Degree:  Ph.D., Cornell University, (1978)
Email:  [javascript protected email address]
Phone:  617.384.8090
Fax:  617.496.3095
Vitae/CV:   Daniel Koretz.pdf
Office:  Gutman 415
Office Hours:   http://danielkoretzofficehours.wikispaces.com/home
Office Hours Contact:  Online Sign-up
Faculty Assistant:  Stephen Martin

Profile

Daniel Koretz is an expert on educational assessment and testing policy. A primary focus of his work has been the impact of high-stakes testing. His research has included studies of score inflation, the effects of testing programs on educational practice, the assessment of students with disabilities, international differences in the variability of student achievement, the application of value-added models to educational achievement, and the development of methods for validating scores under high-stakes conditions. His current work focuses on the equity implications of high-stakes testing, the effects of high-stakes testing on postsecondary outcomes, the characteristics of traditional tests that encourage inappropriate test preparation, and the design and evaluation of new testing designs tailored for accountability. Professor Koretz is a member of the National Academy of Education and a Fellow of the American Educational Research Association. His doctorate is in developmental psychology from Cornell University. Before obtaining his degree, Koretz taught emotionally disturbed students in public elementary and junior high schools.

Areas of Expertise
Research

Professor Koretz’s current research focuses on large-scale educational assessments, particularly as they are used for monitoring and accountability.
A major strand of his research examines the effects of high-stakes testing, including score inflation and the instructional responses to testing that generate it. His current research (Educational Accountability Project), examines the equity effects of test-based accountability, the effects of testing on postsecondary performance and other later outcomes, the characteristics of traditional tests that encourage inappropriate test preparation, and the design and evaluation of new testing designs tailored for accountability. Professor Koretz has also conducted research on the assessment of students with disabilities, international differences in student achievement, the application of value-added models to educational achievement, and the development of methods for validating scores under high-stakes conditions.

Sponsored Projects


Education Accountability Project: Creating Self-Monitoring Assessments (2011-2012)
Spencer Foundation

Research extending back two decades has shown consistent evidence of negative side-effects of test-based accountability: undesirable changes in instruction (including diverse types of inappropriate test preparation) and inflation of test scores (e.g., Klein et al.,, 2002); Koretz & Barron, 1998; Stecher, 2002). These problems are a predictable instantiation of a general problem with performance-based accountability, known as Campbell’s Law: “The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor. (Campbell, 1979, p. 87).” With support from the Spencer Foundation, the Education Accountability Project (EAP) has undertaken research intended to help lessen these side-effects and contribute to the design of more effective accountability systems.
One component of this work has been the design of ‘self-monitoring assessments (SMAs; see Koretz & Beguin, 2010). Inappropriate test preparation and score inflation are facilitated by predictable recurrences and omissions in tests, including patterns in both content and representation. Test prep activities capitalize on these, producing score gains that shrink or vanish when different content or representations are sampled. Until recently, the extent of this problem has been evaluated by comparing score trends on a high-stakes test with trends on a comparable low-stakes audit test, such as the National Assessment of Educational Progress (NAEP). However, there are numerous disadvantages to relying on separate audit tests for this purpose. In contrast, SMAs incorporate audit components—novel items assessing appropriate content—into ongoing, operational high stakes tests.
During the 2010-2011 academic year, the EAP produced SMA components for use in New York. We produced 11 short test forms in mathematics for grades 4, 7, and 8 that were administered in a statewide field test in May, 2011. Because item writing requires both specialized knowledge and pretesting, we relied primarily on items in the public domain, modifying them as little as necessary and writing new items only in the few instances in which there was no alternative. This approach seriously limits our development of SMAs because we are constrained to grades and subjects in which there is an ample supply of items in the public domain, and even in those cases, testing some important content is difficult.
Accordingly, we have decided to have items written for us by Cito, a testing firm in the Netherlands. Cito’s director of research, Anton Beguin, has been involved in the design of SMAs for several years. Moreover, Cito has a very limited presence in the US market and therefore will not be in the position of bidding against US testing firms operating the high-stakes tests into which our audit items must be embedded.


Developing more effective test-based accountability by improving validity under high-stakes conditions (2011-2015)
U.S. Department of Education, Institute of Education Sciences

Since the 1970s, policymakers have relied on test-based accountability (TBA) as a primary tool for improving student achievement and for reducing racial and socioeconomic achievement gaps. Despite the accumulating evidence of threats to validity caused by educators’ responses to TBA, which in some cases have produced severe inflation of scores, research has not provided guidance for the development of tests better suited for use in accountability systems. This proposal requests funding for a research program that includes:1. Evaluating the limitations of current assessment systems, including the features of tests that are most vulnerable to inflation and the types of schools and students most affected by inflation, so that these problems can be lessened in new designs.2. Developing and evaluating new approaches for validating inferences based on scores on tests used for accountability.3. Designing new, research-based approaches to assessment to lessen the unwanted side-effects of TBA and increase the desired effects, evaluating their validity, and using the results of this ongoing research to develop needed modifications. Although this work addresses issues of national importance, it uses student-by-test-item data from three states—New York, Massachusetts, and Texas—because assessments are currently state-specific. The core is an agreement with the New York State Education Department (NYSED) that calls for us to conduct this work as an independent research group that commits NYSED to providing us with data and the opportunity to field new assessment designs. Our work extends traditional validation methods by examining variations in generalizability within tests rather than only the generalizability of total scores. First, we examine test forms to identify predictable patterns that could create inflation. Second, we examine differential gains on parts of the assessments, since preparation focused on predictable patterns within the test should be accompanied by differentially large performance gains on items reflecting these opportunities. Third, we develop and evaluate “self-monitoring assessments” (SMAs)—assessments that incorporate audit components directly into operational assessments and thus can localize inflation at the level of schools—and compare performance on parts of the operational assessment and these audit components. Fourth, we compare performance on parts of the operational assessment and parts of NAEP. Finally, we evaluate the extent to which trends in performance on accountability tests generalize to later outcomes, such as high school and college performance.We use several methods to analyze differential trends in performance. One approach is an adaptation of differential item functioning methods commonly used to flag items for potential bias. This approach allows us to match groups, such as high-gain and lower-gain schools, on performance on novel or inflation-resistant items and examine differential gains on items hypothesized to be most vulnerable to inappropriate test preparation (e.g., item clones). To make the work more accessible, we are also applying more widely used models, such as multi-level mixed models, to evaluate these contrasts. This is the first effort to systematically identify opportunities for score inflation, to explore variations in validity across parts of tests and across types of students and schools, and to implement and evaluate SMAs. Perhaps most important, this is the first investigation of variations in validity that is directly tied to efforts to lessen the problem of score inflation.

Courses
Publications

Holcombe, R., Jennings, J., & Koretz, D. (2013). The roots of score inflation: An examination of opportunities in two states’ tests. In G. Sunderman (Ed.), Charting reform, achieving equity in a diverse nation. Greenwich, CT: Information Age Publishing.,(2013)

Guo, Q, & Koretz, D. (2013). Estimating the impact of the English immersion law on limited English proficient students’ reading achievement. Education Policy 27(1), 121-149.,(2013)

Koretz, D. (2011). Lessons from test-based education reform in the U.S. Zeitschrift für Erzhiehungswisshenschaft (Journal of Educational Science), Special Issue 13, 9-24.,(2011)

National Research Council. (2011) (incl. D. Koretz). Incentives and Test-Based Accountability in Public Education. Committee on Incentives and Test-Based Accountability in Public Education, M. Hout and S. W. Elliott (Eds.) Board on Testing and Assessment, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.,(2011)

Koretz, D. (2010). The validity of score gains on high-stakes tests. In B. McGaw, P. L. Peterson, and E. Baker (Eds.), International Encyclopedia of Education, 3rd Edition. Oxford: Elsevier. Vol. 4, pp. 186-192.,(2010)

Koretz, D., and Beguin, A. (2010). Self-monitoring assessments for educational accountability systems. Measurement: Interdisciplinary Research and Perspectives, 8(2-3: special issue), 92-109.,(2010)

Koretz, D. (2009). How do American students measure up? Making sense of international comparisons. Future of Children, 19(1), 37-51.,(2009)

Koretz, D. (2009). Moving past No Child Left Behind. Science, 326, 803-804 (November 6).,(2009)

Koretz, D. (2008). Measuring Up: What Educational Testing Really Tells Us. Cambridge, MA: Harvard University Press. http://www.hup.harvard.edu/catalog.php?isbn=9780674035218,(2008)

Koretz. D. (2008). Test-based educational accountability: Research evidence and implications. Zeitschrift für Pädagogik (Journal of Pedagogy), 54(6), 777-790.,(2008)

Koretz, D. (2008). A measured approach: Maximizing the promise, and minimizing the pitfalls, of value-added models. American Educator, Fall, 18-27, 39.,(2008)

Koretz, D. (2008). Further steps toward the development of an accountability-oriented science of measurement . In K. E. Ryan & L. A. Shepard (Eds.), The Future of Test-Based Educational Accountability. Mahwah, NJ: Lawrence Erlbaum Associates, 71-91.,(2008)

Koretz, D. (2008). The pending reauthorization of NCLB: An opportunity to rethink the basic strategy. In G. L. Sunderman (Ed.), Holding NCLB Accountable: Achieving Accountability, Equity, and School Reform. Thousand Oaks, CA: Corwin Press, 9-26.,(2008)

Koretz, D. (2008, in press). Further steps toward the development of an accountability-oriented science of measurement . In K. E. Ryan & L. A. Shepard (Eds.), The Future of Test-Based Educational Accountability. Mahwah, NJ: Lawrence Erlbaum Associates.,(2008)

Koretz, D. (2007). Using aggregate-level linkages for estimation and validation: Comments on Thissen and Braun & Qian. In Dorans, N. J., Pommerich, M., & Holland, P. W. (Eds.), Linking and Aligning Scores and Scales. New York: Springer-Verlag, 339-353.,(2007)

Koretz, D., and Kim, Y-K. (2007). Changes in the Black-White Performance Gap in the Elementary School Grades. CSE Technical Report 715. Los Angeles: Center for the Study of Evaluation, University of California.,(2007)

Koretz, D., and Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan (Ed.), Educational measurement (4th ed.), 531-578. Westport, CT: American Council on Education/Praeger.,(2006)

Koretz, D. (2006). Steps toward more effective implementation of the Standards. Educational Measurement: Issues and Practice, 25(3), 46-50.,(2006)

Hamilton, L. S., McCaffrey, D. F., and Koretz, D. (2006). Validating Achievement Gains in Cohort-to-Cohort and Individual Growth-Based Modeling Contexts. In R. Lissitz (Ed.), Longitudinal and Value Added Modeling of Student Performance, 407 - 434. Maple Grove, MN: JAM Press.,(2006)

Koretz, D. (2005). Alignment, high stakes, and the inflation of test scores. In J. Herman and E. Haertel (Eds.), Uses and misuses of data in accountability testing. Yearbook of the National Society for the Study of Education, vol. 104, Part 2, 99-118. Malden, MA: Blackwell Publishing.,(2005)

Koretz, D., and McCaffrey, D. (2005). Using IRT DIF Methods to Evaluate the Validity of Score Gains. CSE Technical Report 660. Los Angeles: Center for the Study of Evaluation, University of California.,(2005)

Price, J., and Koretz, D. (2005). Building assessment literacy. In K. P. Boudett, E. A. City, and R. M. Murnane (Eds.), Data Wise: A Step-by-Step Guide to Using Assessment Results to Improve Teaching and Learning. Cambridge: Harvard Education Press.,(2005)

McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., and Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67-101.,(2004)

McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., and Hamilton, L. S. (2004). Let's see more empirical studies of value-added models of teacher effects: A reply to Raudenbush, Rubin, Stuart and Zanuto. Journal of Educational and Behavioral Statistics, 29(1), 139-144.,(2004)

Koretz, D., and Barton, K. (2003-2004). Assessing students with disabilities: Issues and evidence. Educational Assessment, 9(1&2), 29-60.,(2004)

McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., and Hamilton, L. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica: RAND, MG-158-EDU.,(2003)

Koretz, D. (2003). Using multiple measures to address perverse incentives and score inflation. Educational Measurement: Issues and Practice 22(2), 18-26.,(2003)

Koretz, D., and Hamilton, L. S. (2003). Teachers’ responses to high-stakes testing and the validity of gains: A pilot study. CSE Technical Report 610. Los Angeles: Center for the Study of Evaluation, University of California.,(2003)

Wainer, H., and Koretz, D. (2003). A political statistic. Chance, 16(4), 45-47.,(2003)

Baker, E.L., Linn, R.L, Herman, J.L., Koretz, D. (2002). Standards for educational accountability systems (CRESST Policy Brief 5). Los Angeles: University of California, Center for Research on Evaluation, Standards, and Student Testing. (http://www.cse.ucla.edu/products/policy/cresst_policy5.pdf),(2002)

Koretz, D. Russell, M., Shin, D. Horn, C., and Shasby, K. (2002). Testing and Diversity in Postsecondary Education: The Case of California. Education Policy Analysis Archives, 10 (1) (January). http://eric.ed.gov/?id=EJ658443,(2002)

Koretz, D. (2002). Exit tests and accountability at the high-school level. In The New Challenge for Public Education: Secondary School Reform—Designs, Standards, and Accountability. The Aspen Institute Congressional Program, vol. 17, no. 2. Washington, author, 39-48.,(2002)

Koretz, D. (2002). Limitations in the use of achievement tests as measures of educators' productivity. In E. Hanushek, J. Heckman, and D. Neal (Eds.), Designing Incentives to Promote Human Capital. Special issue of The Journal of Human Resources, 37(4, Fall), 752-777.,(2002)

Koretz, D., McCaffrey, D., and Hamilton, L. (2001). Toward a Framework for Validating Gains Under High-Stakes Conditions. CSE Technical Report 551. Los Angeles: Center for the Study of Evaluation, University of California.,(2001)

Koretz, D., and Berends, M. (2001). Changes in High School Grading Standards in Mathematics, 1982-1992. Santa Monica: RAND (MR-1445-CB).,(2001)

Blue Ribbon Panel on the New York Performance Standards Consortium (Everson, H.T., Koretz, D.M., Linn, R.L., Phillips, S.E., Qualls, A.L., and Stake, R.), (2001). New York Performance Standards Consortium Schools’ Alternative Assessment Systems: An Evaluation Report. New York City: Author.,(2001)

Koretz, D., McCaffrey, D., and Sullivan, T. (2001). Predicting variations in mathematics performance in four countries using TIMSS. Education Policy Analysis Archives, 9 (34) (September). http://epaa.asu.edu/ojs/article/viewFile/363/489,(2001)

Koretz, D., McCaffrey, D., and Sullivan, T. (2001). Using TIMSS to Analyze Correlates of Performance Variation in Mathematics. Working Paper No. 2001-095. Washington, D.C.: U.S. Department of Education, National Center for Education Statistics.,(2001)

Koretz, D. (2000). The Impact of Score Differences on the Admission of Minority Students: An Illustration. Chestnut Hill, MA: National Board on Educational Testing and Public Policy. NBETPP Statements, 1(5), June.,(2000)

Koretz, D., and Hamilton, L. (2000). Assessment of Students with Disabilities In Kentucky: Inclusion, Student Performance, and Validity. Educational Evaluation and Policy Analysis, 22(3), 255-272.,(2000)

National Research Council, Committee on Equivalency and linkage of Educational Tests (1999) (incl. D. Koretz). Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: National Academy Press.,(1999)

Koretz, D. and Hamilton, L. (1999). Assessing Students with Disabilities in Kentucky: The Effects of Accommodations, Format, and Subject. CSE Technical Report 498. Los Angeles: Center for the Study of Evaluation, University of California.,(1999)

Koretz, D. M., Bertenthal, M. W., and Green, B., Eds. (1999). Embedding Common Test Items in State and District Assessments. (National Research Council, Committee on Embedding Common Test Items in State and District Assessments.) Washington: National Academy Press.,(1999)

Koretz, D. (1998). Large-scale portfolio assessments in the US: Evidence pertaining to the quality of measurement. In D. Koretz, A. Wolf, and P. Broadfoot (Eds.), Records of Achievement. Special issue of Assessment In Education, 5(3), 309-334. Reprinted in W. Harlen, Ed. (2008), Student Assessment and Learning. London: Sage.,(1998)

Koretz, D., and Barron, S. I. (1998). The Validity of Gains on the Kentucky Instructional Results Information System (KIRIS). MR-1014-EDU, Santa Monica: RAND.,(1998)

Associations

Kentucky National Technical Advisory Panel on Assessment and Accountability,(2011-present)

National Academy of Education,(2008-present)

Faculty Associate and Member of the Steering Committee, Institute for Quantitative Social Science, Harvard University,(2006-present)

Assessment Technical Advisory Group, New York State Education Department,(1996-present)

National Council on Measurement in Education,(1988-present)

American Educational Research Association,(1986-present)

Founder and Chair, International Project for the Study of Educational Accountability,(2008-2012)

Program Chair, American Educational Research Association,(2006-2007)

Board on Testing and Assessment, National Research Council,(1999-2006)

Blue Ribbon Panel, NY Performance Assessment Consortium,(2000-2001)

Advisory Panel on Research and Development, The College Entrance Examination Board,(1996-1999)

National Research Council, Committee on Goals 2000 and the Inclusion of Students with Disabilities,(1995-1997)

National Research Council, Committee on Incentives and Test-Based Accountability in Education,(2006-)

Chair, National Research Council, Committee on Embedding Common Test Items in State and District Assessments,(1999-)

National Research Council, Committee on Equivalency and Linkage of Educational Tests,(1998-)

News Stories

Campaign Banner

Learn to Change the World

The Campaign for Harvard Graduate School of Education enables HGSE to fulfill its vision of changing the world through education by expanding opportunity and improving outcomes.

Learn More