Menu Harvard Graduate School of Education

Daniel Koretz

Henry Lee Shattuck Professor of Education
Daniel Koretz

Degree:  Ph.D., Cornell University, (1978)
Email:  [javascript protected email address]
Phone:  617.384.8090
Fax:  617.496.3095
Vitae/CV:   Daniel Koretz.pdf
Office:  Gutman 415
Office Hours:   http://danielkoretzofficehours.wikispaces.com/home
Office Hours Contact:  Online Sign-up
Faculty Assistant:  Talicha Vaval

Profile

Daniel Koretz is an expert on educational assessment and testing policy. A primary focus of his work has been the impact of high-stakes testing. His research has included studies of score inflation, the effects of testing programs on educational practice, the assessment of students with disabilities, international differences in the variability of student achievement, the application of value-added models to educational achievement, and the development of methods for validating scores under high-stakes conditions. His current work focuses on the equity implications of high-stakes testing, the effects of high-stakes testing on postsecondary outcomes, the characteristics of traditional tests that encourage inappropriate test preparation, and the design and evaluation of new testing designs tailored for accountability. Koretz is a member of the National Academy of Education and a Fellow of the American Educational Research Association. His doctorate is in developmental psychology from Cornell University. Before obtaining his degree, Koretz taught emotionally disturbed students in public elementary and junior high schools.

Click here to see a full list of Daniel Koretz's courses.

Areas of Expertise
Research

Professor Koretz’s current research focuses on large-scale educational assessments, particularly as they are used for monitoring and accountability.
A major strand of his research examines the effects of high-stakes testing, including score inflation and the instructional responses to testing that generate it. His current research (Educational Accountability Project), examines the equity effects of test-based accountability, the effects of testing on postsecondary performance and other later outcomes, the characteristics of traditional tests that encourage inappropriate test preparation, and the design and evaluation of new testing designs tailored for accountability. Professor Koretz has also conducted research on the assessment of students with disabilities, international differences in student achievement, the application of value-added models to educational achievement, and the development of methods for validating scores under high-stakes conditions.

Sponsored Projects


Developing more effective test-based accountability by improving validity under high-stakes conditions (2011-2015)
U.S. Department of Education, Institute of Education Sciences

Since the 1970s, policymakers have relied on test-based accountability (TBA) as a primary tool for improving student achievement and for reducing racial and socioeconomic achievement gaps. Despite the accumulating evidence of threats to validity caused by educators’ responses to TBA, which in some cases have produced severe inflation of scores, research has not provided guidance for the development of tests better suited for use in accountability systems. This proposal requests funding for a research program that includes:1. Evaluating the limitations of current assessment systems, including the features of tests that are most vulnerable to inflation and the types of schools and students most affected by inflation, so that these problems can be lessened in new designs.2. Developing and evaluating new approaches for validating inferences based on scores on tests used for accountability.3. Designing new, research-based approaches to assessment to lessen the unwanted side-effects of TBA and increase the desired effects, evaluating their validity, and using the results of this ongoing research to develop needed modifications. Although this work addresses issues of national importance, it uses student-by-test-item data from three states—New York, Massachusetts, and Texas—because assessments are currently state-specific. The core is an agreement with the New York State Education Department (NYSED) that calls for us to conduct this work as an independent research group that commits NYSED to providing us with data and the opportunity to field new assessment designs. Our work extends traditional validation methods by examining variations in generalizability within tests rather than only the generalizability of total scores. First, we examine test forms to identify predictable patterns that could create inflation. Second, we examine differential gains on parts of the assessments, since preparation focused on predictable patterns within the test should be accompanied by differentially large performance gains on items reflecting these opportunities. Third, we develop and evaluate “self-monitoring assessments” (SMAs)—assessments that incorporate audit components directly into operational assessments and thus can localize inflation at the level of schools—and compare performance on parts of the operational assessment and these audit components. Fourth, we compare performance on parts of the operational assessment and parts of NAEP. Finally, we evaluate the extent to which trends in performance on accountability tests generalize to later outcomes, such as high school and college performance.We use several methods to analyze differential trends in performance. One approach is an adaptation of differential item functioning methods commonly used to flag items for potential bias. This approach allows us to match groups, such as high-gain and lower-gain schools, on performance on novel or inflation-resistant items and examine differential gains on items hypothesized to be most vulnerable to inappropriate test preparation (e.g., item clones). To make the work more accessible, we are also applying more widely used models, such as multi-level mixed models, to evaluate these contrasts. This is the first effort to systematically identify opportunities for score inflation, to explore variations in validity across parts of tests and across types of students and schools, and to implement and evaluate SMAs. Perhaps most important, this is the first investigation of variations in validity that is directly tied to efforts to lessen the problem of score inflation.

Publications

Koretz, D., Yu, C., Mbekeani, P., Langi, M., Dhaliwal, T., and Braslow, D. (2016, September). Predicting freshman grade-point average from college-admissions and state high-school test scores. AERA Open. http://ero.sagepub.com/content/2/4/2332858416670601, DOI: 10.1177/2332858416670601.,(2016)

Koretz, D. (2016). Making the term “validity” useful. Assessment in Education: Principles, Policy, & Practice, 23(2), 290-292. DOI: 10.1080/0969594X.2015.1111193,(2016)

Koretz, D. (2015). Adapting the practice of measurement to the demands of test-based accountability: Response to commentaries. Measurement: Interdisciplinary Research and Perspectives, 13(3), 1-6.,(2015)

Koretz, D. (2015). Adapting the practice of measurement to the demands of test-based accountability. Measurement: Interdisciplinary Research and Perspectives, 13, 1-25.,(2015)

Ng, H. L., and Koretz, D. (2015). Sensitivity of school-performance ratings to scaling decisions. Applied Measurement in Education, 28(4), 330-349. https://dash.harvard.edu/handle/1/13360004.,(2015)

Guo, Q, & Koretz, D. (2013). Estimating the impact of the English immersion law on limited English proficient students’ reading achievement. Education Policy 27(1), 121-149.,(2013)

Holcombe, R., Jennings, J., & Koretz, D. (2013). The roots of score inflation: An examination of opportunities in two states’ tests. In G. Sunderman (Ed.), Charting reform, achieving equity in a diverse nation, 163-189. Greenwich, CT: Information Age Publishing.,(2013)

Koretz, D. (2013). Commentary on E. Haertel, “How is testing supposed to improve schooling?” Measurement: Interdisciplinary Research and Perspectives, 11(1-2), 40-43.,(2013)

Holcombe, R., Jennings, J., & Koretz, D. (2013). The roots of score inflation: An examination of opportunities in two states’ tests. In G. Sunderman (Ed.), Charting reform, achieving equity in a diverse nation. Greenwich, CT: Information Age Publishing.,(2013)

Koretz, D. (2011). Lessons from test-based education reform in the U.S. Zeitschrift für Erzhiehungswisshenschaft (Journal of Educational Science), Special Issue 13, 9-24.,(2011)

National Research Council. (2011) (incl. D. Koretz). Incentives and Test-Based Accountability in Public Education. Committee on Incentives and Test-Based Accountability in Public Education, M. Hout and S. W. Elliott (Eds.) Board on Testing and Assessment, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.,(2011)

Koretz, D., and Beguin, A. (2010). Self-monitoring assessments for educational accountability systems. Measurement: Interdisciplinary Research and Perspectives, 8(2-3: special issue), 92-109.,(2010)

Koretz, D. (2010). The validity of score gains on high-stakes tests. In B. McGaw, P. L. Peterson, and E. Baker (Eds.), International Encyclopedia of Education, 3rd Edition. Oxford: Elsevier. Vol. 4, pp. 186-192.,(2010)

Koretz, D. (2009). How do American students measure up? Making sense of international comparisons. Future of Children, 19(1), 37-51.,(2009)

Koretz, D. (2009). Moving past No Child Left Behind. Science, 326, 803-804 (November 6).,(2009)

Koretz. D. (2008). Test-based educational accountability: Research evidence and implications. Zeitschrift für Pädagogik (Journal of Pedagogy), 54(6), 777-790.,(2008)

Koretz, D. (2008). A measured approach: Maximizing the promise, and minimizing the pitfalls, of value-added models. American Educator, Fall, 18-27, 39.,(2008)

Koretz, D. (2008). Further steps toward the development of an accountability-oriented science of measurement . In K. E. Ryan & L. A. Shepard (Eds.), The Future of Test-Based Educational Accountability. Mahwah, NJ: Lawrence Erlbaum Associates, 71-91.,(2008)

Koretz, D. (2008). The pending reauthorization of NCLB: An opportunity to rethink the basic strategy. In G. L. Sunderman (Ed.), Holding NCLB Accountable: Achieving Accountability, Equity, and School Reform. Thousand Oaks, CA: Corwin Press, 9-26.,(2008)

Koretz, D. (2008, in press). Further steps toward the development of an accountability-oriented science of measurement . In K. E. Ryan & L. A. Shepard (Eds.), The Future of Test-Based Educational Accountability. Mahwah, NJ: Lawrence Erlbaum Associates.,(2008)

Koretz, D. (2008). Measuring Up: What Educational Testing Really Tells Us. Cambridge, MA: Harvard University Press. http://www.hup.harvard.edu/catalog.php?isbn=9780674035218,(2008)

Koretz, D. (2007). Using aggregate-level linkages for estimation and validation: Comments on Thissen and Braun & Qian. In Dorans, N. J., Pommerich, M., & Holland, P. W. (Eds.), Linking and Aligning Scores and Scales. New York: Springer-Verlag, 339-353.,(2007)

Koretz, D., and Kim, Y-K. (2007). Changes in the Black-White Performance Gap in the Elementary School Grades. CSE Technical Report 715. Los Angeles: Center for the Study of Evaluation, University of California.,(2007)

Koretz, D. (2006). Steps toward more effective implementation of the Standards. Educational Measurement: Issues and Practice, 25(3), 46-50.,(2006)

Koretz, D., and Hamilton, L. S. (2006). Testing for accountability in K-12. In R. L. Brennan (Ed.), Educational measurement (4th ed.), 531-578. Westport, CT: American Council on Education/Praeger.,(2006)

Hamilton, L. S., McCaffrey, D. F., and Koretz, D. (2006). Validating Achievement Gains in Cohort-to-Cohort and Individual Growth-Based Modeling Contexts. In R. Lissitz (Ed.), Longitudinal and Value Added Modeling of Student Performance, 407 - 434. Maple Grove, MN: JAM Press.,(2006)

Koretz, D., and McCaffrey, D. (2005). Using IRT DIF Methods to Evaluate the Validity of Score Gains. CSE Technical Report 660. Los Angeles: Center for the Study of Evaluation, University of California.,(2005)

Price, J., and Koretz, D. (2005). Building assessment literacy. In K. P. Boudett, E. A. City, and R. M. Murnane (Eds.), Data Wise: A Step-by-Step Guide to Using Assessment Results to Improve Teaching and Learning. Cambridge: Harvard Education Press.,(2005)

Koretz, D. (2005). Alignment, high stakes, and the inflation of test scores. In J. Herman and E. Haertel (Eds.), Uses and misuses of data in accountability testing. Yearbook of the National Society for the Study of Education, vol. 104, Part 2, 99-118. Malden, MA: Blackwell Publishing.,(2005)

McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., and Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29(1), 67-101.,(2004)

McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., and Hamilton, L. S. (2004). Let's see more empirical studies of value-added models of teacher effects: A reply to Raudenbush, Rubin, Stuart and Zanuto. Journal of Educational and Behavioral Statistics, 29(1), 139-144.,(2004)

Koretz, D., and Barton, K. (2003-2004). Assessing students with disabilities: Issues and evidence. Educational Assessment, 9(1&2), 29-60.,(2004)

Koretz, D., and Hamilton, L. S. (2003). Teachers’ responses to high-stakes testing and the validity of gains: A pilot study. CSE Technical Report 610. Los Angeles: Center for the Study of Evaluation, University of California.,(2003)

Koretz, D. (2003). Using multiple measures to address perverse incentives and score inflation. Educational Measurement: Issues and Practice 22(2), 18-26.,(2003)

Wainer, H., and Koretz, D. (2003). A political statistic. Chance, 16(4), 45-47.,(2003)

McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., and Hamilton, L. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica: RAND, MG-158-EDU.,(2003)

Koretz, D. Russell, M., Shin, D. Horn, C., and Shasby, K. (2002). Testing and Diversity in Postsecondary Education: The Case of California. Education Policy Analysis Archives, 10 (1) (January). http://eric.ed.gov/?id=EJ658443,(2002)

Koretz, D. (2002). Exit tests and accountability at the high-school level. In The New Challenge for Public Education: Secondary School Reform—Designs, Standards, and Accountability. The Aspen Institute Congressional Program, vol. 17, no. 2. Washington, author, 39-48.,(2002)

Koretz, D. (2002). Limitations in the use of achievement tests as measures of educators' productivity. In E. Hanushek, J. Heckman, and D. Neal (Eds.), Designing Incentives to Promote Human Capital. Special issue of The Journal of Human Resources, 37(4, Fall), 752-777.,(2002)

Baker, E.L., Linn, R.L, Herman, J.L., Koretz, D. (2002). Standards for educational accountability systems (CRESST Policy Brief 5). Los Angeles: University of California, Center for Research on Evaluation, Standards, and Student Testing. (http://www.cse.ucla.edu/products/policy/cresst_policy5.pdf),(2002)

Koretz, D., McCaffrey, D., and Hamilton, L. (2001). Toward a Framework for Validating Gains Under High-Stakes Conditions. CSE Technical Report 551. Los Angeles: Center for the Study of Evaluation, University of California.,(2001)

Koretz, D., and Berends, M. (2001). Changes in High School Grading Standards in Mathematics, 1982-1992. Santa Monica: RAND (MR-1445-CB).,(2001)

Koretz, D., McCaffrey, D., and Sullivan, T. (2001). Using TIMSS to Analyze Correlates of Performance Variation in Mathematics. Working Paper No. 2001-095. Washington, D.C.: U.S. Department of Education, National Center for Education Statistics.,(2001)

Koretz, D., McCaffrey, D., and Sullivan, T. (2001). Predicting variations in mathematics performance in four countries using TIMSS. Education Policy Analysis Archives, 9 (34) (September). http://epaa.asu.edu/ojs/article/viewFile/363/489,(2001)

Blue Ribbon Panel on the New York Performance Standards Consortium (Everson, H.T., Koretz, D.M., Linn, R.L., Phillips, S.E., Qualls, A.L., and Stake, R.), (2001). New York Performance Standards Consortium Schools’ Alternative Assessment Systems: An Evaluation Report. New York City: Author.,(2001)

Koretz, D. (2000). The Impact of Score Differences on the Admission of Minority Students: An Illustration. Chestnut Hill, MA: National Board on Educational Testing and Public Policy. NBETPP Statements, 1(5), June.,(2000)

Koretz, D., and Hamilton, L. (2000). Assessment of Students with Disabilities In Kentucky: Inclusion, Student Performance, and Validity. Educational Evaluation and Policy Analysis, 22(3), 255-272.,(2000)

National Research Council, Committee on Equivalency and linkage of Educational Tests (1999) (incl. D. Koretz). Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: National Academy Press.,(1999)

Koretz, D. and Hamilton, L. (1999). Assessing Students with Disabilities in Kentucky: The Effects of Accommodations, Format, and Subject. CSE Technical Report 498. Los Angeles: Center for the Study of Evaluation, University of California.,(1999)

Koretz, D. M., Bertenthal, M. W., and Green, B., Eds. (1999). Embedding Common Test Items in State and District Assessments. (National Research Council, Committee on Embedding Common Test Items in State and District Assessments.) Washington: National Academy Press.,(1999)

Koretz, D. (1998). Large-scale portfolio assessments in the US: Evidence pertaining to the quality of measurement. In D. Koretz, A. Wolf, and P. Broadfoot (Eds.), Records of Achievement. Special issue of Assessment In Education, 5(3), 309-334. Reprinted in W. Harlen, Ed. (2008), Student Assessment and Learning. London: Sage.,(1998)

Koretz, D., and Barron, S. I. (1998). The Validity of Gains on the Kentucky Instructional Results Information System (KIRIS). MR-1014-EDU, Santa Monica: RAND.,(1998)

Associations

Fellow of the American Educational Research Association,(2010-present)

New York State Department of Education Technical Advisory Group,(2010-present)

Kentucky National Technical Advisory Panel on Assessment and Accountability,(2009-present)

National Academy of Education,(2008-present)

Faculty Associate and Member of the Steering Committee, Institute for Quantitative Social Science, Harvard University,(2006-present)

Assessment Technical Advisory Group, New York State Education Department,(1996-present)

National Council on Measurement in Education,(1988-present)

American Educational Research Association,(1986-present)

Research Advisory Board, Centre for Evaluation and Monitoring, Durham University, England,(2012-2015)

Founder and Chair, International Project for the Study of Educational Accountability,(2008-2012)

National Research Council, Committee on Incentives and Test-Based Accountability in Education,(2010-2011)

Board of Directors, Graduate Record Examination,(2007-2011)

Program Chair, American Educational Research Association,(2006-2007)

Board on Testing and Assessment, National Research Council,(1999-2006)

Blue Ribbon Panel, NY Performance Assessment Consortium,(2000-2001)

Advisory Panel on Research and Development, The College Entrance Examination Board,(1996-1999)

National Research Council, Committee on Goals 2000 and the Inclusion of Students with Disabilities,(1995-1997)

National Research Council, Committee on Incentives and Test-Based Accountability in Education,(2006-)

Chair, National Research Council, Committee on Embedding Common Test Items in State and District Assessments,(1999-)

National Research Council, Committee on Equivalency and Linkage of Educational Tests,(1998-)

News Stories

Campaign Banner

Learn to Change the World

The Campaign for Harvard Graduate School of Education enables HGSE to fulfill its vision of changing the world through education by expanding opportunity and improving outcomes.

Learn More