Chapter 61 Using Randomization in Development Economics Research: A Toolkit

https://doi.org/10.1016/S1573-4471(07)04061-2Get rights and content

Abstract

This paper is a practical guide (a toolkit) for researchers, students and practitioners wishing to introduce randomization as part of a research design in the field. It first covers the rationale for the use of randomization, as a solution to selection bias and a partial solution to publication biases. Second, it discusses various ways in which randomization can be practically introduced in a field settings. Third, it discusses designs issues such as sample size requirements, stratification, level of randomization and data collection methods. Fourth, it discusses how to analyze data from randomized evaluations when there are departures from the basic framework. It reviews in particular how to handle imperfect compliance and externalities. Finally, it discusses some of the issues involved in drawing general conclusions from randomized evaluations, including the necessary use of theory as a guide when designing evaluations and interpreting results.

Introduction

Randomization is now an integral part of a development economist's toolbox. Over the last ten years, a growing number of randomized evaluations have been conducted by economists or with their input. These evaluations, on topics as diverse as the effect of school inputs on learning (Glewwe and Kremer, 2006), the adoption of new technologies in agriculture (Duflo, Kremer and Robinson, 2006), corruption in driving licenses administration (Bertrand et al., 2006), or moral hazard and adverse selection in consumer credit markets (Karlan and Zinman, 2005b), have attempted to answer important policy questions and have also been used by economists as a testing ground for their theories.

Unlike the early “social experiments” conducted in the United States – with their large budgets, large teams, and complex implementations – many of the randomized evaluations that have been conducted in recent years in developing countries have had fairly small budgets, making them affordable for development economists. Working with local partners on a smaller scale has also given more flexibility to researchers, who can often influence program design. As a result, randomized evaluation has become a powerful research tool.

While research involving randomization still represents a small proportion of work in development economics, there is now a considerable body of theoretical knowledge and practical experience on how to run these projects. In this chapter, we attempt to draw together in one place the main lessons of this experience and provide a reference for researchers planning to conduct such projects. The chapter thus provides practical guidance on how to conduct, analyze, and interpret randomized evaluations in developing countries and on how to use such evaluations to answer questions about economic behavior.

This chapter is not a review of research using randomization in development economics.1 Nor is its main purpose to justify the use of randomization as a complement or substitute to other research methods, although we touch upon these issues along the way.2 Rather, it is a practical guide, a “toolkit,” which we hope will be useful to those interested in including randomization as part of their research design.

The outline to the chapter is as follows. In Section 2, we use the now standard “potential outcome” framework to discuss how randomized evaluations overcome a number of the problems endemic to retrospective evaluation. We focus on the issue of selection bias, which arises when individuals or groups are selected for treatment based on characteristics that may also affect their outcomes and makes it difficult to disentangle the impact of the treatment from the factors that drove selection. This problem is compounded by a natural publication bias towards retrospective studies that support prior beliefs and present statistically significant results. We discuss how carefully constructed randomized evaluations address these issues.

In Section 3, we discuss how can randomization be introduced in the field. Which partners to work with? How can pilot projects be used? What are the various ways in which randomization can be introduced in an ethically and politically acceptable manner?

In Section 4, we discuss how researchers can affect the power of the design, or the chance to arrive at statistically significant conclusions. How should sample sizes be chosen? How does the level of randomization, the availability of control variables, and the possibility to stratify, affect power?

In Section 5, we discuss practical design choices researchers will face when conducting randomized evaluation: At what level to randomize? What are the pros and cons of factorial designs? When and what data to collect?

In Section 6 we discuss how to analyze data from randomized evaluations when there are departures from the simplest basic framework. We review how to handle different probability of selection in different groups, imperfect compliance and externalities.

In Section 7 we discuss how to accurately estimate the precision of estimated treatment effects when the data is grouped and when multiple outcomes or subgroups are being considered. Finally in Section 8 we conclude by discussing some of the issues involved in drawing general conclusions from randomized evaluations, including the necessary use of theory as a guide when designing evaluations and interpreting results.

Section snippets

The problem of causal inference

Any attempt at drawing a causal inference question such as “What is the causal effect of education on fertility?” or “What is the causal effect of class size on learning?” requires answering essentially counterfactual questions: How would individuals who participated in a program have fared in the absence of the program? How would those who were not exposed to the program have fared in the presence of the program? The difficulty with these questions is immediate. At a given point in time, an

Incorporating randomized evaluation in a research design

In the rest of this chapter, we discuss how randomized evaluations can be carried out in practice. In this section, we focus on how researchers can introduce randomization in field research in developing countries. Perhaps the most widely used model of randomized research is that of clinical trials conducted by researchers working in laboratory conditions or with close supervision. While there are examples of research following similar templates in developing countries,8

Sample size, design, and the power of experiments

The power of the design is the probability that, for a given effect size and a given statistical significance level, we will be able to reject the hypothesis of zero effect. Sample sizes, as well as other design choices, will affect the power of an experiment.

This section does not intend to provide a full treatment of the question of statistical power or the theory of the design of experiment.12

Practical design and implementation issues

This section discusses various design and implementation issues faced by those conducting randomized evaluations. We begin with the choice of randomization level. Should one randomize over individuals or some larger group? We then discuss cross-cutting designs that test multiple treatments simultaneously within the same sample. Finally, we address some data collection issues.

Analysis with departures from perfect randomization

This section discusses potential threats to the internal validity of randomized evaluation designs, and ways to either eliminate them ex-ante, or handle them in the analysis ex post. Specifically, we discuss how to analyze data when the probability of selection depends on the strata; analysis of randomized evaluations with imperfect compliance; externalities; and attrition.

Inference issues

This section discusses a number of the key issues related to conducting valid inference from randomized evaluations. We begin by returning to the issue of group data addressing how to compute standard errors that account for the grouped structure. We then consider the situation when researchers are interested in assessing a program's impact on several (possibly related) outcome variables. We next turn to evaluating heterogeneous treatment effect across population subgroups, and finally discuss

External validity and generalizing randomized evaluations

Up until now we have mainly focused on issues of internal validity, i.e., whether we can conclude that the measured impact is indeed caused by the intervention in the sample. In this section we discuss external validity – whether the impact we measure would carry over to other samples or populations. In other words, whether the results are generalizable and replicable. While internal validity is necessary for external validity, it is not sufficient. This question has received a lot of attention

References (116)

  • J.D. Angrist et al.

    Vouchers for private schooling in Colombia: Evidence from a randomized natural experiment

    American Economic Review

    (2002)
  • N. Ashraf et al.

    Tying Odysseus to the mast: Evidence from a commitment savings product in the Philippines

    Quarterly Journal of Economics

    (2006)
  • Attanasio, O., Meghir, C., Santiago, A. (2005). “Education choices in Mexico: Using a structural model and a randomised...
  • A. Banerjee

    New development economics and the challenge to theory

    Economic and Political Weekly

    (2006)
  • A. Banerjee et al.

    Addressing absence

    Journal of Economic Perspectives

    (2006)
  • Banerjee, A., Bardhan, P., Basu, K., Kanbur, R., Mookherjee, D. (2005). “New directions in development economics:...
  • A. Banerjee et al.

    Remedying education: Evidence from two randomized experiments in India

    Quarterly Journal of Economics

    (2007)
  • Bardhan, P. (2005). “Theory or empirics in development economics”. Mimeo. University of California at...
  • Basu, K. (2005). “The new empirical development economics: Remarks on its philosophical foundations”. Economic and...
  • M. Bertrand et al.

    What's psychology worth? A field experiment in the consumer credit market

  • Bertrand, M., Djankov, S., Hanna, R., Mullainathan, S. (2006). “Does corruption produce unsafe drivers?” Working paper...
  • Bhushan, I., Keller, S., Schwartz, B. (2002). “Achieving the twin objectives of efficiency and equity: Contracting...
  • Bloom, E., Bhushan, I., Clingingsmith, D., Hung, R., King, E., Kremer, M., Loevinsohn, B., Schwartz, B. (2006)....
  • H.S. Bloom

    Minimum detectable effects: A simple way to report the statistical power of experimental designs

    Evaluation Review

    (1995)
  • H.S. Bloom

    Learning more from social experiments

  • Bobonis, G.J., Miguel, E., Sharma, C.P. (2004). “Iron deficiency anemia and school participation”. Paper No. 7. Poverty...
  • Buddlemeyer, H., Skofias, E. (2003). “An evaluation on the performance of regression discontinuity design on progresa”....
  • D.T. Campbell

    Reforms as experiments

    American Psychologist

    (1969)
  • D. Card et al.

    Myth and Measurement: The New Economics of the Minimum Wage

    (1995)
  • R. Chattopadhyay et al.

    Women as policy makers: Evidence from a randomized policy experiment in India

    Econometrica

    (2004)
  • J. Cohen

    Statistical Power Analysis for the Behavioral Science

    (1988)
  • Cook, T.D., Shadish, W.R., Wong, V.C. (2006). “Within study comparisons of experiments and non-experiments: Can they...
  • D. Cox et al.

    Theory of the Design of Experiments

    (2000)
  • Das, J., Krishnan, P., Habyarimana, J., Dercon, S. (2004). “When can school inputs improve test scores?”. Working paper...
  • A. Deaton

    The Analysis of Household Surveys

    (1997)
  • R.H. Dehejia et al.

    Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs

    Journal of the American Statistical Association

    (1999)
  • J.B. DeLong et al.

    Are all economic hypotheses false?

    Journal of Political Economy

    (1992)
  • J.J. Diaz et al.

    An assessment of propensity score matching as a non-experimental impact estimator: Evidence from Mexico's Progresa program

    Journal of Human Resources

    (2006)
  • R. Dickson et al.

    Effect of treatment for intestinal helminth infection on growth and cognitive performance in children: systematic review of randomized trials

    British Medical Journal

    (2000)
  • Donald, S., Lang, K. (2001). “Inference with differences-in-differences and other panel data”. Discussion paper. Boston...
  • E. Duflo

    Accelerating development

  • Duflo, E. (2006). “Field experiments in development economics”. Discussion...
  • Duflo, E., Hanna, R. (2006). “Monitoring works: Getting teachers to come to school”. Working paper No. 11880....
  • E. Duflo et al.

    Use of randomization in the evaluation of development effectiveness

  • Duflo, E., Kremer, M., Robinson, J. (2006). “Understanding technology adoption: Fertilizer in western Kenya,...
  • E. Duflo et al.

    How much should we trust difference-in-differences estimates?

    Quarterly Journal of Economics

    (2004)
  • E. Duflo et al.

    The role of information and social interactions in retirement plan decisions: Evidence from a randomized experiment

    Quarterly Journal of Economics

    (2003)
  • Duflo, E., Dupas, P., Kremer, M., Sinei, S. (2006). “Education and HIV/AIDS prevention: Evidence from a randomized...
  • Dupas P. (2006). “Relative risks and the market for sex: Teenagers, sugar daddies, and HIV in Kenya”. Mimeo, Dartmouth...
  • R.A. Fisher

    The arrangement of field experiments

    Journal of the Ministry of Agriculture

    (1926)
  • Cited by (709)

    View all citing articles on Scopus

    We thank the editor T.Paul Schultz, as well Abhijit Banerjee, Guido Imbens and Jeffrey Kling for extensive discussions, David Clingingsmith, Greg Fischer, Trang Nguyen and Heidi Williams for outstanding research assistance, and Paul Glewwe and Emmanuel Saez, whose previous collaboration with us inspired parts of this chapter.

    View full text