Chapter 61 Using Randomization in Development Economics Research: A Toolkit★
Introduction
Randomization is now an integral part of a development economist's toolbox. Over the last ten years, a growing number of randomized evaluations have been conducted by economists or with their input. These evaluations, on topics as diverse as the effect of school inputs on learning (Glewwe and Kremer, 2006), the adoption of new technologies in agriculture (Duflo, Kremer and Robinson, 2006), corruption in driving licenses administration (Bertrand et al., 2006), or moral hazard and adverse selection in consumer credit markets (Karlan and Zinman, 2005b), have attempted to answer important policy questions and have also been used by economists as a testing ground for their theories.
Unlike the early “social experiments” conducted in the United States – with their large budgets, large teams, and complex implementations – many of the randomized evaluations that have been conducted in recent years in developing countries have had fairly small budgets, making them affordable for development economists. Working with local partners on a smaller scale has also given more flexibility to researchers, who can often influence program design. As a result, randomized evaluation has become a powerful research tool.
While research involving randomization still represents a small proportion of work in development economics, there is now a considerable body of theoretical knowledge and practical experience on how to run these projects. In this chapter, we attempt to draw together in one place the main lessons of this experience and provide a reference for researchers planning to conduct such projects. The chapter thus provides practical guidance on how to conduct, analyze, and interpret randomized evaluations in developing countries and on how to use such evaluations to answer questions about economic behavior.
This chapter is not a review of research using randomization in development economics.1 Nor is its main purpose to justify the use of randomization as a complement or substitute to other research methods, although we touch upon these issues along the way.2 Rather, it is a practical guide, a “toolkit,” which we hope will be useful to those interested in including randomization as part of their research design.
The outline to the chapter is as follows. In Section 2, we use the now standard “potential outcome” framework to discuss how randomized evaluations overcome a number of the problems endemic to retrospective evaluation. We focus on the issue of selection bias, which arises when individuals or groups are selected for treatment based on characteristics that may also affect their outcomes and makes it difficult to disentangle the impact of the treatment from the factors that drove selection. This problem is compounded by a natural publication bias towards retrospective studies that support prior beliefs and present statistically significant results. We discuss how carefully constructed randomized evaluations address these issues.
In Section 3, we discuss how can randomization be introduced in the field. Which partners to work with? How can pilot projects be used? What are the various ways in which randomization can be introduced in an ethically and politically acceptable manner?
In Section 4, we discuss how researchers can affect the power of the design, or the chance to arrive at statistically significant conclusions. How should sample sizes be chosen? How does the level of randomization, the availability of control variables, and the possibility to stratify, affect power?
In Section 5, we discuss practical design choices researchers will face when conducting randomized evaluation: At what level to randomize? What are the pros and cons of factorial designs? When and what data to collect?
In Section 6 we discuss how to analyze data from randomized evaluations when there are departures from the simplest basic framework. We review how to handle different probability of selection in different groups, imperfect compliance and externalities.
In Section 7 we discuss how to accurately estimate the precision of estimated treatment effects when the data is grouped and when multiple outcomes or subgroups are being considered. Finally in Section 8 we conclude by discussing some of the issues involved in drawing general conclusions from randomized evaluations, including the necessary use of theory as a guide when designing evaluations and interpreting results.
Section snippets
The problem of causal inference
Any attempt at drawing a causal inference question such as “What is the causal effect of education on fertility?” or “What is the causal effect of class size on learning?” requires answering essentially counterfactual questions: How would individuals who participated in a program have fared in the absence of the program? How would those who were not exposed to the program have fared in the presence of the program? The difficulty with these questions is immediate. At a given point in time, an
Incorporating randomized evaluation in a research design
In the rest of this chapter, we discuss how randomized evaluations can be carried out in practice. In this section, we focus on how researchers can introduce randomization in field research in developing countries. Perhaps the most widely used model of randomized research is that of clinical trials conducted by researchers working in laboratory conditions or with close supervision. While there are examples of research following similar templates in developing countries,8
Sample size, design, and the power of experiments
The power of the design is the probability that, for a given effect size and a given statistical significance level, we will be able to reject the hypothesis of zero effect. Sample sizes, as well as other design choices, will affect the power of an experiment.
This section does not intend to provide a full treatment of the question of statistical power or the theory of the design of experiment.12
Practical design and implementation issues
This section discusses various design and implementation issues faced by those conducting randomized evaluations. We begin with the choice of randomization level. Should one randomize over individuals or some larger group? We then discuss cross-cutting designs that test multiple treatments simultaneously within the same sample. Finally, we address some data collection issues.
Analysis with departures from perfect randomization
This section discusses potential threats to the internal validity of randomized evaluation designs, and ways to either eliminate them ex-ante, or handle them in the analysis ex post. Specifically, we discuss how to analyze data when the probability of selection depends on the strata; analysis of randomized evaluations with imperfect compliance; externalities; and attrition.
Inference issues
This section discusses a number of the key issues related to conducting valid inference from randomized evaluations. We begin by returning to the issue of group data addressing how to compute standard errors that account for the grouped structure. We then consider the situation when researchers are interested in assessing a program's impact on several (possibly related) outcome variables. We next turn to evaluating heterogeneous treatment effect across population subgroups, and finally discuss
External validity and generalizing randomized evaluations
Up until now we have mainly focused on issues of internal validity, i.e., whether we can conclude that the measured impact is indeed caused by the intervention in the sample. In this section we discuss external validity – whether the impact we measure would carry over to other samples or populations. In other words, whether the results are generalizable and replicable. While internal validity is necessary for external validity, it is not sufficient. This question has received a lot of attention
References (116)
- et al.
A review of estimates of the schooling/earnings relationship
Labour Economics
(1999) The causal effect of education on earnings
- et al.
Retrospective vs. prospective analyses of school inputs: The case of flip charts in Kenya
Journal of Development Economics
(2004) Schooling as experimentation: A reappraisal of the postsecondary dropout phenomenon
Economics of Education Review
(1989)Lifetime earnings and the Vietnam era draft lottery: Evidence from social security administrative records
American Economic Review
(1990)Estimating the labor market impact of voluntary military service using social security data on military applicants
Econometrica
(1998)- et al.
Long-term educational consequences of secondary school vouchers: Evidence from administrative records in Colombia
American Economic Review
(2006) - et al.
Identification and estimation of local average treatment effects
Econometrica
(1994) - et al.
Two-stage least squares estimation of average causal effects in models with variable treatment intensity
Journal of the American Statistical Association
(1995) - et al.
Identification of causal effects using instrumental variables
Journal of the American Statistical Association
(1996)
Vouchers for private schooling in Colombia: Evidence from a randomized natural experiment
American Economic Review
Tying Odysseus to the mast: Evidence from a commitment savings product in the Philippines
Quarterly Journal of Economics
New development economics and the challenge to theory
Economic and Political Weekly
Addressing absence
Journal of Economic Perspectives
Remedying education: Evidence from two randomized experiments in India
Quarterly Journal of Economics
What's psychology worth? A field experiment in the consumer credit market
Minimum detectable effects: A simple way to report the statistical power of experimental designs
Evaluation Review
Learning more from social experiments
Reforms as experiments
American Psychologist
Myth and Measurement: The New Economics of the Minimum Wage
Women as policy makers: Evidence from a randomized policy experiment in India
Econometrica
Statistical Power Analysis for the Behavioral Science
Theory of the Design of Experiments
The Analysis of Household Surveys
Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs
Journal of the American Statistical Association
Are all economic hypotheses false?
Journal of Political Economy
An assessment of propensity score matching as a non-experimental impact estimator: Evidence from Mexico's Progresa program
Journal of Human Resources
Effect of treatment for intestinal helminth infection on growth and cognitive performance in children: systematic review of randomized trials
British Medical Journal
Accelerating development
Use of randomization in the evaluation of development effectiveness
How much should we trust difference-in-differences estimates?
Quarterly Journal of Economics
The role of information and social interactions in retirement plan decisions: Evidence from a randomized experiment
Quarterly Journal of Economics
The arrangement of field experiments
Journal of the Ministry of Agriculture
Cited by (709)
A mother's voice: Impacts of spousal communication training on child health investments
2024, Journal of Development EconomicsTaking Gerschenkron to the Field: Attitudes towards Digitalization Hopes and Fears about the Future of Work in Ghana
2024, Telecommunications PolicyDoes education prevent job loss during downturns? Evidence from exogenous school assignments and COVID-19 in Barbados
2024, European Economic ReviewEvaluating the impact of improved crop varieties in the Sahelian farming systems of Niger
2023, Journal of Agriculture and Food ResearchSemiparametric estimation of long-term treatment effects
2023, Journal of Econometrics
- ★
We thank the editor T.Paul Schultz, as well Abhijit Banerjee, Guido Imbens and Jeffrey Kling for extensive discussions, David Clingingsmith, Greg Fischer, Trang Nguyen and Heidi Williams for outstanding research assistance, and Paul Glewwe and Emmanuel Saez, whose previous collaboration with us inspired parts of this chapter.