Abstract
Systematic reviews summarize evidence about the effects of social interventions on crime, health, education, and social welfare. Social scientists should also use systematic reviews to study risk factors, which are naturally occurring predictors of these outcomes. To do this, the quality of risk factor research needs to be evaluated. This paper presents three new methodological quality checklists to identify high-quality risk factor research. They are designed so that reviewers can separately summarize the best evidence about correlates, risk factors, and causal risk factors. Studies need appropriate samples and measures to draw valid conclusions about correlates. Studies need prospective longitudinal data to draw valid conclusions about risk factors. And, in the absence of experimental evidence, controlled studies need to compare changes in risk factors over time with changes in outcomes to draw valid conclusions about causal risk factors.
Similar content being viewed by others
Notes
By ‘observational’ we do not mean research based on systematic observation. We mean research of naturally occurring events and experiences, without experimental manipulation of participants.
Factor is a different word for variable. Factors can be measured on a continuous or interval scale, for example intelligence quotient (IQ) scores. They can have nominal or several values, for example, social class categories. Factors can also be dichotomous, for example, living in a ‘broken’ or ‘intact’ home.
A distinction can be made between the time that a variable is measured and the time period that it refers to. For example, a measure of self-reported delinquency administered at age 18 years may refer to offending behavior between the ages of 14 years and 18 years. This would mean that, for example, joining a gang at age 16 years may not strictly be a risk factor for self-reported delinquency measured at age 18 years. The relevant time dimension of a variable is not when it was measured but the time period that it refers to.
Some commentators argue that each score derived from a scale should only reflect one underlying construct (e.g., statistical conclusion validity). However, the aim of a quality scale is not to measure a common underlying trait but, rather, to measure a common effect—confidence that the study results are accurate.
We use some simple quality cut-offs to make the checklist easy to use and understand, although we recognize that they do not capture all issues of methodological quality. For example, a measurement scale can be made more reliable by using many items, excluding inverse items, and presenting all items in the same sequence, although this would result in a poorer quality measure.
Some people argue that fixed risk factors, like a person’s gender, cannot be causal because they cannot change within an individual over time (Holland 1986; Kraemer et al. 2005). Others disagree, based on a counterfactual view of causality (Glymour 1986) and possible mechanisms linking fixed characteristics and behavior (Rutter 2003b). The resolution to this debate lies in philosophical issues beyond the scope of this paper. However, we note that, even if fixed risk factors could be causal, the fact that they are fixed precludes empirical tests of whether changing them causes changes in outcomes. Therefore, for most practical purposes, fixed risk factors cannot be shown to be causal and cannot be targeted in interventions (see, e.g., Farrington 1988).
As well as comparing risk-exposed individuals with an unexposed comparison group, researchers can study variation in risk exposure on an ordinal, interval or continuous scale. A completely unexposed ‘control group’ is not needed in risk research, so long as there is variation in risk exposure that can be compared with variation in the outcome. Hence, when we refer to studies needing to include comparison groups, this condition can also be met by the use of risk variables that capture variation in the level of risk exposure (for example, IQ scores, levels of social class, or intervals of family income).
Most of the methodological findings that we cite here come from intervention research and may or may not be replicated in risk factor research. We encourage work on similar methodological issues (such as the relative validity of one group before–after studies and control group studies) in risk factor research, to evaluate whether the same findings would apply to risk factor research.
Sometimes researchers test whether risk-exposed and comparison groups are similar on covariates without matching. If it is demonstrated that risk-exposed and comparison groups are similar on covariates, the groups can be treated as matched on those covariates. However, it is important to measure effect size as well as statistical significance in this comparison, to guard against the danger of concluding that there is no difference because of low statistical power.
Selection models are another statistical method used to adjust for covariates. They attempt to estimate both observed and unobserved bias caused by covariates (McCartney et al. 2006; Winship and Morgan 1999). However, if results from selection models are compared with those obtained by randomized experiments and other matching and statistical adjustment methods, it seems that selection models are not much more valid than conventional matching and modeling procedures are (Glazerman et al. 2003; Stolzenberg and Relles 1997).
We note that controlled interrupted time–series studies and regression discontinuity designs also potentially provide strong grounds for inferences about causal relationships (Shadish et al. 2002), although they are not included in this checklist. Interrupted time–series studies examine change in a large number of outcome measures from before to after an intervention, in control and treatment conditions. Regression discontinuity designs use knowledge about how treatment assignment was made to draw causal conclusions about its effects. Because these methods were designed for investigating the effects of intervention programs and are not used in risk factor research, we do not include them in the checklist.
References
Altman, D. G. (2001). Systematic reviews in health care—systematic reviews of evaluations of prognostic variables. British Medical Journal, 323, 224–228.
Arceneaux, K., Gerber, A. S., & Green, D. P. (2006). Comparing experimental and matching methods using a large-scale voter mobilization experiment. Political Analysis, 14, 37–62.
Bloom, H. S., Michalopoulos, C., Hill, C. J., & Lei, Y. (2002). Can nonexperimental comparison group methods match the findings from a random assignment evaluation of mandatory welfare-to-work programs? (working paper). New York: Manpower Demonstration Research Corporation.
Boruch, R. F. (1997). Randomized experiments for planning and evaluation. Thousand Oaks, CA: Sage.
Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment (Sage university paper series on quantitative applications in the social sciences no. 17). Thousand Oaks, CA: Sage.
Chalmers, T. C., Smith, H., Blackburn, B., Silverman, B., Schroeder, B., Reitman, D., et al. (1981). A method for assessing the quality of a randomized control trial. Controlled Clinical Trials, 2, 31–49.
Christenfeld, N. J. S., Sloan, R. P., Carroll, D., & Greenland, S. (2004). Risk factors, confounding, and the illusion of statistical control. Psychosomatic Medicine, 66, 868–875.
Concato, J., Feinstein, A. R., & Holford, T. R. (1993). The risk of determining risk with multivariable models. Annals of Internal Medicine, 118, 201–210.
Conn, V. S., & Rantz, M. J. (2003). Research methods: managing primary study quality in meta-analyses. Research in Nursing and Health, 26, 322–333.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: design and analysis issues for field settings. Chicago, IL: Rand-McNally.
Deeks, J. J., Dinnes, J., D’Amico, R., Sowden, A. J., Sakarovitch, C., Song, F., et al. (2003). Evaluating non-randomised intervention studies. Health Technology Assessment, 7(iii-x), 1–173.
Dehejia, R. H., & Wahba, S. (1999). Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. Journal of the American Statistical Association, 94, 1053–1062.
Downs, S. H., & Black, N. (1998). The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. Journal of Epidemiology and Community Health, 52, 377–384.
Egger, M., Davey Smith, G., & Altman, D. G. (Eds.) (2001). Systematic reviews in health care: meta-analysis in context (2nd ed.). London: BMJ.
Farrington, D. P. (1988). Studying changes within individuals: the causes of offending. In M. Rutter (Ed.), Studies of psychosocial risk: the power of longitudinal data (pp. 158–183). Cambridge: Cambridge University Press.
Farrington, D. P. (1989). Self-reported and official offending from adolescence to adulthood. In M. W. Klein (Ed.), Cross-national research on self-reported crime and delinquency (pp. 399–423). Dordrecht, Netherlands: Kluwer.
Farrington, D. P. (2000). Explaining and preventing crime: the globalization of knowledge—the American Society of Criminology 1999 presidential address. Criminology, 38, 1–24.
Farrington, D. P. (2003). Methodological quality standards for evaluation research. Annals of the American Academy of Political and Social Science, 587, 49–68.
Farrington, D. P., & Petrosino, A. (2001). The Campbell Collaboration Crime and Justice Group. Annals of the American Academy of Political and Social Science, 578, 35–49.
Farrington, D. P., Gottfredson, D. C., Sherman, L. W., & Welsh, B. C. (2002a). The Maryland scientific methods scale. In L. W. Sherman, D. P. Farrington, B. C. Welsh, & D. L. MacKenzie (Eds.), Evidence-based crime prevention (pp. 13–21). London: Routledge.
Farrington, D. P., Loeber, R., Yin, Y., & Anderson, S. J. (2002b). Are within-individual causes of delinquency the same as between-individual causes? Criminal Behaviour and Mental Health, 12, 53–68.
Ferriter, M., & Huband, N. (2005). Does the non-randomized controlled study have a place in the systematic review? A pilot study. Criminal Behaviour and Mental Health, 15, 111–120.
Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: Wiley.
Forgatch, M. S., & DeGarmo, D. S. (1999). Parenting through change: an effective prevention program for single mothers. Journal of Consulting and Clinical Psychology, 67, 711–724.
Glasziou, P., Vandenbroucke, J., & Chalmers, I. (2004). Assessing the quality of research. British Medical Journal, 328, 39–41.
Glazerman, S., Levy, D. M., & Myers, D. (2003). Nonexperimental versus experimental estimates of earnings impacts. Annals of the American Academy of Political and Social Science, 589, 63–93.
Glymour, C. (1986). Comment: Statistics and metaphysics. Journal of the American Statistical Association, 81, 964–966.
Hardt, J., & Rutter, M. (2004). Validity of adult retrospective reports of adverse childhood experiences: review of the evidence. Journal of Child Psychology and Psychiatry, 45, 260–273.
Hawton, K., Sutton, L., Haw, C., Sinclair, J., & Deeks, J. J. (2005). Schizophrenia and suicide: systematic review of risk factors. British Journal of Psychiatry, 187, 9–20.
Henry, B., Moffitt, T. E., Caspi, A., & Silva, P. A. (1994). On the remembrance of things past: a longitudinal evaluation of the retrospective method. Psychological Assessment, 6, 92–101.
Higgins, J. P. T., & Green, S. (Eds.). (2006). Cochrane handbook for systematic reviews of interventions 4.2.6 (updated September 2006). In: The Cochrane Library, issue 4, 2006. Chichester, UK: Wiley.
Hill, A. B. (1965). The environment and disease: association or causation? Proceedings of the Royal Society of Medicine, 15, 295–300.
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960.
Jolliffe, D., & Farrington, D. P. (2004). Empathy and offending: a systematic review and meta-analysis. Aggression and Violent Behavior, 9, 441–476.
Jüni, P., Witchi, A., Bloch, R., & Egger, M. (1999). The hazards of scoring the quality of clinical trials for meta-analysis. Journal of the American Medical Association, 282, 1054–1060.
Kazdin, A. E., Kraemer, H. C., Kessler, R. C., Kupfer, D. J., & Offord, D. R. (1997). Contributions of risk-factor research to developmental psychopathology. Clinical Psychology Review, 17, 375–406.
Khan, K. S., ter Riet, G., Popay, J., Nixon, J., & Kleijnen, J. (2001). Study quality assessment. In Centre for Reviews and Dissemination (Ed.), Undertaking systematic reviews of research on effectiveness: CRD’s guidance for those carrying out or commissioning reviews (2nd ed.). York, England: York Publishing Services.
Kraemer, H. C., Kazdin, A. E., Offord, D., Kessler, R. C., Jensen, P. S., & Kupfer, D. J. (1997). Coming to terms with the terms of risk. Archives of General Psychiatry, 54, 337–343.
Kraemer, H. C., Lowe, K. K., & Kupfer, D. J. (2005). To your health: how to understand what research tells us about risk. New York: Oxford University Press.
Labouvie, E. W. (1986). Methodological issues in the prediction of psychopathology: a life span perspective. In L. Erlenmeyer-Kimling & N. E. Miller (Eds.), Life span research on the prediction of psychopathology (pp. 137–155). Hillsdale, NJ: Erlbaum.
Lalonde, R. J. (1986). Evaluating the econometric evaluations of training-programs with experimental data. American Economic Review, 76, 604–620.
Lieberson, S. (1985). Making it count: the improvement of social research and theory. Berkeley, CA: University of California Press.
Lipsey, M. W., & Derzon, J. H. (1998). Predictors of violent or serious delinquency in adolescence and early adulthood: a synthesis of longitudinal research. In D. P. Farrington & R. Loeber (Eds.), Serious and violent juvenile offenders: risk factors and successful interventions (pp. 86–105). Thousand Oaks, CA: Sage.
Lipsey, M. W., & Landenberger, N. A. (2006). Cognitive-behavioral interventions. In B. C. Welsh & D. P. Farrington (Eds.), Preventing crime: what works for children, offenders, victims, and places (pp. 57–71). Dordrecht, The Netherlands: Springer.
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of psychological, educational, and behavioral treatment—confirmation from metaanalysis. American Psychologist, 48, 1181–1209.
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
Loeber, R., & Farrington, D. P. (2008). Advancing knowledge about causes in longitudinal studies: experimental and quasi-experimental methods. In A. M. Liberman (Ed.), The long view of crime: a synthesis of longitudinal research (pp. 257–279). New York: Springer.
Lösel, F., & Beelman, A. (2006). Child social skills training. In B. C. Welsh & D. P. Farrington (Eds.), Preventing crime: what works for children, offenders, victims, and places (pp. 33–54). Dordrecht, Netherlands: Springer.
Lösel, F., & Köferl, P. (1989). Evaluation research on correctional treatment in West Germany: a meta-analysis. In H. Wegener, F. Lösel, & J. Haisch (Eds.), Criminal behavior and the justice system: psychological perspectives (pp. 334–355). New York: Springer.
McCartney, K., Bub, K. L., & Burchinal, M. R. (2006). Selection, detection, and reflection. In K. McCartney, M. R. Burchinal, & K. L. Bub (Eds.), Best practices in quantitative methods for developmentalists. Monographs of the Society for Research in Child Development, Vol. 71, No. 3 (pp. 105–126). Boston, MA: Blackwell.
Moher, D., Jadad, A. R., Nichol, G., Penman, M., Tugwell, P., & Walsh, S. (1995). Assessing the quality of randomized controlled trials. Controlled Clinical Trials, 16, 62–73.
Pelz, D. C., & Andrews, F. M. (1964). Detecting causal priorities in panel study data. American Sociological Review, 29, 836–848.
Perry, A., & Johnson, M. (2008). Applying the consolidated standards of reporting trials (CONSORT) to studies of mental health provision for juvenile offenders: a research note. Journal of Experimental Criminology, 4, 165–185.
Petrosino, A. (2003). Estimates of randomized controlled trials across six areas of childhood intervention: a bibliometric analysis. Annals of the American Academy of Political and Social Science, 589, 190–202.
Petrosino, A., Boruch, R. F., Farrington, D. P., Sherman, L. W., & Weisburd, D. (2003a). Toward evidence-based criminology and criminal justice: systematic reviews, the Campbell Collaboration, and the Crime and Justice Group. International Journal of Comparative Criminology, 3, 42–61.
Petrosino, A., Turpin-Petrosino, C., & Buehler, J. (2003b). Scared Straight and other juvenile awareness programs for preventing juvenile delinquency: a systematic review of the randomized experimental evidence. Annals of the American Academy of Political and Social Science, 589, 41–62.
Petticrew, M., & Roberts, H. (2006). Systematic reviews in the social sciences: a practical guide. Oxford: Blackwell.
Pratt, T. C., McGloin, J. M., & Fearn, N. E. (2006). Maternal cigarette smoking during pregnancy and criminal/deviant behavior: a meta-analysis (vol. 50, pp. 672–690).
Rhee, S. H., & Waldman, I. D. (2002). Genetic and environmental influences on antisocial behavior: a meta-analysis of twin and adoption studies. Psychological Bulletin, 128, 49–529.
Robins, L. N. (1992). The role of prevention experiments in discovering causes of children’s antisocial behavior. In J. McCord & R. E. Tremblay (Eds.), Preventing antisocial behavior: interventions from birth through adolescence (pp. 3–18). New York: Guilford.
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55.
Rubin, D. B., & Thomas, N. (1996). Matching using propensity scores: relating theory to practice. Biometrics, 52, 249–264.
Rutter, M. (1981). Epidemiological/longitudinal strategies and causal research in child-psychiatry. Journal of the American Academy of Child and Adolescent Psychiatry, 20, 513–544.
Rutter, M. (1988). Longitudinal data in the study of causal processes: some uses and some pitfalls. In M. Rutter (Ed.), Studies of psychosocial risk: the power of longitudinal data (pp. 1–28). Cambridge: Cambridge University Press.
Rutter, M. (2003a). Crucial paths from risk indicator to causal mechanism. In B. B. Lahey, T. E. Moffitt, & A. Caspi (Eds.), Causes of conduct disorder and juvenile delinquency (pp. 3–24). New York: Guilford.
Rutter, M. (2003b). Using sex differences in psychopathology to study causal mechanisms: unifying issues and research strategies. Journal of Child Psychology and Psychiatry, 44, 1092–1115.
Sanderson, S., Tatt, I. D., & Higgins, J. P. T. (2007). Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. International Journal of Epidemiology, 36, 666–676.
Shadish, W. R., & Ragsdale, K. (1996). Random versus nonrandom assignment in controlled experiments: do you get the same answer? Journal of Consulting and Clinical Psychology, 64, 1290–1305.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin.
Shah, B. R., Laupacis, A., Hux, J. E., & Austin, P. C. (2005). Propensity score methods gave similar results to traditional regression modeling in observational studies: a systematic review. Journal of Clinical Epidemiology, 58, 550–559.
Sherman, L. W., Gottfredson, D., MacKenzie, D., Eck, J., Reuter, P., & Bushway, S. (1997). Preventing crime: what works, what doesn’t, what’s promising. Report to the U.S. Congress. Washington, DC: US Department of Justice.
Smith, J. A., & Todd, P. E. (2001). Reconciling conflicting evidence on the performance of propensity-score matching methods. American Economic Review, 91, 112–118.
Stolzenberg, R. M., & Relles, D. A. (1997). Tools for intuition about sample selection bias and its correction. American Sociological Review, 62, 494–507.
The Cochrane Collaboration (2007). The name behind the Cochrane Collaboration. Retrieved July, 2007, from http://www.cochrane.org/docs/archieco.htm.
Valentine, J. C., & Cooper, H. (2008). A systematic and transparent approach for assessing the methodological quality of intervention effectiveness research: the study design and implementation assessment device (Study DIAD). Psychological Methods, 13, 130–149.
Wakschlag, L. S., Pickett, K. E., Cook, E., Benowitz, N. L., & Leventhal, B. L. (2002). Maternal smoking during pregnancy and severe antisocial behavior in offspring: a review. American Journal of Public Health, 92, 966–974.
Weisburd, D., Lum, C. M., & Petrosino, A. (2001). Does research design affect study outcomes in criminal justice? Annals of the American Academy of Political and Social Science, 578, 50–70.
Wells, L. E., & Rankin, J. H. (1991). Families and delinquency: a meta-analysis of the impact of broken homes. Social Problems, 38, 71–93.
Wikström, P.-O. H. (2007). In search of causes and explanations of crime. In R. D. King & E. Wincup (Eds.), Doing research on crime and justice (2nd ed., pp. 117–139). Oxford: Oxford University Press.
Wilson, D. B., & Lipsey, M. W. (2001). The role of method in treatment effectiveness research: evidence from meta-analysis. Psychological Methods, 6, 413–429.
Winship, C., & Morgan, S. L. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25, 659–706.
Yarrow, M. R., Campbell, J. D., & Burton, R. V. (1970). Recollections of childhood: a study of the retrospective method. Monographs of the Society for Research in Child Development, 35(iii-iv), 1–83.
Acknowledgments
The authors are grateful to David Humphreys for his help with this paper and to the British Academy and the UK Economic and Social Research Council (grant RES-000-22-2311) for financially supporting the research.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Murray, J., Farrington, D.P. & Eisner, M.P. Drawing conclusions about causes from systematic reviews of risk factors: The Cambridge Quality Checklists. J Exp Criminol 5, 1–23 (2009). https://doi.org/10.1007/s11292-008-9066-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11292-008-9066-0