A simulation based technique to estimate intracluster correlation for a binary variable

https://doi.org/10.1016/j.cct.2008.07.008Get rights and content

Abstract

Cluster randomized trials have become the design of choice for evaluating the effect of selected interventions on well-known health indicators such as neonatal mortality rate, episiotomy rate, and postpartum hemorrhage rate in a community setting. Determining the sample size of a cluster randomized trial requires a reliable estimate of cluster size and the intracluster correlation (ICC), because sample size can be substantially impacted by these parameters. During the design phase of a trial, the investigators may have estimates of the valid range of the health indicator which is the primary outcome variable. Furthermore, investigators often have an estimate of the average cluster size or range of cluster sizes that exist among the proposed samples they are planning to include in the trial. We present in this article a simulation technique to estimate the ICC value and its distribution for known binary outcome variables and a varying number of clusters and cluster sizes. We applied this technique to estimate ICC values and confidence intervals for a multi-country trial assessing the effect of neonatal resuscitation to decrease seven-day neonatal mortality, where communities within a country were clusters. This simulation technique can be used to estimate the possible ranges of the ICC values and to help to design an appropriately powered trial.

Section snippets

Background

Cluster randomized trials randomize groups such as clinicians, families, medical practices, schools, and communities rather than individuals [1], [2], [3], [4]. There are several advantages to using a cluster randomized design. First, cluster randomized trials help eliminate the inadvertent spread of an intervention to the control group, i.e., contamination, by physically separating control and intervention subjects [5], [6], [7]. Furthermore, cluster randomized designs may aid in cost

Point estimate for the ICC

Several methods have been proposed to estimate the ICC for binary data. These methods include an analysis of variance (ANOVA) estimator, moment estimators, estimators with a direct probabilistic interpretation, estimators based on direct calculation of correlation within each group, and extended quasi-likelihood and pseudo-likelihood estimators. Ridout, Demetrio, and Firth [13] performed an extensive simulation to compare several of these methods for binary data. Their simulation results showed

An example

The FIRST BREATH trial is a cluster randomized controlled trial to assess the effect of training and implementation of a neonatal resuscitation education program for all birth attendants in intervention clusters on seven-day neonatal mortality in communities across the Global Network for Women's and Children's Health Research (Global Network). The Global Network is a multisite, international research network funded by National Institutes of Health (NIH) to conduct maternal and child health

Results

In general, we found that the ICC estimates were very small, with a range of 0.0001 to 0.0015, with the mean 95% upper confidence limit ranging from 0.0011 to 0.0043 consistent with the previously published ICC estimates [24], [25]. Because the mean 95% lower confidence limit in all simulations was less than zero, it was assumed to be zero in all cases. Table 1 presents the mean ICC estimates for 6 to 10 participating research units with 6 to 18 communities per research unit averaging 300 to

Limitations and conclusions

This simulation technique provides a clear estimate of the possible ranges of the ICC values and will assist in designing an appropriately powered trial. This technique is useful during the initial phase of a study when the number of clusters required for the study is unknown and when an ICC estimate is not available from previous studies. Using this technique, some studies may save trial costs by ensuring an adequate sample size with limited baseline data. However there are a few limitations

References (28)

  • M. Campbell et al.

    Sample size calculator for cluster randomized trials

    Comput Biol Med

    (2004)
  • A. Donner et al.

    Methodology for inferences concerning familial correlations: a review

    J Clin Epidemiol

    (1991)
  • M. Campbell et al.

    Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported

    BMC Med Res Methodol

    (2004)
  • H. Chakraborty

    The design and analysis aspects of cluster randomized trials

  • A. Donner

    Some aspects of the design and analysis of cluster randomized trials

    Appl Stat

    (1998)
  • A. Donner et al.

    Pitfalls of and controversies in cluster randomized trials

    Am J Public Health

    (2004)
  • J. Chuang et al.

    Design and analysis of controlled trials in naturally clustered environments

    J Am Med Inform Assoc

    (2002)
  • S. Killip et al.

    What is intracluster correlation coefficient? Crucial concepts for primary care researchers

    Ann Fam Med

    (2004)
  • R. Reading et al.

    Cluster randomised trials in maternal and child health: implications for power and sample size

    Arch Dis Child

    (2000)
  • A. Donner et al.

    Analysis of data arising from a stratified design with the cluster as unit of randomization

    Stat Med

    (1987)
  • R. Hayes et al.

    Simple sample size calculations for cluster randomized trials

    Int J Epidemiol

    (1999)
  • S. Kerry et al.

    Trials which randomize practices I: how should they be analysed?

    Fam Pract

    (1998)
  • B. Baskerville et al.

    The effect of cluster randomization on sample size in prevention research

    J Fam Pract

    (2001)
  • M. Ridout et al.

    Estimating intraclass correlation for binary data

    Biometrics

    (1999)
  • Cited by (23)

    • Faecal contamination of household drinking water in Rwanda: A national cross-sectional study

      2016, Science of the Total Environment
      Citation Excerpt :

      For this purpose, we used thermotolerant coliforms (TTC), a WHO-approved indicator of faecal contamination (WHO, 2011). We used a Monte Carlo simulation in order to generate within-village variance and between-village variance estimates necessary for sample size calculations (Chakraborty et al., 2009). Based on previously collected water quality data from Rwanda (Rosa et al., 2014b), we estimated an average within-village proportion of households with TTC-free drinking water of 40%, with a range of 0% to 100% as parameters for the simulation, as well as average size of a village and variation in size of villages based on a national database (Rwanda Ministry of Local Government, 2011).

    • Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials

      2012, Contemporary Clinical Trials
      Citation Excerpt :

      In this paper, we compare methods of estimating the ICC for binary data, with a focus on application of these methods to community-based cluster randomized trials of cancer prevention interventions with self-reported screening outcomes. There is a profusion of point and interval estimators of the ICC for binary data in the literature; examples include Pendergast et al. [6], Ridout et al. [7], Zou and Donner [8], Turner et al. [9] and Chakraborty et al. [10]. A number of authors have compared the performance of various estimators.

    • Intracluster correlation adjustments to maintain power in cluster trials for binary outcomes

      2009, Contemporary Clinical Trials
      Citation Excerpt :

      The larger the design effect the greater is the number of participants needed for the study [27]. Statistical power can be increased by increasing the average cluster size, but this will only increase the power to a certain point [28,29] after which the increase in power is negligible. The clustered study design requires an estimate of the ICC to determine the required sample size.

    View all citing articles on Scopus

    Grant support: This work was funded through grants from the National Institute of Child Health and Human Development (NICHD) U01 HD40636 and U01 HD043464-01and the Bill and Melinda Gates Foundation.

    View full text