Elsevier

The Lancet

Volume 365, Issue 9471, 7–13 May 2005, Pages 1657-1661
The Lancet

Series
Multiplicity in randomised trials II: subgroup and interim analyses

https://doi.org/10.1016/S0140-6736(05)66516-6Get rights and content

Summary

Subgroup analyses can pose serious multiplicity concerns. By testing enough subgroups, a false-positive result will probably emerge by chance alone. Investigators might undertake many analyses but only report the significant effects, distorting the medical literature. In general, we discourage subgroup analyses. However, if they are necessary, researchers should do statistical tests of interaction, rather than analyse every separate subgroup. Investigators cannot avoid interim analyses when data monitoring is indicated. However, repeatedly testing at every interim raises multiplicity concerns, and not accounting for multiplicity escalates the false-positive error. Statistical stopping methods must be used. The O'Brien-Fleming and Peto group sequential stopping methods are easily implemented and preserve the intended α level and power. Both adopt stringent criteria (low nominal p values) during the interim analyses. Implementing a trial under these stopping rules resembles a conventional trial, with the exception that it can be terminated early should a treatment prove greatly superior. Investigators and readers, however, need to grasp that the estimated treatment effects are prone to exaggeration, a random high, with early stopping.

Section snippets

Subgroup analyses

Indiscriminate subgroup analyses pose serious multiplicity concerns. Problems reverberate throughout the medical literature. Even after many warnings,2 some investigators doggedly persist in undertaking excessive subgroup analyses.

Investigators define subgroups of participants by characteristics at baseline. They then do analyses to assess whether treatment effects differ in these subgroups. The major problems stem from investigators undertaking statistical tests within every subgroup examined.

Interim analyses

Appropriate monitoring of trials involves more than statistical warnings for stopping. Indeed, the superiority or inferiority of the studied treatment has a major role. However, slow accrual, poor data quality, poor adherence, resource deficiencies, unacceptable adverse effects, fraud, and emerging information that make the trial irrelevant, unnecessary, or unethical, all could lead to stopping a trial. The decision process is clearly complex.12, 13 It best resides with an independent data

References (27)

  • SJ Pocock et al.

    Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems

    Stat Med

    (2002)
  • DG Altman et al.

    Interaction revisited: the difference between two estimates

    BMJ

    (2003)
  • SJ Pocock

    Clinical trials: a practical approach

    (1983)
  • Cited by (263)

    • Systematic review and meta-analysis

      2023, Substance Use and Addiction Research: Methodology, Mechanisms, and Therapeutics
    View all citing articles on Scopus
    View full text