Table 3

Key considerations for rating certainty in systematic reviews on the effects of complex interventions

RecommendationRationale
Deciding on the scope of the review
 1. Use logic models to develop PICO and review questions
  • Logic models help in scoping, defining and conducting the review and in making the review relevant to policy and practice. Approaches have been developed to assist with this10 33 58 70

 2. Identify which tools to use to best describe the sources of complexity that users will require
  • There are several newly developed tools on using a complexity perspective in systematic reviews, such as the approach by Petticrew et al 2019,10 iCAT_SR, the CICI framework, TIDieR and PRISMA-CI6 63 65 66

 3. Using these tools identify contextual and implementation factors and other moderators of effect that may help explain heterogeneity and which will need separate GRADE certainty ratings
  • In addition to the standard PICO question, identify in both the intervention and the system in which it is being used all the complexities and interactions that review users will want to know about10

  • Under intervention complexities, consider aspects of its implementation, such as theory of why and how the intervention is expected to work, the components, implementers, mediators, moderators, and causal pathways

  • Under system complexities, consider context, setting and any other independent interventions taking place

Defining thresholds or ranges for certainty of evidence ratings
 Define ‘certainty’ in a manner that matches the needs of the intended users of the review
  • Decide among the three approaches to defining certainty of evidence: ‘non-contextualised’, ‘partly contextualised’ and ‘fully contextualised'38

  • In each case, specify the threshold or ranges used to rate certainty of evidence

  • For ‘non-contextualised’ reviews, consider the utility of using GRADE for the ‘non-null’ effect

 3. Using these tools identify contextual and implementation factors and other moderators of effect that may help explain heterogeneity and which will need separate GRADE certainty ratings
  • In addition to the standard PICO question, identify in both the intervention and the system in which it is being used all the complexities and interactions that review users will want to know about10 .

  • Under intervention complexities, consider all aspects of its implementation, including theory of why and how the intervention is expected to work, the process, the components, implementers, moderators, causal pathways (linear and non-linear) and important process outcomes

  • Under system complexities, consider context, setting (eg, individual or population level) and any other independent interventions taking place

Rating certainty of evidence using GRADE
 1. Initially rate any body of evidence as ‘high’ if a rigorous tool is used to assess risk of bias in NRSs (ie, ROBINS-I), otherwise, use the ‘standard’ GRADE guidance
  • Consider using Cochrane Risk of Bias (RoB V.2.0) tool for randomised controlled trials42

  • Consider using ROBINS-I for cohort-type studies41

 2. Give extra scrutiny to the impact of lack of blinding providers/participants on overall risk of bias for outcomes
  • If lack of blinding of either participants of providers is unlikely to affect assessment of outcome (such as when using objective outcome measures, for example, mortality), then consider not downgrading evidence for lack of blinding for that outcome.

 3. Consider the effect of bias associated with deviation from the intended intervention
  • Deviations, such as poor adherence, poor implementation and cointerventions in relation to the effect of starting and adhering to an intervention, may lead to bias and may be downgraded by one level

  • Consider not downgrading if assessing the effect of assignment to the intervention, when deviations do not occur in relation to usual practice and groups remain balanced

 4. Consider multiple criteria for judging inconsistency of evidenceAssessment of heterogeneity should always start off with an appraisal of study heterogeneity, including heterogeneity in PICO elements as well as methodological aspects
  • Assessment of heterogeneity should take account of multiple rather than single criteria for inconsistency (eg, I2 and its p value, overlap of CIs and degree of variation within chosen thresholds)

  • Consider whether definition of certainty of evidence influences nature of inconsistency assessment (eg, when effect sizes across all studies are consistently in the same direction outside of the null effect or a given threshold of interest, then downgrading for inconsistency is not warranted despite other measures)

  • Consider different analytical methods to explain heterogeneity (eg, subgroup analysis, meta-regression and qualitative comparative analysis)

 5. Rate imprecision of evidence with regard to the adopted definition of ‘certainty’
  • Consider whether definition of certainty of evidence influences nature of imprecision assessment38

  • For ‘non-contextualised’ systematic reviews definition, a certainty that the effect lies within estimated CIs or prediction intervals, a GRADE assessment for imprecision can usually be omitted as assessment of precision is dependent on the chosen range

  • For ‘partly contextualised’ systematic reviews, consider whether the point estimate would represent a trivial, small, moderate or large absolute effect

  • For ‘fully contextualised’ systematic reviews, simultaneously consider all important outcomes to determine precision of the effect estimate

 6. Examine indirectness of evidence by way of assessing important differences in the evidence base beyond what is expected
  • Consider grouping studies, synthesising evidence and rating certainty in the estimates of effect for separate outcomes according to the relevant sources of complexity identified at the start of the review

  • Consider splitting the questions to answer subset conditions, downgrading only for those with less certain evidence. Do not downgrade for indirectness if observed differences are unlikely to affect the outcome

 7. Consider publication bias
  • Conduct extensive grey literature searches and expert contacts to identify reports and working papers

  • Consider sponsorship of studies by any vested industries as well as potential ‘allegiance bias’

 8. Upgrading evidence
  • Consider upgrading certainty of evidence for a dose–response relationship related to the level of implementation

  • Consider upgrading evidence for a body of evidence from studies with low implementation fidelity positive results which counteract plausible residual bias or confounding

 Use logic models to investigate coherence of evidence across the causal pathway
  • Consider assessing the coherence of evidence across different links in the causal pathway at the end of evidence synthesis. This judgement should be made outside of the GRADE framework

  • CICI, Context and Implementation of Complex Interventions; GRADE, Grading of Recommendations Assessment, Development and Evaluation; iCAT-SR, Intervention Complexity Assessment Tool for Systematic Reviews; NRS, non-randomised study; PICO, Population, Intervention, Comparison, Outcome; PRISMA-CI, Preferred Reporting Items for Complex Interventions for Systematic Reviews and Meta-analyses; ROBINS-I, risk of bias in non-randomised studies; TIDieR, Template for Intervention Description and Replication.