Article Text

Predictors of disease severity in children presenting from the community with febrile illnesses: a systematic review of prognostic studies
  1. Arjun Chandna1,2,
  2. Rainer Tan3,4,5,
  3. Michael Carter6,
  4. Ann Van Den Bruel7,
  5. Jan Verbakel7,8,
  6. Constantinos Koshiaris8,
  7. Nahya Salim9,10,
  8. Yoel Lubell2,11,
  9. Paul Turner1,2,
  10. Kristina Keitel5,12
  1. 1Cambodia-Oxford Medical Research Unit, Angkor Hospital for Children, Siem Reap, Cambodia
  2. 2Centre for Tropical Medicine and Global Health, University of Oxford, Oxford, UK
  3. 3Unisanté Centre for Primary Care and Public Health, University of Lausanne, Lausanne, Switzerland
  4. 4University of Basel, Basel, Switzerland
  5. 5Swiss Tropical and Public Health Institute, Basel, Basel-Stadt, Switzerland
  6. 6Department of Women and Children's Health, King's College London, London, UK
  7. 7Academic Centre of General Practice, University of Leuven, Leuven, Flanders, Belgium
  8. 8Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
  9. 9Ifakara Health Institute, Dar-es-Salaam, Tanzania
  10. 10Department of Pediatrics and Child Health, Muhimbili University Health and Allied Sciences, Dar-es-Salaam, Tanzania
  11. 11Mahidol-Oxford Tropical Medicine Research Unit, Bangkok, Thailand
  12. 12Division of Emergency Medicine, Department of Pediatrics, University Children's Hospital, Inselpital, University of Bern, Bern, Switzerland
  1. Correspondence to Dr Arjun Chandna; arjun{at}tropmedres.ac; arjunchandna{at}gmail.com

Abstract

Introduction Early identification of children at risk of severe febrile illness can optimise referral, admission and treatment decisions, particularly in resource-limited settings. We aimed to identify prognostic clinical and laboratory factors that predict progression to severe disease in febrile children presenting from the community.

Methods We systematically reviewed publications retrieved from MEDLINE, Web of Science and Embase between 31 May 1999 and 30 April 2020, supplemented by hand search of reference lists and consultation with an expert Technical Advisory Panel. Studies evaluating prognostic factors or clinical prediction models in children presenting from the community with febrile illnesses were eligible. The primary outcome was any objective measure of disease severity ascertained within 30 days of enrolment. We calculated unadjusted likelihood ratios (LRs) for comparison of prognostic factors, and compared clinical prediction models using the area under the receiver operating characteristic curves (AUROCs). Risk of bias and applicability of studies were assessed using the Prediction Model Risk of Bias Assessment Tool and the Quality In Prognosis Studies tool.

Results Of 5949 articles identified, 18 studies evaluating 200 prognostic factors and 25 clinical prediction models in 24 530 children were included. Heterogeneity between studies precluded formal meta-analysis. Malnutrition (positive LR range 1.56–11.13), hypoxia (2.10–8.11), altered consciousness (1.24–14.02), and markers of acidosis (1.36–7.71) and poor peripheral perfusion (1.78–17.38) were the most common predictors of severe disease. Clinical prediction model performance varied widely (AUROC range 0.49–0.97). Concerns regarding applicability were identified and most studies were at high risk of bias.

Conclusions Few studies address this important public health question. We identified prognostic factors from a wide range of geographic contexts that can help clinicians assess febrile children at risk of progressing to severe disease. Multicentre studies that include outpatients are required to explore generalisability and develop data-driven tools to support patient prioritisation and triage at the community level.

PROSPERO registration number CRD42019140542.

  • paediatrics
  • child health
  • public health
  • infections
  • diseases
  • disorders
  • injuries
  • systematic review

Data availability statement

Data are available upon request. All data relevant to the study are included in the article or uploaded as supplementary information. The protocol for the study is available from: https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=140542.

https://creativecommons.org/licenses/by/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key questions

What is already known?

  • An increasing number of clinical decision-support algorithms and risk stratification tools integrate clinical and laboratory predictors to guide healthcare workers in their assessment of febrile children.

  • Which prognostic factors—alone or as components of clinical prediction models—best identify children at risk of developing severe febrile illness is not clear.

  • Previous systematic reviews have focused on diagnostic studies and used imperfect reference standards for severe disease.

What are the new findings?

  • Malnutrition, hypoxia, altered consciousness, and bedside markers of acidosis and poor peripheral perfusion were the most commonly identified predictors of severe disease.

  • Clinical prediction model performance varied—the best performing models being those evaluated in similar settings and using similar outcomes as the original derivation studies.

  • The prognostic factors and clinical prediction models identified in this study reflect children with relatively advanced illnesses and hence the degree to which they can inform community triage and prioritisation strategies is unclear.

Key questions

What do the new findings imply?

  • The studies included in this systematic review, together with other studies, highlight the importance of not over interpreting prognostic performance of individual predictors, which vary across different epidemiological contexts.

  • If prediction models and decision-support algorithms are to be used as an adjunct to clinical assessment, they must be derived and validated using populations and outcomes appropriate to the clinical problem.

  • To improve identification of children at risk of developing severe febrile illness, this will require multiple, large, collaborative research initiatives, which collect harmonised yet contextualised data on predictors and outcomes, and include unselected children presenting from the community.

Introduction

Acute febrile illnesses are among the most common reasons that parents seek medical care for their children.1 2 While most episodes are mild, an important minority of children progress to severe disease. Early recognition of low-incidence serious disease is challenging,3 especially in many tropical settings where health workers receive limited training, patient volumes are high, diagnostic capacity is poor and different acute febrile syndromes are often clinically indistinguishable.4 5

Clinical and laboratory prognostic factors that enable early and accurate identification of children at risk of developing severe disease could improve patient outcomes and reduce resource misallocation.6 7 An increasing number of clinical decision-support algorithms and risk stratification tools integrate clinical and laboratory predictors to guide referral, admission and treatment decisions.8 While no unified strategy exists to guide selection of candidate predictors, those already reported as prognostic should normally be considered.9

Previous reviews have evaluated predictors of ‘serious bacterial infections’.10 11 However, these studies are diagnostic rather than prognostic.9 Furthermore, ‘serious bacterial infection’ is an imperfect measure of disease severity: microbiological tests for bacterial infections lack sensitivity, especially in settings with high antibiotic consumption; ‘serious bacterial infections’ are not always severe (eg, children with enteric fever are often successfully managed as outpatients) and severe febrile illnesses are frequently caused by non-bacterial pathogens, especially in low/middle-income countries (LMICs),4 12 in part secondary to the introduction of widespread vaccination against prevalent bacterial pathogens of childhood.13

We performed a systematic review to identify which clinical and laboratory factors—alone or as part of clinical prediction models—predict progression to severe disease in febrile children presenting from the community to a community health worker, primary health centre or hospital outpatient or emergency department. Our aim was to understand which prognostic factors might support health workers faced with this difficult and common clinical question and to inform variable selection for future prospective studies aiming to develop data-driven triage tools.

Methods

Protocol and registration

The methods for this systematic review were specified in advance (PROSPERO protocol: CRD42019140542) and adhere to the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS),14 a modification of CHARMS for prognostic factor studies (CHARMS-PF),15 Quality In Prognosis Studies (QUIPS)16 and Prediction Model Risk of Bias Assessment Tool (PROBAST) guidelines.17 The report has been prepared in accordance with Preferred Reporting Items for Systematic Review and Meta-Analysis guidelines.18

Eligibility criteria

All prognostic studies (prognostic factor and clinical prediction model) including ≥20 patients were eligible. Our target population was children aged >28 days and <19 years, presenting from the community with an acute febrile illness (documented abnormal temperature (fever or hypothermia) or history of fever) or suspected sepsis. While sepsis is not always well defined in children,19 ‘suspected sepsis’ was included along with febrile children so as to include all children with suspected infection. Studies were excluded if disaggregated paediatric data were not presented or patients were recruited partway through receipt of inpatient treatment, as the aim of the review was to identify prognostic variables measured at presentation. Studies that only evaluated specific clinical syndromes (eg, neurological presentations, acute respiratory infections and so on) or particular pathogens (eg, Plasmodium spp, influenza and so on) were not included.

Studies measuring predictors at presentation to care were included. Studies where authors identified that a substantial proportion of participants were recruited following transfer from another health facility were excluded. Demographic, anthropometric, socioeconomic, clinical and historical variables were considered, as well as laboratory parameters measured at presentation to care. Studies only reporting variables that would not be available at the time of presentation to care (eg, blood culture results) were excluded.

The primary outcome was any objective measure of disease severity occurring within 30 days of measurement of the predictors or during hospitalisation. Studies assessing outcome at the same time point as baseline predictor measurements (diagnostic studies) were excluded.

Search strategy and selection criteria

We searched MEDLINE, Embase and Web of Science databases, without language restriction, for publications between 31 May 1999 and 30 April 2020 (initial search to 31 May 2019; updated search to 30 April 2020). We followed Cochrane Prognosis Methods Group recommendations to build our search strategy (online supplemental appendix S1), structured according to the ‘populations, interventions, comparators, outcomes, timing and setting’ (PICOTS) framework and adapted published search strings as appropriate.20–22 The search strategy was peer-reviewed by an independent Technical Advisory Panel (online supplemental appendix S2).

Supplemental material

Study selection

Title, abstract and full-text screening were performed independently by two reviewers (AC and RT). Agreement was checked after the first 20 and 250 articles. Discrepancies were resolved by discussion or independent assessment by a third reviewer (KK).

Eligible studies and relevant review articles were ‘snowballed’ (forward and reverse crosschecking of reference lists) to identify additional studies. The list of eligible studies was presented to the Technical Advisory Panel who were asked to identify obvious omissions and suggest key authors whose publication lists were subsequently reviewed for additional eligible studies (online supplemental appendix S2).

Data collection process

Data extraction sheets were developed based on the CHARMS and CHARMS-PF checklists (online supplemental appendix S3).14 15 Data were extracted independently by one reviewer (AC or RT) and checked by the other. Discrepancies were discussed and resolved between the two reviewers. Authors of studies not reporting likelihood ratios (LRs) (prognostic factors) or area under the receiver operating characteristic curves (AUROCs) (clinical prediction models), or the data to allow their calculation, were contacted. Seven authors responded to requests for clarifications and six provided additional data not available in the published manuscript. All predictors were harmonised using the Systematised Nomenclature of Medicine Clinical Terms (SNOMED-CT).

Data analysis: prognostic factors

Contingency tables were constructed and positive likelihood ratio (PLR) and negative likelihood ratio (NLR) calculated for each prognostic factor. In the case of an empty cell, 0.5 was added to each cell (Haldane-Anscombe correction). CIs were calculated on the basis of the SE of a proportion (Stata V.16.0). LRs were selected as the principal effect estimate as they allow estimation of post-test probabilities, are independent of prevalence, are intuitive for clinicians and are frequently used to compare performance of predictors in diagnostic and prognostic studies.10 11 23 24 Prognostic factors are presented in the main analysis if at least one study reported a PLR ≥5.0 (ie, a rule-in test), or a NLR ≤0.2 (ie, a rule-out test).23 To contextualise the results, we used the outcome prevalence of individual studies to calculate the pre-test probability, and display positive and negative post-test probabilities on dumbbell plots (R V.3.6.1).

Data analysis: clinical prediction models

For clinical prediction models, AUROCs are presented on forest plots (Stata V.16.0). When available, we present LRs for different thresholds of the models in online supplemental appendix S4.

Synthesis of results

Due to expected heterogeneity between studies (as a result of variations in case-mix and baseline risk), few common predictors for comparison and absence of well-defined subgroups, no formal meta-analysis nor comparison of variability and bias between studies was planned, as these comparisons are recognised as being prone to bias.25 Qualitative comparisons are described considering major differences between populations and study design. Prevalence of severe disease was used to group studies into low (<2.5%), moderate (2.5%–7.5%) and high (>7.5%) prevalence settings, as a proxy for the case-mix and level of care.

Quality assessment

Risk of bias and applicability of studies were assessed using the QUIPS tool for prognostic factor studies,16 and PROBAST for studies developing, validating or updating prediction models.17 Each study was independently assessed using QUIPS or PROBAST by two reviewers (AC and RT), as well as an independent senior reviewer (MC, AVDB or JV). All discrepancies were resolved by discussion. For prognostic factor studies (QUIPS), risk of bias was categorised as low, medium or high, while in clinical prediction model studies (PROBAST) risk was categorised as low, high or unclear. For all studies, applicability was assessed as being of high, low or unclear concern.

Role of the funding source

The funders had no role in study design, data collection, data analysis, data interpretation or writing of the report. The co-primary authors (AC and RT) had full access to the data and final responsibility for the decision to submit for publication.

Patient and public involvement

Neither patients nor members of the public were directly involved in the conduct of this work.

Results

The electronic search retrieved 5930 articles, and 19 additional articles were identified through snowballing and expert consultation (figure 1). Eighteen studies were included in the review: 16 studies evaluated 200 prognostic factors, from 75 SNOMED-CT categories,12 26–38 and eight evaluated 33 clinical prediction model/outcome pairs, using 25 distinct models.27 29 31 38–42

Figure 1

Selection of studies. Only one reason for exclusion per study is listed. CPM, clinical prediction model; PF, prognostic factor.

In total 24 530 children were included, with overlap across eight studies.26 31 32 34–37 40 The majority (11/18) included only hospitalised patients. Two studies recruited children from primary care,29 33 and five recruited both children admitted and those sent home directly from hospital outpatient or emergency departments.28 35–37 40 Seven studies included children aged 5 years and under,27 30 32–34 39 42 with the remainder including patients up to 19 years of age. Definition of fever varied between studies, ranging from an axillary temperature (or equivalent) of ≥37.5°C to >38.1°C. Five studies did not include a temperature measurement in their eligibility criteria and enrolled children on the basis of suspected infection or sepsis.35 38 40–42 Eight studies were conducted in sub-Saharan Africa,26 27 31–34 41 42 four in North America,35–37 40 three in Europe,29 30 38 two in Asia12 39 and one in Latin America.28 Six were multicentre studies.12 26 31 33 40 42 Most used ‘hard’ outcomes to define severe disease, such as mortality, organ dysfunction or need for organ support, while four used ‘softer’ outcomes, such as prolonged length of stay or persistence of symptoms.29 30 33 38 Characteristics of the 18 studies are summarised in table 1.

Table 1

Characteristics of included studies

Prognostic factors

Figures 2–4 present prognostic factors identified as having rule-in (PLR ≥5.0) or rule-out (NLR ≤0.2) value in at least one study. Prognostic factors that met neither of these pre-specified cut-offs are presented in online supplemental appendix S5. In settings with moderate prevalence of severe disease, both high lactate (PLR range 4.97–5.13) and hypoglycaemia (PLR range 12.63–13.36) were useful for ruling in severe disease,32 34 37 whereas a lactate ≤5 mM was more useful as a rule-out test (NLR 0.13) among a population in whom prevalence of severe disease was high (febrile children with signs of poor organ perfusion).26

Figure 2

Prognostic factors identified as having rule-in (PLR ≥5.0) or rule-out (NLR ≤0.2) value for severe disease in at least one study—laboratory tests. mM, millimolar; NLR, negative likelihood ratio; PICU, paediatric intensive care unit; PLR, positive likelihood ratio.

Figure 3

Prognostic factors identified as having rule-in (PLR ≥5.0) or rule-out (NLR ≤0.2) value for severe disease in at least one study—cardiovascular, respiratory or neurological signs. in the study by Costa et al. ‘sepsis’ was defined according to the systemic inflammatory response syndrome (SIRS), requiring measurement of heart rate, respiratory rate, temperature and leucocyte count. In the study by Kwizera et al, ‘sepsis’ was defined according to the qSOFA Score in children aged ≥15 years, and using a combination of temperature, mental status, respiratory distress, prostration and seizures in children aged <15 years. AVPU, alert, voice, pain or unresponsive; BCS, Blantyre Coma Score; bpm, beats per minute; CRT, capillary refill time; GCS, Glasgow Coma Score; HR, heart rate; LLTG, lower limb temperature gradient; NLR, negative likelihood ratio; PICU, paediatric intensive care unit; PLR, positive likelihood ratio; qSOFA, quick Sequential Organ Failure Assessment.

Figure 4

Prognostic factors identified as having rule-in (PLR ≥5.0) or rule-out (NLR ≤0.2) value for severe disease in at least one study—historical, anthropometric and metabolic variables. *Children with visible wasting or nutritional oedema were also classified as having severe malnutrition. In the study by Elshout et al, ‘comorbidity’ was defined as being under routine care of a paediatrician or ENT specialist. ENT, ear, nose and throat; MUAC, mid-upper arm circumference; NLR, negative likelihood ratio; PICU, paediatric intensive care unit; PLR, positive likelihood ratio; WAZ, weight-for-age z-score.

Hypoxia was most useful to rule-in severe disease in moderate prevalence settings (PLR range 8.11–9.49).27 34 Some studies found hypotension and bedside markers of poor peripheral perfusion to have useful rule-in value, but this was inconsistent (PLR range 1.89–9.57 and 1.78–17.38, respectively).26 27 31 32 34–36 38 Bradycardia was evaluated in a multicentre study conducted across three East African countries and found to have useful rule-in value (PLR range 5.95–14.59) for severe disease in those high prevalence settings.26 31 Impaired consciousness, assessed using bedside coma scales, was a useful predictor of severe disease, particularly in low and moderate prevalence settings (PLR range 3.38–14.02), with the post-test probability of poor outcome increasing with the degree of neurological impairment.27 32 34 36 38 41 42

In sub-Saharan African settings, severe malnutrition (PLR range 1.56–11.23),26 27 32 34 41 HIV positive status (PLR range 2.32–12.48)26 27 41 42 and bedside correlates of metabolic derangement such as deep breathing and jaundice (PLR range 3.57–7.71) were useful rule-in predictors, across a range of prevalence settings.27 32 34

Very few prognostic factors were satisfactorily able to rule-out progression to severe disease: presence of comorbidities (NLR range 0.12–1.04), sepsis at admission (NLR 0.19) and prostration (NLR range 0.18–1.23) were each identified in only one study.27 28 35

Clinical prediction models

Figure 5 illustrates the discrimination (AUROC) of 25 clinical prediction models for 33 different outcomes assessed in eight studies: most (18/33) were external validations of existing models27 31 38 39; 13 were newly derived models29 31 40–42 and two were updates and external validations of an existing model.38 Components of the clinical prediction models are summarised in table 2.

Table 2

Components of clinical prediction models evaluated in the included studies

Figure 5

Discrimination of clinical prediction models to identify children at risk of severe disease. Individual studies evaluated different clinical prediction models using datasets with varying numbers of children with severe disease, depending on the data available. The outcome prevalence reflects the proportion of children with severe disease in the dataset used to evaluate that particular prediction model/outcome pair. This may be different from the overall prevalence of children with severe disease in the study, which is listed in table 1 and used to classify studies into low, moderate or high prevalence settings. No CIs were provided for the AUROC estimates in the study by Walia et al. AQUAMAT, African Quinine Artesunate Malaria Trial; AUROC, area under the receiver operating characteristic curve; FEAST-PET, FEAST-Paediatric Emergency Triage; FEAST-PETaL, FEAST-Paediatric Emergency Triage and Laboratory; LODS, Lambaréné Organ Dysfunction Score; PEDIA, Paediatric Early Death Index for Africa; PEWS, Paediatric Early Warning Score; PICU, paediatric intensive care unit; PRISM III, Paediatric Risk of Mortality; qPELOD-2, quick Paediatric Logistic Organ Dysfunction; qSOFA, quick Sequential Organ Failure Assessment; SICK, Signs of Inflammation in Children that Kill; SIRS, Systemic Inflammatory Response Syndrome; YOS, Yale Observation Score.

Three models, Lambaréné Organ Dysfunction Score (LODS), Paediatric Early Death Index for Africa (early death score) (PEDIA-e) and Signs of Inflammation in Children that Kill (SICK), showed good (AUROC ≥0.80) discrimination in a Ugandan setting where in-hospital mortality occurred at a prevalence of 4.7% (AUROC range 0.85–0.90).27 Two of these (LODS and PEDIA-e) were also assessed in a multicentre study in East Africa where discrimination was lower (AUROCs of 0.77 and 0.70).31 This study also derived two models, the FEAST-Paediatric Emergency Triage (FEAST-PET) and FEAST-Paediatric Emergency Triage and Laboratory (FEAST-PETaL) scores, which showed good discrimination (AUROCs of 0.86 and 0.82).31 Two other East African studies used combinations of simple clinico-demographic variables to derive a number of prediction models, four of which had AUROCs ≥0.80.41 42

One North American study derived a model to predict hypotensive shock in unselected children presenting with suspected sepsis, which showed good discrimination in an external geographic validation (AUROC 0.87).40 The Yale Observation Score also showed high discrimination for mortality (AUROC 0.97) and mechanical ventilation (AUROC 0.89) in India, however, the small sample size (n=100) renders the results difficult to interpret.39 In general, models assessed against ‘softer’ outcomes (eg, persistence of symptoms or length of stay) had poorer discrimination, and a more distal temporal relationship between measurement of predictors and ascertainment of outcome.

Quality assessment

Only one prognostic factor study was at low risk of bias,35 while another was judged to be at low risk of bias in all but one domain.26 The domains at highest risk of bias were study confounding, related to omission of important covariates; study participants, often due to requirement for the measurement of specific laboratory parameters (eg, leucocyte count); and statistical analysis, as a result of inadequate reporting or inappropriate exclusion of participants from the analysis (figure 6).

Figure 6

Risk of bias and applicability assessments for included studies using (A) the QUIPS tool (n=11 studies) and (B) PROBAST (n=33 clinical prediction model/outcome pairs from seven studies). All studies evaluating clinical prediction models were assessed using PROBAST, except for the study by Elshout et al, which was primarily a prognostic factor study and was therefore assessed using QUIPS. PROBAST, Prediction Model Risk of Bias Assessment Tool; QUIPS, Quality In Prognosis Studies.

Each clinical prediction model/outcome pair was assessed independently and all judged to be at high risk of bias (figure 6). Most often this was due to inadequate reporting of model performance (studies reporting discrimination but not calibration), circularity between predictors and outcomes or having fewer than 100 participants with severe outcomes for model validation. It is noteworthy that one study which externally validated three models included 99 children who died.27 Another study which derived and/or validated nine models undertook an additional external validation in a population of acutely unwell but non-febrile children (and thus not eligible for consideration in this review), which included more than 100 children who died.31

In all but one study there was high concern regarding applicability to the review question.40 This was largely due to the majority of studies including only children requiring hospitalisation, with recruitment occurring after the decision to admit had been made by the treating physician. Full details on risk of bias and applicability assessments are provided in online supplemental appendix S6.

Discussion

This systematic review of prognostic factors and clinical prediction models assessing severity of disease in febrile children highlights that few well-conducted studies address this important public health question, particularly in unselected children presenting from the community. One of its main strengths is the inclusion of studies from a wide geographic context, aiding understanding of how predictive performance can vary across settings. By focusing on prognosis, we identified features that predict the likelihood that a child’s illness might progress, rather than features associated with illness severity at the moment of assessment.

Most prognostic factors identified as valuable for predicting severe childhood febrile illness (PLR ≥5.0) overlapped with individual components of the most promising clinical prediction models (AUROC ≥0.80): nutritional and HIV status, hypoxia, altered consciousness, and markers of acidosis (raised venous lactate or deep breathing) and poor peripheral perfusion (weak pulse, limb-core temperature gradient or prolonged capillary refill time).27 31 32 34 36 38 42 Hypoglycaemia was a useful prognostic factor identified in our review, but omitted in most clinical prediction models. Many of these features, however, indicate a child that is already very unwell, reflecting the fact that most studies included only hospitalised children and focused on predicting mortality. Few prognostic factors adequately ruled-out (NLR ≤0.2) the possibility of progression to severe disease, a finding consistent with a previous systematic review evaluating the diagnostic utility of clinical features for serious bacterial infections.10

The major limitation of our work arises from the heterogeneity of studies, which precludes comparison of effect estimates. Second, it is difficult to determine if studies included children presenting to first-line health workers. We did not exclude studies solely based on the designated ‘level’ of a health facility: concerned parents in all settings use primary, secondary and tertiary care facilities as their first point-of-access. Third, most studies included only hospitalised children. This is a major barrier to understanding the potential for prognostic factors and prediction models to guide referral or admission decisions. Follow-up of children assessed as ‘low-risk’ (ie, those managed in the community) must be a priority for future studies seeking to determine the validity of prognostic factors and prediction models in outpatient settings.43 Fourth, in line with other reviews we found most studies to be of low quality.44 Recent guidance may help address this.17 Finally, we framed the review around ‘febrile illness’, rather than, for example, ‘clinically-suspected infection’. Our rationale was to ensure the findings were as relevant as possible for lesser-trained community health workers in resource-constrained settings, for whom a presumptive diagnosis of suspected infection can be challenging. Febrile illness is an accepted ‘pragmatic point-of-entry’ in these settings,45 however, we acknowledge that some children (particularly younger infants) may not mount a fever in response to serious infection. Therefore, despite our deliberately broad definition of febrile illness (documented abnormal temperature and history of fever), and the inclusion of studies of children with ‘suspected sepsis’, relevant studies may have been missed. Of note, in view of a suggestion arising during the peer-review process we also performed a second MEDLINE search, using alternate search strings, which did not yield any additional eligible articles (online supplemental appendix S7).

Thirty out of 200 (15%) prognostic factors met our pre-specified threshold for clinical relevance (PLR ≥5.0 or NLR ≤0.2). This may reflect the difficulty of identifying parsimonious predictors for all febrile children. While common pathophysiological pathways for severe disease have been identified across a spectrum of microbial aetiologies,46 47 certain predictors may perform better for specific syndromes or pathogens, compared with all-cause febrile illness. Five studies in our review reported a high proportion of children as being either slide-positive or rapid diagnostic test-positive for malaria. Notwithstanding the issues of co-infection and/or concomitant incidental parasitaemia in settings of high malaria endemicity, it is possible that the findings of these studies are more pertinent to children with malaria. However, four of these studies compared the prognostic performances of hyperlactaemia, hypoglycaemia and the prediction models SICK, LODS and PEDIA, and found them to be broadly consistent between children with malaria, non-malarial fever and invasive bacterial disease.26 27 32 34 Furthermore, as can be seen in figures 2–4, a number of predictors identified in malaria endemic regions also demonstrated prognostic utility in contexts where malaria is not endemic (eg, venous lactate, impaired peripheral perfusion, hypotension and altered consciousness). This, in conjunction with the subgroup analyses performed in the original studies, gives us a degree of confidence that the prognostic factors that we have identified are generalisable across different infecting pathogens. Nonetheless, future reviews using search strategies developed to retrieve syndrome-specific or pathogen-specific studies should explore this.

Another potential explanation for the relatively few valuable prognostic factors identified is work-up bias. In most studies, predictors were available to the treating clinicians: abnormal values are likely to have been acted on and predictive performance underestimated. For most predictors, particularly clinical signs, this is unavoidable as blinding is often neither possible nor ethical. When feasible, randomisation is required to definitively assess their potential impact.48 This is particularly important for new tests proposed in resource-limited settings. For example, both lactate and hypoxia were identified as potentially of value in this review but introducing tests for these parameters at all first-line health facilities across the tropics would incur substantial cost, and as their predictive value may vary in different settings, could result in unnecessary or missed referrals. Randomisation can help determine whether new tests such as these add value to simple clinical assessment.49

Clinical prediction models performed better when derived and validated in similar populations27: in East Africa LODS and PEDIA-e (both derived in sub-Saharan Africa)50 51 were superior to SICK (originally derived in India).52 Model performance also improved when predicting the same outcome as the derivation study: quick Sequential Organ Failure Assessment and quick Paediatric Logistic Organ Dysfunction, derived to predict mortality, performed poorly when predicting prolonged length of stay.38 53 54 These findings highlight the importance of deriving prediction models using populations and outcomes appropriate to the clinical question. While mortality is a ‘hard’ outcome, it seldom occurs in primary care. Furthermore, its reflection of disease severity is influenced (mediated) by the level of care. It is striking that in Tanzania a raised lactate conveyed a post-test probability of in-hospital mortality comparable to that of ‘organ dysfunction within 24 hours of arrival’ in a similar prevalence setting in the USA.32 34 37 Rather than relying on models derived in secondary care to generalise to outpatient settings across different epidemiological landscapes, alternative ways to quantify disease severity, which consider local context yet avoid circularity between predictor variables and outcome definitions, will be important to facilitate comparisons across settings and explore generalisability of risk prediction tools. Finally, the fact that most studies summarised model performance using only the AUROC means that is difficult to appreciate what the impact might be on clinical decision making.55

In LMIC primary care contexts, many variables are not feasible to collect,56 and as noted above, some may incur substantial cost. Interestingly, HIV and nutritional status were both identified in our review and represent the only prognostic factors meeting our threshold for clinical relevance that may not necessarily reflect a child that is overtly very unwell. While biological plausibility for the prognostic utility of these two variables is high, it should be noted that the study which identified them was small and correspondingly the CI for the PLR is wide.41 The WHO’s Integrated Management of Childhood Illnesses ‘Danger Signs’ are recommended to guide referrals from community healthcare providers in resource-constrained settings.57 Of these, only altered consciousness was widely represented among included studies, and most found it to be a good predictor of severe disease.26 27 31 32 34 36 38 41 42 History of convulsions was examined in two studies while other ‘Danger Signs’ were not evaluated.26 27

Conclusion

Our findings emphasise the limitations of individual prognostic factors. Performance varies widely across settings and clinicians must be cognisant not to over interpret individual predictors. While prediction models can support clinical decision making, they must be derived and validated using appropriate methodology, and populations and outcomes relevant to the clinical problem. For the identification of children at risk of severe febrile illness, this will require multiple, large, collaborative, research initiatives across different settings, which collect harmonised data on predictors and outcomes,58 59 and include unselected children presenting from the community.

Data availability statement

Data are available upon request. All data relevant to the study are included in the article or uploaded as supplementary information. The protocol for the study is available from: https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=140542.

Ethics statements

Patient consent for publication

Acknowledgments

We acknowledge the support of Nia Roberts and Jolanda Elmers in helping to translate MEDLINE search terms and access certain articles. We are grateful to Hanne Boon for assisting with the R code to create the dumbbell plots.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Handling editor Soumyadeep Bhaumik

  • Twitter @arji_barji, @DrRainerTan, @PaulTurnerMicro

  • AC and RT contributed equally.

  • PT and KK contributed equally.

  • Contributors AC conceived the study; AC, RT, YL, PT and KK defined the review strategy; AC and RT conducted the search, screened retrieved articles and extracted the data; AC, RT, MC, AVDB and JV assessed quality of included articles; AC and RT analysed the data and drafted the report; CK provided statistical oversight; PT and KK commented on the drafted report; AC, RT, MC, AVDB, JV, CK, NS, YL, PT and KK commented on and approved the final manuscript.

  • Funding AC is supported by a Wellcome Trust Doctoral Training Fellowship. RT is supported by the Botnar Foundation. The Cambodia Oxford Medical Research Unit and Mahidol-Oxford Tropical Medicine Research Unit are part of the Wellcome Trust Thailand Africa and Asia Programme, which receives core-funding from the UK Wellcome Trust (106 698Z/14/Z). The project also received seed funding from the Tropical Health Education Trust in the UK.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Author note Technical Advisory Panel members: Jalemba Aluvaala, Quique Bassat, David Bell, John Crump, W. Conrad Liles, Rianne Oostenbrink and Shunmay Yeung. Affiliations are listed in online supplemental appendix S2.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.