Predictors of neonatal mortality: development and validation of prognostic models using prospective data from rural Bangladesh

Objective To assess the extent to which maternal histories of newborn danger signs independently or combined with birth weight and/or gestational age (GA) can capture and/or predict postsecond day (age>48 hours) neonatal death. Methods Data from a cluster-randomised trial conducted in rural Bangladesh were split into development and validation sets. The prompted recall of danger signs and birth weight measurements were collected within 48 hours postchildbirth. Maternally recalled danger signs included cyanosis (any part of the infant’s body was blue at birth), non-cephalic presentation (part other than head came out first at birth), lethargy (weak or no arm/leg movement and/or cry at birth), trouble suckling (infant unable to suckle/feed normally in the 2 days after birth or before death, collected 1-month postpartum or from verbal autopsy). Last menstrual period was collected at maternal enrolment early in pregnancy. Singleton newborns surviving 2 days past childbirth were eligible for analysis. Prognostic multivariable models were developed and internally validated. Results Recalling ≥1 sign of lethargy, cyanosis, non-cephalic presentation or trouble suckling identified postsecond day neonatal death with 65.3% sensitivity, 60.8% specificity, 2.1% positive predictive value (PPV) and 99.3% negative predictive value (NPV) in the development set. Requiring either lethargy or weight <2.5 kg identified 89.1% of deaths (at 39.7% specificity, 1.9% PPV and 99.6% NPV) while lethargy or preterm birth (<37 weeks) captured 81.0% of deaths (at 53.6% specificity, 2.3% PPV and 99.5% NPV). A simplified model (birth weight, GA, lethargy, cyanosis, non-cephalic presentation and trouble suckling) predicted death with good discrimination (validation area under the receiver-operator characteristic curve (AUC) 0.80, 95% CI 0.73 to 0.87). A further simplified model (GA, non-cephalic presentation, lethargy, trouble suckling) predicted death with moderate discrimination (validation AUC 0.74, 95% CI 0.66 to 0.81). Conclusion Maternally recalled danger signs, coupled to either birth weight or GA, can predict and capture postsecond day neonatal death with high discrimination and sensitivity.

► Neonatal mortality declines in low-income countries have slowed significantly. ► Current approaches to identifying illness in communities heavily rely on infrequent visits by overburdened community health workers; first contact with a trained provider may be long after the period of greatest danger has passed. ► Reaching and assessing newborns in the first week of life can be challenging, but is also likely to have the greatest survival benefit.
What are the new findings?
► Neonatal danger signs ascertained through prompted maternal recall shortly after childbirth, if coupled with either birth weight or gestational age (GA), can predict and capture neonatal death with good discrimination and high sensitivity, respectively.
What do the new findings imply?
► Given the severe resource limitations experienced in many communities, neonatal illness screening and referral may potentially be improved on the basis of a limited set of predictors which include maternal recall of the newborn's condition at birth, birth weight and GA. ► Moreover, these findings provide support for systematically capturing GA and birth weight in low-income and middle-income countries.

ABSTRACT
Objective To assess the extent to which maternal histories of newborn danger signs independently or combined with birth weight and/or gestational age (GA) can capture and/or predict postsecond day (age>48 hours) neonatal death.
Methods Data from a cluster-randomised trial conducted in rural Bangladesh were split into development and validation sets. The prompted recall of danger signs and birth weight measurements were collected within 48 hours postchildbirth. Maternally recalled danger signs included cyanosis (any part of the infant's body was blue at birth), non-cephalic presentation (part other than head came out first at birth), lethargy (weak or no arm/leg movement and/ or cry at birth), trouble suckling (infant unable to suckle/ feed normally in the 2 days after birth or before death, collected 1-month postpartum or from verbal autopsy). Last menstrual period was collected at maternal enrolment early in pregnancy. Singleton newborns surviving 2 days past childbirth were eligible for analysis. Prognostic multivariable models were developed and internally validated.
Conclusion Maternally recalled danger signs, coupled to either birth weight or GA, can predict and capture postsecond day neonatal death with high discrimination and sensitivity.

InTROduCTIOn
Despite dramatic reductions in under-five mortality in recent years, progress has largely benefitted infants surviving past the neonatal period, with a marked plateau in neonatal mortality declines. 1 According to 2015 estimates, 45% of deaths among children underfive occurred in the neonatal period. 2 As the greatest number of deaths happen during the first hours of life, 1 there is an urgent need to identify sick newborns and deliver efficacious interventions in a timely manner. However,

BMJ Global Health
identifying and treating neonatal illnesses in this window is challenging because neonatal deaths are exclusively concentrated in low-income and middle-income countries with half of these deaths stemming from childbirths occurring in the home, 3 usually far from skilled obstetric care and where resources to intervene in the event of an emergency are limited. 4 5 Increasing rates of facility-based births have not resulted in marked improvements in neonatal survival outcomes likely because poor quality of care persists and the onset of complications begins after discharge from the facility. 6 To ensure high-quality care, processes including systematic assessment and correct diagnosis need to be improved. 7 Risk assessment tools that evaluate the condition of newborns can aid facility health workers in determining whether intervention is necessary. 8 Many of these tools are quite sophisticated, requiring skilled clinical personnel or are targeted towards neonatal intensive care units; 9 10 their need for complex, resourceintensive inputs limits their applicability in low-income settings. Even when births occur in rural facilities, these facilities are often under-resourced and may be unable to conduct routine assessments that require temperature or even weight. 11 Although a prediction model for neonatal mortality in low-income and middle-income countries was developed recently using surveillance data from sites in India, Nepal and Bangladesh, 12 alternative models and risk assessment tools may be useful where severe data gaps exist.
At the community level, home-based neonatal care programmes have been used to identify and treat neonatal illnesses. 13 14 However, attending births and completing postnatal visits in the hours following birth can be challenging for community health workers (CHW), 15 who are often in short supply and overtaxed with many health provision responsibilities. Although prompt warning sign recognition may potentially translate into timely care seeking and intervention, it is generally difficult for mothers to spontaneously recognise danger signs. [16][17][18][19] In circumstances where danger signs are recognised early, misperceptions regarding severity may inhibit caretakers from seeking timely care. 20 21 We therefore propose that there is programmatic value in identifying a minimum set of readily observable predictors that can be used to aid neonatal illness screening in the first few highestrisk days of life. However, such a set of predictors would need to capture all infants at risk of dying at high sensitivity, be readily observable or prompted with minimal or no training and could be collected within hours of childbirth.
We sought to assess, through a secondary analysis of a large, prospective dataset from a cluster-randomised trial conducted in Gaibandha and Rangpur, northern Bangladesh, whether the prompted maternal recall of neonatal danger signs (lethargy, cyanosis, noncephalic presentation and trouble suckling) defined in local terms, could be predictive of postsecond day neonatal death and whether predictive capacity was improved by combining these risk factors with birth weight and gestational age (GA). For this purpose, the aims of this study were to characterise the association between these danger signs and postsecond day neonatal mortality; to describe the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) at which combinations of these danger signs with GA and birth weight captured mortality and to develop and internally validate prognostic multivariable models using birth weight, GA and maternally reported danger signs as predictors.

MeTHOdS data sources, participants and outcome
This is a secondary analysis of the JiVitA-3 clusterrandomised trial dataset, of which the protocol and methods are published elsewhere. 22 The parent study, which evaluated the efficacy of antenatal micronutrient supplementation versus iron-folate supplementation on all-cause 6-month infant mortality in a community-based setting, was conducted in 596 sectors across 19 unions of Gaibandha and Rangpur districts of northwestern Bangladesh between January 2008 (start of pregnancy surveillance) and August 2012 (end of follow-up). Enrolment of mothers took place early in pregnancy. The populationbased cohort enrolled 44 567 pregnant women, with 28 516 live-born infants. Infants were excluded from this analysis if they died within 2 days postbirth (age of death ≤48 hours), if an assessment following birth was absent/ incomplete or performed beyond 2 days past birth (age >48 hours), if the age of assessment was implausible (ie, negative), if they were born from a multiple pregnancy, if they were delivered through caesarean section (figure 1). The analytic cohort was then randomly split into a model development (75%) and model validation set (25%), stratified by neonatal deaths, once the final set of eligible records was reached (figure 1).
The outcome of interest, postsecond day neonatal death (age>48 hours), was ascertained for all infants through a 1-month vital status visit, with details captured in an extensive verbal autopsy performed within 30-45 days after an infant death was reported. Few (n=5) infants were lost to follow-up prior to the final 1-month visit, and for this analysis were presumed to have survived. No actions were performed to blind the assessment of the outcome.

Predictors
An assessment of the infant was performed at home as soon as possible after birth where measurements (birth weight, length, mid-upper arm circumference) were collected by a trained anthropometrist and a simple interview regarding the condition of the infant at birth and newborn care practices was performed. 22 The mother was asked which part of the baby came out first (head, cord, arm/leg, buttocks, c-section, other, don't know). Non-cephalic presentation was defined as any part that BMJ Global Health was not the head coming out first. Cyanosis was defined as any part of the infants body, as reported by the mother, was blue at birth. The strength with which the baby cried and/or moved at birth (either: strong, weak, none) was ascertained. Lethargy was defined in this analysis as the infant either not moving, moving weakly, not crying or crying weakly. Suckling vigour was either captured during a 1-month postpartum assessment or during a verbal autopsy if death occurred first; poor suckling was first defined by the infant being unable to suckle or feed normally in the 2 days after birth during the 1-month postpartum assessment. For infants with missing data from the 1-month assessment, poor suckling was then ascertained from a verbal autopsy in which the mother was asked if the infant was able to suckle or breastfeed normally before death. Home births were defined as births occurring in the home of the mother's husband, mother/father, neighbour/friend/relative or nurse/ family welfare visitor.
Baseline maternal characteristics were collected at the time pregnant women were enrolled into the parent trial. Information included age at pregnancy identification (ie, positive urine test), gravidity, pregnancy history (prior stillbirth, prior abortion, prior miscarriage, prior infant death), anthropometry (height, weight), tobacco exposure (chewed betel nut, chewed tobacco, husband smoking) and education. A living standards index capturing socioeconomic status was calculated from reported household assets and household construction materials by principal components analysis. 23 Women were categorised as chewing betel nut or tobacco if they reported chewing at least once a week. Husbands were categorised as smokers if pregnant women reported them smoking every day or nearly every day. A woman was considered to have a prior infant death, stillbirth, miscarriage or abortion/menstrual regulation if they reported so as an outcome for any prior pregnancy. A self-reported date of last menstrual period (LMP) was recorded at enrolment and used to estimate GA at delivery. In this site, LMP dates are more precise than usually expected as an ongoing prospective pregnancy surveillance system visits eligible women on a monthly basis to capture amenorrhea.
No actions were performed to blind the assessment of any predictors of the outcome. The parent trial's allocation of multiple micronutrient or iron-folic acid supplementation was double-masked. However, the parent trial intervention was not a predictor in this analysis.

Sample size and missing data
No sample size calculations were performed; all available data from the parent trial dataset were used in this BMJ Global Health analysis. Analyses were performed and models were developed and validated using participants with data available for all predictors (complete case analysis).

Statistical analysis methods
Maternal and infant baseline characteristics were compared between the development and validation sets. Groups were compared using χ 2 tests for categorical variables and t-tests for continuous variables. The variables used to model associations were handled as categorical. Associations between risk factors and outcomes were calculated using the development set only. The sensitivity, specificity, PPV and NPV of individual risk factors and combinations of risk factors to classify/predict neonatal death was calculated for infants in the development and validation sets, separately.
For multivariable prediction models, birth weight was modelled as a continuous variable through a linear spline with knot at 1.5 kg, GA was modelled as a continuous variable and all other predictors were modelled as categorical. Birth weight was analysed in kilograms as opposed to grams in order to improve the interpretability of model parameters. Three multivariable models were fitted on the development set. Model coefficients (β) and odds ratio (OR) estimates were calculated using multivariable logistic regression models.
The rationale for developing prediction models for this study was to assess the predictive performance of maternally reported danger signs in relation to birth weight and/or GA. Therefore, models whose variables were selected before modelling were based on demographic factors, common risk factors for adverse pregnancy outcomes observed in the literature (including birth weight and GA) and purposively, the maternally reported danger signs of interest. The first model (Model 1) was based on an expanded set of risk factors for neonatal death: birth weight, GA, lethargy, cyanosis, non-cephalic presentation, trouble suckling, infant sex, maternal age at positive urine test, primigravidae, maternal body mass index (BMI), maternal betel nut chewing, maternal tobacco chewing, husband smoking, maternal education and living standards index (above median). The second model (Model 2) focused on a limited set of risk factors for neonatal distress and death: lethargy, cyanosis, noncephalic presentation, suckling, GA and birth weight.
The variables for the third model (Model 3) were selected during modelling. This further simplified model (Model 3) is based on a limited set of risk factors that may be approximated when birth weight cannot be measured: GA, lethargy, non-cephalic presentation and suckling. Cyanosis was excluded from Model 3 because its parameter estimate was not statistically significant in multivariable analyses (ie, Model 2). A subgroup analysis was also performed to assess if prior infant loss, prior stillbirth, prior miscarriage or prior abortion/menstrual regulation improved model discrimination among women reporting a previous pregnancy. The predictors in these models are specified in the online supplementary data.
Model fit (calibration) was assessed with a Hosmer-Lemeshow goodness of fit test, where observations were divided by predicted probability into ten quantiles unless otherwise specified. The degrees of freedom for the Hosmer-Lemeshow test were adjusted for tests on the validation sample. Discrimination was assessed through the area under the receiver-operator characteristic curve (AUC) with asymptotic normal CIs calculated using the DeLong method for estimating standard errors (SEs). 24 The validation set (25% of the analytical cohort) was used to assess the internal validity of the three models. We largely chose this split ratio to be aligned with common practice. Predicted probabilities were calculated for each observation, using all three models, for observations the validation set. These predictions were then used to calculate discrimination using AUC. Discrimination was compared across all three models.
All statistical tests were two-sided. All analyses were performed using Stata 15 (Stata, College Station, Texas, USA).

Informed consent, ethical approval and trial registration
Once identified as pregnant from a population-based surveillance, oral consent to participate in the parent trial was obtained from pregnant women before a witness. The parent trial was approved by the Institutional Review Board at the Johns Hopkins Bloomberg School of Public Health (Baltimore, Maryland, USA) under registration number IRB00000570 and the Bangladesh Medical Research Council (Dhaka, Bangladesh). The parent trial is registered at clinicaltrials. gov (Identifier: NCT00860470). This reporting of this analysis followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines (online supplementary data, table S1). 25 26 Patient and public involvement Patients and/or the general public were not involved in the design, execution or drafting of this secondary analysis.

Study population
The cohort of infants in the parent trial consisted of 28 516 live births including 1246 neonatal deaths (figure 1). Excluding 714 infants that did not survive 2 days past childbirth resulted in an analytic cohort with a postsecond day neonatal mortality rate of 19.1 deaths per 1000 live births. After all exclusion criteria were applied, the cohort was randomly split into development and validation sets. The development set included 192 neonatal deaths among 14 944 live births (12.8 postsecond day neonatal deaths per 1000 live births) and the validation set included 65 neonatal deaths among 4983 live births (13.0 postsecond day neonatal deaths per 1000 live births) (figure 1). In the development set, infants that died during the postsecond day neonatal period differed from those that survived with respect to the following BMJ Global Health  (table 2). Few deaths were observed for cyanotic newborns (9 out of 317) and for newborns presenting non-cephalically (12 out of 309) in the development set (table 2).

Classification and multivariable prediction models
The sensitivity, specificity, PPV and NPV of neonatal death were calculated for each risk factor and combinations of risk factors (  presentation or trouble suckling to lethargy increased the sensitivity to 65.3% in the development set (table 3). A notable increase in the sensitivity of predictor combinations was observed when low birth weight (<2.5 kg) was added to combinations of maternally recalled danger signs. A combination of either lethargy or birth weight <2.5 kg classified neonatal deaths with 89.1% sensitivity (39.7% specificity, 1.9% PPV and 99.6% NPV) in the development set and 84.6% sensitivity in the validation set (table 3). Low birth weight (birth weight <2.5 kg) alone captured deaths at 74.1% sensitivity, 58.0% specificity, 2.2% PPV and 99.4% NPV (table 3). A combination of either lethargy or GA<37 weeks classified neonatal deaths with 81.0% sensitivity (53.6% specificity, 2.3% PPV and 99.5% NPV) in the development set and 71.4% sensitivity in the validation set (table 3). Preterm birth alone (GA<37 weeks) captured deaths at 54.1% sensitivity, 80.9% specificity, 3.6% PPV and 99.3% NPV (table 3). Because the PPV observed among the more sensitive predictor combinations did not generally exceed 5%, we examined whether more specific or restrictive combinations of predictors could result in increases in PPV. Reporting both lethargy and birth weight <2.0 kg captured infants at 25.9% sensitivity, 96.9% specificity and predicted deaths at 9.6% PPV (table 3).
Three multivariable prediction models were developed; model fit and discrimination were compared across development and validation sets. All models fit the data across development and validation sets (Hosmer-Lemeshow p>0.05). Model 1 (expanded set of risk factors) predicted neonatal death with good discrimination in the development set (AUC 0.81, 95% CI 0.77 to 0.84) and the validation set (AUC 0.80, 95% CI 0.73 to 0.88) (table 4). Model 2 (limited set of risk factors for neonatal distress and death) predicting neonatal death with good discrimination in the development set (AUC 0.80, 95% CI 0.76 to 0.84) and in the validation set (AUC 0.80, 95% CI 0.73 to 0.87), performed as well as Model 1 (table 4). Model 3 (GA, lethargy, trouble suckling and non-cephalic presentation) retained moderate discrimination in both the development set (AUC 0.75, 95% CI 0.71 to 0.80) and the validation set (AUC 0.74, 95% CI 0.66 to 0.81) given the exclusion of birth weight (table 4). Receiver-operator characteristic curves and AUC were produced for the three models (online supplementary data, figure S1). A subgroup analysis among multigravidae revealed that prior infant loss, previous abortion, previous miscarriage and previous stillbirth did not increase discrimination beyond birth weight, GA, noncephalic presentation, lethargy, cyanosis and trouble suckling (online supplementary data, table S4). A individual newborn's predicted probability of death in the neonatal period can be calculated using the provided risk equations for Models 1 and 2 (online supplementary data, table S5). †Multivariable models for individual predictors include lethargy, cyanosis, malpresentation, poor suckling, birth weight < 2.5 g and gestational age <37 weeks. ‡Missing data on lethargy for 93 infants (0 of which died), cyanosis for 48 infants (of which 0 died), non-cephalic presentation for 16 infants (in which 0 died), trouble suckling for 153 infants (of which 10 died), birth weight for 36 infants (3 of which died) and gestational age for 638 infants (7 of whom died).

BMJ Global Health
§Multivariable model includes birth weight <1.5 kg or <2.0 kg in lieu of birth weight <2.5 kg.

Summary of main findings
This analysis provides evidence on the ability of maternally recalled danger signs (lethargy, cyanosis, non-cephalic presentation, trouble suckling), birth weight and GA to predict postsecond day neonatal mortality. It is important to note that for maternally recalled danger signs mothers were neither trained nor standardised in the detection or reporting of these signs, but that these findings are based on over 19 000 such reports captured in a typical rural setting in Bangladesh. 27 Associations were observed between these danger signs and neonatal death in bivariate analyses. with the exception of cyanosis (maternal recall that any part of the baby's body was blue at birth), these associations retained their statistical significance in multivariable, adjusted analyses. Moreover, the magnitude of association increased as the number of reported danger signs increased. These maternally recalled signs, however, did not capture subsequent neonatal death with high sensitivity when modelled on their own. Recalling at least lethargy with at least one sign of cyanosis, non-cephalic presentation or trouble suckling in the development cohort classified neonatal death at 65% sensitivity. In contrast, a decision rule that classified neonatal death for infants who were either lethargic or weighed less than 2.5 kg at birth increased sensitivity to 89% in the development cohort. The specificity of this decision rule was 40% (with 60% of all infants surviving the neonatal period being either lethargic or weighing less than 2.5 kg). Multivariable prediction models that were developed using a development set that included danger signs together with birth weight and/or GA predicted neonatal death in the validation set achieved good discrimination. These findings are consistent with the literature indicating that low birth weight and preterm birth are strong risk factors for neonatal death. 1

Implications for public health practice
Predictive models of likely adverse pregnancy outcomes or increased propensity for neonatal distress, prior to the onset of clinical symptoms, have been highly sought after for decades. Scores which discriminate healthy neonates from those at higher risk of early newborn illness, susceptibility to infection or failure to thrive have been attempted-often with variable performance, depending on the skill of the observer and the context of use. The Apgar score is among the most-used metrics for newborn assessments, with demonstrated predictive reliability in clinical settings, captured by trained BMJ Global Health clinical staff at 1 and 5 min postnatally. 28 However, in lowresource community settings, where neonatal mortality remains high, where improvement in this critical development metric has stagnated and where achieving full coverage of institutional deliveries is a long-term proposition, innovation in triaging limited resources to high-risk neonates is required. In this rural Bangladesh context, where home-based delivery still occurs 69% of the time, 29 and where even facility-deliveries are discharged shortly after birth, contact with qualified healthcare providers in the highest risk postnatal period is rare. Although home-based neonatal care packages have emerged as a remedy for these coverage gaps, ensuring that all infants receive timely postnatal care visits still remains a challenge under both controlled and programmatic contexts. 15 30 Despite a growing body of evidence that suggests early CHW screening can significantly improve outcomes-from early initiation BMJ Global Health BMJ Global Health of breastfeeding to thermal care and improved careseeking behaviours, 30 31 it is clear that the current availability of health workers in low-level facilities and community health workers hinders their ability to recognise the currently prioritised warning signs. These challenges underscore the importance of seeking a minimum set of maternally observable danger signs which can reliably distinguish neonates at high risk of mortality. While efforts are made to improve the coverage of institutional delivery and postnatal care in the first week of life, these findings suggest that maternally reportable indicators are both assessable and could reduce preventable neonatal mortality in these settings in the interim. Specifically, our findings help to identify a minimum set of key danger signs worthy of triggering referral that can be highlighted in a late-in pregnancy antenatal care visit or that can be used to counsel recently delivered women and/or their family members, given infrequent CHW postnatal visits. For deliveries that occur in the home, caretakers are generally advised to seek immediate, qualified care from a CHW or health centre if they recognise danger signs. However, given the challenges associated with spontaneous warning sign recognition, caretakers recall may need to be prompted. A recent study in this population demonstrated that a mobile health package of short message service and home visits was a cost-effective addition to census enumeration and pregnancy surveillance. Such a package could potentially be leveraged to prompt recall of these danger signs. 32 Concerns about maternal capacity to recognise a newborn in danger, nonetheless, exist. Maternal recall of lethargy, non-cephalic presentation or robust suckling have been shown in this data to be reliable, despite likely variability in maternal interpretation of their own child's condition. Coupling these perceived conditions to information about GA or birth weight could provide a powerful risk index to trigger a CHW home visit or immediate care seeking by the family. Although the PPV of individual and combined danger signs (ranging from 1% to 10%) may be clinically relevant given the low absolute risk of mortality in this population, the PPV of these signs may seem too low to capture deaths in the context of a standalone, one-time screening test. However, we do not see this as problematic. A programmatic approach that aims for high sensitivity, ensuring that no babies at risk of dying are missed while helping overburdened health workers target their resources and time, can be acceptable despite the loss of specificity, but only if it is followed by a more specific screening test or examination on referral. Any infants in the community that are falsely classified at risk of death due to both a loss of sensitivity and low absolute risk of mortality could therefore be introduced into the continuum of care; intervened on in ways that may yield benefits beyond immediate survival (ie, vital registration, vaccination scheduling, postnatal counselling). This approach may be acceptable because the NPV exceeds 98%, suggesting that most infants that screen negative will survive the neonatal period.
Maternal histories of danger signs can be used to support a highly sensitive first test in a two-stage sequential screening process if birth weight and/or GA measurements are available. Because countries with the highest burden of birth weight tend to also lack reliable birth weight data, 33 efforts are needed to systematically measure and record GA in the long-term (as programmes aim for full coverage of facility-based deliveries) as well as the short-term. Capturing birth weight will be increasingly feasible in low-income and middle-income settings as the cost of robust, portable weighing scales drops with time and as investments are committed towards strengthening health information systems through digital tools. 34 However, for circumstances in which measuring birth weight may be cost prohibitive, our findings suggest that an estimate of GA can serve as a reasonable alternative. The capture of LMP can be used to estimate GA; our group has previously validated this method as a reliable approach to determining preterm status in this community. 35 The deployment of such a preliminary screener would likely be by phone, dramatically altering the cost-benefit calculus since mobile birth notification with GA capture, followed by a mobile short interview to assess risk, is increasingly feasible across most of rural Bangladesh and South Asia given trends in mobile phone ownership. 36 For facilities with the resources to ascertain valid birth weight and GA measurements, a model that also includes lethargy, cyanosis, non-cephalic presentation and trouble suckling will be able to discriminate infants as well as a model with other risk factors for neonatal mortality among primigravidae and multigravidae. As the quality of care in lower-level facilities increases with time and infants are triaged on the basis of more sophisticated risk assessment tools, the predictors we present may potentially be used for purposes other than immediate intervention. For example, these predictors may be used to flag infants for additional monitoring before discharge. If discharged, community health workers may be instructed to selectively follow-up with infants born preterm, with low birth weight or presenting as lethargic on delivery.

Study strengths and limitations
The strengths of this analysis include a large sample size and complete follow-up for infant outcomes in a rigorous research environment, ensuring high-quality data capture. The risk of recall bias and misclassification was reduced because the analytic sample was restricted to infants assessed either on the date of delivery or the day after. Assessments were performed before infant deaths and the predictors selected for analysis were expressly chosen due to their easy observability. There is a risk of recall bias for poor suckling because it was collected either at a 1-month postpartum assessment or during a verbal autopsy. Other predictors for neonatal illness that could be ascertained through maternal report such as temperature, fast breathing and severe chest indrawing, 37 were not measured during assessment. The maternally BMJ Global Health reported predictors that were measured were not standardised or externally validated-although we consider this an important source of natural variation expected in a self-reported score; as mentioned earlier, this included variability enhances the generalisability of these findings. Although maternal recall was prompted in these data, other studies report poor caregiver knowledge or spontaneous recall of danger signs for neonatal death. 16 19 Although the risk of recurrence of adverse birth outcomes has been previously reported for stillbirth, 38 low birth weight, 39 small-for-GA, 40 and perinatal mortality, 41 these results indicate that experiencing a prior infant death, previous miscarriage, previous abortion or previous stillbirth do not increase predictive discrimination beyond birth weight, GA and maternally recalled risk factors. The inability of these risk factors to discriminate mortality in these data may be driven by their relatively low prevalence in this study population.
A higher proportion of infants that died during the neonatal period were not assessed on the day of their birth or the day after (within age ≤48 hours) compared with those infants that survived the neonatal period. Although we attempted to mitigate this survivorship bias by restricting any inferences drawn to those infants that survived past 48 hours following childbirth, the postsecond day mortality rate in the development set was 19 deaths per 1000 live births before exclusions and 13 deaths per 1000 live births after exclusions. We speculate that this bias attenuated model coefficient estimates because neonatal deaths were differentially depleted from the study population.
Nevertheless, the most important limitation of this analysis is that it attempts to use predictors that are probably most useful at the point of delivery (non-cephalic presentation, cyanosis) to predict mortality during the period of time at which they may be less relevant, past 48 hours postbirth. Non-cephalic presentation, for example, is strongly associated with birth asphyxia and stillbirth. 42 43 Few deaths were observed beyond the intrapartum period among infants who had presented noncephalically or among infants cyanotic at birth. Late neonatal deaths, however, are predominantly caused by preterm birth (21%) or infection (48%) where infection is further divided into sepsis (37%), pneumonia (5%) and diarrhoea (2%). 1 The general symptoms for neonatal sepsis include fever, temperature instability, 'not doing well'. 44 Furthermore, the use of split sample validation as opposed to cross-validation or temporal validation as the method for internal validation, Hosmer-Lemeshow tests for the calibration of a model with a binary outcome, the relatively few events per predictor, and the absence of imputation methods for predictors are major weaknesses for the model's internal validity. However, the purpose of developing prediction models was to assess how well maternally observed danger signs, in addition to birth weight and/or GA, discriminates infants. Given these results, we would recommend that these data be used to inform the future development of prediction models that prospectively capture and assess how maternally reported danger signs may improve discrimination among all neonates (including outcomes occurring within 48 hours of childbirth). Moreover, the generalisability and programmatic applications of these predictors and models will ultimately depend on their validation in other populations.

COnCluSIOn
Maternal histories of lethargy, non-cephalic presentation and trouble suckling in their newborn shortly after birth were, in this rural Bangladeshi setting, reasonably predictive of postsecond day neonatal death, but only if coupled to a measurement of birth weight and GA. These findings may be used to support efforts to triage and focus care to high-risk newborns, to advocate for the systematic capture of birth weight and GA at birth to improve prediction and ultimately, to reduce preventable neonatal mortality in resource-constrained settings. As Bangladesh and neighbouring countries struggle to cover existing workforce gaps, strategies which enable resource prioritisation and target care to those at greatest risk may be useful in accelerating reductions in neonatal mortality.