Discussion
This study extends prior validation research related to maternal and newborn care by assessing the validity of ANC and PNC service coverage indicators16 using data from three additional LMIC settings. Our results make evident that, at discharge, women are able to report with accuracy on multiple aspects of antenatal and postnatal care, particularly in contrast to the very few intrapartum and immediate postnatal (within 1 hour of birth) indicators, that have met validation criteria.13–17 However, study findings also demonstrate considerable variability by survey question and setting. Despite this heterogeneity, we identified several broad patterns by type of indicator.
The first is a general pattern of higher reporting accuracy for indicators related to concrete, observable interventions. For example, most ANC indicators which met both accuracy criteria in at least two countries reflected health checks during a physical examination of the mother (blood pressure check, abdominal examination, anaemia screening, urine test and fetal heart rate monitoring). The same trend was observed for PNC indicators. Indicators which reflected physical examination of the mother were more likely than indicators of counselling interventions, provider type or newborn care to meet both validation criteria in half or more of settings tested.
In contrast, ANC and PNC indicators that reflected more abstract concepts, particularly those pertaining to counselling or advice given, performed less reliably. Neither recalling whether the woman was counselled on the status of her pregnancy, given a return date in ANC, nor whether the provider gave information on sexually transmitted infections (STIs) and HIV/AIDS, discussed danger signs for the mother or gave information on the baby’s sickness signs in PNC met both validation criteria in any of the countries tested. For nearly all these indicators, the AUC fell below the benchmark of 0.70, with the exception of being given information on STI and HIV/AIDS. Notably, at the time of data collection in Kenya facilities, there was emphasis on provider-initiated HIV counselling and testing which may have enhanced the quality of counselling provision. These results raise questions around how counselling is conducted and how well women understand and retain the information they are given. Also, what makes counselling memorable to women?
An analysis of DHS service provision assessment data from Haiti, Malawi and Senegal collected between 2012 and 2014 found that the strongest predictor of client ANC knowledge was when client and observer reports agreed that counselling on a given topic had been performed.29 This study also found that client and observer agreement that counselling had taken place was generally low and suggested that poor-quality counselling was a factor in client acknowledgement that counselling had occurred. Two exceptions to the relatively low validity of counselling-type indicators documented in this study were counselling related to family planning and whether the provider discussed breast feeding or infant feeding with the mother, which met both validation criteria in two countries, respectively. Notably, these two indicators had higher SE than other counselling-type indicators. It is possible that in these cases counselling was paired with an observable action such as breastfeeding demonstration or being shown or receiving a family planning method.
The more accurate recall of physical aspects of care, as opposed to advice or information given, has mixed support in prior validation studies. A PNC recall study of similar design in Kenya and eSwatini found that five of the same six indicators of maternal physical examination as measured in this study met both validation criteria in at least one of the two countries.16 However, the Kenya and eSwatini study also found somewhat better recall of counselling indicators. Two counselling indicators (whether the provider discussed danger signs for the mother or gave information on STIs and HIV) met both validation criteria in either Kenya or eSwatini, whereas neither of these indicators met the benchmark in any setting of the present study. Another study in China which compared women’s reports with facility records found generally lower SP for indicators of the maternal physical examination during ANC and PNC than the present and only prior known PNC validation study.19 The China study did not include counselling-type indicators. Notably, women included in the survey had delivered at least one live birth in the country in the 5 years preceding the survey and therefore had a substantially longer recall period.
Another notable trend observed was that indicators related to care for the mother herself were more accurately recalled than indicators of care for the newborn in PNC. Across indicators, SP was on average lower for newborn PNC interventions relative to maternal PNC interventions, demonstrating that women had a tendency to over-report newborn interventions received. The same trend for lower SP was observed in a prior validation study conducted in Kenya and eSwatini for newborn postnatal interventions.16 Overall, a similar proportion of newborn indicators met both validation criteria in at least one country in the present study (five of six), as in the McCarthy et al study16 conducted in Kenya and eSwatini (four of five). Three indicators—whether the provider discussed breast feeding/infant feeding with the mother, weighed the baby or immunised the baby—had acceptable validity in at least one setting in each of the studies.
While broad trends in the types of indicators accurately recalled are apparent, the substantial variation across samples raises questions about what characteristics of the respondent or setting influence recall accuracy. For example, for both ANC and PNC, indicators tended to have the highest SE and SP in Bangladesh, while these were lower in Cambodia and Kenya. These findings lead to questions as to whether differences in respondent characteristics contribute to differences in recall accuracy. For example, are women with a greater number of prior births primed to recall interventions received because they are more knowledgeable about the standard of care? Or does the expectation of type of care that should be received (eg, in a higher-tier facility or in a setting where an intervention is near universally vs rarely practised) matter more? It is also possible that respondent expectations of care based on facility attributes, random variation or differences in data training protocols account for discrepancies by setting. Such questions warrant further investigation and highlight the need for validation research in additional settings.
A strength of this study is the inclusion of a large number and range of type of facilities in each setting. Greater variation in facility practices addresses sample size limitations due to lack of variation in facility practices in prior validation research.13 14 16 Examination of validation results in three different countries also gives insight into how individual validity may vary by setting and which indicators tended to perform most consistently. However, this study also has several limitations. While direct observation by a third-party observer is considered to be gold standard, it may also be imperfect. Differences in observer training protocols, facility practices or how apparent it was that a given intervention was implemented, among other factors, may contribute to differences in observer ratings across settings. For example, observation of counselling interventions may have been more subjective regarding whether counselling took place. This may have contributed to lower SE and SP for counselling-type indicators. Finally, summary accuracy criteria should be interpreted with care. Global measures of test accuracy fail to distinguish between false negative and false positive test errors. We also caution against generalising the population-based results of this study to other settings, depending on the prevalence of the intervention. We have previously demonstrated how indicator properties established in this study can be extended to contexts with varying levels of intervention coverage.13
An important consideration of the relevance of study findings for national and global monitoring efforts is that this study assessed women’s immediate recall accuracy (at facility discharge). Results may not be directly generalisable to the DHS and MICS, which typically ask women to recall events related to a birth in the 2–5 years prior. While immediate recall may represent best-case scenario in terms of accuracy, findings inform the degree to which women perceived specific interventions took place. Prior evidence suggests that, unless interventions are recalled with high accuracy at facility discharge, recall generally declines with time. For example, validation analysis of facility-based interventions received in the intrapartum and immediate postnatal periods in Kenya showed that the few select interventions which were recalled with high accuracy at facility discharge maintained acceptable accuracy at 13–15 months’ follow-up.15 However, for most indicators, recall accuracy was poor and either remained the same or declined with time. Another study which assessed maternal recall of infant birth weight among women in Nepal at recall periods ranging from 1 to 24 months postdelivery found that recall was generally poor and relatively uninfluenced by length of follow-up.30 While additional research evaluating different lengths of recall time is warranted, it is possible that high immediate recall is necessary in order to ‘code’ certain events into memory for later reporting.
In the context of calls for enhanced measurement of the components that lead to effective coverage, study findings such as these suggest that careful consideration of the type of information women are asked to recall is needed. While household survey programmes such as the DHS and MICS are frequently relied on as data sources for measuring intervention coverage, findings should be triangulated with other data sources such as routine data from health information systems. As new indicators are proposed, they should be subject to validity tests in a range of settings and recall periods. Variations in question wording and sequence should be systematically tested to investigate whether validity can be improved.